The goal of week 2 of #SoDS18 is to narrow down all of the plans made during the brainstorming sessions from week 1 and plan out how to accomplish those goals. I’ve decided to narrow it down to three main goals to be completed over the 3 months of #SoDS18. These goals are: Build an R Shiny app/dashboard Complete a machine learning competition (and implement multiple models) Learn about deep learning and implement a deep learning model I've got my week 2 plan for #SoDS18 done!

Starting earlier this week, I decided to participate in the Summer of Data Science 2018 (#SoDS18)! Now that we are into June, I wanted to put up my first post about the Summer of Data Science and my plans. The first week is supposed to be spent brainstorming and looking up resources, so here are some of my ideas. Starting in week 2, I will be narrowing down my focus and making up a plan for the rest of the Summer.

On April 12th, 2018, Charlottesville and the Tom Tom Founders Festival played host to the second annual Applied Machine Learning Conference (AMLCville). I had the fortune to attend AMLCville and listen to talks on a wide range of topics and get to see the amazing ways that machine learning is being used out in the world. AMLCville was split into three different tracks with topic sessions on natural language processing, health care, computer vision, and geospatial analysis.

In December, I came across a guide to reproducible code in ecology and evolution published by the British Ecological Society. While my background is in physics research, I think the guide is incredibly useful for scientists of all disciplines. Reading through this guide also got me thinking about reproducibility in general and my own journey towards creating reproducible research. The British Ecological Society’s guide comes at a time when researchers across the various scientific disciplines are discussing the importance of reproducibility.

After seeing an announcement about it on R-Bloggers, I decided to test out the new xray package using the Titanic data set. The xray package provides a few functions for quickly getting a summary of anomalies and distributions of the variables in a data set. For anomalies, the anomalies() function outputs the number and percentage of NA’s, zeroes, empty strings, and infinities while also giving some useful information about distinct observations and variable class.