cwg Posts Contact

Medium Steps

Data Science

March 2018

"Baby steps" are how we get from A to B. We do the hard work of learning the details, spending hours on hiccups and chasing rabbits down holes. But I find it difficult to really post about baby steps. You all don't need to know about every little aspect of my data science education, and I don't have time to write about them. Reporting falls prey to the law of diminishing returns.

But the pressure to avoid reporting baby steps can overshoot the mark, leading to a desire to only post polished, new material. But if I only posted finished products, I wouldn't be able to write anything for weeks on end, and you, dear reader, would get no sense of the process. Also, I need to get over my fear of appearing too raw or unpolished. Thus, I'm going to try and be better about posting medium steps. Here are the medium steps I've taken lately:

  1. I discovered that GitHub hosts websites, which, of course, display Jupyter Notebooks beautifully and with ease. I spent an entire day trying to figure out a way to display my analysis of the 2017 Kaggle Survey here on WordPress [note from the future: not on wordpress anymore], only to have to resort an awful scrolling embedded window. So, in retrospect, I should probably have built my entire site on GitHub pages. Oh well. I've decided to marry the two for now. I'll continue to use this as my blog and primary writing outlet, but I'll host my portfolio projects on my GitHub site, which I hope to build soon.
  2. I've started working on start-to-finish project concerning Florida high school performance! Start with what you know, right? I've downloaded a lot of raw data from the Florida DOE. I'm currently working on cleaning it up, getting it ready to visualize and to run through some predictive modeling. I haven't decided on my fundamental questions yet, outside of "What factors contribute to failing schools and how much so?" Hopefully I'll get some time in the coming week to really jump into this.
  3. I'm three weeks into Andrew Ng's well known Machine Learning course on Coursera. I was privileged enough to get to chat with Hugo Bowne-Anderson from DataCamp the other week, and he suggested it as a resource. I'm really enjoying it so far. I'm glad that Ng gets into the weeds a bit with the math. Having taken four semesters of calculus in undergrad, I feel confident that I can do the math, but I haven't had a great opportunity yet to dust off those skills.
  4. I finally finished Nate Silver's book The Signal and the Noise. I really enjoyed it and learned lot, but I feel that it was a bit of a slog. He could have applied his general thesis/warning to fewer fields and still have written a great book. But I'm glad to have read it. I am going to try dive deeper into some more data science specific material next (see both my progress page and my current reading page [these pages no longer exist] for more!).
  5. I've got several posts coming up explaining some data science basics: one on conditional probability, a followup on Bayesian analysis, and a third on gradient descent (I'm looking forward to building the visuals and mechanics behind this one!). So stay tuned!