Friday, March 14, 2014

Why Scientists Make Better Data Scientists

Have a look at this blog post by Mike Walker on why it's useful for data scientists to have a scientific background. The link came to me in a list of "featured articles" I receive weekly from Data Science Central. The tl;dr is that analysts without scientific training (the author singles out those with undegraduate business degrees) lack the tools for distinguishing correlation from causation. This leads to a range of maladies, including spurious correlations, cherry-picking data, and stringing "disconnected facts" together to construct a fallacious narrative. Walker acknowledges that not all successful analysis requires starting out with a hypothesis, but stresses that there are scientifically rigorous ways to explore data for unexpected relationships, such as A/B testing.

I find this refreshing, after spending a great deal of time lately looking at job ads for data scientists: most ads focus on experience with specific software packages, rather than experience conducting rigorous research. I suppose the former is more of an objective measure than the latter, but I'm not sure how useful it is to hire based on what applications a person has used before, especially in a profession where the start of the art changes rapidly. Another problem is that people have started slapping the word "data scientist" on a wide variety of jobs: I've seen it applied frequently to database architect positions, or even to positions that have more to do with software development than data analysis.

At the moment, all of this matters to me because the contract on which I was working ended last December, and I'm now looking for a job again. I've had two good interviews, but I'm finding it very hard to break into a profession with a background different from traditional data analysts and business analaysts. One thing I have learned is the power of networking: one of my interviews came from a contact my wife made while carpooling, and the other resulted from my submitting a resume to a small-business group recommended by a former co-worker. (Oh, and if anyone has any good job leads, I'm happy to network here, too. :)

Those of you who frequently visit my links page might have noticed that I've updated it quite a bit over the past few weeks, particularly in the sections covering online courses ("Self-teaching Resources" and "Formal Learning Resources"). Coursera and Udacity have some interesting new offerings that you might want to check out. I'm also planning to add a section listing portals and other commercial websites, and I need to go through all the links to make sure the information on them is up to date. As ever, if you have any suggestions for additional resources, please let me know!

1 comment:

  1. We at COEPD provides finest Data Science and R-Language courses in Hyderabad. Your search to learn Data Science ends here at COEPD. Here, we are an established training institute who have trained more than 10,000 participants in all streams. We will help you to convert your passion to learn into an enriched learning process. We will accelerate your career in data science by mastering concepts of Data Management, Statistics, Machine Learning and Big Data.