Thursday, May 8, 2014

Why Data Science Needs Statistics

If you've read my earlier posts about why a scientific approach is important to data science, you won't find it surprising that I recommend Jeff Leek's recent post on the Simply Statistics blog, "Why Big Data Is in Trouble: They Forgot about Applied Statistics". Leek, a biostatistics professor at Johns Hopkins, and one of the instructors in Coursera's Data Science specialization, argues that a number of recent big data failures, including that of Google Flu Trends, can be chalked up to a lack of statistical knowledge among the researchers in question. Leek cites sampling, data collection, causal logic, model specification, and sensitivity analysis as areas where a solid knowledge of applied statistics could have prevented serious errors. It's a short but cogent read.

No comments:

Post a Comment