The Aspirational Data Scientist: Online Course Review: Coursera's Machine Learning, Part 2

Back in October, I reviewed Coursera's Machine Learning course, taught by Stanford professor and Coursera co-founder Andrew Ng. As I mentioned when I first reviewed the course, I wasn't able to finish it, because I was starting a new job and moving halfway across the country. I've just been able to complete the most recent iteration of Machine Learning (which was nearly identical to earlier versions), and I'd like to add a few more thoughts to my original review.

My last time through the course (the session that began on April 22nd, 2013), I completed almost all of the lessons on supervised machine learning methods, such as regression, logistic regression, and neural networks. In this session (which began on March 3rd, 2014), I repeated those lessons, and also finished the rest, most of which covered unsupervised techniques, such as clustering and recommender systems. I won't repeat the contents of my earlier review, except to note that what I said then remains true: Andrew Ng is a clear and charismatic lecturer, he covers advanced techniques, and he provides a number of practical tips, but the programming exercises are a bit canned, and may not fully prepare students to write their own scripts in Octave.

My new comments mostly reflect comparisons to other MOOC's, particularly the two courses from Coursera's Data Science specialization that I took recently. First of all, I think that Machine Learning could do more with the online format. In fact, most MOOC's consist largely of video-recorded lectures, with the addition of a sprinkling of interactive content, but Machine Learning falls short even by comparison with other online courses. The class does feature a very effective automatic grader, but it lacks any links to additional resources, or, very importantly, notes or slides from the lectures. While the latter omission may seem trivial (I didn't notice it the first time I took the course), a lack of lecture notes makes it difficult to go back later and review material from a lecture, except by watching the whole thing again. It's true that the programming exercises include detailed instructions, but not all of the course's topics are covered by these exercises, and at any rate the organization of the instructions can make it difficult to locate information on a specific subject.

I might also amplify my comment from the earlier review that the programming exercises involve mostly copying and pasting, rather than writing entire scripts. There's a reason for this: the focus of the course is on algorithms, not on other parts of solving machine learning problems. Nonetheless, my experiences taking other courses, especially those from the Data Science specialization, have demonstrated the practical value of forcing students to think about the nuts and bolts of a research project. Machine Learning's lack of a big final project also arguably deprives students of valuable practical experience, especially since these projects usually require students to explore the course material in greater depth than do short exercises; on the other hand, the fact that a final project can only cover a single topic from the course—or at most a handful of them—calls the value of such projects into question.

My final concern is that Machine Learning seems to have gone on autopilot at this point, with little or no attention from Ng or anyone else who helped him prepare the course materials. Questions in the discussion forum are answered instead by "Community TA's", that is, volunteers who took earlier sessions of the course. Most disturbingly, the majority of reports of errors in the course materials go unanswered, and those that are answered are answered by Community TA's, who lack the ability to fix the errors. For example, a month ago I discovered that the automatic grader accepted one version of my code and rejected another, even though the two versions were algebraically equivalent. My report of this apparent bug still hasn't been answered.

Despite these concerns, I still heartily recommend Machine Learning as a valuable starting point for anyone interested in data science. While the course was offered twice in 2013, the start date of the next iteration, on June 16th, 2014, suggests that Coursera may be planning to offer sessions of the 10-week course almost back-to-back, meaning several sessions each year.

What's next for me? I'll soon be posting a review of Udacity's short Intro to Hadoop and MapReduce. After that, I'm considering taking two more courses from the Data Science specialization, first Exploratory Data Analysis, which will give me some practical experience with graphics programing in R, and then Practical Machine Learning, which will provide experience using R for machine learning, as well as a basis for comparing the machine learning course reviewed above (though the course for the Data Science specialization, at four weeks, is much shorter, and can't possibly cover the same ground).

In the meantime, while I'm still looking for work as a data scientist, I've had a number of interviews, and some of the potential employers have read and commented positively on this blog. I hope that provides an example for other social scientists out there that, yes, you can become a data scientist.

Labels

Thursday, May 29, 2014

Online Course Review: Coursera's Machine Learning, Part 2

No comments:

Post a Comment

Useful Links