Thursday, May 29, 2014

Online Course Review: Coursera's Machine Learning, Part 2

Back in October, I reviewed Coursera's Machine Learning course, taught by Stanford professor and Coursera co-founder Andrew Ng. As I mentioned when I first reviewed the course, I wasn't able to finish it, because I was starting a new job and moving halfway across the country. I've just been able to complete the most recent iteration of Machine Learning (which was nearly identical to earlier versions), and I'd like to add a few more thoughts to my original review.

My last time through the course (the session that began on April 22nd, 2013), I completed almost all of the lessons on supervised machine learning methods, such as regression, logistic regression, and neural networks. In this session (which began on March 3rd, 2014), I repeated those lessons, and also finished the rest, most of which covered unsupervised techniques, such as clustering and recommender systems. I won't repeat the contents of my earlier review, except to note that what I said then remains true: Andrew Ng is a clear and charismatic lecturer, he covers advanced techniques, and he provides a number of practical tips, but the programming exercises are a bit canned, and may not fully prepare students to write their own scripts in Octave.

My new comments mostly reflect comparisons to other MOOC's, particularly the two courses from Coursera's Data Science specialization that I took recently. First of all, I think that Machine Learning could do more with the online format. In fact, most MOOC's consist largely of video-recorded lectures, with the addition of a sprinkling of interactive content, but Machine Learning falls short even by comparison with other online courses. The class does feature a very effective automatic grader, but it lacks any links to additional resources, or, very importantly, notes or slides from the lectures. While the latter omission may seem trivial (I didn't notice it the first time I took the course), a lack of lecture notes makes it difficult to go back later and review material from a lecture, except by watching the whole thing again. It's true that the programming exercises include detailed instructions, but not all of the course's topics are covered by these exercises, and at any rate the organization of the instructions can make it difficult to locate information on a specific subject.

I might also amplify my comment from the earlier review that the programming exercises involve mostly copying and pasting, rather than writing entire scripts. There's a reason for this: the focus of the course is on algorithms, not on other parts of solving machine learning problems. Nonetheless, my experiences taking other courses, especially those from the Data Science specialization, have demonstrated the practical value of forcing students to think about the nuts and bolts of a research project. Machine Learning's lack of a big final project also arguably deprives students of valuable practical experience, especially since these projects usually require students to explore the course material in greater depth than do short exercises; on the other hand, the fact that a final project can only cover a single topic from the course—or at most a handful of them—calls the value of such projects into question.

My final concern is that Machine Learning seems to have gone on autopilot at this point, with little or no attention from Ng or anyone else who helped him prepare the course materials. Questions in the discussion forum are answered instead by "Community TA's", that is, volunteers who took earlier sessions of the course. Most disturbingly, the majority of reports of errors in the course materials go unanswered, and those that are answered are answered by Community TA's, who lack the ability to fix the errors. For example, a month ago I discovered that the automatic grader accepted one version of my code and rejected another, even though the two versions were algebraically equivalent. My report of this apparent bug still hasn't been answered.

Despite these concerns, I still heartily recommend Machine Learning as a valuable starting point for anyone interested in data science. While the course was offered twice in 2013, the start date of the next iteration, on June 16th, 2014, suggests that Coursera may be planning to offer sessions of the 10-week course almost back-to-back, meaning several sessions each year.

What's next for me? I'll soon be posting a review of Udacity's short Intro to Hadoop and MapReduce. After that, I'm considering taking two more courses from the Data Science specialization, first Exploratory Data Analysis, which will give me some practical experience with graphics programing in R, and then Practical Machine Learning, which will provide experience using R for machine learning, as well as a basis for comparing the machine learning course reviewed above (though the course for the Data Science specialization, at four weeks, is much shorter, and can't possibly cover the same ground).

In the meantime, while I'm still looking for work as a data scientist, I've had a number of interviews, and some of the potential employers have read and commented positively on this blog. I hope that provides an example for other social scientists out there that, yes, you can become a data scientist.

Thursday, May 8, 2014

Online Course Reviews: The Data Scientist's Toolbox, and Getting and Cleaning Data, from Coursera's Data Science Specialization

I recently completed Coursera's The Data Scientist's Toolbox and Getting and Cleaning Data, two courses that form part of the online learning provider's new Data Science specialization, taught by Brian Caffo, Jeffrey Leek, and Roger D. Peng, biostatistics professors at Johns Hopkins University, and, in the cases of Leek and Peng, authors of the Simply Statistics blog. Both of the courses I took were taught by Jeff Leek (referred to in my earlier post today). I found Getting and Cleaning Data to be an especially useful course, teaching some practical skills that are quite essential to the real-world practice of data science. However, I probably wouldn't recommend the entire specialization to anyone coming from the world of quantitative research in academia, since a big focus of the program is teaching the scientific method and the logic of statistical inference—that is, things a quantitative social scientist should know already. First, however, a little background on Coursera's specializations....

Coursera has recently introduced a handful of "specializations", each consisting of a series of short courses followed by a capstone project. The specializations continue Coursera's effort to monetize its offerings through the Signature Track, which offers a "Verified Certificate" for those who pay a fee (typically about $50-$100) to take the course.

The Signature Track in itself has dubious value. Allegedly, its purpose is to provide a more useful credential than the certificates Coursera has traditionally offered for its free classes. To make the Verified Certificate more useful (that is, more impressive to potential employers), Coursera takes measures to guarantee you did the work yourself, but these measures seem fairly easy to circumvent. Specializations add an additional sweetener: if you take every class in the specialization on the Signature Track, you can then take the capstone project (offered as an additional class), which is not available to students who take the courses for free (or even, for that matter, to students who pay for only some of the courses). Students completing the specialization also receive a specialization certificate.

The Data Science specialization includes 10 short (four-week) classes, including the capstone, each priced at $49 for the Signature Track. If you stump up the whole $490 at once, you can take any of the courses as many times as you like over the next two years (in case you don't pass the first time); if you pay for the courses one at a time, you can only retake each one once (which is probably enough—honestly, if you can't pass one of these classes, you probably don't belong in the profession, but sometimes life gets busy, and you can't finish the work for a class). Each of the first nine courses will be offered once a month; the first six are available already, and the remaining three will be offered for the first time in June. For a couple of the classes, there's also an option to substitute an alternate course on Coursera. The capstone has yet to be schedule (word in the forums has it that it'll be offered in fall), and I'm not sure how often Coursera plans to offer it.

The course that really interested me was Getting and Cleaning Data, but I signed up for The Data Scientist's Toolbox because it's required for the rest of the specialization; R Programming is also required, but I already had some experience with R, and I had no intention of completing the entire specialization, and so I skipped this one. Taking one of the later courses at the same time as the required intro course didn't pose any difficulties for me, but I think that someone who has no experience with R would probably want to complete that class before tackling any of the others.

Much of The Data Scientist's Toolbox is devoted to introducing the topics of the specialization's other eight courses; frankly, you can skip this if you don't intend to take those courses (or possibly, even if you do intend to take them—you will, after all, cover that information later, though if you're taking the whole specialization, you may need to watch the video lectures in question in order to complete the quiz for Week 1). For me, the most useful content of this class was its introductions to Git, GitHub, and RStudio (I had been using the plain old R Console, and RStudio makes things considerably easier). RStudio is required for the programming necessary to complete the assignments in the later courses, and Git and GitHub are necessary to complete the projects at the end of each course (you have to upload your work to GitHub so that other students can peform peer assessments on it). For the sake of full disclosure, let me say that I skipped the introductory lectures in Week 1 of this course (though I did pass all the quizzes), and did not complete the course project, which consisted of taking screenshots to prove that you'd installed Git, GitHub, and RStudio (I installed all three, but I wasn't really concerned with getting the course certificate).

I found Getting and Cleaning Data invaluable. I took the course because I wanted to learn how to get data off the web. For example, in the project I did for Coursera's Social Network Analysis last year, I ended up saving data from several hundred web pages by hand, which is not a particularly efficient way of doing things. Getting and Cleaning Data promises to teach students how to extract data from common data storage formats (including databases, specifically SQL, XML, JSON, and HDF5), and from the web using API's and web scraping. The syllabus also includes tips on using R to clean and recode data, and, in the last lecture, a long list of links to sources of data. It's also worth noting that the style of the video lectures is a bit different from those of other classses I've taken: there's never any video of the instructor, just the instructor's voice over the lecture notes.

Initially, I was skeptical, because most of the lectures amount to little more than a list of R packages, functions (with a few short examples), and links for further information. The information blows past you so fast that there's no hope of remembering much of it. However, the lecture notes (in both HTML5 and PDF—the HMTL5 is a little awkward to navigate, but the links work, unlike in the PDF) provide a wonderful resource that you'll find yourself referring to again and again. I've often found that the hardest part of a project is knowing where to start, and the lectures in Getting and Cleaning Data point you in the right direction; in fact, I'm using information from the lectures on web-scraping and JSON right now to do an updated version of my project for Social Network Analysis, a statistically informed visualization of which cards in the game Android: Netrunner appear together in the decks designed by players. Look for that to be posted here soon!

Among the data science courses that I've taken online, Getting and Cleaning Data is the first one that taught me how to go out and get data and then put it in a form that's usable for analysis. By contrast, Coursera's Machine Learning, taught by Stanford's Andrew Ng, provides highly practical advice on selecting and using algorithms, but does so uses very much canned programming exercises, in which the data has already been collected and processed. In fact, the two course are highly complementary, at least inasmuch as they give you ideas about how to handle different stages of a data science project. It should though be noted that Machine Learning uses Octave (essentially the open-source version of MATLAB) rather than R; the Data Science specialization includes its own (much shorter) Practical Machine Learning course, as well as an earlier course on Regression Models that delves far more deeply into that topic than does Machine Learning.

I should add that, for this class too, I never completed the final project: it looks like a highly practical exercise, but I was short on time, and more interested in my own project; again, I didn't care much about earning a certificate, with my main concern being to learn the nuts and bolts of getting data from the web.

Finally, let me offer a few comments on the Data Science specialization as a whole. I would not recommend completing the entire specialization for anyone who's well-versed in statistics and the scientfic method: if you're a competent social scientist (as opposed to someone who took one stats course as an undergraduate), you already understand important issues like sampling, causal inference, and reproducibility (though, admittedly, I've read more than a few articles by social scientists who evidently had shaky grasps on these concepts). For a specialization that labels itself as "Data Science", there's also scant coverage of databases. That being said, anyone interested in data science might find Getting and Cleaning Data, R Programming, and Practical Machine Learning useful, and for someone who doesn't have a background as a quantitative researcher, I can't recommend this specialization's focus on the scientific method and applied statistics highly enough.

Why Data Science Needs Statistics

If you've read my earlier posts about why a scientific approach is important to data science, you won't find it surprising that I recommend Jeff Leek's recent post on the Simply Statistics blog, "Why Big Data Is in Trouble: They Forgot about Applied Statistics". Leek, a biostatistics professor at Johns Hopkins, and one of the instructors in Coursera's Data Science specialization, argues that a number of recent big data failures, including that of Google Flu Trends, can be chalked up to a lack of statistical knowledge among the researchers in question. Leek cites sampling, data collection, causal logic, model specification, and sensitivity analysis as areas where a solid knowledge of applied statistics could have prevented serious errors. It's a short but cogent read.

Friday, March 14, 2014

Why Scientists Make Better Data Scientists

Have a look at this blog post by Mike Walker on why it's useful for data scientists to have a scientific background. The link came to me in a list of "featured articles" I receive weekly from Data Science Central. The tl;dr is that analysts without scientific training (the author singles out those with undegraduate business degrees) lack the tools for distinguishing correlation from causation. This leads to a range of maladies, including spurious correlations, cherry-picking data, and stringing "disconnected facts" together to construct a fallacious narrative. Walker acknowledges that not all successful analysis requires starting out with a hypothesis, but stresses that there are scientifically rigorous ways to explore data for unexpected relationships, such as A/B testing.

I find this refreshing, after spending a great deal of time lately looking at job ads for data scientists: most ads focus on experience with specific software packages, rather than experience conducting rigorous research. I suppose the former is more of an objective measure than the latter, but I'm not sure how useful it is to hire based on what applications a person has used before, especially in a profession where the start of the art changes rapidly. Another problem is that people have started slapping the word "data scientist" on a wide variety of jobs: I've seen it applied frequently to database architect positions, or even to positions that have more to do with software development than data analysis.

At the moment, all of this matters to me because the contract on which I was working ended last December, and I'm now looking for a job again. I've had two good interviews, but I'm finding it very hard to break into a profession with a background different from traditional data analysts and business analaysts. One thing I have learned is the power of networking: one of my interviews came from a contact my wife made while carpooling, and the other resulted from my submitting a resume to a small-business group recommended by a former co-worker. (Oh, and if anyone has any good job leads, I'm happy to network here, too. :)


Those of you who frequently visit my links page might have noticed that I've updated it quite a bit over the past few weeks, particularly in the sections covering online courses ("Self-teaching Resources" and "Formal Learning Resources"). Coursera and Udacity have some interesting new offerings that you might want to check out. I'm also planning to add a section listing portals and other commercial websites, and I need to go through all the links to make sure the information on them is up to date. As ever, if you have any suggestions for additional resources, please let me know!

Tuesday, October 8, 2013

Online Course Reviews: Coursera's Machine Learning and Probabilistic Graphical Models

Whoops, I haven't posted in a while.

In May, I started a new job. It has nothing to do with data science, but it has given me experience in supervising other writers, and it's also kept me quite busy. The fact that work kept me busy explains why I haven't posted recently. It also explains the one caveat I have to add to the reviews I'm about to give you: I was never able to finish all the material for either course. I got busy with the new job and moving my family into a new house, and by the time I came up for air, it was too late to finish.

As I mentioned in my April post, I signed up for Coursera's Probabilistic Graphical Models, Machine Learning, and An Introduction to Interactive Programming in Python. I dropped the An Introduction to Interactive Programming in Python almost immediately, after realizing that the course's focus on programming video games made it not as useful for my purposes as I had hoped.

Probabilistic Graphical Models was taught by Stanford Professor and Coursera co-founder Daphne Koller. Coursera hasn't yet listed a new iteration of it, but if the previous pattern holds up, it should be offered again next year. As I mentioned before, I took this course because it includes Bayesian and Markov models, both of which show up in many job ads for data scientists. I decided not to take the optional programming track, figuring that it probably wasn't a good idea to be writing programs for two different courses in a language I was just learning (both this course and Machine Learning use MATLAB and/or the very similar Octave).

Machine Learning was taught by Andrew Ng, also a Stanford professor and Coursera co-founder, and is one of Coursera's best-known and most popular courses. It's also been taught by the University of Washington's Pedro Domingos, but Ng's version will be offered again starting October 14th. I signed up for the course because machine learning is one of the basic skills of data science, but I also wanted the chance to learn one of the most commonly used statistical programming languages, MATLAB/Octave.

As I said in my last post, Daphne Koller is not the most charismatic lecturer, and her explanations can be confusing. What I didn't say last time is that I don't think Koller entirely understands the medium in which she's working. In the classroom, asking questions of the professor can make up for a confusing lecture; Koller seems to be giving the same lecture she would give in the classroom, but without the opportunity to stop her and ask questions about each topic before moving on to the next, that same lecture doesn't work very well.

While the lectures are less than ideal, the quizzes are particularly troubling: rather than presenting a simple test of the material covered in the lecture, the quizzes ask students to move beyond the lecture material, drawing out implications on their own. Asking students to do this is a great pedagogical technique, especially in a graduate-level class. However, it works a lot better when the students have discussed the material in class, giving them the opportunity to start down that path together, with the professor's guidance. None of this is possible in an online class, and, even with discussion forums, rules that prevent students from providing answers to one another prevent full exploration of the quiz topics; part of the problem is that students can see the quiz questions before beginning their discussion, rather than receiving a quiz or homework assignment only after the classroom discussion is over.

It might help to begin each quiz with more straightforward questions, giving students a little practice, before moving on the ones that require additional thinking. Far from adopting this model, Koller actually exacerbates the problem by adopting an unusually strict rule (by MOOC standards) for retaking quizzes: any attempt after the second is penalized. Because of this, I found myself taking quizzes I had no way to prepare for, because they introduced concepts for the first time, and I had no way to practice applying those concepts beforehand.

I want to stress here that I'm not simply some idiot who was in over my head. I'm trained in statistics, and I have experience using structural equation and time series models, both of which share similarities with probabilistic graphical models—and I was really interested in the course material. Koller acknowledges in her lectures that the course is challenging, and even seems to take pride in that fact. However, while the material is indeed challenging, the course is hard partly because it's badly taught. It's also possible that Koller is trying to cover too much material for the online format—the lack of classroom discussion not only makes individual topics more difficult, but increases the time required to cover each topic, since the teacher has to provide a much more detailed lecture, rather than relying on student questions to fill in holes.

While I didn't pursue the programming track, other reviewers have complained that they spent more time trying to figure out how to read in the data than they did conducting the analysis. Mind you, this is a problem that data scientists face in the real world, and so the criticism might not be completely fair.

Andrew Ng's Machine Learning is another beast altogether. Ng is in fact a charismatic, and very clear, lecturer; indeed, Koller uses a couple of his lectures in areas where the material in the two courses overlaps. Not only does Ng convey his topics clearly, but he stresses the practical aspects of the methods he's teaching, and provides useful tips about how to apply them in the real world. While Ng pulls students along at pace much gentler than Koller's, he's still able to teach methods that, he insists, are advanced enough to be unfamiliar to many practicing data scientists. I should add that the automated system used to grade programming assignments works quite well. If I do have one criticism, it's that the programming assignments probably involve a little more copying and pasting than might be ideal for learning Octave, but then, copying and pasting isn't uncommon in real programming.

In short, this is a very good course, and I strongly recommend signing up for the session that starts October 14th. Now that things have calmed down a bit for me, I might even sign up for it myself.

Wednesday, April 24, 2013

Online Course Reviews: Coursera's Social Network Analysis and Foundations of Business Strategy—Plus New Courses to Check Out

I've recently completed Foundations of Business Strategy, taught by the University of Virginia's Michael J. Lenox, and I've submitted the final project for Social Network Analysis, taught by the University of Michigan's Lada Adamic, and I'd like to share some comments on both of these oferrings from Coursera, as well as give readers a heads-up to other courses that have just started.

Social Network Analysis provided a good survey of the methods and applications in the field, covering random networks, measures of centrality, small world networks (and other topics related to the question of optimization), and the dynamic aspects of networks, such as contagion and opinion formation. Adamic's explanations were usually clear, and even a student with little knowledge of probability could have gotten the gist of most of the course material (and made use of Gephi to perform basic analysis), but equations were presented for those who wanted them, and the readings gave further detail. In fact, this is the only course I've had so far that made extensive use of academic journal articles (and a few written for a wider audience), some of them required and some recommended—they gave a much better impression of the history of social network analysis and the current state of the art than Professor Adamic could have given by herself. From a personal perspective, this topic particularly interests me because I can see how social network analysis might be applied to the study of ethnic politics, my previous area of research.

The course's only weakness lay in the (optional) programming track: the first three programming assignments, two in R and one in NetLogo, were largely exercises in copy-and-paste, rather than posing full-fledged coding tasks; they were, however, enough to give students basic familiarity with the two programming languages, and with the igraph package for R. In contrast to these "canned" assignments, the final project was almost completely unstructured, and while this provided welcome freedom to explore whatever topic a student wished, it also meant a steep learning curve for someone whose experience with R or NetLogo was limited to the earlier exercises.

Compared to the other courses I've taken, Foundations of Business Strategy proved much less time-consuming, with required readings limited to very short chapters from a forthcoming book by Lenox (and when I say "short", I mean it's more a pamphlet than a book, with chapters only a few pages long), and a business case each week. The professor encouraged discussion of each case, both in small groups and in the discussion forum, and each week recorded a debriefing that made reference to students' comments in the forum; however, the only assignments that needed to ber turned in before the final project were (relatively easy) weekly quizzes that covered the lecture topics. Quizzes, hence the lectures the quizzes covered, could be completed at any time during the course, though completing the lectures late meant that a student had no chance to participate in the discussios of the associated cases. The final project was a 1500-word "executive summary" of a strategic analysis of an organization of the each student's choice.

Despite the sparseness of the course material, the class provided a useful framework for business strategy, a framework—and this is the part that surprised and impressed me, after all the scurrilous rumors I've heard about business schools, and the weak business students I've taught in my own classes—that was solidly grounded in microeconomics, with no mention at all made of the latest management fads. No, someone who took this course isn't guaranteed to become a strategic genius, or even, necessarily, an effective strategic thinker, but that's because strategy requires making decisions in an environment that's inherently complex and ambiguous, the upshot of which is that giving students a good framework for organizing thought—and a chance to practice strategic thinking on real cases—is about the best that a teacher can do.

My one serious concern with the course was the rubric used for peer review of the final project: the assignment presented a set of criteria for grading that focused largely on the quality of analysis and writing, and specifically warned against trying to use all the methods of analysis that had been covered in the course; by contrast, the rubric that was actually used for peer grading of the assignment amounted to a checklist of topics in the course, and penalized students for not covering each one, while leaving little room for judging the quality of the report. As a former professor, I certainly understand that detailed rubrics, while beloved by students, tend to push attention in grading towards mechanical aspects of the assignment, and this probably goes double for peer assessment, but it is possible to create rubrics that give more or less clear guidance for making qualitative judgments. More importantly, whatever rubric is used needs to match the criteria that are spelled out in the original assignment.

New Courses to Check Out

I've signed up for three courses that have just started, though odds are I'll need to withdraw from one of them, due to time constraints. I stumbled upon Probabilistic Graphical Models—the term "graphical" didn't suggest anything I was interested in, but a look at the course description revealed that probabalistic graphical models (PGM's) includes Bayesian and Markov networks, both of which feature in decision-making and machine learning, and both of which show up repatedly in job ads for data scientists. Coursera co-founder Daphne Koller, of Stanford University, lacks the charisma and clear explanations of the three MOOC professors I've previously learned from; she also comes off (if I may be subjective here) as a bit pretentious, an impression that makes her inclusion of the Simpsons' family tree as an example of a genealogical network more cringeworthy than cool. I've found most of the material so far to be readily understandable, but I'm guessing it wouldn't be for someone without my background in statistics (especially since I've used by structural equation and time series models in my research, and these feature many of the same concepts found in PGM's). This is a graduate-level course, and Koller herself describes it as challenging even by those standards. Needless to say, the other side of that coin is that anyone who gets through the course will have a solid foundation in PGM's, and also, for those taking the programming track, knowledge of Octave and/or MATLAB (the two are close relatives), especially given weekly programming assignments in a 11-week course.

I've also just started An Introduction to Interactive Programming in Python, taught by multiple instructors from Rice University, and Machine Learning, this iteration taught by Coursera co-founder Andrew Ng of Stanford, one of two professors on Coursera to teach this course; like Probabilistic Graphical Models, Machine Learning makes use of Octave. I doubt however that I'll have time for all three courses, meaning that I'll likely have to withdraw from one of them, which will probably be Probabilistic Graphical Models or Machine Learning, given their overlap in programming language, and, to a lesser extent, subject matter.

Thursday, March 14, 2013

Online Courses to Check Out

Having nearly completed Stanford Professor Jennifer Widom's Introduction to Databases, I've recently begun two more massive open online courses (MOOC's) that you, the reader, might want to take a look at, both of them offered by Coursera.

The first course is Social Network Analysis, taught by Lada Adamic of the University of Michigan. This methodology, which can be applied to topics as divergent as infrastructure and epedemiology (as well as the more obvious targets such as Facebook), obviously plays a prominent role in data science, which is one reason to take the course. A second reason is that the course features an optional programming track with four assignments (including a peer-graded final project), some using NetLogo and some using R, and in my case I'm taking the course in part as a way to learn R. The course also makes use of Gephi for basic network analysis. In the second week, there are two versions of the lectures, with an advanced version for students with a background in probability distributions and differential equations; it's not clear if this will be the case in later weeks. This is a nine-week class, and if you're reading this soon after I've posted it, you can still sign up and get full credit, since the first assignment isn't due until Friday night (March 15th).

Taking the advice of one of my contacts to learn something about business, I've also signed up for a non-technical course, Foundations of Business Strategy, taught by Michael J. Lenox of the Unviersity of Virginia. This six-week class features a textbook that Lenox is currently developing, as well as the case method typical of business-school education (Lenox recommends small-group discussion to get the full impact of this method). The most interesting feature of the course is a peer-graded final assignment in which each student writes a short but well-researched strategy memo for a the CEO of a company of his or her choice; more interesting still, Lenox has invited organizations that would like their strategy assessed to join the course and offer themselves as cases for the students' final projecdts. Though we're already about 25% of the way through the course, the assignments all have the same deadline of April 14th, and so it's easy to catch up.

You might also keep a lookout for two courses starting the latter half of April, An Introduction to Interactive Programming in Python, taught by a team from Rice University, and the perennially popular Machine Learning, taught by Coursera co-founder Andrew Ng of Stanford (this one uses Octave, a close relative of MATLAB, for those keeping track of programming languages).