Friday, July 11, 2014

Programming Languages for Big Data, Part 2

I mentioned the recent study on the relative speeds of programming languages to Tommy Jones, a specialist in natural language processing and fellow member of the Data Community DC, and he, being more industrious than I, dove into the code used by the authors of the paper in question. In their R code, he found gems such as a triple-nested "for" loop inside a "while" loop (instead of the much faster "apply" functions), which made the comparisons pretty useless, at least in the case of R. See Tommy's blog, Biased Estimates, for more details.

Nonetheless, it's a pretty interesting question, and I'd love to see someone who's proficient in all of the languages involved try this test again, using better code. I'm still intrigued by the very high speed of MATLAB/Octave—something that leads Andrew Ng to recommend those languages over R for prototyping—though Tommy pointed out to me that, since R is closer to being a full-featured language, it's more flexible than the former languages.

No comments:

Post a Comment