Sunday, November 25, 2012

Why Become a Data Scientist?

Having decided teaching was not the right career for me (see my previous post), I began a career transition. I submitted a few applications before my professor job ended in May, mostly with the U.S. government (I figured that private-sector employers would want to hire someone immediately, not wait until I was available). In May, I started in earnest. I haven't had a lot of luck—not surprising given the current economy.

As it happens, my main region of study, Eastern and Central Europe, is not terribly fashionable right now, and, while my other specialization, ethnic politics, is pretty trendy, most of the government and private-sector jobs that would make use of my area and subject knowledge require an active security clearance. I've interviewed for a few jobs in those specialties, but none has panned out. I do however have considerable experience with advanced statistical methods, and also survey methods, and most of the jobs I've applied to would make use of those skills.

A couple of months ago, a friend of mine who, judging by the strange links he sends me (most involving cats), gets paid to browse random stuff on the internet, sent me a link to the infamous Harvard Business Review article that declared the data scientist to be "the sexiest job of the 21st century". He thought it was something that I could do. Judging by the clothes my wife makes me wear when we go out dancing, I'm a reasonably hip guy; I read the job description, and I thought, "Yeah, I could do that." Not only that, but working in a new field to tackle novel problems sounded like a whole lot of fun.

I therefore began to evaluate my strengths and weaknesses as a potential data scientist. Other social scientists considering careers as data scientists may share these.

  1. I'm a trained researcher. Davenport and Patil (the authors of the "sexiest job" article) point out how valuable scientific researchers can be in making sense of big data; articles by Press and Miller make the same point. To put it simply, we're very good at figuring out whether X really causes Y, or whether an apparent relationship between the two is either just dumb luck, or the result of the fact that Z actually causes both X and Y (yes, this is a longer version of the old saw that "correlation isn't causation"). This process involves coming up with good hypotheses, and clever ways to test them, using experimental or quasi-experimental methods.
  2. I know advanced statistical techniques. I am not a methods specialist (to me, stats are just a tool for answering interesting questions), but I've done research using structural equation modeling, time series modeling, and analysis of variance (ANOVA), and I've taught multiple regression and survey methods to undergraduates. I've been exposed to a lot more methods, and, most importantly, I the undertand how each method, and the assumptions behind it, can and can't be used to draw causal conclusions (see the above point).
  3. I have a decent technical background. I did coding on the job in college (using FORTRAN), I've used scripting languages for statistical packages (Mplus and gretl) as a researcher, and I've manipulated and cleaned survey databases (using SPSS). I've spent a lot of time with computers, even working as a tech support agent for a while, and, for fun, I've even coded in the obsscure and not-so-practically-useful OOP language MUSHcode (for my overly detailed thoughts on MUSHcode, click here).
  4. I know psychology. Like most survey researchers, I understand the pyschology of asking and responding to questions. Unlike most, I also know a great deal about social identity, which plays an important role in social networks.
  5. I have a great deal of international and intercultural experience. Not only have I conducted research in foreign countries, but the subject of my research has been ethnic politics. Conducting interviews on the topic of of intergroup relations has given me experience dealing with sensitive issues.
  1. I don't know business database applications. Big data may be all about NoSQL, but the assumption is that everyone knows SQL in the first place. I'm studying MySQL right now.
  2. I don't know modern programming languages. As I mentioned above, I've coded using FORTRAN and even an OOP language, and I've used Pascal as well, but I don't know Java, Python, or R (let alone C++). I'm studying Java and R right now, and planning to tackle Python next. I should add that, based on both past experience and talking with friends in IT, I don't think that learning any of these languages will prove at all difficult.
  3. I don't have business experience. I read (or rather, listen to) the business section of The Economist, and, as a lifelong board and computer gamer, I'm a past master of strategy and resource allocation, but I have neither formal training nor work experience in business.
What's next, then? What I have to do now is to learn the things I don't know, and, in the meantime, convince potential employers that what I do know is valuable, and what I don't know I can learn. This blog will chronicle my efforts to do so, and, in the process, I hope to provide good advice to others in the same position.

Click here to see the useful resources I've found for bringing myself up to speed.


  1. We at Coepd declared Data Science Internship Programs (Self sponsored) for professionals who want to have hands on experience. We are providing this program in alliance with IT Companies in COEPD Hyderabad premises. This program is dedicated to our unwavering participants predominantly acknowledging and appreciating the fact that they are on the path of making a career in Data Science discipline. This internship is designed to ensure that in addition to gaining the requisite theoretical knowledge, the readers gain sufficient hands-on practice and practical know-how to master the nitty-gritty of the Data Science profession. More than a training institute, COEPD today stands differentiated as a mission to help you "Build your dream career" - COEPD way.