Question

I'm totally naive to Data Science - that is, the relatively new, somewhat hyped-field that is so popular at the moment. But I'm not naive to data ... as a scientist and researcher I've worked with all sorts in different roles in the past.

Now I'm in the lamentable position of having dug a lot of shallow holes, using different software systems and different types of data, and not really being professionally competent in anything.

My question is, if I want to "get up to speed" with data science, and perhaps leverage the different experiences I've had, how do I approach it? Ideally I'd like to make my research skills marketable - that is, become a data scientist of sorts, but with a greater emphasis on the research/reporting side.

Assuming I'm coming from scratch but have demonstrated capacity - I say that because, for example, I've used R before for some projects but after a break of a year or so I need to relearn it every time... Where do I start; how do I unify all these bits and pieces?

And what claim can I make to work in this field? (I've worked on all sorts of data, from gigabytes of climate data and earth science, to health registers to longitudinal surveys ... but none of it under the moniker of a data scientist).

Specifically, what tool(s) do I learn and what theory do I need to grasp? (Keeping in mind that all my coding and statistical competencies are mostly self-taught.)

Unlike this (fascinating) question, I don't have a business background and don't necessarily want to move towards the business analyst path - I still want to play with psychical (earth science) or social data. Neither do I want to work on the data management side so much - I want databases and coding to be a means, not an end. And finally, I'm not inclined much towards the theory and mathematics. Perhaps the best way to summarise my inclination and position is that I don't want to become a data science expert, but want to be able to become an expert in given subjects through data science.

My inclination is perhaps to concentrate on something like Python, and use it to exploit R and other functionality?

Tools I've used in the past (in order of exposure) -

  • SAS (for statistics and research, not the warehouse side)
  • VBA/VB6/Excel/Access (data manipulation, reporting)
  • GIS (ArcGIS for analysis/research, not database management)
  • R (stats...)
  • Some HTML/JS
  • Some Python

One thing I find is that my existing competencies don't provide me with a useful tool for bringing together different data and getting it into the state I want for analysis (ETL I suppose?), hence the inclination to re-learn Python.

Thanks for your thoughts!

Was it helpful?

Solution

The Udacity Data Analyst Nanodegree gives a very gentle introduction using Python and R for mostly exploratory Data Analysis which I think is what you are looking for. (It's not tailored towards business analysis. The courses are free and you can just take the ones you think are interesting. In your case, I would skip the ones about data visualization with javascript and data wrangling with MongoDB.) THey provide a bunch of resources to go from there. Of course there are plenty of other online courses.

If you are more into books you could check out Tukey's Exploratory Data Analysis. A classic that is still highly relevant.

In terms of tools, the most commonly used are R and Python and I would focus on them for now.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top