Question

In recent years, the term "data" seems to have become a term widely used without specific definition. Everyone seems to use the phrase. Even people as technology-impaired as my grandparents use the term and seem to understand words like "data breach." But I don't understand what makes "data science" a new discipline. Data has been the foundation of science for centuries. Without data, there would be no Mendel, no Schrödinger, etc. You can't have science without interpreting and analyzing data.

But clearly it means something. Everyone is talking about it. So what exactly do people mean by data when they use terms like "big data" and why has this become a discipline in itself? Also, if it is an emerging discipline, where can I find more serious/in-depth information so I can better educate myself?

Thanks!

Was it helpful?

Solution

I get asked this question all the time, so earlier this year I wrote an article (What is Data Science?) based on a presentation I've given a few times. Here's the gist...

First, a few definitions of data science offered by others:

Josh Wills from Cloudera says a data scientist is someone "who is better at statistics than any software engineer and better at software engineering than any statistician."

A frequently-heard joke is that a "Data Scientist" is a Data Analyst who lives in California.

According to Big Data Borat, Data Science is statistics on a Mac.

In Drew Conway's famous Data Science Venn Diagram, it's the intersection of Hacking Skills, Math & Statistics Knowledge, and Substantive Expertise.

Here's another good definition I found on the ITProPortal blog:

"A data scientist is someone who understands the domains of programming, machine learning, data mining, statistics, and hacking"

Here's how we define Data Science at Altamira (my current employer):

data science diagram

The bottom four rows are the table stakes -- the cost of admission just to play the game. These are foundational skills that all aspiring data scientists must obtain. Every data scientist must be a competent programmer. He or she must also have a solid grasp of math, statistics, and analytic methodology. Data science and "big data" go hand-in-hand, so all data scientists need to be familiar with frameworks for distributed computing. Finally, data scientists must have a basic understanding of the domains in which they operate, as well as excellent communications skills and the ability to tell a good story with data.

With these basics covered, the next step is to develop deep expertise in one or more of the vertical areas. "Data Science" is really an umbrella term for a collection of interrelated techniques and approaches taken from a variety of disciplines, including mathematics, statistics, computer science, and software engineering. The goal of these diverse methods is to extract actionable intelligence from data of all kinds, enabling clients to make better data-driven decisions. No one person can ever possibly master all aspects of data science; doing so would require multiple lifetimes of training and experience. The best data scientists are therefore "T-shaped" individuals -- that is, they possess a breadth of knowledge across all areas of data science, along with deep expertise in at least one. Accordingly, the best data science teams bring together a set of individuals with complementary skillsets spanning the entire spectrum.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top