Question

I'm not sure if this question is appropriate for this forum, so excuse me if it's not (if not, any suggestions on where might be a better place would be very much appreciated). I'm currently an undergrad in a quantitative field, and for the summer, I've been given an opportunity to do a data project by the company I am working for. I'm not really sure where to start. I had a conversation with one of the business owners today, in order to get a better handle on how the business works, and what kind of data they have. We talked a little bit about what sorts of questions they have, and what sort of things would be nice to know. I guess that seems to be the main question: What questions to ask? My initial thoughts are to first just look at the data via traditional descriptive stats methods (histograms, scatter plots etc....), and maybe that creates some ideas. If anyone has some tips, or even some good links (yes, I have already Googled it quite a bit), I would be grateful. Thanks.

Was it helpful?

Solution

The thing you need to understand as completely as possible is how they expect a data analysis to enable them to achieve their objective. They are a business, so their overall objective is likely related to maximising profit. However, there will be a more immediate objective underneath that heading. To maximise profit you can either reduce costs or increase sales. In turn, to increase sales you can increase the number of customers or increase the amount of sales to each customer etc.

The question then turns on how you can use data science to perform one those objectives.

For example, questions that can almost be answered with data science could be 'how do I better identify potential customers?' or 'how do increase existing custmers' spend?' These are still very high level questions, but they are the sort of questions that you need to have in mind as you start to do your descriptive stats etc.

Bear in mind that this is an iterative process and it is completely normal to start off in a fuzzy sort of area. At this stage it is almost the case that having a question in mind is a McGuffin - it will kick things off, but it may not be the question you end up answering.

The CRISP-DM process is a process that has been built for data mining that discusses how to iteratively use results from analyses and models to increase your understanding of the customer's situation, and hence drive the development of a better business objective for use in a data science project.

OTHER TIPS

I asked a similar question a couple weeks ago and got some good feedback. One of the answers to my question is also from Robert de Graaf, here's the link: Tips for a new data scientist

It seems like the first task is often just collecting the data from various sources and cleaning it. Based on my limited experience, I think that is a good place to start and can actually take quite a bit of time. Once the data is organized, moving on to data visualization/exploration will help you get a feel for the data. Kaggle has some good data visualization tutorials and I will post one if I find it.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top