Question

I am trying to forecast the total attendance (ie. the number of entrances, which is also the number of tickets bought) of a festival just two days after it started. That is, knowing how many people went to the event during the first two days, how to predict the total number of people that will have visited the festival?

I do know it seems difficult at first glance, and that I theoretically could only make pretty bad forecasts, but here's the deal : I have organized more than thirty festivals in the past, and have collected data on each of them. Specifically, on each of these festivals, I have a daily time series where I know:

  • the number of tickets bought daily
  • whether the day was a weekday or during the week-end
  • whether the day was a day of public and school holidays
  • the daily weather

I have observed that all of these time series follow multiple trends. For instance, attendance always is at its best on saturdays, and at its worse on tuesdays... Likewise, there seems to always be more people coming in the very last days of the event than in beginning. These trends are the same for almost all the festivals. When decomposing the time series, I observe close trends, and close seasonal values.

Another thing, which I guess is no good news, is that the events had different timespans : some lasted 4 days, others 5, 6, 7 and even 8 days. Some started on a monday, others on a saturday.

So my questions is: how could I use these time series as a training data to try to forecast total attendance at the event knowing the attendance of the very first days. That is to say, which model could I use to predict total attendance of the event knowing I have all of this data ? I was of course thinking of machine-learning (or deep-learning) since I have a lot of training data, but I'm unsure whether it can be easily implemented in R or Python...

In order to do forecasts, I do of course know, for the on-going festival, how long it will last, whether it is going to take place during public and school holidays or not, whether it is going to span over a week-end, and I have the weather forecasts for each day.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top