This is the perfect example for a situation where itertools.groupby is your best friend!
Please forgive me for not expanding on your answer, but I'm not too familiar with pandas, so I opted to use the csv module.
By writing two methods for grouping the data(get_season
and get_year
), it's only a matter of iterating over the groups, and write the data to a new csv file.
import csv
from datetime import datetime
from itertools import groupby
LOOKUP_SEASON = {
11: 'Winter',
12: 'Winter',
1: 'Winter',
2: 'Spring',
3: 'Spring',
4: 'Spring',
5: 'Summer',
6: 'Summer',
7: 'Summer',
8: 'Autumn',
9: 'Autumn',
10: 'Autumn'
}
def get_season(row):
date = datetime.strptime(row[0], '%d/%m/%Y')
season = LOOKUP_SEASON[date.month]
if season == 'Winter':
if date.month == 1:
last_year, next_year = date.year - 1, date.year
else:
last_year, next_year = date.year, date.year + 1
return '{} {}/{}'.format(season, last_year, next_year)
else:
return '{} {}'.format(season, date.year)
def get_year(row):
date = datetime.strptime(row[0], '%d/%m/%Y')
if date.month < 8:
return date.year - 1
else:
return date.year
with open('NJDATA.csv') as data_file, open('outfile.csv', 'wb') as out_file:
headers = next(data_file)
reader = csv.reader(data_file)
writer = csv.writer(out_file)
# Loop over groups distinguished by the "year" from Autumn->Summer,
# defined by the `get_year` function
for year, seasons in groupby(reader, get_year):
mean_data = []
# Loop over the data in the current year, grouped by season, defined
# by the get_season method. Since the required "season string"
# (e.g Autumn 1952) can be used as an identifier for the seasons,
# the `get_season` method returns the specific string which is used
# in the output, so you don't have to compile that one more time
# inside the for loops
for season_str, iter_data in groupby(seasons, get_season):
data = list(iter_data)
mean = sum([float(row[1]) for row in data]) / len(data)
# Use the next line instead if you want to control the precision
#mean = '{:.3f}'.format(sum([float(row[1]) for row in data]) / len(data))
mean_data.extend([season_str, mean])
writer.writerow(mean_data)
The basic idea here is to first group your data based on the year (Autumn -> Summer), and then group that data again by the season. The groupby
function accepts two arguments; one sequence and one function. It iterates over the sequence, and whenever the returned value of the provided function changes, the preceding data is considered as a distinct group.
Consider this sample data:
01/01/1951,1
02/01/1951,-0.13161201
01/04/1951,1
02/04/1951,-0.13161201
03/04/1951,-0.271796132
04/06/1951,-0.258977158
05/06/1951,-0.198823057
06/08/1951,0.167794502
...
09/02/1952,-0.121824587
The first groupby
call groups the data based on your year-definition (defined in get_year
), giving the following groups of data:
# get_year returns 1950
01/01/1951,1
...
05/06/1951,-0.198823057
# get_year returns 1951
06/08/1951,0.167794502
...
09/02/1952,-0.121824587
The next groupby
method groups each of the above groups based on the season (defined in get_season
). Lets consider the first group:
# get_season returns 'Winter 1950/1951'
01/01/1951,1
02/01/1951,-0.13161201
# get_season returns 'Spring 1951'
01/04/1951,1
02/04/1951,-0.13161201
03/04/1951,-0.271796132
# get_season returns 'Summer 1951'
04/06/1951,-0.258977158
05/06/1951,-0.198823057