I came up with I solution, but I'd really appreciate suggestions on alternate ways of doing this or perhaps a more efficient way of performing my solution.
I first used the pandas
shift
method to add shifted lon/lat columns (inspired by this SO question), so I could perform the calculations over a single row.
Then I used the pandas apply
method (as was suggested here) to implement the pyproj.Geod.inv
calculation, looping through slices of the pandas
DataFrame
for each individual in the population.
def calc_distspd(df):
'''Broadcast pyproj distance calculation over pandas dataframe'''
import pyproj
import numpy as np
def calcdist(x):
'''Pandas broadcast function for pyproj distance calculations'''
return g.inv(x['lons+1'], x['lats+1'], x['lons'], x['lats'])[2]
# Define Earth ellipsoid for dist calculations
g = pyproj.Geod(ellps='WGS84')
# Create array of zeros to initialize new columns
fill_data = np.zeros(df['date'].shape)
# Create new columns for calculated vales
df['dist'] = fill_data
df['sog'] = fill_data
df['lons+1'] = fill_data
df['lats+1'] = fill_data
# Get list of unique animal_ids
animal_ids = np.unique(df.animal_id.values)
# Peform function broadcast for each individual
for animal_id in animal_ids:
idx = df['animal_id']==animal_id
# Add shifted position columns for dist calculations
df['lons+1'] = df['lons'].shift(1) # lon+1 = origin position
df['lats+1'] = df['lats'].shift(1) # lat+1 = origin position
# Copy 1st position over shifted column nans to prevent error
idx2 = (idx) & (np.isnan(df[lons+1]))
df['lons+1'][idx2] = df['lons'][idx2]
df['lats+1'][idx2] = df['lats'][idx2]
df['dist'][idx] = df[idx].apply(calcdist, axis=1)
df['sog'][idx] = df['dist']/24. # Calc hourly speed
# Remove shifted position columns from df
del df['lons+1']
del df['lats+1']
return df