سؤال

I am trying to work out an effective date for any given date. The dataframe has a column which is populated by the BMonthEnd (last business day of month taking into account holidays - calculated by code not shown here)

the partial dataframe shown below has the EffectiveDate equal to the Date as 1st step

            Date        BMonthEnd   EffectiveDate
2014-08-24  2014-08-24  2014-08-29  2014-08-24
2014-08-25  2014-08-25  2014-08-29  2014-08-25
2014-08-26  2014-08-26  2014-08-29  2014-08-26
2014-08-27  2014-08-27  2014-08-29  2014-08-27
2014-08-28  2014-08-28  2014-08-29  2014-08-28
2014-08-29  2014-08-29  2014-08-29  2014-08-29
2014-08-30  2014-08-30  2014-08-29  2014-08-30
2014-08-31  2014-08-31  2014-08-29  2014-08-31

I now try to select out the data that need to be changed with:

df[~(df.Date<df.BMonthEnd)].EffectiveDate  # giving the expected slice
# but 
df[~(df.Date<df.BMonthEnd)].EffectiveDate = 1
# gives error

SettingWithCopyWarning: A value is trying to be set on a copy of a slice
from a DataFrame. Try using .loc[row_index,col_indexer] = value instead
self[name] = value

following the warning i tried the alternate method i tried:

df.loc[~(df.Date<df.BMonthEnd)].EffectiveDate = 1

this also gives the same error. (note the 1 used in assignment is just placeholder for another function) and the assignment does not reflect on the original dataframe. I understand that I am effectively assigning to a copy so that it does not change the original dataframe as intended.

How do I however achieve my goal of using the selecting syntax to assign. I really do not want to have to iterate over the dataframe.

هل كانت مفيدة؟

المحلول

Figured it out. Selecting out the Series in the Dataframe effectively allows me to assign to it and the original dataframe. this allows me to use the slicing syntac to apply logic influencing the results:

# not all methods, classes shown
def effective_date(dr):
    df = pd.DataFrame(dr, index=dr, columns=['Date'])
    df['BMonthEnd'] = df.Date.apply(h.last_business_day)
    df['MonthEnd'] = df.Date.apply(h.month_end)
    df['EffectiveDate'] = df.Date
    # df.EffectiveDate[~(df.Date<df.BMonthEnd)] = df.MonthEnd
    df.loc[~(df.Date<df.BMonthEnd),'EffectiveDate'] = df.MonthEnd
    return df.EffectiveDate

Have Updated it with Jeff's suggestion. See now why chain indexing can get you into trouble. Have done a few timeits and they seem to be faster, but when assigning to the dataframe .loc is the better option.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top