It's floating point error. Because the s_act
column has a missing value (pandas doesn't have integer missing values), it reads in s_act
with dtype=float (dtypes are defined at the column level in pandas). So you're basically see the following:
>>> x = 4321113141090630389
>>> float(x)
4.32111314109063e+18
>>> int(float(x))
4321113141090630144
In terms of a solution you could change the dtype of s_act
to a string when you read it in (the resulting dtype will be oject). For, example:
data = """
id,val,x
1,4321113141090630389,4
2,,5
3,200,4
"""
df = pd.read_csv(StringIO(data),header=True,dtype={'val':str})
print df
id val x
0 1 4321113141090630389 4
1 2 NaN 5
2 3 200 4
print df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3 entries, 0 to 2
Data columns (total 3 columns):
id 3 non-null int64
val 2 non-null object
x 3 non-null int64
df['val'] = df['val'].fillna(0).astype(int)
print df
id val x
0 1 4321113141090630389 4
1 2 0 5
2 3 200 4