Question

I am trying to import data from an excel file. The data is arranged through several spreadsheets within the file. The data looks like this:

sheet1:

Names  Values  pvalues   
Bread   3      0.001  
Milk    2      0.003  
Eggs    1      0.001

sheet2:

Names  Values  pvalues   
Bread     6    0.002  
Cheese    2    0.003  
Salad    11    0.0001

I would like to obtain this final structure for the dataframe :

Names  Values_sheet1  Values_sheet2  
Bread     3             6  
Milk      2             0  
Eggs      1             0  
Cheese    0             4  
Salad     0            11
  • How can I merge the column 'Names' so that I have all the names once and keep track of the corresponding data in the other columns ?

With the help of the documentation and on other posts I could do the following :

import pandas as pd

input_handle = pd.ExcelFile('file.xls')

#get a dictionnary with all the sheets as keys and their data as values
dfs = {sheet_name : input_handle.parse(sheet_name) for sheet_name in input_handle.sheet_names}

#Keep track of the Sheet names
SheetNames = []
for i in dfs.keys:
    SheetNames.append(i)

#Get new dataframe with merged data from each spreadsheet
New_df = [pd.merge(dfs[name], dfs[name], on='Names') for name in SheetNames]

The last line in my code doesn't work... I managed to get pd.merge work when merging 2 spreadsheets but it only returns the values common in both and discard the rest...

  • Is it possible to keep track of the pvalues for each value as well ?

Thank you very much for any insight or help !

Was it helpful?

Solution

Since it's likely that the values columns will be read as a Series in the DataFrame, here is an example that should help you out: (Note, I have omitted the pvalues column for conciseness).

sheet1 = pd.Series([3,2,1], index=['Bread', 'Milk', 'Eggs'], name='Values_sheet1')
sheet2 = pd.Series([6,2,11], index=['Bread', 'Cheese', 'Salad'], name='Values_sheet2')

# concatenate the two series and fill in the missing data with zeros
result = pd.concat([sheet1,sheet2], axis=1).fillna(0)

print result

Should appear as:

        Values_sheet1  Values_sheet2
Bread               3              6
Cheese              0              2
Eggs                1              0
Milk                2              0
Salad               0             11

Also, in your code example you had this row for your final result set:

Cheese    0             4 

I'm guessing 4 was typo as it's supposed to be 2 as in this row from sheet2:

Cheese    2    0.003 

Hope that helps.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top