Pandas Pivot Table Display in ReportLab

https://stackoverflow.com/questions/18724778

28-06-2022
|

Pergunta

I am trying to plot the output from a Pandas pivot_table in Reportlab following on a pattern found at https://stackoverflow.com/a/17652442/2478647.

import pandas as pd
from reportlab.pdfgen import canvas
from reportlab.platypus import SimpleDocTemplate, Table, Paragraph
from reportlab.lib import colors
from reportlab.lib.pagesizes import letter, legal, portrait, landscape
from reportlab.lib.styles import getSampleStyleSheet

df = pd.DataFrame(randn(8, 2), columns=['var A', 'var B'])
df['year'] = ['2013','2013','2013','2013','2014','2014','2014','2014']
df['run'] = ['base','base','option','option','base','base','option','option']
df['id'] = [1,2,1,2,1,2,1,2]

df.pivoted = pd.pivot_table(df, values=['var A','var B'], rows=['id'], cols=['year','run'], aggfunc='sum')

doc = SimpleDocTemplate('temp.pdf', pagesize=landscape(letter), showBoundary=0, 
                            topMargin=72*.75,
                            bottomMargin=72*1,
                            leftMargin=72*.5,
                            rightMargin=72*.5)

lista = [df.pivoted.columns[:,].values.astype(str).tolist()] + df.pivoted.values.tolist()

elements = []
table = Table(lista, repeatRows=3) # repeat the header rows
elements.append(table)    
doc.build(elements)

I get this error at the 'lista = ...' line because of the multiple column labels:

ValueError: cannot set an array element with a sequence

How can I structure the code so that the pivot_table columns will play nice with reportlab? Alternately, do you have any suggestions for a different approach to writing PDF reports with pivot_table output?

EDIT: I get pretty close with this modification but still don't retain the y-axis labels

lista = map(list, zip(*df.pivoted.columns.values)) + df.pivoted.values.tolist()

Solução

This function gets pretty close -- it returns a list for input to reportlab Table and the # of table header rows to repeat. For some reason, it isn't working well with simple tables--those with only one header row.

def prepare_df_for_reportlab(df):
    df2 = df.reset_index() # reset the index so row labels show up in the reportlab table
    n = df2.columns.nlevels # number of table header rows to repeat
    if n > 1:
        labels = map(list, zip(*df2.columns.values))
    else:
        labels = [df2.columns[:,].values.astype(str).tolist()]
    values = df2.values.tolist()
    datalist = labels + values
    return datalist, n

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow