Question

Using WinPython 3.4, matplotlib 1.3.1, I'm pulling data for a dataframe from a mysql database. The raw dataframe that I get from the query looks like:

            wafer_number test_type  test_pass  x_coord  y_coord  test_el_id wavelength intensity
        0       HT2731      T2          1       38       54          24      288.68   4413
        1       HT2731      T2          1       40       54          25      257.42   2595
        2       HT2731      T2          1       50       54          28      300.00   2836
        3       HT2731      T2          1       52       54          29      300.00   2862
        4       HT2731      T2          1       54       54          30      300.00   3145
        5       HT2731      T2          1       56       54          31      300.00   2804
        6       HT2731      T2          1       58       54          32      255.69   2803
        7       HT2731      T2          1       59       54          33      257.23   2991
        8       HT2731      T2          1       60       54          34      262.45   3946
        9       HT2731      T2          1       62       54          35      291.84   9398
        10      HT2801      T2          1       38       55          54      288.68   4125
        11      HT2801      T2          1       38       56          55      265.25   4258

What I need is to plot wavelength and intensity on the x and y axes respectively with each different wafer number as it's own series. I need to keep the x_coord and y_coord variables so that I can identify standout data points later ideally by clicking on them and adding them to a list. I'll get that working after I get these things plotted.

I thought that using the built-in dataframes plotting capability requires me to perform a pivot_table method

wl_vs_int = results.pivot_table(values='intensity', rows=['x_coord', 'y_coord','wavelength'], cols='wafer_number')

on my dataframe which then turns the dataframe into:

        wafer_number    HT2478  HT2625  HT2644  HT2671  HT2673  HT2719  HT2731  HT2796  HT2801
 x_coord  y_coord   wavelength                                  
    27      35  289.07   NaN     NaN     NaN     5137    NaN     NaN     NaN     NaN     NaN
            36  250.88   4585    NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
            37  260.90   NaN     NaN     NaN     NaN     4270    NaN     NaN     NaN     NaN
            38  288.87   NaN     NaN     NaN     8191    NaN     NaN     NaN     NaN     NaN
            40  259.74   NaN     NaN     NaN     NaN     17027   NaN     NaN     NaN     NaN
            41  259.74   NaN     NaN     NaN     NaN     18742   NaN     NaN     NaN     NaN
            42  259.74   NaN     NaN     NaN     NaN     34098   NaN     NaN     NaN     NaN
    28      34  268.27   NaN     NaN     NaN     NaN     2080    NaN     NaN     NaN     NaN
            38  257.42   7727    NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
            44  260.13   NaN     NaN     NaN     NaN     55329   NaN     NaN     NaN     NaN

but now the index is a multi-index of the x, y coords and the wavelength so when I just try to print the wl vs columns,

plt.scatter(wl_vs_int.wavelength, wl_vs_int.columns)

I get the AttributeError:

AttributeError: 'DataFrame' object has no attribute 'wavelength'

I've tried to reindex the dataframe back to a default index but that still gives me the results that 'DataFrame' object has no 'wavelength' attribute.

There's got to be a better way to either rearrange the dataframe to make this possible through the built-in dataframe plotting capabilities or to plot only select columns vs other columns (with the columns being dynamic). I'm clearly new to python and pandas but I've spent days of time trying to do this in different ways and with no results. Any help would be greatly appreciated. Thanks.

Était-ce utile?

La solution

To plot wavelength and intensity on the x and y axes respectively with each different wafer number as it's own series, one can group data wrt wafer_number, and then deal with each group

import pandas as pd
from StringIO import StringIO
import matplotlib.pyplot as plt

data = \
"""wafer_number,test_type,test_pass,x_coord,y_coord,test_el_id,wavelength,intensity
HT2731,T2,1,38,54,24,288.68,4413
HT2731,T2,1,40,54,25,257.42,2595
HT2731,T2,1,50,54,28,300.00,2836
HT2731,T2,1,52,54,29,300.00,2862
HT2731,T2,1,54,54,30,300.00,3145
HT2731,T2,1,56,54,31,300.00,2804
HT2731,T2,1,58,54,32,255.69,2803
HT2731,T2,1,59,54,33,257.23,2991
HT2731,T2,1,60,54,34,262.45,3946
HT2731,T2,1,62,54,35,291.84,9398
HT2801,T2,1,38,55,54,288.68,4125
HT2801,T2,1,38,56,55,265.25,4258"""

df = pd.read_csv(StringIO(data),sep = ',')
dfg = df.groupby('wafer_number')

colors = 'bgrcmyk'
fig, ax = plt.subplots()
for i,k in enumerate(dfg.groups.keys()):
    currentGroup = df.loc[dfg.groups[k]]
    color = colors[i % len(colors)]
    ax.plot(currentGroup['wavelength'].values,currentGroup['intensity'].values,\
            ls='', color = color, label = k, marker = 'o', markersize = 8)
legend = ax.legend(loc='upper center', shadow=True)
plt.xlabel('wavelength')
plt.ylabel('intensity')
plt.show()
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top