Question

How to make a boxplot where each row in my dataframe object is a box in the plot?

I have some stock data that I want to plot with a box plot. My data is from yahoo finance and includes Open, High, Low, Close, Adjusted Close and Volume data for each trading day. I want to plot a box plot where each box is 1 day of OHLC price action.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas.io.data import DataReader

# get daily stock price data from yahoo finance for S&P500
SP = DataReader("^GSPC", "yahoo") 

SP.head()
             Open        High        Low         Close       Volume          Adj Close
Date                        
2010-01-04   1116.56     1133.87     1116.56     1132.99     3991400000      1132.99
2010-01-05   1132.66     1136.63     1129.66     1136.52     2491020000      1136.52
2010-01-06   1135.71     1139.19     1133.95     1137.14     4972660000      1137.14
2010-01-07   1136.27     1142.46     1131.32     1141.69     5270680000      1141.69
2010-01-08   1140.52     1145.39     1136.22     1144.98     4389590000      1144.98

plt.figure()
bp = SP.boxplot()

But when I plot this data frame as a boxplot, I only get one box with the Open, High, Low, and Close values of the entire Volume column.

Likewise, I try re-sampling my Adjusted Close daily price data to get weekly OHLC:

close = SP['Adj Close']
wk = close.resample('W', how='ohlc')
wk.head()

             open        high        low         close
Date                
2010-01-10   1132.99     1144.98     1132.99     1144.98
2010-01-17   1146.98     1148.46     1136.03     1136.03
2010-01-24   1150.23     1150.23     1091.76     1091.76
2010-01-31   1096.78     1097.50     1073.87     1073.87
2010-02-07   1089.19     1103.32     1063.11     1066.19

This yields a Box Plot with 4 Boxes. Each box is the range of each column, not row. So for example, the first Box, 'open', shows the Open, Close, High and Low of the entire 'open' Column.

But what I actually want is 1 box for each 'Date' (index or row of my DataFrame). So the first Box will show the OHLC of the first row, '2010-01-10'. Second box will be the second row ('2010-01-17').

What I really want though is each row in my original Daily data (SP DataFrame) is its own OHLC Box. Essentially I want daily candlesticks, generated as a boxplot().

                 Open        High        Low         Close     
    Date                        
    2010-01-04   1116.56     1133.87     1116.56     1132.99

How do I do this using the Pandas DataFrame and Matplotlib boxplot()? I just want a basic boxplot plot where each row from the DataFrame is a OHLC box in the plot. Nothing fancy at this point. Thanks!

Was it helpful?

Solution

As I said in the comments, you don't really want boxplots. Instead you should be making a candlestick chart. Here's some code to get you started.

import numpy as np
import pandas
import matplotlib.pyplot as plt
from matplotlib.finance import candlestick, candlestick2
import matplotlib.dates as mdates
from pandas.io.data import DataReader

# get daily stock price data from yahoo finance for S&P500
SP = DataReader("^GSPC", "yahoo")
SP.reset_index(inplace=True)
print(SP.columns)
SP['Date2'] = SP['Date'].apply(lambda date: mdates.date2num(date.to_pydatetime()))
fig, ax = plt.subplots()
csticks = candlestick(ax, SP[['Date2', 'Open', 'Close', 'High', 'Low']].values)
plt.show()
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top