Skip to content Skip to sidebar Skip to footer

Is There A More Efficient Way To Convert Periodicity Of An Intraday Ohlc Dataframe In Python

In Python, using the pandas library I would like to convert my minutely data into daily data. After loading in the data (from a csv) and making a DatetimeIndex as the index the obj

Solution 1:

In master/0.13 (releasing very shortly), you can do this (in 0.12 this is a bit more manual to do, as you have to do it on the series individually)

In [7]: df = DataFrame(np.random.randn(10000,2),index=date_range('20130101 09:00:00',periods=10000,freq='1Min'),columns=['last','volume'])

In [8]: df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 10000 entries, 2013-01-0109:00:00to2013-01-0807:39:00
Freq: T
Data columns (total 2 columns):
last10000  non-nullvalues
volume    10000  non-nullvalues
dtypes: float64(2)
In [9]: df.resample('1D',how='ohlc')
Out[9]: 
                last                                  volume                              
                open      high       low     closeopen      high       low     close2013-01-010.8019823.343166-3.203291-0.3615020.2553562.723863-3.3194141.0733762013-01-020.1016873.378843-3.219792-1.1219001.2260994.103099-3.463014-0.4525942013-01-03-0.0518064.290010-4.099700-0.6373210.7131893.622728-3.236652-0.1044582013-01-040.8212153.058024-3.907862-1.5954490.8362342.821551-3.191774-0.3996032013-01-050.0849733.458210-3.1914551.426380-0.4024352.777447-2.9661651.2273982013-01-06-0.6699223.232865-3.9022371.846017-0.4400553.088109-3.7106403.0667252013-01-07-0.1227273.300163-3.3155011.7181631.0850663.373251-4.0296790.1878282013-01-080.3117853.073488-3.013702-0.627721-0.5022582.795292-2.772738-0.654676

[8rows x 8 columns]

This will work in 0.12

pd.concat(dict([ (k,df[k].resample('1D',how='ohlc')) for k in df.columns ]),axis=1)

Solution 2:

I'm very new to pandas and python but I came up with this which allows conversion to any time period

In my example, minData is minute data, stored in a flat format without any commas. My data came from quantquote.com

columnHeadings = ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume', 'Split Factor', 'Earnings', 'Dividends']

minData = pd.read_csv(
    filename,
    header = None,
    names = columnHeadings, 
    parse_dates = [["Date", "Time"]],
    date_parser = lambda x: datetime.datetime.strptime(x, '%Y%m%d %H%M'), 
    index_col = "Date_Time",
    sep=' ')

xx = minData.to_period(freq="min")

openCol = DataFrame(xx.Open)
openCol2 = openCol.resample("M", how = 'first')

highCol = DataFrame(xx.High)
highCol2 = highCol.resample("M", how = 'max')

lowCol = DataFrame(xx.Low)
lowCol2 = lowCol.resample("M", how = 'min')

closeCol = DataFrame(xx.Close)
closeCol2 = closeCol.resample("M", how = 'last')

volumeCol = DataFrame(xx.Volume)
volumeCol2 = volumeCol.resample("M", how = 'sum')

#splitFactorCol = DataFrame(xx.SplitFactor)
#splitFactorCol.resample("M", how = 'first')


monthlyData = DataFrame(openCol2.Open)

monthlyData["High"] = highCol2.High
monthlyData["Low"] = lowCol2.Low
monthlyData["Close"] = closeCol2.Close
monthlyData["Volume"] = volumeCol2.Volume

I'm sure there must be a more concise method, but this works with the data I have and it allows me to use the same code to generate 15min, 1 hour, daily, weekly and monthly. And its fast.

Any improvements/comments would be gratefully received.

Kind Regards,

-Jason

Post a Comment for "Is There A More Efficient Way To Convert Periodicity Of An Intraday Ohlc Dataframe In Python"