Is There A More Efficient Way To Convert Periodicity Of An Intraday Ohlc Dataframe In Python
In Python, using the pandas library I would like to convert my minutely data into daily data. After loading in the data (from a csv) and making a DatetimeIndex as the index the obj
Solution 1:
In master/0.13 (releasing very shortly), you can do this (in 0.12 this is a bit more manual to do, as you have to do it on the series individually)
In [7]: df = DataFrame(np.random.randn(10000,2),index=date_range('20130101 09:00:00',periods=10000,freq='1Min'),columns=['last','volume'])
In [8]: df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 10000 entries, 2013-01-0109:00:00to2013-01-0807:39:00
Freq: T
Data columns (total 2 columns):
last10000 non-nullvalues
volume 10000 non-nullvalues
dtypes: float64(2)
In [9]: df.resample('1D',how='ohlc')
Out[9]:
last volume
open high low closeopen high low close2013-01-010.8019823.343166-3.203291-0.3615020.2553562.723863-3.3194141.0733762013-01-020.1016873.378843-3.219792-1.1219001.2260994.103099-3.463014-0.4525942013-01-03-0.0518064.290010-4.099700-0.6373210.7131893.622728-3.236652-0.1044582013-01-040.8212153.058024-3.907862-1.5954490.8362342.821551-3.191774-0.3996032013-01-050.0849733.458210-3.1914551.426380-0.4024352.777447-2.9661651.2273982013-01-06-0.6699223.232865-3.9022371.846017-0.4400553.088109-3.7106403.0667252013-01-07-0.1227273.300163-3.3155011.7181631.0850663.373251-4.0296790.1878282013-01-080.3117853.073488-3.013702-0.627721-0.5022582.795292-2.772738-0.654676
[8rows x 8 columns]
This will work in 0.12
pd.concat(dict([ (k,df[k].resample('1D',how='ohlc')) for k in df.columns ]),axis=1)
Solution 2:
I'm very new to pandas and python but I came up with this which allows conversion to any time period
In my example, minData is minute data, stored in a flat format without any commas. My data came from quantquote.com
columnHeadings = ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume', 'Split Factor', 'Earnings', 'Dividends']
minData = pd.read_csv(
filename,
header = None,
names = columnHeadings,
parse_dates = [["Date", "Time"]],
date_parser = lambda x: datetime.datetime.strptime(x, '%Y%m%d %H%M'),
index_col = "Date_Time",
sep=' ')
xx = minData.to_period(freq="min")
openCol = DataFrame(xx.Open)
openCol2 = openCol.resample("M", how = 'first')
highCol = DataFrame(xx.High)
highCol2 = highCol.resample("M", how = 'max')
lowCol = DataFrame(xx.Low)
lowCol2 = lowCol.resample("M", how = 'min')
closeCol = DataFrame(xx.Close)
closeCol2 = closeCol.resample("M", how = 'last')
volumeCol = DataFrame(xx.Volume)
volumeCol2 = volumeCol.resample("M", how = 'sum')
#splitFactorCol = DataFrame(xx.SplitFactor)
#splitFactorCol.resample("M", how = 'first')
monthlyData = DataFrame(openCol2.Open)
monthlyData["High"] = highCol2.High
monthlyData["Low"] = lowCol2.Low
monthlyData["Close"] = closeCol2.Close
monthlyData["Volume"] = volumeCol2.Volume
I'm sure there must be a more concise method, but this works with the data I have and it allows me to use the same code to generate 15min, 1 hour, daily, weekly and monthly. And its fast.
Any improvements/comments would be gratefully received.
Kind Regards,
-Jason
Post a Comment for "Is There A More Efficient Way To Convert Periodicity Of An Intraday Ohlc Dataframe In Python"