Add Missing Timestamp Row To A Dataframe
I have a dataframe which contains data that were measured at two hours interval each day, some time intervals are however missing. My dataset looks like below: 2020-12-01 08:00:00
Solution 1:
Create DatetimeIndex
and use DataFrame.asfreq
:
print (df)
date val
0 2020-12-01 08:00:00 145.9
1 2020-12-01 10:00:00 100.0
2 2020-12-01 16:00:00 99.3
3 2020-12-01 18:00:00 91.0
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date').asfreq('2H')
print (df)
val
date
2020-12-01 08:00:00 145.9
2020-12-01 10:00:00 100.0
2020-12-01 12:00:00 NaN
2020-12-01 14:00:00 NaN
2020-12-01 16:00:00 99.3
2020-12-01 18:00:00 91.0
Solution 2:
assuming your df looks like
datetimevalue02020-12-01T08:00:00 145.912020-12-01T10:00:00 100.022020-12-01T16:00:00 99.332020-12-01T18:00:00 91.0
make sure datetime column is dtype datetime;
df['datetime'] = pd.to_datetime(df['datetime'])
so that you can now resample to 2-hourly frequency:
df.resample('2H',on='datetime').mean()valuedatetime2020-12-01 08:00:00 145.92020-12-01 10:00:00 100.02020-12-01 12:00:00 NaN2020-12-01 14:00:00 NaN2020-12-01 16:00:00 99.32020-12-01 18:00:00 91.0
Note that you don't need to set the on=
keyword if your df already has a datetime index. The df resulting from resampling will have a datetime index.
Also note that I'm using .mean()
as aggfunc, meaning that if you have multiple values within the two hour intervals, you'll get the mean of that.
Solution 3:
You can try the following:
I have used datetime and timedelta for this,
from datetime import datetime, timedelta
# Asuming that the data is given like below.
data = ['2020-12-01 08:00:00 145.9',
'2020-12-01 10:00:00 100.0',
'2020-12-01 16:00:00 99.3',
'2020-12-01 18:00:00 91.0']
# initialize the starttimeusing data[0]
date= data[0].split()[0].split('-')
time= data[0].split()[1].split(':')
start= datetime(int(date[0]), int(date[1]), int(date[2]), int(time[0]), int(time[1]), int(time[2]))
newdata = []
newdata.append(data[0])
i =1
while i < len(data):
cur =start
nxt =start+ timedelta(hours=2)
if (str(nxt) != (data[i].split()[0] +' '+ data[i].split()[1])):
newdata.append(str(nxt) +' NaN')
else:
newdata.append(data[i])
i+=1start= nxt
newdata
NOTE : temedelta(hours=2) will add 2 hours to the existing time.
Post a Comment for "Add Missing Timestamp Row To A Dataframe"