Skip to content Skip to sidebar Skip to footer

Add Missing Timestamp Row To A Dataframe

I have a dataframe which contains data that were measured at two hours interval each day, some time intervals are however missing. My dataset looks like below: 2020-12-01 08:00:00

Solution 1:

Create DatetimeIndex and use DataFrame.asfreq:

print (df)

                  date    val
0  2020-12-01 08:00:00  145.9
1  2020-12-01 10:00:00  100.0
2  2020-12-01 16:00:00   99.3
3  2020-12-01 18:00:00   91.0

df['date'] = pd.to_datetime(df['date'])

df = df.set_index('date').asfreq('2H')

print (df)
                       val
date                      
2020-12-01 08:00:00  145.9
2020-12-01 10:00:00  100.0
2020-12-01 12:00:00    NaN
2020-12-01 14:00:00    NaN
2020-12-01 16:00:00   99.3
2020-12-01 18:00:00   91.0

Solution 2:

assuming your df looks like

datetimevalue02020-12-01T08:00:00  145.912020-12-01T10:00:00  100.022020-12-01T16:00:00   99.332020-12-01T18:00:00   91.0

make sure datetime column is dtype datetime;

df['datetime'] = pd.to_datetime(df['datetime'])

so that you can now resample to 2-hourly frequency:

df.resample('2H',on='datetime').mean()valuedatetime2020-12-01 08:00:00  145.92020-12-01 10:00:00  100.02020-12-01 12:00:00    NaN2020-12-01 14:00:00    NaN2020-12-01 16:00:00   99.32020-12-01 18:00:00   91.0

Note that you don't need to set the on= keyword if your df already has a datetime index. The df resulting from resampling will have a datetime index.

Also note that I'm using .mean() as aggfunc, meaning that if you have multiple values within the two hour intervals, you'll get the mean of that.

Solution 3:

You can try the following:

I have used datetime and timedelta for this,

from datetime import datetime, timedelta

# Asuming that the data is given like below.
data = ['2020-12-01 08:00:00 145.9',
'2020-12-01 10:00:00 100.0',
'2020-12-01 16:00:00 99.3',
'2020-12-01 18:00:00 91.0']

# initialize the starttimeusing data[0]
date= data[0].split()[0].split('-')
time= data[0].split()[1].split(':')
start= datetime(int(date[0]), int(date[1]), int(date[2]), int(time[0]), int(time[1]), int(time[2]))

newdata = []
newdata.append(data[0])

i =1
while i < len(data):
    cur =start
    nxt =start+ timedelta(hours=2)
    
    if (str(nxt) != (data[i].split()[0] +' '+ data[i].split()[1])):
        newdata.append(str(nxt) +' NaN')
    else:
        newdata.append(data[i])
        i+=1start= nxt
    
newdata

NOTE : temedelta(hours=2) will add 2 hours to the existing time.

Post a Comment for "Add Missing Timestamp Row To A Dataframe"