I have a csv data looks like this, and I want to create a tf.data.Dataset with window functionality. For example, the first 5 rows (unixtime 1700323218 and 1700323219) are grouped together and the last 5 rows(unixtime 1700323218 and 1700323219) are grouped together. After reading a few guides related to tf.data.Dataset, I still could not figure out how to build such Dataset. Generally, the window size is the continuous seconds of unixtime, and I want to batch all rows in the specific window.
unixtime,id,feature1,feature2
1700323218,a,0.01,0.01
1700323218,b,0.01,0.01
1700323218,c,0.01,0.01
1700323219,a,0.01,0.01
1700323219,b,0.01,0.01
1700323220,a,0.01,0.01
1700323220,b,0.01,0.01
1700323220,c,0.01,0.01
1700323221,b,0.01,0.01
1700323221,c,0.01,0.01
I try to build the dataset with code below, but it doesn’t work
#! /usr/bin/python
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras import layers
max_rows = 10
df = pd.read_csv("./test.csv")
ds = tf.data.Dataset.from_tensor_slices(dict(df))
unixtime_vocabulary = layers.IntegerLookup(vocabulary=np.unique(df['unixtime'].to_numpy()))
key_func = lambda x: unixtime_vocabulary(x['unixtime'])
reduce_func = lambda key, dataset: dataset.batch(max_rows)
ds = ds.group_by_window(key_func=key_func,
reduce_func=reduce_func,
window_size=max_rows)
windows = ds.window(2, shift=1, stride=1) # what to do next?
user10416795 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.