I’ve been trying to get a models performance to be able to optimise the inference and training time.
There are several pieces to it but the main issue is that even the Tensorflow Profiler’s guide does not currently show a summary using current versions of the packages (tf 2.17, Keras 3.4.1 etc.)
However, most of the profiler’s functionality works, but not the summary page, which returns a cryptic error: No step marker observer and hence the step time is unknown.
I’ve dug down the tensorflow and tensorflow/profiler issue-threads, but the workarounds are for docker or permissions problems, none of which appears here.
The functionality that works is most of the list shown here:
I’ve created a simple colab to reproduce the results, this is the Python code for it:
# !pip uninstall tensorflow keras tensorboard-plugin-profile tensorboard tb-nightly tensorboardX -y
# !pip install -U tensorflow==2.17.0 keras==3.4.1 tensorboard-plugin-profile
import tensorflow as tf
import keras
from keras.api.layers import Dense, Flatten
from keras.api.callbacks import TensorBoard
from datetime import datetime
import tensorflow_datasets as tfds
tfds.disable_progress_bar()
(ds_train, ds_test), ds_info = tfds.load(
'mnist',
split=['train', 'test'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)
def normalize_img(image, label):
"""Normalizes images: `uint8` -> `float32`."""
return tf.cast(image, tf.float32) / 255., label
ds_train = ds_train.map(normalize_img)
ds_train = ds_train.batch(128)
ds_train = ds_train.cache()
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)
ds_test = ds_test.map(normalize_img)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)
model = keras.models.Sequential([
keras.layers.Input(shape=(28, 28, 1)),
keras.layers.Flatten(),
keras.layers.Dense(412,activation='relu'),
keras.layers.Dense(10, activation='softmax')
])
model.compile(
loss='sparse_categorical_crossentropy',
optimizer=keras.optimizers.Adam(0.001),
metrics=['accuracy']
)
logs = "logs/" + datetime.now().strftime("%Y%m%d-%H%M%S")
options = tf.profiler.experimental.ProfilerOptions(host_tracer_level = 3,
python_tracer_level = 1,
device_tracer_level = 1)
tboard_callback = tf.keras.callbacks.TensorBoard(log_dir = logs, histogram_freq = 1, profile_batch='10, 15')
tf.profiler.experimental.start(logs,options=options)
model.fit(ds_train, epochs=4, validation_data=ds_test,callbacks = [tboard_callback], steps_per_epoch=1)
tf.profiler.experimental.stop()
%load_ext tensorboard
%tensorboard --logdir="{logs}"
Can anyone suggest what’s going on here and how to get it fixed?