I am trying to learn tensorboard tracking, and this is the first time I am using it. However even after using all the right code, it still is not opening.
Here is the error message:
The tensorboard extension is already loaded. To reload it, use:
%reload_ext tensorboard
ERROR: Failed to launch TensorBoard (exited with 2).
Contents of stderr:
2024-05-04 05:30:38.549269: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-04 05:30:38.549336: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-04 05:30:38.550587: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-05-04 05:30:39.592717: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
usage: tensorboard [-h] [–helpfull] [–logdir PATH] [–logdir_spec PATH_SPEC] [–host ADDR]
[–bind_all] [–port PORT] [–reuse_port BOOL] [–load_fast {false,auto,true}]
[–extra_data_server_flags EXTRA_DATA_SERVER_FLAGS]
[–grpc_creds_type {local,ssl,ssl_dev}] [–grpc_data_provider PORT]
[–purge_orphaned_data BOOL] [–db URI] [–db_import] [–inspect]
[–version_tb] [–tag TAG] [–event_file PATH] [–path_prefix PATH]
[–window_title TEXT] [–max_reload_threads COUNT] [–reload_interval SECONDS]
[–reload_task TYPE] [–reload_multifile BOOL]
[–reload_multifile_inactive_secs SECONDS] [–generic_data TYPE]
[–samples_per_plugin SAMPLES_PER_PLUGIN] [–detect_file_replacement BOOL]
{serve,dev} …
tensorboard: error: argument {serve,dev}: invalid choice: ‘/content/runs’ (choose from ‘serve’, ‘dev’)
Here is my code:
from typing import Dict, List
from tqdm.auto import tqdm
from going_modular.going_modular.engine import train_step, test_step
# Import train() function from:
# https://github.com/mrdbourke/pytorch-deep-learning/blob/main/going_modular/going_modular/engine.py
def train(model: torch.nn.Module,
train_dataloader: torch.utils.data.DataLoader,
test_dataloader: torch.utils.data.DataLoader,
optimizer: torch.optim.Optimizer,
loss_fn: torch.nn.Module,
epochs: int,
device: torch.device) -> Dict[str, List]:
"""Trains and tests a PyTorch model.
Passes a target PyTorch models through train_step() and test_step()
functions for a number of epochs, training and testing the model
in the same epoch loop.
Calculates, prints and stores evaluation metrics throughout.
Args:
model: A PyTorch model to be trained and tested.
train_dataloader: A DataLoader instance for the model to be trained on.
test_dataloader: A DataLoader instance for the model to be tested on.
optimizer: A PyTorch optimizer to help minimize the loss function.
loss_fn: A PyTorch loss function to calculate loss on both datasets.
epochs: An integer indicating how many epochs to train for.
device: A target device to compute on (e.g. "cuda" or "cpu").
Returns:
A dictionary of training and testing loss as well as training and
testing accuracy metrics. Each metric has a value in a list for
each epoch.
In the form: {train_loss: [...],
train_acc: [...],
test_loss: [...],
test_acc: [...]}
For example if training for epochs=2:
{train_loss: [2.0616, 1.0537],
train_acc: [0.3945, 0.3945],
test_loss: [1.2641, 1.5706],
test_acc: [0.3400, 0.2973]}
"""
# Create empty results dictionary
results = {"train_loss": [],
"train_acc": [],
"test_loss": [],
"test_acc": []
}
# Loop through training and testing steps for a number of epochs
for epoch in tqdm(range(epochs)):
train_loss, train_acc = train_step(model=model,
dataloader=train_dataloader,
loss_fn=loss_fn,
optimizer=optimizer,
device=device)
test_loss, test_acc = test_step(model=model,
dataloader=test_dataloader,
loss_fn=loss_fn,
device=device)
# Print out what's happening
print(
f"Epoch: {epoch+1} | "
f"train_loss: {train_loss:.4f} | "
f"train_acc: {train_acc:.4f} | "
f"test_loss: {test_loss:.4f} | "
f"test_acc: {test_acc:.4f}"
)
# Update results dictionary
results["train_loss"].append(train_loss)
results["train_acc"].append(train_acc)
results["test_loss"].append(test_loss)
results["test_acc"].append(test_acc)
### New: Experiment tracking ###
# Add loss results to SummaryWriter
writer.add_scalars(main_tag = 'Loss',
tag_scalar_dict={"train_loss": train_loss,
"test_loss": test_loss},
global_step = epoch)
writer.add_scalars(main_tag = 'Accuracy',
tag_scalar_dict = {'train_acc': train_acc,
'test_acc': test_acc},
global_step = epoch
)
writer.close()
return results
set_seeds()
results = train(model,
train_dataloader,
test_dataloader,
optimizer,
loss_function,
epochs = 5, device = device)
%load_ext tensorboard
%tensorboard --logsdir /content/runs
I tried reinstalling tensorboard but doesn’t seem to work