I’m deploying a machine learning model using Python scripts (.py files) within an automated server workflow. The core of the model training process resides in model_training.py, which contains functions for data preprocessing, model training with hyperparameter optimization using Optuna, and model evaluation.
The deployment flow is orchestrated through main.py, where I execute the entire pipeline. Up until the stage where I retrieve best_params for model training, everything runs smoothly. However, at the best_params stage, the script appears to get stuck indefinitely, similar to what’s illustrated in the provided image (even when I test with n_trials=1 and early_stopping_rounds=1).
Here model_training.py:
import lightgbm as lgb from sklearn.metrics import mean_squared_error import numpy as np import optuna from sklearn.model_selection import train_test_split from optuna.integration import LightGBMPruningCallback import warnings warnings.filterwarnings("ignore", message="Found
n_estimators` in params. Will use it instead of argument”)
optuna.logging.set_verbosity(optuna.logging.INFO)
seed = 42
np.random.seed(42)
def train_validation_test_split(X, y, test_size=0.2, random_state=seed):
“””
A function to split input data into training, validation, and test sets.
Parameters:
X (array-like): The input features.
y (array-like): The target variable.
test_size (float): The proportion of the dataset to include in the test split.
random_state (int): Controls the randomness of the training and testing indices.
Returns:
X_train (array-like): Training data for input features.
X_test (array-like): Testing data for input features.
y_train (array-like): Training data for target variable.
y_test (array-like): Testing data for target variable.
"""
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=random_state)
return X_train, X_test, y_train, y_test
def pre_lgb_dataset(X_train, X_test, y_train, y_test, cat_cols):
“””
Generate a LightGBM Dataset for training, validation, and testing data.
Parameters:
- X_train: training data features
- X_test: testing data features
- y_train: training data labels
- y_test: testing data labels
- cat_cols: list of categorical columns
- type: a string indicating the type of dataset
Returns:
- train_data: LightGBM Dataset for training data
- val_data: LightGBM Dataset for validation data
- test_data: LightGBM Dataset for testing data
"""
train_data = lgb.Dataset(X_train, label=y_train, categorical_feature=cat_cols,free_raw_data=False)
test_data = lgb.Dataset(X_test, label=y_test, categorical_feature=cat_cols,free_raw_data=False)
return train_data, test_data
def train_optuna_cv(train_data, n_folds=5, n_trials=1, logging_period=10, early_stopping_rounds=10):
“””
Trains a LightGBM model using Optuna for hyperparameter optimization with cross-validation.
Parameters:
- data: Features for training.
- n_folds: Number of folds for cross-validation (default is 5).
- n_trials: Number of optimization trials to run (default is 100).
- logging_period: Interval for logging evaluation metrics during training (default is 10).
- early_stopping_rounds: Rounds to trigger early stopping if no improvement (default is 10).
Returns:
- best_params: Dictionary of the best hyperparameters found by Optuna.
"""
def objective(trial):
# Define the hyperparameter search space
params = {
'objective': 'regression',
'metric': 'rmse',
'lambda_l1': trial.suggest_float('lambda_l1', 1e-8, 10.0, log=True),
'lambda_l2': trial.suggest_float('lambda_l2', 1e-8, 10.0, log=True),
'learning_rate': trial.suggest_float('learning_rate', 1e-3, 5e-1, log=True),
'num_leaves': trial.suggest_int('num_leaves', 2, 256),
'feature_fraction': trial.suggest_float('feature_fraction', 0.4, 1.0),
'bagging_fraction': trial.suggest_float('bagging_fraction', 0.4, 1.0),
'bagging_freq': trial.suggest_int('bagging_freq', 1, 7),
'verbosity': -1 # Suppress internal LightGBM logging
}
# Perform cross-validation
cv_results = lgb.cv(
params,
train_data,
nfold=n_folds,
stratified=False, # Usually, stratification is not needed for regression
shuffle=True, # Shuffle data before splitting
callbacks=[
lgb.early_stopping(stopping_rounds=early_stopping_rounds),
lgb.log_evaluation(period=logging_period),
LightGBMPruningCallback(trial, 'rmse')
],
seed=42,
)
# Get the best score from cross-validation
best_score = cv_results['valid rmse-mean'][-1]
return best_score
# Create an Optuna study and optimize
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=n_trials)
# Return the best found hyperparameters
best_params = study.best_params
return best_params`
`def model_pred(best_params, train_data, val_data):
# Train the model
best_model = lgb.train(best_params, train_data, valid_sets=[val_data])
return best_model`
Here’s a simplified structure of my workflow in main.py:
`from model_training import train_validation_test_split, pre_lgb_dataset, train_optuna_cv, model_pred
import pandas as pd
import numpy as np
import optuna
seed = 42
np.random.seed(42)
def main():
# Data preparation and feature engineering steps here…
# Model Training
X_train, X_test, y_train, y_test = train_validation_test_split(df_features, df_target)
train_data, test_data = pre_lgb_dataset(X_train, X_test, y_train, y_test, cat_cols)
# Hyperparameter Optimization
best_params = train_optuna_cv(train_data, n_trials=1, early_stopping_rounds=1)
# Model Training with Best Parameters
best_model = model_pred(best_params, train_data, test_data)
# Further steps for model evaluation and deployment...
if name == “main“:
main()`
To debug, I tried using a simplified sample_params as follows, and it ran without any issues
sample_params = { 'objective': 'regression', 'metric': 'rmse', 'num_leaves': 31, 'learning_rate': 0.05, 'num_threads': 4 }
What could be causing the script to get stuck at the best_params step despite simpler configurations running fine?
Any suggestions on how to troubleshoot or debug this issue further in an automated deployment environment?
Any insights or advice would be greatly appreciated. Thank you!
Anh Nguyen is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.