I am trying to simulate the Online Federated Learning framework presented in the paper “Communication-Efficient Online Federated Learning Framework for Nonlinear Regression” by Gogineni et al., 2022. The simulation involves using Random Fourier Features (RFF) with a kernel least-mean-square (KLMS) algorithm to perform a nonlinear regression task across multiple clients in a federated setting.
Summary of the Implementation:
- Number of Clients: 100
- Global Iterations: 1000
- RFF Dimension: 200
- Learning Rate: 0.75
- Number of Participating Clients per Iteration: 20
- Number of Independent Monte Carlo Trials: 500
In each global iteration, a subset of clients is selected, and each client updates its local model using streaming data. The clients then share their model updates with the global server, which aggregates these updates to form a new global model.
Problem:
The Mean Squared Error (MSE) computed during the simulation is not converging or decreasing as expected. Instead, the MSE fluctuates significantly or does not exhibit the steady decline that should be characteristic of a learning process. I have verified the implementation against the methodology described in the paper, but the results do not align with those presented in the paper’s simulations.
Key Aspects of the Simulation:
- The input signal at each client is generated using a first-order autoregressive (AR) model, with parameters sampled from uniform distributions as described in the paper.
- The clients apply a kernel LMS algorithm using random Fourier features to perform the local nonlinear regression.
- The global model is updated iteratively by averaging the weights of the selected clients in each global iteration.
Code Snippet:
import numpy as np
import matplotlib.pyplot as plt
# Hyperparameters
num_clients = 100 # Number of clients in the simulation
independent_experiment = 10 # Number of independent Monte Carlo trials
feature_dim = 5 # Dimensionality of the input features
rff_dim = 200 # Dimensionality of the random Fourier features
num_participating_clients = 20 # Number of clients participating in each iteration
learning_rate = 0.75 # Learning rate for the local updates
num_iterations = 1000 # Number of iterations for training
# Initialize an array to store the MSE values across all trials
mse_values_all_trials = np.zeros(num_iterations)
# Main loop for averaging over multiple Monte Carlo trials
for _ in range(independent_experiment):
global_weights = np.zeros(rff_dim) # Initialize global weights
x = np.zeros((num_clients, num_iterations, feature_dim)) # Input features for each client
y = np.zeros((num_clients, num_iterations, 1)) # Target values for each client
z = np.zeros((num_clients, num_iterations, rff_dim)) # Random Fourier features for each client
W = np.random.randn(num_clients, feature_dim, rff_dim) # Random weights for RFF
b = np.random.uniform(0, 2 * np.pi, (num_clients, 1, rff_dim)) # Random bias for RFF
# Generate data for each client
for k in range(num_clients):
theta_k = np.random.uniform(0.2, 0.9) # Autoregressive coefficient
mu_k = np.random.uniform(-0.2, 0.2) # Mean of the process noise
sigma2_uk = np.random.uniform(0.2, 1.2) # Variance of the process noise
sigma2_nuk = np.random.uniform(0.005, 0.03) # Variance of the observation noise
uk = np.random.normal(mu_k, np.sqrt(sigma2_uk), (num_iterations, feature_dim)) # Process noise
nuk = np.random.normal(0, np.sqrt(sigma2_nuk), (num_iterations, 1)) # Observation noise
# Generate the time series data
x[k, 0] = uk[0]
for n in range(1, num_iterations):
x[k, n, :] = theta_k * x[k, n-1, :] + np.sqrt(1 - theta_k**2) * uk[n]
y[k, n, :] = (np.sqrt(x[k, n, 0]**2 + np.sin(np.pi * x[k, n, 3])**2) +
(0.8 - 0.5*np.exp(-x[k, n, 1]**2)*x[k, n, 2])) + nuk[n]
# Compute the random Fourier features
z[k, :, :] = np.sqrt(2 / rff_dim) * np.cos(np.dot(x[k, :, :], W[k, :, :]) + b[k, :, :])
local_weights = [np.zeros(rff_dim) for _ in range(num_clients)] # Initialize local weights for each client
mse_values_per_iteration = np.zeros(num_iterations) # Store MSE for each iteration
mse_values_per_iteration_per_client = np.zeros((num_clients, num_iterations)) # Store MSE for each client per iteration
# Iterative training process
for n in range(num_iterations):
selected_indices = np.random.choice(num_clients, num_participating_clients, replace=False) # Select random clients
for k in selected_indices:
local_weights[k] = global_weights # Start with global weights
epsilon = y[k, n, :] - np.dot(local_weights[k], z[k, n, :]) # Compute error
local_weights[k] += learning_rate * z[k, n, :] * epsilon # Update local weights
mse_values_per_iteration_per_client[k, n] = epsilon**2 # Compute MSE for the current iteration
mse_values_per_iteration[n] += mse_values_per_iteration_per_client[k, n] # Aggregate MSE for selected clients
mse_values_per_iteration[n] /= num_participating_clients # Average MSE over participating clients
global_weights = np.zeros(rff_dim) # Reset global weights
for k in selected_indices:
global_weights += local_weights[k] # Aggregate updated local weights
global_weights /= num_participating_clients # Average global weights
mse_values_all_trials += mse_values_per_iteration # Accumulate MSE across all trials
# Average MSE across all trials and normalize
mse_values_all_trials /= independent_experiment
mse_values_all_trials /= max(mse_values_all_trials)
# Convert MSE to decibels
mse_value_all_trials = 10 * np.log10(mse_values_all_trials)
# Plot the MSE values over iterations
plt.plot(mse_value_all_trials)
plt.xlabel("Iterations")
plt.ylabel("MSE (dB)")
plt.title("Mean Squared Error Over Iterations")
plt.show()
What I Tried:
-
Implemented the Simulation: I followed the methodology described in the paper by Gogineni et al., implementing the federated learning framework with random Fourier features (RFF) for kernel least-mean-square (KLMS) regression. This involved generating synthetic data for multiple clients, performing local model updates, and aggregating these updates on a global server.
-
Verified Data Generation: I ensured that the input signal for each client was generated using a first-order autoregressive (AR) model with the parameters and noise characteristics specified in the paper. I also checked the implementation of the RFF transformation to map input data into the feature space.
-
Adjusted Learning Rate: I experimented with different learning rates to see if it would stabilize the MSE. While the paper suggests a learning rate of 0.75, I tried smaller and larger values to see if this would have an effect.
-
Checked Model Updates: I verified that the global model updates were correctly computed by averaging the local model weights from the selected clients in each iteration.
-
Multiple Trials: The simulation was run over multiple independent Monte Carlo trials to average out randomness, as suggested by the paper.
What I Expected:
-
MSE Convergence: Based on the paper’s results, I expected the MSE to show a consistent decrease over the iterations, reflecting the improvement of the global model as more data and updates are accumulated.
-
Smoother MSE Curve: While some fluctuations are expected due to the random nature of client selection and data, I anticipated that the overall MSE curve would smooth out and converge to a lower value as the model learns over iterations.
-
Results Consistent with the Paper: I expected my simulation results to align closely with the figures presented in the paper, particularly regarding the convergence rate and the final steady-state MSE values.
Sunil Dhawan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.