I’m trying to wrap my head around the optim()
function in R. Therefore I tried to optimize two beta parameters (beta1 and beta2) in a linear model so that the resulting R-squared value matches a specified target (e.g., 0.1). I’ve written the following code in R:
generate_data <- function(n_obs = 100, beta_params = c(2, 3), noise_sd = 1) {
# Extract beta parameters
beta1 <- beta_params[1]
beta2 <- beta_params[2]
# Generate predictors X1 and X2
X1 <- rnorm(n_obs) # Normally distributed predictor 1
X2 <- rnorm(n_obs) # Normally distributed predictor 2
# Generate response variable Y with added Gaussian noise
Y <- beta1 * X1 + beta2 * X2 + rnorm(n_obs, mean = 0, sd = noise_sd)
# Return data as a data frame
data <- data.frame(X1 = X1, X2 = X2, Y = Y)
return(data)
}
lm_r2 <- function(beta_params, target_r2, n_obs = 100, noise_sd = 1) {
# Generate data with current beta parameters
data <- generate_data(n_obs = n_obs, beta_params = beta_params, noise_sd = noise_sd)
# Fit a linear model
model <- lm(Y ~ X1 + X2, data = data)
# Calculate R-squared
actual_r2 <- summary(model)$r.squared
# Return the squared difference from the target R-squared
(actual_r2 - target_r2)^2
}
# Define the target R-squared
target_r2 <- 0.1
# Use optim to minimize the lm_r2 function
result <- optim(
par = c(1, 1), # Initial guesses for beta1 and beta2
fn = function(beta_params) lm_r2(beta_params, target_r2 = target_r2),
method = "L-BFGS-B"
)
# Extract optimized parameters
optimized_beta1 <- result$par[1]
optimized_beta2 <- result$par[2]
# Print results
cat("Optimized beta1:", optimized_beta1, "n")
cat("Optimized beta2:", optimized_beta2, "n")
# check if the optimized parameters result in the target R-squared
data <- generate_data(n_obs = 100, beta_params = c(optimized_beta1, optimized_beta2))
model <- lm(Y ~ X1 + X2, data = data)
summary(model)$r.squared
When I run this code, the optim() function completes, and I get optimized beta1 and beta2 values. However, when I use these optimized parameters to generate new data and compute the R-squared, the result is not close to my target R-squared (e.g., 0.1).
Observations:
- For larger target R-squared values (e.g., 0.8), the solution seems to work better but not perfectly.
- For smaller target R-squared values (e.g., 0.1), the discrepancy between the target and computed R-squared is larger.
- The optimization runs without errors, but it doesn’t converge to the desired result.
7