I’m working on a statistical analysis to calculate odds ratios and their 95% confidence intervals for several variables over specific time points. While the odds ratios are being calculated correctly, the confidence intervals (CIs) for these odds ratios are not showing the expected output.
Here’s a simplified version of my code:
import numpy as np
import pandas as pd
# Define the time points and variables
time_points = [6,12]
variables = ['var1', 'var2']
# Initialize dictionaries for odds ratios and confidence intervals
odds_ratios = {var: [] for var in variables}
conf_intervals = {var: [] for var in variables}
# Create a DataFrame for time points and generate spline terms (function not shown)
all_times = pd.DataFrame({'Time': time_points})
all_spline_terms = create_spline_terms(all_times, column='Time', num_knots=6)
# Loop through time points
for time in time_points:
base_data = data.copy()
base_data['Time'] = time
for col in all_spline_terms.columns:
base_data[col] = all_spline_terms.loc[all_times['Time'] == time, col].values[0]
for var in variables:
# Predict probabilities for var = 1 and var = 0
base_data_var_1 = base_data.copy()
base_data_var_1[var] = 1
proba_1 = result.predict(base_data_var_1)
odds_1 = proba_1 / (1 - proba_1)
base_data_var_0 = base_data.copy()
base_data_var_0[var] = 0
proba_0 = result.predict(base_data_var_0)
odds_0 = proba_0 / (1 - proba_0)
# Calculate the odds ratio
odds_ratio = odds_1.mean() / odds_0.mean() if odds_0.mean() != 0 else np.nan
odds_ratios[var].append(odds_ratio)
# Calculate Wald 95% confidence intervals
coef_var = result.params[var]
std_error_var = result.bse[var]
interaction_coef = 0
interaction_var = 0
for col in all_spline_terms.columns:
interaction_term = f"{var}:{col}"
if interaction_term in result.params:
interaction_coef += result.params[interaction_term]
interaction_var += result.bse[interaction_term] ** 2
total_coef = coef_var + interaction_coef
total_std_error = np.sqrt(std_error_var ** 2 + interaction_var)
lower_ci_log = total_coef - 1.96 * total_std_error
upper_ci_log = total_coef + 1.96 * total_std_error
lower_ci = np.exp(lower_ci_log)
upper_ci = np.exp(upper_ci_log)
conf_intervals[var].append((lower_ci, upper_ci))
# Output results
for var in variables:
print(f"nOdds ratios and 95% CI for {var}:")
for time, odds_ratio, (lower_ci, upper_ci) in zip(time_points, odds_ratios[var], conf_intervals[var]):
print(f' At time {time}: Odds ratio = {odds_ratio:.4f}, 95% CI = ({lower_ci:.4f}, {upper_ci:.4f})')
Here is create_spline_terms():
def create_spline_terms(df, column='Time', num_knots=6, degree=3):
knots = np.linspace(df[column].min(), df[column].max(), num_knots + 2)[1:-1]
spline_terms = patsy.dmatrix(
f"cr({column}, knots={list(knots)}, constraints='center')",
df,
return_type='dataframe'
)
return spline_terms.iloc[:, 1:]
Issue:
The odds ratios are being calculated correctly, but the confidence intervals are not displaying the expected values. I suspect the issue may lie in how I’m calculating the total coefficients or standard errors, particularly when summing interaction terms.
Specific Questions:
- Is there a mistake in how I calculate the confidence intervals?
- How can I ensure that the interaction terms are being correctly accounted for in the confidence interval calculations?
- Are there any best practices for calculating odds ratios and confidence intervals in logistic regression that I might be overlooking?
What I Tried:
I implemented a loop to calculate the odds ratios and their corresponding confidence intervals for several variables at different time points. I ensured that each variable’s interaction terms were correctly included in the CI calculations by summing the coefficients and their variances.
What I Was Expecting:
I expected different confidence intervals for each variable at each time point, reflecting the unique relationships and variances at those specific points. The output should provide a distinct CI for every variable at every time, but the current implementation seems to produce incorrect or identical confidence intervals across variables and time points.
Vicky is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
2