I am trying to estimate a multinomial logit with Python for a discrete choice model. I come from R, where the estimation of the coefficients is the same, but not the predictions.
I am using a known dataset of yogurts. I download it in R with data(yogurt)
in long format.
I got my first prediction using the package xlogit
:
Optimization terminated successfully.
Message: The gradients are close to zero
Iterations: 11
Function evaluations: 12
Estimation time= 0.0 seconds
---------------------------------------------------------------------------
Coefficient Estimate Std.Err. z-val P>|z|
---------------------------------------------------------------------------
_intercept.hiland -3.7156022 0.1454191 -25.5509897 1.21e-127 ***
_intercept.weight -0.6411846 0.0544983 -11.7652278 4.11e-31 ***
_intercept.yoplait 0.7345696 0.0806442 9.1087765 1.7e-19 ***
price -0.3665841 0.0243661 -15.0448680 5.79e-49 ***
feat 0.4914337 0.1200630 4.0931314 4.4e-05 ***
---------------------------------------------------------------------------
Significance: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log-Likelihood= -2656.888
AIC= 5323.776
BIC= 5352.717
I tried statsmodels
and I loved it, because it gives back many metrics that I need, like the R2 of McFadden. For this, I converted the data to wide format.
Optimization terminated successfully.
Current function value: 1.064086
Iterations 8
MNLogit Regression Results
==============================================================================
Dep. Variable: choice No. Observations: 2412
Model: MNLogit Df Residuals: 2385
Method: MLE Df Model: 24
Date: Mon, 27 May 2024 Pseudo R-squ.: 0.09402
Time: 12:54:04 Log-Likelihood: -2566.6
converged: True LL-Null: -2832.9
Covariance Type: nonrobust LLR p-value: 2.632e-97
==================================================================================
choice=hiland coef std err z P>|z| [0.025 0.975]
----------------------------------------------------------------------------------
const 4.1582 1.971 2.110 0.035 0.295 8.021
featdannon -0.0743 0.802 -0.093 0.926 -1.646 1.498
feathiland 1.1137 0.462 2.412 0.016 0.209 2.019
featweight 0.5323 0.687 0.775 0.438 -0.814 1.879
featyoplait -0.1600 0.753 -0.212 0.832 -1.635 1.316
pricedannon 0.2312 0.144 1.600 0.110 -0.052 0.514
pricehiland -0.8970 0.155 -5.794 0.000 -1.201 -0.594
priceweight -0.2053 0.163 -1.259 0.208 -0.525 0.114
priceyoplait -0.2371 0.073 -3.244 0.001 -0.380 -0.094
----------------------------------------------------------------------------------
choice=weight coef std err z P>|z| [0.025 0.975]
---------------------------------------------------------------------------------
const 1.6081 0.878 1.831 0.067 -0.113 3.329
featdannon -0.3334 0.319 -1.046 0.295 -0.958 0.291
feathiland -1.1772 0.382 -3.079 0.002 -1.927 -0.428
featweight 0.7189 0.287 2.502 0.012 0.156 1.282
featyoplait 0.0913 0.280 0.326 0.745 -0.458 0.641
pricedannon 0.1513 0.059 2.570 0.010 0.036 0.267
pricehiland -0.5736 0.078 -7.392 0.000 -0.726 -0.422
priceweight -0.1966 0.076 -2.574 0.010 -0.346 -0.047
priceyoplait 0.1134 0.046 2.440 0.015 0.022 0.205
---------------------------------------------------------------------------------
choice=yoplait coef std err z P>|z| [0.025 0.975]
----------------------------------------------------------------------------------
const 0.0535 0.869 0.062 0.951 -1.650 1.757
featdannon 0.4482 0.314 1.427 0.154 -0.168 1.064
feathiland 0.0552 0.328 0.168 0.866 -0.587 0.698
featweight -0.4865 0.328 -1.483 0.138 -1.129 0.156
featyoplait 0.3402 0.224 1.519 0.129 -0.099 0.779
pricedannon 0.7023 0.067 10.503 0.000 0.571 0.833
pricehiland -0.1863 0.072 -2.586 0.010 -0.327 -0.045
priceweight -0.1383 0.080 -1.724 0.085 -0.296 0.019
priceyoplait -0.3676 0.035 -10.627 0.000 -0.435 -0.300
==================================================================================
Later, when I tried predicting the probabilities I obtained different results. For the xlogit
model I had to do it manually, since it doesn’t predict probs.
I’d like to understand if I can do something to obtain the same results in both predictions. Am I doing everything right with statsmodels
?
Also, how can I compute the R2-McFadden manually if it’s not possible to obtain them from xlogit
?
Is there any other package that replicates the libraries from R: logitr
or mlogit
?
user25274814 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.