Thiết kế website giá rẻ

Question

I am trying to estimate a multinomial logit with Python for a discrete choice model. I come from R, where the estimation of the coefficients is the same, but not the predictions.

I am using a known dataset of yogurts. I download it in R with data(yogurt) in long format.

I got my first prediction using the package xlogit:

<code>Optimization terminated successfully.

Message: The gradients are close to zero

Iterations: 11

Function evaluations: 12

Estimation time= 0.0 seconds

---------------------------------------------------------------------------

Coefficient Estimate Std.Err. z-val P>|z|

---------------------------------------------------------------------------

_intercept.hiland -3.7156022 0.1454191 -25.5509897 1.21e-127 ***

_intercept.weight -0.6411846 0.0544983 -11.7652278 4.11e-31 ***

_intercept.yoplait 0.7345696 0.0806442 9.1087765 1.7e-19 ***

price -0.3665841 0.0243661 -15.0448680 5.79e-49 ***

feat 0.4914337 0.1200630 4.0931314 4.4e-05 ***

---------------------------------------------------------------------------

Significance: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log-Likelihood= -2656.888

AIC= 5323.776

BIC= 5352.717

</code>

<code>Optimization terminated successfully. Message: The gradients are close to zero Iterations: 11 Function evaluations: 12 Estimation time= 0.0 seconds --------------------------------------------------------------------------- Coefficient Estimate Std.Err. z-val P>|z| --------------------------------------------------------------------------- _intercept.hiland -3.7156022 0.1454191 -25.5509897 1.21e-127 *** _intercept.weight -0.6411846 0.0544983 -11.7652278 4.11e-31 *** _intercept.yoplait 0.7345696 0.0806442 9.1087765 1.7e-19 *** price -0.3665841 0.0243661 -15.0448680 5.79e-49 *** feat 0.4914337 0.1200630 4.0931314 4.4e-05 *** --------------------------------------------------------------------------- Significance: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Log-Likelihood= -2656.888 AIC= 5323.776 BIC= 5352.717 </code>

Optimization terminated successfully.
    Message: The gradients are close to zero
    Iterations: 11
    Function evaluations: 12
Estimation time= 0.0 seconds
---------------------------------------------------------------------------
Coefficient              Estimate      Std.Err.         z-val         P>|z|
---------------------------------------------------------------------------
_intercept.hiland      -3.7156022     0.1454191   -25.5509897     1.21e-127 ***
_intercept.weight      -0.6411846     0.0544983   -11.7652278      4.11e-31 ***
_intercept.yoplait      0.7345696     0.0806442     9.1087765       1.7e-19 ***
price                  -0.3665841     0.0243661   -15.0448680      5.79e-49 ***
feat                    0.4914337     0.1200630     4.0931314       4.4e-05 ***
---------------------------------------------------------------------------
Significance:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log-Likelihood= -2656.888
AIC= 5323.776
BIC= 5352.717

I tried statsmodels and I loved it, because it gives back many metrics that I need, like the R2 of McFadden. For this, I converted the data to wide format.

<code>Optimization terminated successfully.

Current function value: 1.064086

Iterations 8

MNLogit Regression Results

==============================================================================

Dep. Variable: choice No. Observations: 2412

Model: MNLogit Df Residuals: 2385

Method: MLE Df Model: 24

Date: Mon, 27 May 2024 Pseudo R-squ.: 0.09402

Time: 12:54:04 Log-Likelihood: -2566.6

converged: True LL-Null: -2832.9

Covariance Type: nonrobust LLR p-value: 2.632e-97

==================================================================================

choice=hiland coef std err z P>|z| [0.025 0.975]

----------------------------------------------------------------------------------

const 4.1582 1.971 2.110 0.035 0.295 8.021

featdannon -0.0743 0.802 -0.093 0.926 -1.646 1.498

feathiland 1.1137 0.462 2.412 0.016 0.209 2.019

featweight 0.5323 0.687 0.775 0.438 -0.814 1.879

featyoplait -0.1600 0.753 -0.212 0.832 -1.635 1.316

pricedannon 0.2312 0.144 1.600 0.110 -0.052 0.514

pricehiland -0.8970 0.155 -5.794 0.000 -1.201 -0.594

priceweight -0.2053 0.163 -1.259 0.208 -0.525 0.114

priceyoplait -0.2371 0.073 -3.244 0.001 -0.380 -0.094

----------------------------------------------------------------------------------

choice=weight coef std err z P>|z| [0.025 0.975]

---------------------------------------------------------------------------------

const 1.6081 0.878 1.831 0.067 -0.113 3.329

featdannon -0.3334 0.319 -1.046 0.295 -0.958 0.291

feathiland -1.1772 0.382 -3.079 0.002 -1.927 -0.428

featweight 0.7189 0.287 2.502 0.012 0.156 1.282

featyoplait 0.0913 0.280 0.326 0.745 -0.458 0.641

pricedannon 0.1513 0.059 2.570 0.010 0.036 0.267

pricehiland -0.5736 0.078 -7.392 0.000 -0.726 -0.422

priceweight -0.1966 0.076 -2.574 0.010 -0.346 -0.047

priceyoplait 0.1134 0.046 2.440 0.015 0.022 0.205

---------------------------------------------------------------------------------

choice=yoplait coef std err z P>|z| [0.025 0.975]

----------------------------------------------------------------------------------

const 0.0535 0.869 0.062 0.951 -1.650 1.757

featdannon 0.4482 0.314 1.427 0.154 -0.168 1.064

feathiland 0.0552 0.328 0.168 0.866 -0.587 0.698

featweight -0.4865 0.328 -1.483 0.138 -1.129 0.156

featyoplait 0.3402 0.224 1.519 0.129 -0.099 0.779

pricedannon 0.7023 0.067 10.503 0.000 0.571 0.833

pricehiland -0.1863 0.072 -2.586 0.010 -0.327 -0.045

priceweight -0.1383 0.080 -1.724 0.085 -0.296 0.019

priceyoplait -0.3676 0.035 -10.627 0.000 -0.435 -0.300

==================================================================================

</code>

<code>Optimization terminated successfully. Current function value: 1.064086 Iterations 8 MNLogit Regression Results ============================================================================== Dep. Variable: choice No. Observations: 2412 Model: MNLogit Df Residuals: 2385 Method: MLE Df Model: 24 Date: Mon, 27 May 2024 Pseudo R-squ.: 0.09402 Time: 12:54:04 Log-Likelihood: -2566.6 converged: True LL-Null: -2832.9 Covariance Type: nonrobust LLR p-value: 2.632e-97 ================================================================================== choice=hiland coef std err z P>|z| [0.025 0.975] ---------------------------------------------------------------------------------- const 4.1582 1.971 2.110 0.035 0.295 8.021 featdannon -0.0743 0.802 -0.093 0.926 -1.646 1.498 feathiland 1.1137 0.462 2.412 0.016 0.209 2.019 featweight 0.5323 0.687 0.775 0.438 -0.814 1.879 featyoplait -0.1600 0.753 -0.212 0.832 -1.635 1.316 pricedannon 0.2312 0.144 1.600 0.110 -0.052 0.514 pricehiland -0.8970 0.155 -5.794 0.000 -1.201 -0.594 priceweight -0.2053 0.163 -1.259 0.208 -0.525 0.114 priceyoplait -0.2371 0.073 -3.244 0.001 -0.380 -0.094 ---------------------------------------------------------------------------------- choice=weight coef std err z P>|z| [0.025 0.975] --------------------------------------------------------------------------------- const 1.6081 0.878 1.831 0.067 -0.113 3.329 featdannon -0.3334 0.319 -1.046 0.295 -0.958 0.291 feathiland -1.1772 0.382 -3.079 0.002 -1.927 -0.428 featweight 0.7189 0.287 2.502 0.012 0.156 1.282 featyoplait 0.0913 0.280 0.326 0.745 -0.458 0.641 pricedannon 0.1513 0.059 2.570 0.010 0.036 0.267 pricehiland -0.5736 0.078 -7.392 0.000 -0.726 -0.422 priceweight -0.1966 0.076 -2.574 0.010 -0.346 -0.047 priceyoplait 0.1134 0.046 2.440 0.015 0.022 0.205 --------------------------------------------------------------------------------- choice=yoplait coef std err z P>|z| [0.025 0.975] ---------------------------------------------------------------------------------- const 0.0535 0.869 0.062 0.951 -1.650 1.757 featdannon 0.4482 0.314 1.427 0.154 -0.168 1.064 feathiland 0.0552 0.328 0.168 0.866 -0.587 0.698 featweight -0.4865 0.328 -1.483 0.138 -1.129 0.156 featyoplait 0.3402 0.224 1.519 0.129 -0.099 0.779 pricedannon 0.7023 0.067 10.503 0.000 0.571 0.833 pricehiland -0.1863 0.072 -2.586 0.010 -0.327 -0.045 priceweight -0.1383 0.080 -1.724 0.085 -0.296 0.019 priceyoplait -0.3676 0.035 -10.627 0.000 -0.435 -0.300 ================================================================================== </code>

Optimization terminated successfully.
         Current function value: 1.064086
         Iterations 8
                          MNLogit Regression Results                          
==============================================================================
Dep. Variable:                 choice   No. Observations:                 2412
Model:                        MNLogit   Df Residuals:                     2385
Method:                           MLE   Df Model:                           24
Date:                Mon, 27 May 2024   Pseudo R-squ.:                 0.09402
Time:                        12:54:04   Log-Likelihood:                -2566.6
converged:                       True   LL-Null:                       -2832.9
Covariance Type:            nonrobust   LLR p-value:                 2.632e-97
==================================================================================
 choice=hiland       coef    std err          z      P>|z|      [0.025      0.975]
----------------------------------------------------------------------------------
const              4.1582      1.971      2.110      0.035       0.295       8.021
featdannon        -0.0743      0.802     -0.093      0.926      -1.646       1.498
feathiland         1.1137      0.462      2.412      0.016       0.209       2.019
featweight         0.5323      0.687      0.775      0.438      -0.814       1.879
featyoplait       -0.1600      0.753     -0.212      0.832      -1.635       1.316
pricedannon        0.2312      0.144      1.600      0.110      -0.052       0.514
pricehiland       -0.8970      0.155     -5.794      0.000      -1.201      -0.594
priceweight       -0.2053      0.163     -1.259      0.208      -0.525       0.114
priceyoplait      -0.2371      0.073     -3.244      0.001      -0.380      -0.094
----------------------------------------------------------------------------------
choice=weight       coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------
const             1.6081      0.878      1.831      0.067      -0.113       3.329
featdannon       -0.3334      0.319     -1.046      0.295      -0.958       0.291
feathiland       -1.1772      0.382     -3.079      0.002      -1.927      -0.428
featweight        0.7189      0.287      2.502      0.012       0.156       1.282
featyoplait       0.0913      0.280      0.326      0.745      -0.458       0.641
pricedannon       0.1513      0.059      2.570      0.010       0.036       0.267
pricehiland      -0.5736      0.078     -7.392      0.000      -0.726      -0.422
priceweight      -0.1966      0.076     -2.574      0.010      -0.346      -0.047
priceyoplait      0.1134      0.046      2.440      0.015       0.022       0.205
---------------------------------------------------------------------------------
choice=yoplait       coef    std err          z      P>|z|      [0.025      0.975]
----------------------------------------------------------------------------------
const              0.0535      0.869      0.062      0.951      -1.650       1.757
featdannon         0.4482      0.314      1.427      0.154      -0.168       1.064
feathiland         0.0552      0.328      0.168      0.866      -0.587       0.698
featweight        -0.4865      0.328     -1.483      0.138      -1.129       0.156
featyoplait        0.3402      0.224      1.519      0.129      -0.099       0.779
pricedannon        0.7023      0.067     10.503      0.000       0.571       0.833
pricehiland       -0.1863      0.072     -2.586      0.010      -0.327      -0.045
priceweight       -0.1383      0.080     -1.724      0.085      -0.296       0.019
priceyoplait      -0.3676      0.035    -10.627      0.000      -0.435      -0.300
==================================================================================

Later, when I tried predicting the probabilities I obtained different results. For the xlogit model I had to do it manually, since it doesn’t predict probs.

I’d like to understand if I can do something to obtain the same results in both predictions. Am I doing everything right with statsmodels?

Also, how can I compute the R2-McFadden manually if it’s not possible to obtain them from xlogit?

Is there any other package that replicates the libraries from R: logitr or mlogit?

Thiết kế website giá rẻ

Danh mục

Estimating Multinomial Logit using xlogit vs. statsmodels