Thiết kế website giá rẻ

Question

Let’s say I have following datafarme df1 coresponding to user1:

+-------------------+-------+--------+-------+-------+----------+----------------+
|      Models       |  MAE  |  MSE   | RMSE  | MAPE  | R² score |  Runtime [ms]  |
+-------------------+-------+--------+-------+-------+----------+----------------+
| LinearRegression  | 4.906 | 27.784 | 5.271 | 0.405 |  -6.917  | 0:00:43.387145 |
+-------------------+-------+--------+-------+-------+----------+----------------+
|   Random Forest   | 2.739 | 10.239 |  3.2  | 0.231 |  -1.917  | 0:28:11.761681 |
+-------------------+-------+--------+-------+-------+----------+----------------+
|      XGBoost      | 2.826 | 10.898 | 3.301 | 0.234 |  -2.105  | 0:03:58.883474 |
+-------------------+-------+--------+-------+-------+----------+----------------+
|   MLPRegressor    | 5.234 | 30.924 | 5.561 | 0.43  |  -7.812  | 0:01:44.252276 |
+-------------------+-------+--------+-------+-------+----------+----------------+
|        SVR        | 5.061 | 29.301 | 5.413 | 0.417 |  -7.349  | 0:04:52.754769 |
+-------------------+-------+--------+-------+-------+----------+----------------+
| CatBoostRegressor | 2.454 | 8.823  | 2.97  | 0.201 |  -1.514  | 0:19:36.925169 |
+-------------------+-------+--------+-------+-------+----------+----------------+
|   LGBMRegressor   | 2.76  | 10.204 | 3.194 | 0.231 |  -1.907  | 0:04:51.223103 |
+-------------------+-------+--------+-------+-------+----------+----------------+

I have following datafarme df2 coresponding to user2:

+-------------------+-------+--------+-------+-------+----------+----------------+
|      Models       |  MAE  |  MSE   | RMSE  | MAPE  | R² score |  Runtime [ms]  |
+-------------------+-------+--------+-------+-------+----------+----------------+
| LinearRegression  | 4.575 | 24.809 | 4.981 | 0.377 |  -6.079  | 0:00:45.055854 |
+-------------------+-------+--------+-------+-------+----------+----------------+
|   Random Forest   | 2.345 | 8.065  | 2.84  | 0.199 |  -1.301  | 0:10:55.468473 |
+-------------------+-------+--------+-------+-------+----------+----------------+
|      XGBoost      | 2.129 | 7.217  | 2.686 | 0.179 |  -1.059  | 0:01:01.575033 |
+-------------------+-------+--------+-------+-------+----------+----------------+
|   MLPRegressor    | 4.414 | 23.477 | 4.845 | 0.363 |  -5.699  | 0:00:31.231719 |
+-------------------+-------+--------+-------+-------+----------+----------------+
|        SVR        | 4.353 | 22.826 | 4.778 | 0.357 |  -5.513  | 0:02:12.258870 |
+-------------------+-------+--------+-------+-------+----------+----------------+
| CatBoostRegressor | 2.281 | 7.671  | 2.77  | 0.189 |  -1.189  | 0:08:16.526615 |
+-------------------+-------+--------+-------+-------+----------+----------------+
|   LGBMRegressor   | 2.511 |  9.18  | 3.03  | 0.212 |  -1.619  | 0:15:25.084937 |
+-------------------+-------+--------+-------+-------+----------+----------------+

Let’s say I have more datafarmes df1000 coresponding to user1000.

Problem statement:
I want to rank Models result (sorted) over specific column (e.g. MAE) and return frequency of top models over all dfs (df1 till df1000). so this not something I can easily reach using:

df["category"].value_counts()

so defintly I need to transform and add list of sorted models’ names that’d be list of strings. including the name of Users in final transformed dataframe could be also useful however I did not mentioned in following table in expected output.

Expected output:

+-------------------+-------------------------------------------------------+--------+---------+
|      Rank         |  MAE                                                  |counts  |freq(%)  |
+-------------------+-------------------------------------------------------+--------+---------+
| Top models(sorted)| ["CatBoostRegressor","RandomForest","LGBMRegressor",
                       "XGBoost","LinearRegression","SVR","MLPRegressor"]   | 70     |   65%   |
| Top models(sorted)| ["LGBMRegressor","CatBoostRegressor","RandomForest",
                       "XGBoost","LinearRegression","SVR","MLPRegressor"]   | 20     |   12%   |
....
+-------------------+-------------------------------------------------------+--------+---------+

I also was thinking maybe I can use Natural Language Processing (NLP) methods called TF-IDF to handle this problem using:

# import required module
from sklearn.feature_extraction.text import TfidfVectorizer

Potentially related posts I have checked:

How can I compute a histogram (frequency table) for a single Series?
Count the frequency that a value occurs in a dataframe column
Efficient way to get frequency of elements in a pandas column of lists
Calculate Frequency of item in list
Get the frequency of individual items in a list of each row of a column in a dataframe
count the frequency of elements in list of lists in Python
What’s the best alternative to using lists as elements in a pandas dataframe?
pandas – create dataframe with counts and frequency of elements
Python: Calculate PMF for List in Pandas Dataframe
Frequency plot of a Pandas Dataframe
python & pandas – How to calculate frequency under conditions in columns in DataFrame?

Thiết kế website giá rẻ

Danh mục

What is the best practice to calculate frequency of list of elements in python within multiple pandas dataframe?