I am doing some tests using this YDF library through RandomForestLearner, and comparing a little with RandomForestClassifier from sklearn, in python
but working with this, with YDF, I am seeing that every time I train a forest it generates a print that long
WARNING:absl:The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=56 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.
[INFO 24-05-17 15:29:24.0753 UTC dataset.cc:407] max_vocab_count = -1 for column income, the dictionary will not be pruned by size.
Train model on 22792 examples
[INFO 24-05-17 15:29:24.1390 UTC learner.cc:142] Data spec:
Number of records: 22792
Number of columns: 15
Number of columns by type:
CATEGORICAL: 9 (60%)
NUMERICAL: 6 (40%)
Columns:
CATEGORICAL: 9 (60%)
0: "income" CATEGORICAL has-dict vocab-size:3 zero-ood-items most-frequent:"<=50K" 17308 (75.9389%) dtype:DTYPE_BYTES
2: "workclass" CATEGORICAL num-nas:1257 (5.51509%) has-dict vocab-size:8 num-oods:3 (0.0139308%) most-frequent:"Private" 15879 (73.7358%) dtype:DTYPE_BYTES
4: "education" CATEGORICAL has-dict vocab-size:17 zero-ood-items most-frequent:"HS-grad" 7340 (32.2043%) dtype:DTYPE_BYTES
6: "marital_status" CATEGORICAL has-dict vocab-size:8 zero-ood-items most-frequent:"Married-civ-spouse" 10431 (45.7661%) dtype:DTYPE_BYTES
7: "occupation" CATEGORICAL num-nas:1260 (5.52826%) has-dict vocab-size:14 num-oods:4 (0.018577%) most-frequent:"Prof-specialty" 2870 (13.329%) dtype:DTYPE_BYTES
8: "relationship" CATEGORICAL has-dict vocab-size:7 zero-ood-items most-frequent:"Husband" 9191 (40.3256%) dtype:DTYPE_BYTES
9: "race" CATEGORICAL has-dict vocab-size:6 zero-ood-items most-frequent:"White" 19467 (85.4115%) dtype:DTYPE_BYTES
10: "sex" CATEGORICAL has-dict vocab-size:3 zero-ood-items most-frequent:"Male" 15165 (66.5365%) dtype:DTYPE_BYTES
14: "native_country" CATEGORICAL num-nas:407 (1.78571%) has-dict vocab-size:41 num-oods:1 (0.00446728%) most-frequent:"United-States" 20436 (91.2933%) dtype:DTYPE_BYTES
NUMERICAL: 6 (40%)
1: "age" NUMERICAL mean:38.6153 min:17 max:90 sd:13.661 dtype:DTYPE_INT64
3: "fnlwgt" NUMERICAL mean:189879 min:12285 max:1.4847e+06 sd:106423 dtype:DTYPE_INT64
5: "education_num" NUMERICAL mean:10.0927 min:1 max:16 sd:2.56427 dtype:DTYPE_INT64
11: "capital_gain" NUMERICAL mean:1081.9 min:0 max:99999 sd:7509.48 dtype:DTYPE_INT64
12: "capital_loss" NUMERICAL mean:87.2806 min:0 max:4356 sd:403.01 dtype:DTYPE_INT64
13: "hours_per_week" NUMERICAL mean:40.3955 min:1 max:99 sd:12.249 dtype:DTYPE_INT64
Terminology:
nas: Number of non-available (i.e. missing) values.
ood: Out of dictionary.
manually-defined: Attribute whose type is manually defined by the user, i.e., the type was not automatically inferred.
tokenized: The attribute value is obtained through tokenization.
has-dict: The attribute is attached to a string dictionary e.g. a categorical attribute stored as a string.
vocab-size: Number of unique values.
[INFO 24-05-17 15:29:24.1653 UTC abstract_learner.cc:128] No input feature explicitly specified. Using all the available input features.
[INFO 24-05-17 15:29:24.1823 UTC abstract_learner.cc:142] The label "income" was removed from the input feature set.
[INFO 24-05-17 15:29:24.1976 UTC random_forest.cc:416] Training random forest on 22792 example(s) and 14 feature(s).
[INFO 24-05-17 15:29:24.3008 UTC random_forest.cc:802] Training of tree 1/300 (tree index:0) done accuracy:0.842143 logloss:5.68975
[INFO 24-05-17 15:29:24.3899 UTC random_forest.cc:802] Training of tree 11/300 (tree index:1) done accuracy:0.855271 logloss:2.88284
[INFO 24-05-17 15:29:24.4740 UTC random_forest.cc:802] Training of tree 21/300 (tree index:10) done accuracy:0.860728 logloss:1.95538
[INFO 24-05-17 15:29:24.5490 UTC random_forest.cc:802] Training of tree 31/300 (tree index:25) done accuracy:0.862188 logloss:1.53271
[INFO 24-05-17 15:29:24.6546 UTC random_forest.cc:802] Training of tree 41/300 (tree index:40) done accuracy:0.862452 logloss:1.33531
[INFO 24-05-17 15:29:24.7470 UTC random_forest.cc:802] Training of tree 51/300 (tree index:49) done accuracy:0.862452 logloss:1.17718
[INFO 24-05-17 15:29:24.8189 UTC random_forest.cc:802] Training of tree 61/300 (tree index:60) done accuracy:0.862978 logloss:1.07946
[INFO 24-05-17 15:29:24.8915 UTC random_forest.cc:802] Training of tree 71/300 (tree index:69) done accuracy:0.863461 logloss:0.984856
[INFO 24-05-17 15:29:24.9501 UTC random_forest.cc:802] Training of tree 81/300 (tree index:80) done accuracy:0.864514 logloss:0.947954
[INFO 24-05-17 15:29:25.0158 UTC random_forest.cc:802] Training of tree 91/300 (tree index:90) done accuracy:0.864645 logloss:0.877497
[INFO 24-05-17 15:29:25.0751 UTC random_forest.cc:802] Training of tree 101/300 (tree index:100) done accuracy:0.86447 logloss:0.846556
[INFO 24-05-17 15:29:25.1351 UTC random_forest.cc:802] Training of tree 111/300 (tree index:110) done accuracy:0.864163 logloss:0.804944
[INFO 24-05-17 15:29:25.1967 UTC random_forest.cc:802] Training of tree 121/300 (tree index:120) done accuracy:0.864953 logloss:0.783092
[INFO 24-05-17 15:29:25.2593 UTC random_forest.cc:802] Training of tree 131/300 (tree index:130) done accuracy:0.865128 logloss:0.761509
[INFO 24-05-17 15:29:25.3165 UTC random_forest.cc:802] Training of tree 141/300 (tree index:140) done accuracy:0.864207 logloss:0.737278
[INFO 24-05-17 15:29:25.3691 UTC random_forest.cc:802] Training of tree 151/300 (tree index:149) done accuracy:0.864909 logloss:0.722793
[INFO 24-05-17 15:29:25.4256 UTC random_forest.cc:802] Training of tree 161/300 (tree index:159) done accuracy:0.864338 logloss:0.707122
[INFO 24-05-17 15:29:25.4817 UTC random_forest.cc:802] Training of tree 171/300 (tree index:171) done accuracy:0.864953 logloss:0.695653
[INFO 24-05-17 15:29:25.5345 UTC random_forest.cc:802] Training of tree 181/300 (tree index:181) done accuracy:0.864909 logloss:0.684699
[INFO 24-05-17 15:29:25.5848 UTC random_forest.cc:802] Training of tree 191/300 (tree index:190) done accuracy:0.864733 logloss:0.674939
[INFO 24-05-17 15:29:25.6356 UTC random_forest.cc:802] Training of tree 201/300 (tree index:200) done accuracy:0.864865 logloss:0.662798
[INFO 24-05-17 15:29:25.6880 UTC random_forest.cc:802] Training of tree 211/300 (tree index:209) done accuracy:0.86504 logloss:0.653042
[INFO 24-05-17 15:29:25.7417 UTC random_forest.cc:802] Training of tree 221/300 (tree index:220) done accuracy:0.864996 logloss:0.641847
[INFO 24-05-17 15:29:25.7911 UTC random_forest.cc:802] Training of tree 231/300 (tree index:230) done accuracy:0.864733 logloss:0.631173
[INFO 24-05-17 15:29:25.8225 UTC random_forest.cc:802] Training of tree 262/300 (tree index:261) done accuracy:0.864821 logloss:0.6299
[INFO 24-05-17 15:29:25.9500 UTC random_forest.cc:802] Training of tree 272/300 (tree index:271) done accuracy:0.865216 logloss:0.599614
[INFO 24-05-17 15:29:26.0030 UTC random_forest.cc:802] Training of tree 282/300 (tree index:281) done accuracy:0.865479 logloss:0.588677
[INFO 24-05-17 15:29:26.0873 UTC random_forest.cc:802] Training of tree 292/300 (tree index:291) done accuracy:0.865567 logloss:0.583081
[INFO 24-05-17 15:29:26.1583 UTC random_forest.cc:802] Training of tree 300/300 (tree index:299) done accuracy:0.865391 logloss:0.576064
[INFO 24-05-17 15:29:26.1735 UTC random_forest.cc:882] Final OOB metrics: accuracy:0.865391 logloss:0.576064
Model trained in 0:00:02.149436
How can I prevent this print from being generated every time I train?
My question is because I am doing the process many times, in a cycle, and after a while it takes up my memory.
I was looking for an input or variable like ‘verbose=False’, but I couldn’t identify it
thank you so much
I was looking for an input or variable like verbose=False, but I couldn’t identify it
MRojas is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.