I m trying to emulate cv.glmnet (family=”cox”) call for a model with splines using mlr3

The following code throws an error. Thank you in advance for your help.

require(mlr3)
require(mlr3proba)
require(mlr3learners)
require(mlr3tuning)
require(mlr3pipelines)
require(mlr3verse)
require(mlr3viz)
#- require(mlr3fda)require(mlr3verse)
require(survival)
require(glmnet)
require(splines)

Simulate a regression dataset


set.seed(123)
n <- 100
p <- 3
X <- matrix(rnorm(n * p), nrow = n, ncol = p)
time <- rexp(n, rate = 1)
status <- sample(0:1, n, replace = TRUE)
df <- as.data.frame(X)
df$time <- time
df$status <- status

Create a survival task

task <- TaskSurv$new("survival_task", backend = df, time = "time", event ="status")
task

#---- Perform initial split
initial_split <- rsmp("holdout")
initial_split$instantiate(task)

Separate the data into training and testing sets


train_task <- task$clone()$filter(initial_split$train_set(1))  
test_task  <- task$clone()$filter(initial_split$test_set(1)

Load the glmnet learner


learner <- lrn("surv.glmnet")
#---- Define the hyperparameter search space  
search_space <- ps(   alpha  = p_dbl(lower = 0, upper = 1),   
                      lambda = p_dbl(lower = 0.0001, upper = 0.1, logscale = TRUE)
 )

#---- Define objects needed for tuning 
#---- Create a Pipeline Using for Splines Transformation
#- library(paradox)
#- Define a function to apply splines transformation
apply_splines <- function(x) {
     as.data.table(splines::ns(x, df = 3))  
}  

#- Define the pipeline graph for applying splines transformation

graph <- gunion(list(
      po("colapply", id = "spline_V1", applicator = apply_splines,        
              affect_columns = selector_name("V1")),
      po("colapply", id = "spline_V2", applicator = apply_splines,       
              affect_columns = selector_name("V2")),
      po("colapply", id = "spline_V3", applicator = apply_splines,
              affect_columns = selector_name("V3"))  )) %>>% 
      po("featureunion") %>>% 
    learner  

#-- Create the pipeline learner 

    pipeline <- GraphLearner$new(graph)

    #--- Define the resampling strategy for tuning
    resampling <- rsmp("cv", folds = 5)
    
    # Define the performance measure for survival analysis
    measure <- msr("surv.cindex")
    
    # Create the tuner
    tuner <- tnr("grid_search", resolution = 5)

    #-- Define the AutoTuner
    at <- AutoTuner$new(
    learner = pipeline,
    resampling = resampling,
    measure = measure,
    search_space = search_space,
    terminator = trm("evals", n_evals = 20),
    tuner = tuner
    )
    
    # Train the AutoTuner on the training set
    at$train(train_task)

… part of the output is omitted

INFO  [18:08:27.654] [mlr3] Finished benchmark
INFO  [18:08:27.692] [bbotk] Result of batch 20:
INFO  [18:08:27.694] [bbotk]  alpha    lambda surv.cindex warnings errors runtime_learners
INFO  [18:08:27.694] [bbotk]   0.25 -7.483402   0.4561424        0      0             1.52
INFO  [18:08:27.694] [bbotk]                                 uhash
INFO  [18:08:27.694] [bbotk]  a491c12c-47e5-448b-b365-34aa53350e01
INFO  [18:08:27.711] [bbotk] Finished optimizing after 20 evaluation(s)
INFO  [18:08:27.712] [bbotk] Result:
INFO  [18:08:27.714] [bbotk]  alpha    lambda learner_param_vals  x_domain surv.cindex
INFO  [18:08:27.714] [bbotk]  <num>     <num>             <list>    <list>       <num>
INFO  [18:08:27.714] [bbotk]   0.75 -4.029524          <list[8]> <list[2]>   0.4561424

Error in self$assert(xs, sanitize = TRUE) : 
  Assertion on 'xs' failed: Parameter 'alpha' not available. Did you mean 'spline_V1.applicator' / 'spline_V1.affect_columns' / 'spline_V2.applicator'?.

The issue explained

You define a GraphLearner that inside somewhere has a learner. When you define the Autotuner you provide the search_space of the learner not of the learner inside the larger GraphLearner.

The difference is that for the learner, the parameters that need tuning are defined as alpha and lamdba. Inside the GraphLearner they are defined as surv.glmnet.alpha and surv.glmnet.lambda. This triggers warnings as many lambdas are actually fitted (pretty much the search_space is not used at all in your case I think). You can see that if in your Autotuner you just used the learner, then things would work normally.

This is more general: the GraphLearner constructs <pipeop_id>.<arg_name> to be able to differentiate between parameters of the different pipeops.

Solution(s)

Suggested: Define the search_space with the learner (and when the GraphLearner gets constructed, the prefix of the parameters is automatically added)

learner = lrn("surv.glmnet")
learner$param_set$set_values(.values = list(
  alpha = to_tune(0, 1),
  lambda = to_tune(p_dbl(0.001, 0.1, logscale = TRUE))
))

Note that in this case you DO NOT need to use the search_space argument in AutoTuner.

Manually define the search space with the suffixes directly given that you don’t change the id = surv.glmnet of the learner, ie:

search_space = ps(
  surv.glmnet.alpha  = p_dbl(lower = 0, upper = 1),   
  surv.glmnet.lambda = p_dbl(lower = 0.0001, upper = 0.1, logscale = TRUE)
)

Suggestions

You can simplify the pipeline with the colapply as the same operation is applied to all columns, see examples
Whenever you want just a simple train/test split, do use:

# simple train/test split
part = partition(task)
at$train(task, row_ids = part$train)

Use the sugar function to construct the autotuner:

at = auto_tuner(
  learner = pipeline, # better name => grlrn, it has the `learner` inside with "solution No 1" above
  resampling = resampling,
  measure = measure,
  tuner = tuner,
  term_evals = 20
)

The revised code included below is working fine. Your suggestions were very helpful. Note that I corrected learner statement. Now it reads learner = lrn("surv.glmnet")

    # Required libraries
    library(mlr3)
    library(mlr3proba)
    library(mlr3learners)
    library(mlr3extralearners)  # added
    library(mlr3tuning)
    library(mlr3pipelines)
    library(survival)
    library(splines)
    library(data.table)
    library(glmnet)

Simulate data

    set.seed(123)
    n = 100
    p = 3
    X = matrix(rnorm(n * p), nrow = n, ncol = p)
    time = rexp(n, rate = 1)
    status = sample(0:1, n, replace = TRUE)
    df = as.data.frame(X)
    df$time <- time
    df$status <- status

Define a survival task

    task <- TaskSurv$new("survival_task", backend = df, time = 
      "time", event = "status")

Load the learner

    learner <- lrn("surv.glmnet")
    learner$param_set$set_values(.values = list(
      alpha  = to_tune(0, 1),
      lambda = to_tune(p_dbl(0.00001, 1, logscale = TRUE))
))

Splines

    apply_splines <- function(x) {
     as.data.table(splines::ns(x, df = 3))
    }

    # Define  grlrn for applying splines transformation
    grlrn0 <- 
     po("colapply", id = "spline_all", 
        applicator = apply_splines, 
        affect_columns = selector_type("numeric")) %>>%
     po("learner", learner)
     grlrn <- GraphLearner$new(grlrn0)

Tuning

    # Resampling strategy for tuning
    resampling <- rsmp("cv", folds = 5)

    # Performance measure for survival analysis
    measure <- msr("surv.cindex")
    
    # Create the tuner
    tuner <- tnr("grid_search", resolution = 10)
    
    # Define the AutoTuner
    at <- auto_tuner(
      learner = grlrn, 
      resampling = resampling,
      measure = measure, 
      ### search_space = search_space, # omitted
      terminator = trm("evals", n_evals = 50),
      tuner = tuner
     )
     
     # Simple train/test split
     part = partition(task)
     at$train(task, row_ids = part$train)
     at$model
     at$tuning_result

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: Kiến thức lập trình - @ 17:24

Thẻ: mlr3

Thiết kế website giá rẻ

Danh mục

I m trying to emulate cv.glmnet (family=”cox”) call for a model with splines using mlr3

The issue explained

Solution(s)

Suggestions