With Code below I’m getting:
for i in range(metrics2.meanAbsoluteError):
TypeError: ‘float’ object cannot be interpreted as an integer
I’ve converted the 2 columns (max_FPIWX_obs and max_fpiwx) to doubletype and get this error. After searching stackoverflow I found a range cannot take a float; so when I do the following: for i in range(int(metrics2.meanAbsoluteError)), I get: [CANNOT_INFER_SCHEMA_FOR_TYPE] Can not infer schema for type: int
.
I also tried changeing both doubletypes to integer types after rounding (see commented out code). but get another [CANNOT_INFER_SCHEMA_FOR_TYPE] error “DoubleType() cannot accept object ‘3’ in type ‘int’. The data from both columns has no floating characters to begin with, and no nulls. Any ideas? please see screenshots for errors. Thx!!
ctx, metrics, mae_output
):
metrics = metrics.dataframe()
metrics = metrics.filter(metrics.max_fpiwx != -999)
metrics = metrics.fillna( { 'max_FPIWX_obs':0, 'max_fpiwx':0 } )
metrics = metrics.withColumn("max_FPIWX_obs",metrics.max_FPIWX_obs.cast(T.DoubleType()))
# metrics = metrics.withColumn("max_FPIWX_obs", F.round(metrics.max_FPIWX_obs))
# metrics = metrics.withColumn("max_FPIWX_obs",metrics.max_FPIWX_obs.cast(T.IntegerType()))
# metrics = metrics.withColumn("max_fpiwx", F.round(metrics.max_FPIWX_obs))
# metrics = metrics.withColumn("max_fpiwx",metrics.max_FPIWX_obs.cast(T.IntegerType()))
metrics = metrics.withColumn("row_id", F.monotonically_increasing_id())
mae_values = metrics.select(F.col('max_fpiwx'), F.col('max_FPIWX_obs'))
mae_values_rdd_new = mae_values.rdd.map(tuple)
metrics2 = RegressionMetrics(mae_values_rdd_new)
MAE_output = []
for i in range(metrics2.meanAbsoluteError):
MAE_output.append(i)
MAE_df = ctx.spark_session.createDataFrame(data=MAE_output, schema = ["MAE_metrics"])
MAE_df = MAE_df.withColumn("row_id", F.monotonically_increasing_id())
MAE_df_final = metrics.join(MAE_df, ("row_id")).drop("row_id")
mae_output.write_dataframe(MAE_df)