I am pretty new to Rust and I’m trying to convert arrow record batches to polars dataframes in Rust, but keep getting the same error:
expected struct `Box<(dyn polars_arrow::array::Array + 'static)>` found reference `&dyn arrow_array::Array`
clearly, being unable to convert from arrow Array to polars Array.
use arrow_odbc::OdbcReaderBuilder;
use arrow_odbc::odbc_api as odbc_api;
// use arrow::record_batch::RecordBatch;
use odbc_api::{Environment, ConnectionOptions};
use polars::prelude::*;
use polars_arrow;
// use polars::{prelude::{DataFrame, Field}, series::Series, functions::concat_df_diagonal};
use arrow_array::RecordBatch;
fn record_batch_to_dataframe(batch: RecordBatch) -> Result<DataFrame, PolarsError> {
let schema = batch.schema();
let mut columns:Vec<Series> = Vec::with_capacity(batch.num_columns());
for (i, column) in batch.columns().iter().enumerate() {
println!("{}",i);
dbg!(&column);
dbg!(&schema.fields().get(i).unwrap().name().to_string());
let arrow = Box::<dyn polars_arrow::array::Array>::from(&**column);
columns.push(Series::from_arrow(
PlSmallStr::from_string(schema.fields().get(i).unwrap().name().to_string()),
arrow,
)?);
}
Ok(DataFrame::from_iter(columns))
}
fn main() -> Result<(), anyhow::Error> {
let odbc_environment = Environment::new()?;
// Connect with database.
let connection = odbc_environment.connect_with_connection_string(
CONNECTION_STRING,
ConnectionOptions::default(),
)?;
// This SQL statement does not require any arguments.
let parameters = ();
// const PROJECT_ID: &str = "*********";
const TABLE_ID: &str = "*********";
let query = format!("SELECT * FROM `{}` LIMIT 5", TABLE_ID);
// Execute query and create result set
let cursor = connection
.execute(&query, parameters)?
.expect("SELECT statement must produce a cursor");
// Read result set as arrow batches. Infer Arrow types automatically using the meta
// information of `cursor`.
let arrow_record_batches = OdbcReaderBuilder::new()
// Use at most 256 MiB for transit buffer
.with_max_bytes_per_batch(256 * 1024 * 1024)
.build(cursor)?;
for batch in arrow_record_batches.into_iter() {
// ... process batch ...
dbg!(&batch);
let _ = record_batch_to_dataframe(batch.unwrap());
}
Ok(())
}
I already tried the answer in Arrow RecordBatch as Polars DataFrame, and its solution doesn’t work anymore in my case.
Can anybody help? Thank you very much
EDIT:
full error
error[E0308]: mismatched types
--> src/main.rs:27:65
|
27 | let arrow = Box::<dyn polars_arrow::array::Array>::from(&**column);
| ------------------------------------------- ^^^^^^^^^ expected `Box<dyn Array>`, found `&dyn Array`
| |
| arguments to this function are incorrect
|
= note: expected struct `Box<(dyn polars_arrow::array::Array + 'static)>`
found reference `&dyn arrow_array::Array`
note: associated function defined here
at this line
let arrow = Box::<dyn polars_arrow::array::Array>::from(&**column);
columns.push(Series::from_arrow( PlSmallStr::from_string(schema.fields().get(i).unwrap().name().to_string()),
arrow,
)?);
7