I have an S3 bucket which contains parquet files.
I need to analyse that parquet file and create the required table in Redshift serverless.
import pyarrow.parquet as pq
df = pq.read_table(f"s3://{bucket_name}/{s3_path}").to_pandas()
table_create_statement = pd.io.sql.get_schema(df, table_name)
Using the above code, I was able to get the create statement for the dataframe as table.
But the df
columns contains numbers that are of 38 digits. where as the create table statement that I got contain the INTEGER
data type which has numeric precision of 32 for that column instead of NUMERIC(38, 0)
or DECIMAL(38, 0)
.
And for some columns, the table create statement have boolean type, where as the actual data contains numbers or strings – TRUE
or FALSE
How do I get the table create statement that’s compatible for Redshift serverless?
Poreddy Siva Sukumar Reddy US is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.