Which option below is better for reading 200 million rows or 21GB data from Teradata and then loading into bigquery? Any better ways other than below? Also, I want to automate the process with Airflow DAGs.
- Use Teradata TPT to export as csv file (21GB) => then gzip it (1+ GB) => upload into GS bucket => bq load into Bigquery table
- Use Spark to read in directly via jdbc Teradata connection => buffering in GS temp table => Bigquery table