I am following Data Engineering Zoomcamp course and in module 2 for workflow orchestration with mage.ai (ETL: GCS to BigQuery, I am trying to load data from google cloud storage to BigQuery but I get an error
NotFound: 404 Not found: Dataset [projectID]:ny_taxi was not found in location US
To recreate the data pipeline follow module 2 of the course for setup. Also Below are three scripts in mage.ai with Data Loader, Transformer and SQL Data Exporter to export data from google cloud storage to BigQuery.
Data Loader Script
from mage_ai.settings.repo import get_repo_path
from mage_ai.io.config import ConfigFileLoader
from mage_ai.io.google_cloud_storage import GoogleCloudStorage
from os import path
if 'data_loader' not in globals():
from mage_ai.data_preparation.decorators import data_loader
if 'test' not in globals():
from mage_ai.data_preparation.decorators import test
@data_loader
def load_from_google_cloud_storage(*args, **kwargs):
"""
Template for loading data from a Google Cloud Storage bucket.
Specify your configuration settings in 'io_config.yaml'.
Docs: https://docs.mage.ai/design/data-loading#googlecloudstorage
"""
config_path = path.join(get_repo_path(), 'io_config.yaml')
config_profile = 'default'
bucket_name = 'mage'
object_key = 'ny_taxi_data.parquet'
return GoogleCloudStorage.with_config(ConfigFileLoader(config_path, config_profile)).load(
bucket_name,
object_key,)
Transformer
if 'transformer' not in globals():
from mage_ai.data_preparation.decorators import transformer
if 'test' not in globals():
from mage_ai.data_preparation.decorators import test
@transformer
def transform(data, *args, **kwargs):
data.columns = (data.columns
.str.replace(' ', '_')
.str.lower())
return data
@test
def test_output(output, *args) -> None:
"""
Template code for testing the output of the block.
"""
assert output is not None, 'The output is undefined'
SQL Data Exporter
connection: Bigquery
profile: default
schema: nyc_taxi
Table: yellow_cab_data