- i m using Pentaho Data Integration for loading incremental data from Production Postgres to Staging Postgres DB.
- For doing so i m getting max timestamp from staging DB and querying production DB for getting records with timestamp greater then the max value.
- For getting max timestamp i m using table input and for second query to get records from production DB i m using Database Join (I tried table input also) where date variable will be replaced from the previous step value.
- I have tried giving limit of 1 Lac records when selecting from production DB and that got completed but when doing without limit step is not getting completed even after 90+ hours and also not giving any error.
- Doing the count query is giving result in 15-20 mins. Number of records to be pushed is more than 70 Lacs for now.
- Once this 70 Lac records are loaded this job will be running every hour.
- For loading data to staging postgres DB i m using Bulk Postgres Data Loader.