I have an apache beam pipeline that reads a CSV file and also makes a query to a bigquery table, I need to count how many registers are in each PCollection and based on that create a FinalResult object that has both counts, have not been able to figure out how to combine both result into one object.
Here is my code:
public Pipeline build() {
PCollection<QueryData> queryData = pipeline.apply("Read from BigQuery",
BigQueryIO.read(reader).fromQuery(reader.getQueryString()).usingStandardSql().withoutValidation());
PCollection<Incident> incidentes = pipeline.apply("READ CSV", TextIO.read().from("gs://bucket/incidents.csv"))
.apply("Convert to bean", ParDo.of(new IncidentCsvToBeanFunction()));
PCollection<Long> queryDataCount = queryData.apply("Count QueryData", Count.globally());
PCollection<Long> incidentesCount = incidentes.apply("Count Incidentes", Count.globally());
// Missing how to combine both counts into FinalResult object, which im going to save later into another bigquery table
return pipeline;
}
I am missing how to combine queryDataCount and incidentesCount into my FinalResult object which looks like:
public class FinalResult implements Serializable {
private Integer numberOfIncidents;
private Integer numberOfQueryData;
// getters setters
}