I’m reading data from the BigQuery as TableRows, then convert them to Rows. This step takes a lot of time. Is it possible to read Rows from the beginning? Or any other ideas of how to make it faster?
p.apply("BQ " + tableName, BigQueryIO.readTableRows().from(table))
.apply(ParDo.of(new TableRowToRowConverter(BQSchema)))
.setRowSchema(BQSchema);
public class TableRowToRowConverter extends DoFn<TableRow, Row> implements Serializable {
private static final long serialVersionUID = 1L;
private final Schema schema;
public TableRowToRowConverter(Schema schema) {
this.schema = schema;
}
@ProcessElement
public void processElement(ProcessContext c) {
TableRow tableRow = c.element();
Map<String, Object> fieldValues = new HashMap<>();
assert tableRow != null;
schema.getFieldNames().forEach(e -> {
try {
fieldValues.put(e, tableRow.get(e)!= null ? tableRow.get(e) : "");
} catch (Exception e1) {
}
});
Row modifiedRow = Row.withSchema(schema).withFieldValues(fieldValues).build();
assert tableRow != null;
c.output(modifiedRow);
}
}