I am writing a cron job to fully dump some tables from a database to a CSV file. The table content is quite large, with more than 1 million records, which is why I chose to use JPA Batch Reading. I used the Stream approach for JPA Batch:
@Query(value = "SELECT client FROM Client as client " +
"JOIN FETCH client.company " +
"JOIN FETCH client.clientData")
@QueryHints(@QueryHint(name = HINT_FETCH_SIZE, value = "25"))
Stream<Client> findAllFetchingCompanyAndClientData();
I noticed that the process log prints separate SQL queries for fetching the Company and ClientData for each Client. It seems the JOIN FETCH is not working together with Stream batch. Has anyone used Stream together with JOIN FETCH without any issues?
I read somewhere that Spring JPA Batch with Stream does not support JOIN FETCH. I changed the above query to use Paging with two queries:
@Query(value = "SELECT client.id FROM Client as client ")
Page<Integer> findAllId(Pageable pageable);
@Query(value = "SELECT client FROM Client as client " +
"JOIN FETCH client.company " +
"JOIN FETCH client.clientData " +
"WHERE client.id IN (:ids)")
List<Client> findAllFetchingCompanyAndClientData(@Param("ids") Collection<Integer> ids);
The first query fetches the Client IDs for each page, and the second query fetches the Clients together with Company and ClientData based on those IDs. While the JOIN FETCH works, the performance is much worse. It takes more than 10x or even 20x longer to finish compared to Batching with Stream. I configured the Pageable to use 1000 rows per page.
Does anyone have any thoughts on this?