I am learning Spring Data JPA. I want to implement this query in it:
select count(*) from p
left join c on p.p_id = c.p_id
left join g on c.c_id = g.c_id;
I have three tables, p, c and g.
These are abbreviations for:
p = parent
c = child
g = grandchild
The relationship between them is as follows:
p
andc
have one to many relationshipc
andg
have one to many relationship
My entities:
P.java
@Entity
@Getter
@Setter
@ToString
public class P {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long pId;
@Column(nullable = false)
private String pName;
@OneToMany(mappedBy = "p")
private List<C> c;
}
C.java
@Entity
@Getter
@Setter
@ToString
public class C {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long cId;
@Column(nullable = false)
private String cName;
@ToString.Exclude
@ManyToOne
@JoinColumn(name = "p_id")
private P p;
@OneToMany(mappedBy = "c")
private List<G> g;
}
G.java
@Entity
@Getter
@Setter
@ToString
public class G {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long gId;
@Column(nullable = false)
private String gName;
@ToString.Exclude
@ManyToOne
@JoinColumn(name = "c_id")
private C c;
}
I am trying to fetch parents, their corresponding children and the grandchildren of those children into same object in a single join query, in my repository method like:
PRepository.java
public interface PRepository extends JpaRepository<P, Long> {
@Query("SELECT p FROM P p LEFT JOIN FETCH p.c c LEFT JOIN FETCH c.g")
List<P> findAllProjectedP();
}
However, this query throws MultipleBagFetchException
org.springframework.dao.InvalidDataAccessApiUsageException: org.hibernate.loader.MultipleBagFetchException: cannot simultaneously fetch multiple bags: [com.test.entity2.C.g, com.test.entity2.P.c]
I read it’s because JPA annotations are parsed not to allow more than two eagerly loaded collection, but the hibernate-specific annotations allow it.
I looked at this answer, that advises to use @LazyCollection(LazyCollectionOption.FALSE)
. However, the type LazyCollection is deprecated since Hibernate version 6.2 and this is Hibernate exclusive, I was hoping to achieve it in Spring Data JPA.
Another option I found was to use Set instead of List as the type of associated collection in the entity.
But Vlad Mihalcea features this solution as the worst solution in his answer as it introduces the Cartesian Product Problem.
However in my case, I am not joining one single table with 2 tables (ex. P with C & G). I am joining one table P with table C and the table C with table G. So I think this does not produce a cartesian product:
So in this case, is it okay to use Set to solve this problem?
And if I use Set, will it give rise to Cartesian problems with my other queries in future, and will I need to be careful with them?
And if I use Set here, is it necessary to override equals and hashcode here with some business key and if I do not have a business key modify my logic with some complex logic like mentioned in this answer in case Hibernate automatically inserts and removes same references in sets with different instances.
Is it really this complex to implement that three lines join query in Hibernate, or am I missing something?
Other solution for this exception I found was to split the query into multiple queries as mentioned in this article.
However, It is not straightforward for me since the result of first join returns a p List, while the result of the second join returns a c List. And since they are 2 different objects, I am not able to combine them automatically using hibernate.
For example, I tried with two methods in repositories:
- First method to return Parents p joined with Children c
@Query("SELECT p FROM P p LEFT JOIN FETCH p.c") List<P> findAllProjectedP();
- Second method in CRepository to join Children c with GrandChildren g
@Query("SELECT C FROM C c LEFT JOIN c.g ") List<C> findAllProjectedP();
Calling both of these methods hoping Hibernate will first fetch and save all the children with grandchildren in its first level cache / context on calling the first method findAllProjectedCG()
and then on calling findAllProjectedPC()
it will use those children in the cache to join it with the parents, resulting in only 2 queries:
@DataJpaTest
@AutoConfigureTestDatabase(replace = AutoConfigureTestDatabase.Replace.NONE)
class CartesianTest {
@Autowired
private PRepository pR;
@Autowired
private CRepository cR;
@Test
@Transactional
void test() {
List<C> cL = cR.findAllProjectedCG();
List<P> pL = pR.findAllProjectedPC();
System.out.println(pL);
}
}
However, this is not the case and the children returned in cL are not used. Many queries are still fired to load grandChildren for pL lazily. It also loads parents for children in first method lazily for some reasons.. generating more queries as they have bidirectional relationship.
When using this way, do I need to do entity specific modifications, removing parent from child, maybe affecting unrelated queries, and manually many mapping cL into pL, or there is some way Hibernate can do it for us?
I am hoping to avoid Hibernate-specific code. Please suggest the proper way to achieve it.
My PostgreSQL database scripts
CREATE TABLE p (
p_id BIGSERIAL PRIMARY KEY,
p_name VARCHAR(255) NOT NULL
);
CREATE TABLE c (
c_id BIGSERIAL PRIMARY KEY,
c_name VARCHAR(255) NOT NULL,
p_id BIGINT REFERENCES p(p_id)
);
CREATE TABLE g (
g_id BIGSERIAL PRIMARY KEY,
g_name VARCHAR(255) NOT NULL,
c_id BIGINT REFERENCES c(c_id)
);
Data Insertion
-- Insert data into p table
DO $$
BEGIN
FOR i IN 1..20 LOOP
INSERT INTO p (p_name) VALUES (CONCAT('p', i));
END LOOP;
END $$;
-- Insert data into c and g tables
DO $$
DECLARE
p_row RECORD;
c_row RECORD;
c_counter INT;
g_counter INT;
BEGIN
FOR p_row IN SELECT p_id, p_name FROM p LOOP
-- Insert children for each parent
c_counter := 1;
WHILE c_counter <= 3 LOOP
INSERT INTO c (c_name, p_id) VALUES (CONCAT(p_row.p_name, 'c', c_counter), p_row.p_id);
-- Insert 2-5 grandchildren for each child
g_counter := 1;
WHILE g_counter <= (2 + (RANDOM() * 3)::INT) LOOP
INSERT INTO g (g_name, c_id) VALUES (CONCAT(p_row.p_name, 'c', c_counter, 'g', g_counter),
(SELECT c_id FROM c WHERE c_name = CONCAT(p_row.p_name, 'c', c_counter) AND p_id = p_row.p_id));
g_counter := g_counter + 1;
END LOOP;
c_counter := c_counter + 1;
END LOOP;
END LOOP;
END $$;