Consider a PostgreSQL database containing
(1) N groups of Tags
Group 1: Adults, Kids
Group 2: Sci-fi, Fantasy, Drama
Group 3: Human, Alien
Group 3: Digital, Paperback
...
(2) N different Stories with various combinations of those Tags assigned – describing what the Story is about:
Story 1: Adults, Kids, Fantasy, Drama, Human, Digital, Paperback,
Story 2: Adults, Kids, Sci-Fi, Digital
Story 3: Digital, Adults, Human
...
And (3) N different people again with various combinations of Tags assigned – describing what the Person is interested in:
Person 1: Adults, Kids, Sci-Fi, Fantasy, Drama, Human, Alien, Digital
Person 2: Kids, Sci-fi, Fantasy
Person 3: Sci-Fi, Digital, Paperback
Neither side always uses Tags from all groups – sometimes a group is represented by multiple tags (Person 1: Adults AND Kids). Sometimes, one or more groups are not used at all (Person 3 not using Groups 1, 2 and 3; Story 3 not using Group 2)
I’m trying to load a list of all Stories related to People through their mutual Tag combination where:
- the Person.tags side is the authoritative: if a selected tag doesn’t exist in Story.tags, the Story is deemed unrelated, and Tags on Story.tags side that don’t exist in Person.tags are ignored
- [the tricky bit that gets me]: At least one Tag from each Group on the Person.tags must be present in Story.tags, but not all
In other words, I need the following logic to be applied:
Person 1: (Adults OR Kids) AND (Sci-Fi OR Fantasy OR Drama) AND (Human OR Alien) AND (Digital), matching Story 1
Person 2: (Kids) AND (Sci-fi OR Fantasy), matching Story 1 and 2
Person 3: (Sci-Fi) AND (Digital OR Paperback), matching Story 2 and 3
I’ve tried a few scenarios, but the best I could come up was using the intersection of Tags on both sides to drive the match – which was incorrect because for instance Person 3 was matched with Story 1 because of the “Digital” Tag present in Story 1, but Person 3 also needs the “Sci-Fi” tag present, which isn’t the case of Story 1, hence Story 1 should have been not linked with Person 3.
I’m able to construct a correct query for a single person in my application, but only after reading the Person.array to learn what Tags are there, hence how does the WHERE statement look like for that specific person. I can’t do that when I want to find matches of multiple people with multiple stories in a single query.
I think there could be some way along the lines of CROSS JOINing all of the possible combinations of the Tags all Persons are interested in, then joining it with a flattened list of Story Tags:
For person 1, that means:
Adults, Scifi, Human, Digital
Adults, Fantasy, Human, Digital
Adults, Drama, Human, Digital
Adults, Scifi, Alien, Digital
Adults, Fantasy, Alien, Digital
Adults, Drama, Alien, Digital
Kids, Scifi, Human, Digital
Kids, Fantasy, Human, Digital
Kids, Drama, Human, Digital
Kids, Scifi, Alien, Digital
Kids, Fantasy, Alien, Digital
Kids, Drama, Alien, Digital
Note how the Tag Groups are important – it’s not just a cross join of all Tags, there is no Adults, Kids, Scifi, Fantasy, Drama, Human, Alien
combination, for example.
For Person 3, that means:
Sci-Fi, Digital
Sci-Fi, Paperback
I could then inner join the combinations of Person.tags with the Story.tags, only accepting Story records where all Tags on the Person side are found in the list of Tags on the Story side. Some Stories could be linked to one person multiple times through different Tag combinations, but I could just select DISTINCT Stories to get around that.
My guts tell me https://www.postgresql.org/docs/current/tutorial-window.html might be the answer, but it’s way outside of what I comfortably know today.
Am I at least going in the right direction? Any guidance would be greatly appreciated.