I’m writing a filtering mechanism for my tag system. What I’m having trouble is getting tag use count when given a set of some tags. I’ll explain my setup first.
I have two main tables: tags
and item_tags
. First one stores tag: ID
and name
, second one assigns tags to some items using two columns: IDitem
and IDtag
.
Filtering mechanism allows the user to select any set of tags and is supposed to display all items that are tagged with all of the selected tags.
Example database
CREATE TABLE `tags`(ID int, name text);
INSERT INTO `tags` VALUES
(1, "food"),
(2, "fruit"),
(3, "drink"),
(4, "absurdly_sweet"),
(5, "car");
CREATE TABLE `item_tags`(IDitem int, IDtag int);
INSERT INTO `item_tags` VALUES
(1, 1), (1, 2), -- Apple
(2, 1), (2, 3), -- Apple juice
(3, 5), -- A car
(4, 2), (4, 5), -- Apple-shaped car
(5, 1), (5, 3), (5, 4); -- Sweet drink
For example, if user selects “fruit” and “drink” tags, filtering system should display items 2 (Apple juice) and 5 (Sweet drink), because both items are tagged as both “fruit” and “drink”.
Counting tag usage
To get all tags with the number of items tagged with them I can use either query:
SELECT t.ID, t.name, (SELECT COUNT(*) FROM item_tags it WHERE it.IDtag=t.ID) as `count` FROM tags t;
SELECT t.ID, t.name, COUNT(*) AS `count` FROM tags t LEFT JOIN item_tags it ON it.IDtag=t.ID GROUP BY t.ID;
Getting tagged items
To get all items tagged with a specific set of tags (in this example, with tags “1” and “3”), I can use:
SELECT it.IDitem
FROM `item_tags` it
LEFT JOIN `tags` t
ON t.ID=it.IDtag
WHERE it.IDtag IN (1, 3)
GROUP BY it.IDitem
HAVING COUNT(*)=2 -- <- Must match number of tags we're looking for
Getting tag usage given a set of tags
What I’m struggling with is getting the tag usage. When filtering, I want to display a list of items with specified tags, all of the tags and their count in the resulting list.
For example, when searching for tags “1/food” and “3/drink”, user should see:
Items:
- 2: Apple juice #food #drink
- 5: Sweet drink #food #drink #absurdly_sweet
Tags:
- 1: food [2]
- 2: fruit [0]
- 3: drink [2]
- 4: absurdly_sweet [1]
- 5: car [0]
Initially I tried to get all tags and run a subquery that would count items tagged with (t.ID, 1, 3)
, but it seems that correlated queries can only go one level down:
SELECT t.ID, t.name,
(SELECT COUNT(*)
FROM (SELECT it.IDitem
FROM `item_tags` it
WHERE it.IDtag IN (t.ID, 1, 3)
GROUP BY it.IDitem
HAVING COUNT(*)=2) subcount
) AS `count`
FROM tags t
Running above query results in error “unknown column ‘t.ID'”.
Then I tried another query, but that only gives me tags that are used in the results:
SELECT t.ID, t.name, COUNT(*) AS `count`
FROM `tags` t
LEFT JOIN `item_tags` it
ON t.ID=it.IDtag
WHERE it.IDitem IN (
SELECT it.IDitem
FROM `tags` t
LEFT JOIN `item_tags` it on t.ID=it.IDtag
WHERE t.ID IN (1, 3)
GROUP BY it.IDitem
HAVING COUNT(*)=2
)
GROUP BY t.ID
After a lot of thinking I devised a query that seems to work, but I’m wondering if there is a better way to do it:
SELECT *
FROM (
SELECT t.ID AS `ID`, t.name, COUNT(*) as `count`
FROM `tags` t
JOIN `item_tags` it ON t.ID=it.IDtag
WHERE it.IDitem IN (
SELECT it.IDitem
FROM `tags` t
JOIN `item_tags` it ON t.ID=it.IDtag
WHERE t.ID IN (1, 3)
GROUP BY it.IDitem
HAVING COUNT(*)=2
)
GROUP BY t.ID
UNION
SELECT t.ID, t.name, 0 as count
FROM `tags` t
) tempTags
GROUP BY `ID`