So the database is based off Apache superset from what I understand. I am ONLY allowed to “run select statements”, so no data manipulation of any kind (EX: I can’t run UNION statements to try and merge the data in a different manner). I also have no access to editing/altering the databases themselves. I can simply query the data in this web-based application and that is all I can do.
My question is this; How can I get the below code to avoid duplicative data pulling into my query.
SELECT
concat(opentripview.locid,'_',opentripview.assettype) AS Key,
opentripview.locid AS GLID,
opentripview.assettype AS Material,
SUM(opentripview.tripdays) AS OpenTripDays,
COUNT(opentripview.tripdays) AS OpenTripCount,
SUM(closedtripview.tripdays) AS ClosedTripDays,
COUNT(closedtripview.tripdays) AS ClosedTripCount
FROM opentripview
INNER JOIN closedtripview
ON concat(opentripview.locid,'_',opentripview.assettype)
=concat(closedtripview.locid,'_',closedtripview.assettype)
AND opentripview.assettype=closedtripview.assettype
AND opentripview.locid=closedtripview.locid
WHERE (opentripview.assettype like '%828' or opentripview.assettype like '%24024')
AND opentripview.locid='1000094424'
AND opentripview.division='PA'
AND closedtripview.TripStartDate < DATE '2024-05-01'
AND closedtripview.TripEndDate >= DATE '2024-02-01'
AND closedtripview.TripEndDate < DATE '2024-05-01'
GROUP BY concat(opentripview.locid,'_',opentripview.assettype),
opentripview.locid,opentripview.assettype;
My original data looks like this (but I have thousands of lines per glid/material…
- **Key GLID Material OTD OTC CTD CTC- **
X_X 1000094424 828 1000 10 0 0
X_X 1000094424 828 2000 20 0 0
X_X 1000094424 828 0 0 500 10
X_X 1000094424 828 0 0 750 10
… I end up getting data similar to this…
- **Key GLID Material OTD OTC CTD CTC- **
X_X 1000094424 828 6000 60 2500 40
…and I don’t really have anything to make a “key” unless I concatenate data. However, even with a concat it seems to be creating roughly 46x the data that I actually need to see (FYI, the 46x is not shown above, but that’s what it ends up totaling to when I am looking at my entire dataset). I know all of my filtering and grouping is working as intended because if I run the above without the join, I get the correct totals for both my open and closed data. Based off the above, it seems to be replicating the open trip data for every line of closed trip, which in turn causes the immense number of duplicates. I have also tried all other JOIN statements (R, L, Inner, ETC.), but that changes nothing within my data either. I do also believe that a FULL JOIN is what I need, because there are some GLIDs within the open trip data that will not be in the closed trip data, but I still need those open trips to pull through with “0” as their values in that scenario.
I originally thought a UNION would work better here so I could apply the filters to each of the SELECT statements rather than applying them to all of the query, but my database does not allow UNIONs.
Am I missing something obvious within my JOIN statement, or will I not be able to pull both my open and closed trip data into one query since I do not have specific enough data for a proper key?
Any help is greatly appreciated!
If you look at the above, you will see everything I tried and expected in detail.