I have two tables, Printers and “Printer_audit” where every issue with printer is recorded
I am trying to understand how to calculate the avarage time its taking to fix a printer in general…
assuming each printer has only a single entry of “ISSUE_FOUND” following by a single entry of “FIXED”
while ignoring the ones that are “TRASHED” like Xerox3
I know how to calculate the time it took for a single row, but not how to do the average for all Printers
select (SELECT action_date from
printer_history where printer_name='Xerox1' and action_name='FIXED')
-
(SELECT action_date from
printer_history where printer_name='Xerox1' and action_name='ISSUE_FOUND') diff
from dual;
**Update – added issue ID in the Printer_audit table. **
Sample Fiddle:
https://sqlfiddle.com/oracle/online-compiler?id=2c5ffb16-6aad-432d-8bf1-c881cba08356
-- INIT database
CREATE TABLE printer (
printerID INT GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),
Name VARCHAR2(100)
);
-- INIT history
CREATE TABLE printer_history (
IssueId INT GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),
printer_name VARCHAR2(100),
action_name VARCHAR2(100),
action_date TIMESTAMP
);
INSERT INTO printer(Name) VALUES ('Xerox1');
INSERT INTO printer(Name) VALUES ('Xerox2');
INSERT INTO printer(Name) VALUES ('Xerox3');
INSERT INTO printer_history(printer_name,action_name,action_date) VALUES ('Xerox1','INSTALLED' , TO_TIMESTAMP('2014-07-02 06:44:00.743000000', 'YYYY-MM-DD HH24:MI:SS.FF') );
INSERT INTO printer_history(printer_name,action_name,action_date) VALUES ('Xerox1','ISSUE_FOUND', TO_TIMESTAMP('2014-07-02 06:45:00.744000000', 'YYYY-MM-DD HH24:MI:SS.FF') );
INSERT INTO printer_history(printer_name,action_name,action_date) VALUES ('Xerox1','FIXED' , TO_TIMESTAMP('2014-07-15 8:33:00.655000000', 'YYYY-MM-DD HH24:MI:SS.FF') );
INSERT INTO printer_history(printer_name,action_name,action_date) VALUES ('Xerox2','INSTALLED' , TO_TIMESTAMP('2015-08-01 05:11:00.722000000', 'YYYY-MM-DD HH24:MI:SS.FF') );
INSERT INTO printer_history(printer_name,action_name,action_date) VALUES ('Xerox2','ISSUE_FOUND', TO_TIMESTAMP('2015-09-02 03:44:00.732000000', 'YYYY-MM-DD HH24:MI:SS.FF') );
INSERT INTO printer_history(printer_name,action_name,action_date) VALUES ('Xerox2','FIXED' , TO_TIMESTAMP('2015-09-14 07:55:00.742000000', 'YYYY-MM-DD HH24:MI:SS.FF') );
INSERT INTO printer_history(printer_name,action_name,action_date) VALUES ('Xerox3','INSTALLED' , TO_TIMESTAMP('2018-08-01 05:11:00.752000000', 'YYYY-MM-DD HH24:MI:SS.FF') );
INSERT INTO printer_history(printer_name,action_name,action_date) VALUES ('Xerox3','ISSUE_FOUND' , TO_TIMESTAMP('2018-08-05 05:11:00.752000000', 'YYYY-MM-DD HH24:MI:SS.FF') );
INSERT INTO printer_history(printer_name,action_name,action_date) VALUES ('Xerox3','TRASHED' , TO_TIMESTAMP('2018-08-23 05:11:00.752000000', 'YYYY-MM-DD HH24:MI:SS.FF') );
select * from printer_history;
-- QUERY database
select (SELECT action_date from
printer_history where printer_name='Xerox1' and action_name='FIXED')
-
(SELECT action_date from
printer_history where printer_name='Xerox1' and action_name='ISSUE_FOUND') diff
from dual;
7
You say there can be only one ‘ISSUE_FOUND’ and one ‘FIXED’ per printer. This is a bit strange, considering that we are looking at a “printer_history”, but it makes the issue quite easy.
The only problem (at least in Oracle 11 which you tagged the request with) is that you cannot apply AVG
on intervals (the result when subtracting one timestamp from another), unfortunately.
Instead cast the timestamps to datetimes (in Oracle inappropriately called DATE
). Then you get the durations as fractions of days:
with
issue_found as
(
select *
from printer_history
where action_name = 'ISSUE_FOUND'
),
fixed as
(
select *
from printer_history
where action_name = 'FIXED'
),
durations as
(
select cast(f.action_date as date) - cast(if.action_date as date) as diff
from issue_found if
join fixed f using (printer_name)
)
select round(avg(diff), 1) as days
from durations;
Demo: https://dbfiddle.uk/p62dU2ux
3
From Oracle 12, you can use MATCH_RECOGNIZE
to perform row-by-row pattern matching:
SELECT COALESCE(printer_name, 'All printers') AS printer_name,
AVG(CAST(fixed_date AS DATE) - CAST(found_date AS DATE))
AS avg_fix_time_in_days
FROM printer_history
MATCH_RECOGNIZE(
PARTITION BY printer_name
ORDER BY action_date
MEASURES
issue_found.action_date AS found_date,
fixed.action_date AS fixed_date
PATTERN ( issue_found other*? fixed )
DEFINE
issue_found AS action_name = 'ISSUE_FOUND',
fixed AS action_name = 'FIXED'
)
GROUP BY ROLLUP(printer_name)
In earlier versions, you can use the LAG
analytic function:
SELECT COALESCE(printer_name, 'All printers') AS printer_name,
AVG(CAST(action_date AS DATE) - CAST(found_date AS DATE))
AS avg_fix_time_in_days
FROM (
SELECT printer_name,
action_name,
LAG(CASE action_name WHEN 'ISSUE_FOUND' THEN action_date END)
IGNORE NULLS OVER (
PARTITION BY printer_name ORDER BY action_date
) AS found_date,
action_date
FROM printer_history
WHERE action_name IN ('ISSUE_FOUND', 'FIXED')
)
WHERE action_name = 'FIXED'
GROUP BY ROLLUP(printer_name)
Note: Printers can have multiple issues and this will calculate the average time for each printer and globally over all the relevant issues.
Which, for the sample data, both output:
PRINTER_NAME | AVG_FIX_TIME_IN_DAYS |
---|---|
Xerox1 | 13.075 |
Xerox2 | 12.17430555555555555555555555555555555556 |
All printers | 12.62465277777777777777777777777777777778 |
fiddle