I have functions and procedures that contains loops and can run for several hours. As a result I want to implement some sort of progress monitoring (so I know they are actually making progress) similar to tqdm in python or progress in R.
I have got a rough working version but wanted help to make the logic cleaner. The example task below is to read each integer in task_list and do an upsert to store the double of the integer in the second column:
CREATE TABLE example.task_list (
intt int NOT NULL PRIMARY KEY,
double_intt int
);
INSERT INTO example.task_list VALUES
(1, NULL), (2, NULL), (3, NULL), (4, NULL)
RETURNING *;
CREATE OR REPLACE function example.doloop()
returns void
LANGUAGE plpgsql AS
$func$
DECLARE
begin_time timestamptz = clock_timestamp();
num_tasks_done int := 0;
average_time_per_task interval := null;
tasks_remaining int;
f record;
time_to_do interval;
BEGIN
SELECT count(*) INTO tasks_remaining FROM (SELECT DISTINCT intt from example.task_list where double_intt is null);
raise notice '%: Starting. We have [%] tasks to do.', clock_timestamp(), tasks_remaining;
FOR f IN (SELECT DISTINCT intt from example.task_list where double_intt is null)
LOOP
begin_time = clock_timestamp();
raise notice '%: Putting in [%]', begin_time, f.intt;
raise notice '%: We have done [%] jobs and have [%] jobs remaining. It will take %', clock_timestamp(), num_tasks_done, tasks_remaining, average_time_per_task * tasks_remaining;
INSERT INTO example.task_list (intt, double_intt) values (f.intt, 2 * f.intt)
ON CONFLICT("intt") DO UPDATE
SET double_intt = EXCLUDED.double_intt;
time_to_do = clock_timestamp() - begin_time;
raise notice '%: Finished. It took : %', clock_timestamp(), time_to_do;
/* Do calculations for time remaining */
average_time_per_task = ((coalesce(average_time_per_task,'0 minutes') * num_tasks_done) + (time_to_do)) / (num_tasks_done+1);
num_tasks_done = num_tasks_done + 1;
tasks_remaining = tasks_remaining - 1;
END LOOP;
END;
$func$;
select example.doloop();
This does work as intended with the function raising notices to describe its progress and estimated time to completion. The shortcomings are:
- I need to do separate SQL queries. One to get the list of tasks to be completed for the monitoring and once to loop over.
- The logic/variables for the function and the logic/variables for timekeeping are all mixed together and it is hard to see what is going on.
Is there a way to 1) define a list of tasks once and 2) encapsulate the timekeeping logic and raising notices logic separately.
It does not work but it would be great to be able to write loops like the following and get the timekeeping for free:
CREATE OR REPLACE function example.do_loop_with_tqdm()
returns void
LANGUAGE plpgsql AS
$func$
DECLARE
f record;
task_list_generator any := make_generator(SELECT DISTINCT intt from example.task_list where double_intt is null)
BEGIN
FOR f IN (task_list_generator)
LOOP
INSERT INTO example.task_list (intt, double_intt) values (f.intt, 2 * f.intt)
ON CONFLICT("intt") DO UPDATE
SET double_intt = EXCLUDED.double_intt;
END LOOP;
END;
$func$;