I am using Codeigniter framework for PHP on Windows platform. My problem is I am trying to track progress of a controller method running in background. The controller extracts data from the database(MySQL) then does some processing and then stores the results again in the database. The complete aforesaid process can be considered as a single task. A new task can be assigned while another task is running. The newly assigned task will be added in a queue. So if I can track progress of the controller, I can show status for each of these tasks. Like I can show “Pending” status for tasks in the queue, “In Progress” for tasks running and “Done” for tasks that are completed.
Main Issue:
Now first thing I need to find is an algorithm to track the progress of how much amount of execution the controller method has completed and that means tracking how much amount of method has completed execution. For instance, this PHP script tracks progress of array being counted. Here the current state and state after total execution are known so it is possible to track its progress. But I am not able to devise anything analogous to it in my case.
Maybe what I am trying to achieve is programmtically not possible. If its not possible then suggest me a workaround or a completely new approach. If some details are pending you can mention them. Sorry for my ignorance this is my first post here. I welcome you to point out my mistakes.
EDIT:
Database outline:
The URL(s) and keyword(s) are first entered by user which are stored in a database table called link_master and keyword_master respectively. Then keywords are extracted from all the links present in this table and compared with keywords entered by user and their frequency is calculated which is the final result. And the results are stored in another table called link_result. Now sub-links are extracted from the domain links and stored in a table called sub_link_master. Now again the keywords are extracted from these sub-links and the corresponding results are stored in a table called sub_link_result.
The number of records cannot be defined beforehand as the number of links on any web page can be different. Only the cardinality of link_result table can be known which will be equal to multiplication of number of keyword(s) and URL(s) .
I insert multiple records at a time using this resource.
Controller outline:
The controller extracts keywords from a web page and also extracts keywords from all the links present on that page. There is a method called crawlLink. I used Rolling Curl to extract keywords and web page content. It has callback function which I used for extracting keywords alongwith generating results and extracting valid sub-links. There is a insertResult method which stores results for links and sub-links in the respective tables.
Yes, the processing depends on the number of records. The more the number of records, the more time it takes to execute:
Consider this scenario:
Number of Domain Links = 1
Number of Keywords = 3
Number of Domain Links Result generated = 3 (3 x 1 as described in the question)
Number of Sub Links generated = 41
Number of Sub Links Result = 117 (41 x 3 = 123 but some links are not valid or searchable)
Approximate time taken for above process to complete = 55 seconds.
The above result is for a single link. I want to track the progress of the above results getting stored in database. When all results are stored, the task is complete. If results are getting stored, the task is In Progress. I am not clear how can I track this progress.
6
Although it would be near impossible to track % complete accurately due to an undetermined number of links and keywords it is possible to show a rough status via depth. For example the first depth would be the url/s processed from the top level.
(100/Total Pages) * Pages Processed = % current status
Total Pages = Select count() from master_links
Pages Processed = Select count() from master_links where processed=true. When you have processed the page simply set the flag in the db.
(This could similarly be done by populating an array with your db values and using the index value as your pages processed)
Note: You can only get the status for each level. Do not start crawling your sub_links until all the master_links are crawled – this will also allow you to avoid duplicate url crawls and should have a minimal impact on the total time.
The squares in the diagram below represent the pages which need to be processed. Inside each box is the percentage complete if you were processing them left to right. This is for illustrative purposes the percentage would be based on this:
Your output would show percentage complete of that level:
e.g. Master Links 40% complete
or
e.g. Master Links 100%
Sub Links 49.8%
This should still give you enough info to indicate the progress, after all you cannot guess the actual density of keywords and links…
8
Am I just a lazy programmer for suggesting that as the background process knows where it is up to then it could report to the database once per loop cycle?
The reason I ask that is that I once wrote a background process whose job was importing records from other databases. The database, table needed were noted in a table called command_queue and row ID on the old system added to command_queue which the background task would start a new transaction, lock the row representing the command being acted upon, add it’s system given ID number (which it got by registering on another table) then start working on and then update the new table adding the old ID to the new row data and then drop the command_queue row as complete and commit the transaction.
That way I could have multiple threads running at once, could cope with a power cut (should it ever happen) and recover from any level of catastrophe. Likewise I could monopolise the box with lots of threads if there was a lot of work to do and especially when we went home and the servers were idle.
In code igniter I had a view that would output the number of commands left to run how many were being processed and what size the tables in the database were that got a mention in the command list. (There was also a stop command for each worker thread).
Using some javascript goodness the view looked quite impressive with speedometers, pie-charts and so forth. Managers liked to see that sort of thing.
I didn’t have to figure out what they were doing as I had the process tell me in great detail. Although I could go look as I used (detached) screen instances for each process and could see the output. It helped that I wanted to be the first coder to write a project that even with thousands of users a second there would be a zero crisis, zero data loss, zero disruption ever.
Edit If don’t like using screen and you compiled PHP with –enable-pcntl then you can use pcntl_fork to create the background processes and setproctitle to note progress and then when you view the process list there is your process renaming itself to give you a clue.
There are three ways to monitor the progress on a PHP task running on the server.
Use WebSockets To Monitor In Real-Time
Likely overkill for a 55 second task.
Using Ratchet you can create a socket service that you can connect to via Javascript, and have PHP send data in real-time to the browser. The socket connection is keep alive for as long as the Javascript socket is connected.
Timed AJAX Calls
You call pull the server on a regular interval to receive data about the process of the current task (this is what I do all the time). The request would return a JSON response with the current progress.
$.ajax({url: "/monitor_task?task_id=3"}).done(function(data) { alert(data); });
Sharing Progress Data
There are two ways to have your task share it’s progress on the server side.
- Write the progress to a database table, and have your
monitor_task
action read that value. - Use a shared memory object (just like a file), and have your
monitor_task
read that file.
Refreshing The Current Page
Create a view that shows the current progress, and place a meta refresh with an interval. This will tell the browser to always refresh the page for the interval.
<meta http-equiv="refresh" content="5" />