It has been my experience, when building websites, that most of the logic of a system is executed when user input is accepted, be it via POSTs, GETs etc. I would like to know what processes or methodologies exist in Python (leaning towards using Python), PHP, and Ruby that allow web applications to perform tasks automatically without user input. For instance performing tasks at a certain time or condition or event. I have no experience with, and little understanding of, triggers, events, or cron, and all of the articles I ran across on google during my searches assumed a high familiarity with those concepts. I simply desire a description of ways one could go about handling non-reactive processes in a web application.
2
You have pretty much answered the question yourself: cron jobs are the standard way of doing this.
In PHP, you don’t have much of a choice about this: the entire language is built with an execution model in mind where you get a clean slate with each request, and the language isn’t very suitable for implementing long-running processes – you can do it, but you have to be very careful about not leaking memory.
In Python, it depends on how you hook your code into a web server; if you use a web server that is integrated into your own code in a long-running process, you can trigger automatic processing directly from there, which means that you don’t need cron in those cases. If, however, you follow a fresh-process-for-each-request paradigm like PHP does (mod_python
does this IIRC) then you’re pretty much limited to cron jobs or similar schedulers.
I don’t know enough about Ruby to make any qualified comments, but I assume that the situation is similar to that in Python.
And of course, you can always implement a long-running process outside of your regular website, that just uses the same persistence backend and shares some code; such a process could then implement its own scheduling and even act upon other events. For example, you could set up a process that wakes up when a particular file changes, does some processing, then goes back to sleep. This approach has two advantages:
- Since there is only ever one such process, you don’t have to worry about race conditions such as those caused by a cron job taking so long that individual iterations start overlapping – your process simply does one iteration at a time.
- You can act on events immediately without adding a lot of scheduling overhead. With cron jobs, you have to make a choice between frequent polling (shorter latency, more overhead) and less frequent polling (less overhead, longer latencies), but a process that just waits for a file to change has almost zero overhead while inactive.
It’s harder to set up though, and you’ll probably want some sort of watchdog mechanism in case the process dies.
1