I’m working on a project, I have a question regarding the architecture:
- Say I have a many python scripts on my server and there’s main.py
which contains all the classes. And there’s a script called
copymain.py
- A user named alex signs up and the url of his site is stored in
mariadb copymain.py
checks if a user has signed up every 5 min using cron- when
copymain.py
detects that alex has signed up, it creates a copy ofmain.py
rename it toalex.py
, movesalex.py
toscriptFolder
, and writes the url of alex’s site toalex.py
- Cron or Celery (haven’t decided yet) will run all the
.py
files insidescriptFolder
every lets say 15 mins
Why I picked this design?
-
This design will let me get rid of many stack and queues and
threading in my script and make it simpler, it is a simple solution, and it works perfectly. If I need to editmain.py
, all I have to do is edit the modules imported, so I don’t need to edit individual copies. -
I think that by copying the files, I could easily deploy it on many
servers. Move some.py
files to this server, others to that and I’m
done. These are the two main reasons. -
Say you have to generate RSS for websites, for some reason there’s a user called fred that has a problem, there’s an issue that needs a special script, because we all know that every website has its own design and many errors occur when scrapping and dealing with html, you can go to
fred.py
and edit your script for that user.
Is it the most efficient architecture? Or should I use one file, get the users from database? My script currently does that, but I prefer to copy for the reasons stated above, a simple scheduling would do. I need to make sure that no matter how many users I have, the website of every user will be processed at exactly the time I promised when they signed up, when there’s too many of them and I notice it’s being slow, I just buy upgrade or buy another server. I’m afraid of creating too many threads and having to worry about it as it scales. Copying seems the simplest solution, Linux uses cron all the time for entire directories.
Edit
Please forget about the scrapping part, say you have to do some task for each user in your database, like sending him a message, or whatever, is my solution as good as queuing and threading, in terms of time spent on the process and in terms of being lightweight on the server? Is there a better solution that I have not considered?
9
It’s not uncommon to do what you described. In fact, when you create a new database for your user, you basically creating a new file for that user. So, it just adding a file to the set of per user files.
The choice of when to do the work begs explanation though. Doing the sign up process in batch periodically can create load spikes on the servers. It’s better and simpler to copy all the files for new users straight away. If the process is long enough (e.g. creating a vm for user), you show them an “setup in progress” page.
2
Generally, this is not a good set up, because it’s not DRY (“Don’t Repeat Yourself”). If it were me, I’d work very hard to figure out how to have one well tested script which can generate the feeds needed by the individual users. Perhaps the script would get per-user information from the database.
2
Disclaimer: I’m not entirely sure I understand what you’re actually trying to accomplish here, so my answer is going to be somewhat vague.
Cron or Celery (haven’t decided yet) will run all the .py files inside scriptFolder every lets say 15 mins
Let’s be optimistic and suppose you have millions of users some day. Will Cron/Celery be able to get through all those scripts in a reasonable period of time? How long does each script take to run?
If your scripts are sufficiently complex, you will probably have difficulty scaling to millions of users. If they are not sufficiently complex, you can probably get by with a simpler design, perhaps using multiprocessing
.
5