I’m creating a crawler app that will always, in the beginning, when the Class constructor, ensure that the basic database structure exists.
Is this a bad practice? What is the advantage OS creating the structure directly on the database?
import psycopg2
from datetime import timedelta
class DBManager:
def __init__(self):
self.connected = False
self.conn = None
self.cursor = None
try:
self.conn = psycopg2.connect()
self.cursor = self.conn.cursor()
except:
print "I am unable to connect to the database"
try:
self.cursor.execute("BEGIN TRANSACTION;")
self.cursor.execute('CREATE TABLE IF NOT EXISTS proxy(id SERIAL NOT NULL PRIMARY KEY,ip VARCHAR(15) NOT NULL,active BOOLEAN NOT NULL, times_used smallint default 0, last_use timestamp DEFAULT CURRENT_TIMESTAMP not null );')
self.cursor.execute('CREATE TABLE IF NOT EXISTS app_update (time TIMESTAMP NOT NULL);')
self.cursor.execute("""
CREATE OR REPLACE FUNCTION merge_proxy(_proxy VARCHAR(15)) RETURNS void AS $$
BEGIN
IF EXISTS( SELECT * FROM proxy WHERE ip = _proxy ) THEN
UPDATE proxy SET active = TRUE WHERE ip = _proxy;
ELSE
INSERT INTO proxy(ip,active,times_used,last_use) VALUES (_proxy,TRUE,0,CURRENT_TIMESTAMP);
END IF;
END;
$$ LANGUAGE plpgsql;
""")
self.cursor.execute("END TRANSACTION;")
self.connected = True
print 'ending creating tables'
except:
print 'unable to create table'
It depends – mainly on the life cycle of your database(s) and the usage scenario. Creating the DB structures automatically makes sense if
-
the database is used exclusively (or at least primarily) by your application
-
you expect to have not just one database, but many different db instances of this structure in different places
-
the database is used as some kind of temporary storage, either archived or thrown away after use
-
there are no security requirements forbidding your application to change db structures
For example, when you are building a big enterprise OLTP database, with lots of different applications using it, creating the DB structure inside one of the applications is typically a bad idea. It won’t actually make sense, since the DB is part of the environment you can expect to be be available for the application. In such an environment, normal applications will typically not even get the access rights to change any DB structures.
On the other hand, when you are building a database application for logging or storing a series of measurements over a fixed time period (lets say, for a day), things are different. Assumed for each of these periods you need a new db instance, the creating this instance automatically when it is not there makes a lot of sense. In such scenarios it is common not to use a full-fledged client/server database system, but a lightweight single-file db system.
I don’t know into which of these categories your “crawler app” falls, but check your scenario according to the points I wrote above and make a decision.
3
It’s generally considered a good idea to separate the environment setup from the main application. The usual practice for something like this is to have the database structures created by your installer or setup script. The application should not create its own structures, if it can help it.
One reason to separate them might be if you have slightly different versions of the schema for different database engines. If you have the database installer separate from the main application, the user can choose which database script is most appropriate for them, and then run your application, rather than letting your application try to guess what the user wants.
Also, how will your application handle failures in the database setup? Sometimes it happens that the standard setup that you want people to run might not work 100% in everyone else’s database, so the DBA might want to tweak it for their own environment. Embedding the setup directly in the application prevents people from doing this, but might also prevent people from running your application, if they are unable to make necessary changes to the setup for their particular database.
…If you really want, you could have your application check for the database, and if it can’t connect, ask the user if they want the setup to run. Then have your main application run your setup script before resuming the main application.