When should use of database be preferred over parsing data from a text file?

I was making a Python program to measure the growth of codereview.SE. My approach was to get the “Site stats” shown on the front page and store them on my hard drive. I plan to do this once every day. So far I have made enough to get the stats and append them to a text file. The python script can be viewed on github. The format I am using is the following

22-08-2013

questions 9073
answers 15326
answered 88
users 26102
visitors/day 7407

22-08-2013

questions 9073
answers 15326
answered 88
users 26102
visitors/day 7407

I just ran the script twice to get the format I would be using in the file. Initially this seemed good to me because I would be storing it myself and the format would be the same so it would be easily parsed but not I am not sure. It seems that using a database should be a better here because that way retrieving data should be easier. Just a note, I have never used any database and have no knowledge of SQL, MySQL or any other variants of RDBMS.

So this brings me to the question. When should a database be preferred for storing the data over storing the data in a text file? Are there some pointers that I can look for when making decisions about whether I need a database or simple text files?

PS: If better tags can be added please do so. I had some doubts about the tags which could be added.

2

When should a database be preferred for storing the data over storing the data in a text file?

Wikipedia tells us that a database is an organized collection of data. By that measure, your text file is a database. It goes on to say:

The data are typically organized to model relevant aspects of reality in a way that supports processes requiring this information. For
example, modeling the availability of rooms in hotels in a way that
supports finding a hotel with vacancies.

That part is subjective — it doesn’t tell us specifically how the data should be modeled or what operations need to be optimized. Your text file consists of a number of distinct records, one for each day, so you’re modeling an aspect of reality in a way that’s relevant to your problem.

I realize that when you say “database” you’re probably thinking of some sort of relational database management system, but thinking of your text file as a database changes your question from “when should I use a database?” to “what kind of database should I use?” Seeing things in that light makes the answer easier to see: use a better database when the one you’ve got no longer meets your requirements.

If your Python script and simple text file work well enough, there’s no need to change. With only one new record per day and computers getting faster each year, I suspect that your current solution could be viable for a long time. A decade’s worth of data would give you only 3650 records that, once parsed, would probably require less than 75 kilobytes.

Imagine that instead of one small record per day, you decided to record every question asked on CodeReview, who asked it, and when. Furthermore, you also collect all the answers and the relevant metadata. You could store all that in a text file, but a flat file would make it difficult to find information when you needed it. There’d be too much data to read the whole thing into memory, so whenever you wanted to find a question or answer, you’d have to scan through the file until you found what you were looking for. When you wanted to find all the questions asked by a given user, you’d have to scan through the entire file. If you wanted to find all the questions that have “bugs” as a tag, you’d have to scan through the file.

That’d be horribly slow, so you might decide to speed things up by building some indexes that tell you where to look in the file to find a given record. You could have an index for questions, another for users, a third for answers, and so on. When you wanted to find a question you’d search the (much smaller) question index, get the position of the question in the main data file, and jump quickly to the right spot in the file. That’d be a big performance improvement. Indeed, that’s pretty much what a database management system is.

So, use a DBMS when it’s what you need. Use it when you have a lot of data, when you need to be able to access that data quickly and perhaps in ways that you can’t entirely predict at the outset. If you have different kinds of data — different types of records — that are connected to each other, use a RDBMS so that you can relate the various records appropriately.

2

Data bases have many advantages, but making access easier isn’t one of them. Faster, more standardized, interpretable as an embedded command sublanguage, safer, yes – but not easier. No matter how much syntactic sugar your language and standard library provide, you have to have a data base in the first place, open a connection to it and route data from your program something completely different and back. As long as there are no problems with what you do, and ease of programming is your priority, never switch to a database just because you think it’s “good practice”.

My take on when to make the switch is to follow the historical development. After all, people stored data in files for a long time before the relational DB was invented, and in fact a whole bunch of inferior database models (hierarchical DB, network DB…) were invented before that. They started writing data bases and used them when it became clear that this would save major processing effort, increase reliability etc. overall and in the long run. As long as that’s not the case for you, and you don’t foresee it becoming the case any time soon, switching would be over-engineering.

2

This will of course be a judgement call, but the three main criteria I would consider are: does it need to be ACID compliant, how complex the data is and finally, how many things need to read/write it. As long as you are simply reading and writing one line per and your app is the only app doing either reading or writing, you can probably skip the database. Once you start having multiple apps either reading or writing or your data structure becomes complex (particularly if it has relationships between seperate lines) then a DB starts looking really attractive.

1

Databases are used for not just storing but manipulating and querying data, therefore you’d have to make an educated decision:

A big factor is the benefit you get from installing a database on the machine vs the functionality it brings

Obviously if you need to query and manipulate the data, and you want access to be speedy – and additionally you might be thinking about using a database for other functions then it might be a good idea. Databases storage models allow data to be looked up by key values very quickly, and I can imagine parsing a file could be slow (depending on how you are doing it)

If you want to have a play with SQL and what it can do, SQLFiddle.com has a few different RDBMS models which you can toy around with (run queries, create schema etc)

1

As always using a database or not depends on what you need to do.
If you have a huge amount of data and you need to perform many different queries on it, probably a database could help you.

In your case I would keep the storage in a test file until the performance are acceptable. Usually reading a text file (even big) doesn’t take that long.
If you need more you can always add the database later.

For my experience, if you are completely new to databases you may find easier using something like couchdb: http://couchdb.apache.org/ which is no-sql and you can use directly javascript or python, etc. for queries.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật