I’m working on a project where we produce lots of data in the form of simple .txt files. These usually contain a set of columns that will later be loaded in to Matlab to be analysed. I’ve implemented an application that uses a database to store information about the person who uses the application and should be able to match that information to the results that are created in the .txt files. The files are long; nearing 10000 rows and with 3-5 columns.
My current application design simply aims to link a query to the database with the file using an Id which works fine provided the files are put in a specific folder and kept there on the server so that the database can always find them.
Should I have designed a way to store the rows and rows of data in the database even though it would just be 100000s of numbers linked to one or two users and session Ids? What is the best practice in this situation.
5
I think Jon Raynor makes a good point about the safety and consistency of your data being better in a database. I think a database could make application management easier in some ways (fewer data files to be concerned with, as well as easier disaster recovery or failover capability)
But, I don’t think a relational database (such as MySQL) is required for the data you are storing – I think it would add overhead which you don’t need.
Instead you might find a NoSQL database, which does not use a fixed schema more useful. In addition, it might provide additional capabilities to use the data.
A useful comparison of features is here: http://www.thegeekstuff.com/2014/01/sql-vs-nosql-db/
By storing the files on the file system there is a possibility of things becoming out of sync. For example, if some process removes the file, or if the data is not saved in the database, there could be mismatches. In this case it is hard(er) to make the transaction atomic because data is in two sources. This is not ideal.
If you store all the data (including the file) in the database the transaction will have ACID properties as it will either succeed or fail. You get this for free because RDMS systems are built this way. So, the best practice is to store the file in the database.
For the second point, if the users only need is to pull thier data, then just storing the file is acceptable. However, if thier is a need to query all the data inputted by the users, then I would take the data in the files, extract it and store it relationally in the database so it can be queried across all users. I am not exactly sure of your application’s requirements so either method would be acceptable depending on the application’s needs.