I was asked an interesting question: Should a database contains all data? Or heavy binary files should be stored in file system?
Example of heavy binary files : videos or heavy pdf files (+200 MB)
With an old aspx web app (1.1) I tried to open a 200MB pdf file stored as a blob in an Oracle 11g database, and it just run out of memory.
However, same asp.net web application had no problem to open same pdf file stored in file system of a server. It could be that maybe there is some proper way to open heavy blobs fields with asp.net.
For integrity reasons, I say that all data should be stored in database, but my described case showed me that maybe it’s not the way.
I read once that if your web application goes to cloud, then it would be very difficult to keep references of files stored in file system (file paths like: ../MyFolder001/MyFile.mpg), since you don’t know where those files will be distributed.
Should heavy binary files not be stored in database?
7
No, large binary files should generally not be stored in the database.
- Generally, your filesystem will provide better caching/buffering than your database will for large files
- Databases are generally designed and optimized around smaller bits of data – large binary files are often not optimized for in databases
- Database storage is generally more expensive (both in terms of CPU time and disk space) than filesystem storage
- You often lose opportunities to optimize sending files to a web client when storing them in a database (e.g. sendfile())
- You’re putting an unncessary load on the database by having to store/fetch large binary objects from the db
2
The most important reason not to use large blobs is that you cannot stream them – when you query a database, it’s the whole column value or nothing. With files, you can open a handle, then read the data as you go – you never need to have the entire file in memory at the same time.
At the same time, many of the advantages of databases, such as more efficient storage of lots of tiny bits of data and more efficient lookup of such values, ACID guarantees, etc., become less important when large files are concerned.
1
It’s a question of advantages and disadvantages. Binary files in the database will be stored with all the other data, but you have to pay the overhead including it in the database.
In your case, it sounds like the disadvantages include limited support in the tools you’re using. Does the value of having the binary data in the database justify the effort needed to work around those limitations?
SQL Server 2012 now has the FileTables. That combine the strategy of appearing to have the data in the database but utilizes the file structure behind the scene.
Your other concern:
I read once that if your web application goes to cloud, then it would
be very difficult to keep references of files stored in file system
(file paths like: ../MyFolder001/MyFile.mpg), since you don’t know
where those files will be distributed.
For an application that handles a large number of large files, you better find a cloud solution that will allow you to store them. Changing the file paths (which are probably stored in a database) can be handled by:
- storing the root folder in a single location in your app and hopefully maintaining the rest of the folder and file naming structure.
- updating any other sub folder structurs in the database with a script. This is the piece that databases handle well.
In every company I’ve worked at the general consensus has always been to store file locations within the database and the files elsewhere.
Firstly, you do not know what size of files people are uploading or how many times. There are ways to measure the size of a file before it goes in the database and there will be programmatic ways to stop people uploading multiple versions but the point I’m trying to make is that you don’t want your database’s performance being hindered by someone uploading a 2GB file, or 20 people trying to download that 2GB file at the same time. The chances are your database is also having to deal with lots of other smaller queries at the same time. Help it to help you!
This leads onto my second point. Separation of concerns. A file server can look after the files, a database can store the locations of those files. Problems with storage of files, detecting and preventing viruses in files can be handled separately.
I’ve worked mainly with websites and applications so the reasons I’m giving are influenced by that.