I’m using ASP.NET MVC 4 and SQL Server 2012. I have no problem going for either of these solutions but I want to know which one would fit better in the particular application I’m creating now.
I’m making a sort of a library system application. The only images the application will store are:
- Users’ profile pictures (1 per user)
- Libraries’ profile pictures (1 per library)
- Book authors’ pictures (1 per author)
- Book cover images (1 per book)
None of the images stored will need to be of high quality. So obviously the images are not a big part of the application and considering this, I am currently storing the images in my database. Though I noticed something that I think makes for a major con.
The images are retrieved using a Url.ActionLink
which leads to a Controller action method that accepts (for example) a bookId parameter before reaching to the database, retrieving that book’s image and sending it back. So if a user were to search for books and he ends up getting a list of books along with their images before him, there will be a new call to the action method that retrieves images and a new query to the database for each image listed.
Is there a better way of handling this? Should I reconsider keeping my images in the database or is it not a big issue?
EDIT: Wouldn’t consider this question a duplicate because it pertains to a specific type of an application that handles very little image data as as opposed to any type of an application (like in the linked question). Of course when you’re not sure how much image data you’re gonna be storing it’s always best to store images on the file system. This is not such a case. The nature of the answers here versus the answers there shows this.
3
Why not do both? Database is the backing, ultimate store for the image. The public side can be a simple read from db but you can easily extend that into a read-through disk cache and also take advantage of numerous infrastructure tricks to better enhance performance.
The wins here are:
- simpler data backup — database backups are fun and easy, no file system touching necessary
- simpler dev story — you don’t need to handle a bunch of files, just get the dev team a copy of the database
- easier to write the image — file systems are a PITA, from permissions, to changing locations from dev->qa->production, to contention to lack of transactions. If the writes are going to the db then you solve 90% of those problems.
- modern databases can handle it — presuming we are talking fairly typical image file sizes measuring under a few megabytes most modern database systems can handle them very well. It is not the database murdering issue that some older articles would have you believe.
2
There are valid concerns on either side of the debate, so always consider your requirements and constraints. How much data, how many images, how large, how secure, time to recovery?
Inline / BLOB storage
Upside
- Version control / transactional consistency. This is the key semantic difference. This is handled well by the DB, so allows for point-in-time recovery. Filesystems don’t provide this so you aren’t going to have the ability to rollback. (I’m not referring to the filesystem standalone, I am referring to the problem of transactional control from your code). Images and documents can get corrupted by user error, or out of synch, becoming a 2-phase commit problem. Some systems solve this by adding an external revision control system, but it requires integration.
- Simplifies implementation
- All app data self-contained – Simplifies backup and recovery or migration of the system as long as the blob data footprint isn’t unwieldy; just do a dump, backup, export (whatever the term for your flavor of DB) and move it to the new database. No external files to synchronize.
- Security / access control is cleaner; access to an image BLOB is intrinsic to access to the data row. If a document is outside the DB and we let the HTTP server fetch it up (outside of MVC app), it may be better for concurrency and scalability, but takes more care to ensure Authorization is applied to the external files as they are the DB data. I’ve audited designs where large files were stored out of line, under static, multiplexed URLs, and my first test scenario is to attempt direct access to the static URL without authentication. If you do house secure documents outside the DB, don’t serve them statically, and make sure your security policy covers access control of images between tenants. Your HTTP server authentication has to integrate with the overall system’s authentication. This is a big concern in multi-tenant databases. Less of a concern in single purpose, single-tenant systems, with simple authentication.
- Reduced Database Server Surface Area – In many systems, the DB is separate from the apps server, and there is a firewall in between. Everything can be served through a single DB port (1433 / MSSQL) or (1521 / Oracle) as an example. With files, if you move your DB to a separate tier, you have to use NFS or NAS or either replicate the files among app server if you scale out.
Downside
- Backup times – For really REALLY large databases (not applicable to you), the backup and recovery gets frustrating and starts to become problematic and costly; while you may have a small core dataset otherwise, you may have many GB or TB of image data. Treating it all as one consistent database is both good from integrity point of view, but bad for backup and recovery unless you use DBMSes with enterprise grade backup and recovery (example is Oracle RMAN and rolling backups, EMC BCVs or Netapp Snapshots). At one customer, our DB got so large, we just stopped backing it up and committed to geographically redundant replicas. Another well known case was Starbucks (the DBAs put the paper online, it is interesting read if you are into that).
- Time to Recovery (aka Recovery Time Objective or Recovery SLA) – This is often defined by your client. As a DBA you have to back into that if the customer has already declared so. If adding 90% to the size of the database to store non-secure images impacts the recovery SLA of the critical data, then don’t do it.
- Less flexibility. Not simple to access the documents with OS level utilities, shell commands, etc.
Personally, in this context, I think the scales are stacked in favor of inline DB storage. If your DB isn’t large, keep it all in the DB.
2
So if a user were to search for books and he ends up getting a list of books along with their images before him, there will be a new call to the action method that retrieves images and a new query to the database for each image listed.
I would suggest that’s a Good Thing. Images take much longer to transfer than text so, rather than making the user wait for both to load from the DB, you can show the text and then load the images separately.
Also, should your application ever become successful, you may want to make your web application run on multiple servers. Each server will, at that point, have access to the database, but not to the other server’s file system.
However, you should consider isolating images in their own image table. Don’t store an image in the same table as the book; simply store an image ID and use that to retrieve the image later. This is much more efficient, considering the way databases manage data on the disk. This statement is considerably more important if you plan to change the image associated with a book at any point.
Finally, my experience with these things is that you’ll very quickly hit a situation where you want to use a Content Delivery Network for image storage (see AWS CloudFront or Azure CDN or CloudFlare), which really is the best of both worlds — a secondary storage that can be accessed directly by URL, without even troubling your app — and not that much extra work. Moving to a CDN is a very easy change if you’ve isolated your images into their own table.
Or, you could just go for a CDN solution now and never worry about it again.
3
I advise against storing the images in the database.
It’s not what database are intended for – it isn’t relational data that relates to other data. It easily takes up a huge amount of space and is difficult to manipulate when within a database blob column.
I use databases to store the name, timestamps, meta data and file location. I then store and access the image at that file location/name
I went through this a few years ago in the real world and along with many others, came to the conclusion that storing the images in the database was a bad approach. For versioning and history just have the code with the references in source control and manage them as with any other. So change the name for each image, use date stamps and don’t re-use exactly the same filename for another image.
Many folks that start off during this have to main the switch after a while as their database grows too big due to all the images. This has a ripple effect on storage needs, servers, cpu and possibly network traffic. It also makes it harder to use cloud based filesystem storage that can take into account geographic location, traffic and the like. These techniques become critical as your company grows and having the images in a database can severely limit your options.
In conclusion, good for small projects with small amount of image data. Not suitable for larger shops or larger amounts of data.
If you are using sql server (2008 or beyond) you can store them in the database as blobs or using FILESTREAM. As other commentors have pointed out, I would not go with a propertiery store the file on disk approach as you will have to manage consistency in that scenario.
A good link on FILESTREAM can be found here:
FileStream Guidelines
Some general rules:
In SQL Server, BLOBs can be standard varbinary(max) data that stores
the data in tables, or FILESTREAM varbinary(max) objects that store
the data in the file system. The size and use of the data determines
whether you should use database storage or file system storage. So, in
a nutshell, these are the typical scenarios where this feature is best
suited:
- If you are storing BLOBs with an average size of 1MB or
more.- When fast read access is adamant.
- When you must access BLOBs from your application’s middle tier code.
Overall, the main advantages of using FILESTREAM over other options are:
- Storage and retrieval of BLOBs is made together with relational data in a single
data store.- Yes, the BLOBs are included in database backups and restores.
- Inserting, updating and deleting BLOBs and your relational data happens in the same database transaction.
- The 2 GB max size for a varbinary(max) column does not apply; you are only limited by the available space on the NTFS file system.
- The SQL Server buffer pool memory is not used to operate on BLOBs; in previous versions large BLOBs could consume a lot of this memory.
- All BLOB access can be performed from .NET code, allowing your middle tier code to easily work with these objects.
- The NTFS file system can save and retrieve large BLOBs more efficiently than SQL Server.
Still, a few restrictions apply to FILESTREAM implementations:
- FILESTREAM feature is not supported with Database Mirroring.
- FILESTREAM data containers cannot share the same directory or be nested.
- FILESTREAM data is not encrypted even when transparent data encryption (TDE) is enabled.