Developing an application in which users have to upload images. I have not really got much experience with file uploads in an application so I am wondering what the best way is for storing images which are uploaded by users to the file system.
2
A good, basic starter strategy is like this:
- Make a table called ‘images’. Make sure it has an autoincrementing primary key (we’ll call ID).
- When a user uploads an image, insert a row into that table, including the original name of the file and the file extension. You’ll probably also track uploading user, date/time, etc.
- Get the last insert id from that query. Save the image to “your-image-path/(ID).(ext)” in the file system. For ex, “/images/350.png”.
- (Recommended) You can use the ID as the direct link to the image by making a simple wrapper script that loads the image by ID (thus obscuring the actual file name, or even keeping the images outside of the web directories). For ex, ‘image.php?id=500’ or ‘/image/500’. In that script, you can use the original file name from the table in the http header so that when they download it gets saved on the client under the original name.
Areas of enhancement:
- Thumbnail Support. On upload, generate a thumbnail, and store it as “.thumbnail.jpg”. Use a wrapper script for thumbnail retreival based on ID.
- Hashing directories. If you expect more than 1000 or so images, you’ll want to break them into multiple subdirectories so the file system doesnt get sluggish (the exact point this happens varies by OS and other factors). So if you have an image ID of 6500, you might store it in “/images/6/6500.png”
- Security. Validate that an uploaded image is an actual image before saving it to disk. This is especially important if you are allowing direct access to the images – you dont want to accept a .php file that can get uploaded and then executed. Also make sure your web server config is setup to prevent that sort of thing by preventing the execution of scripts from the images directories.
2
Images are just files, so you could store them as-is, as you would store any other file.
However, you may want to transform them a little. For example, most digital cameras these days produce absurdly high resolution files, e.g. 2000 pixels across. Most users don’t know how to shrink them to something more manageable, and even if they did, they don’t have the time (or can’t be bothered.) So you may want to allow people to upload large images, then shrink them.
Also, what will you do with the images? Display them on a web page? There are only certain formats that work on a web page, e.g. if someone uploaded a TIFF image, you might want to convert it to JPEG or PNG.
Finally, you might want to increase compression. If someone sends you an image that’s a reasonable resolution, but they’re turned all the “quality” options up to 11, it will still be a big file. You might want to compress it more to make it a smaller file.
A final consideration: if you’re going to do any image manipulation, you might want to put all images into the same format, e.g. JPEG at a certain resolution and compression level, so your image manipulation only has 1 format to deal with.
The storage strategy should, in most cases, follow from the conceptual design of the site that requires image storage, and the expected number of users. We’ll need more information, I think; are you creating a site like Photobucket or Flickr, designed primarily to be an image-hosting site? Or are the images a secondary concern, such as being attachments to messages? How many images do you foresee having to store? How many users will this system have?
Two popular strategies I know of and have used involve a SQL database, to provide better concurrency than a flat file system. You have a table, representing the images in the DB. That table then either has a “BLOB” (binary large object) column that stores the actual byte data of the image, or a string that holds the path to the image in a flat file system elsewhere. Either way, scanning for data about images is highly parallelizable. Storing images on disk is more straightforward, and reduces the size of the DB, but lowers performance in high user-count situations. Storing images as DB data is the opposite.