Suppose I have a form in my web application where users can upload a profile picture.
I’ve got few requirements about file size, dimensions etc, but when the user uploads the image, how should I name them on my system? I suppose it would need to be consistent and also unique.
Maybe a GUID?
a5c627bedc3c44b7ae7c06a44fb3fcf8.jpg
A timestamp?
129899740140465735.jpg
A hash? Ex: md5
b1a9acaf295cf14ffbc5b6538294562c.jpg
Is there a standard or recommended way to do this?
2
You should try to meet two goals: Uniqueness, and usefulness.
Using a GUID guarantees uniqueness, but one day the files may become detached from their original source, and then you will be in trouble.
My typical solution is to embed crucial information into the filename, such as the userID (if it belongs to a user) or the date and time uploaded (if this is significant), or the filename used when uploading it.
This may really save your skin one day, when the information embedded in the filename allows you to, for example, recover from a bug, or the accidental deletion of records. If all you have is GUIDs, and you lose the catalogue, you will have a heck of a job cleaning that up.
For example, if a file “My Holiday: Florida 23.jpg” is uploaded, by userID 98765, on 2013/04/04 at 12:51:23 I would name it something like this, adding a random string ad8a7dsf9
:
20130404125123-ad8a7dsf9-98765-my-holiday-florida-23.jpg
- Uniqueness is ensured by the date and time, and random string (provided it is properly random from /dev/urandom or CryptGenRandom.
- If the file is ever detached, you can identify the user, the date and time, and the title.
- Everything is folded to lower case and anything non-alphanumeric is removed and replaced by dashes, which makes the filename easy to handle using simple tools (e.g. no spaces which can confuse badly written scripts, no colons or other characters which are forbidden on some filesystems, and so on).
4
You don’t want to stress applications (such as Explorer) and make it crash when you open the directory. Whilst it is unlikely you are going to stress the actual file system, you need to take this into account if you’re going to be storing thousands of files.
If you’re expecting to store thousands of files my suggestion is to partition into folders. For example uploadsilo001
, uploadsilo002
, etc. You can either balance your files or wait until a folder hits a certain number of files and then create another.
With regards to naming, I always name a file with a GUID because it is globally unique. I do pull the extension from the upload and set the extension of the file to match, but the actual name is set from a new Guid.
If you’re doing this in conjunction with a RDBMS and have several categories, i.e. products, categories, etc you could have uploadproducts
, uploadcategories
, and so on, and you could use the row ID as the filename.
In terms of best practices, I too have looked in the past and not found anything. I came up with the above while discussing with some of my developers.
In one of the solutions i worked on years ago we did this:
sub folders for part of user id so if your user id was 232950192
we would have sub folders images/23/29/50/192/232950192
in the final folder have folders for albuns and profile imgs etc
But we save everything in the data base too and keep it in the file system for quick web server access (which has caching too)
Anyway the final image would have the original image name. We did not need to keep versions. But for what can keep more sub folders under the final album names or in the data base with a version id. need to think it thru as once it goes to production would be difficult to change things without time consuming and error prone corrections in current structure
It is very easy to make a sub folder in java and create a file in it:
File folder = new File(pathwithslashes);// like "images/23/29/50/192/232950192"
folder.mkdirs();
File imgFile = new File(folder, name);
//Now get output stream etc
To get date stamp in subfolders:
SimpleDateFormat sdf = new SimpleDateFormat (“/yyyy/MM/dd/”);
pathwithslashes = pathwithslashes + sdf.format(now);//now is a util.Date
File folder = new File(pathwithslashes);
Dot net https://stackoverflow.com/questions/5482230/c-sharp-equivalent-of-javas-mkdirs
2
I’d recommend to use just md5 or anything conceptually equivalent.
By renaming files by digest of it contents you are not only granting uniqueness (always cache images for as long as possible, and with content-based renaming, well, with proper one, you can cache images practically forever).
Also, not a big deal, but nevertheless it is not a pure hypothetical case when different users upload exactly same image. Just out of the box you’ll have a small data storage optimization.
As for anything else proposed: as for me, I am a strong opponent of keeping any kind of auxiliary information in a file name. When I was much younger (and a bit slimmer :), I’ve been a Perl developer and had a dubious habit to store as much of auxiliary info in file name as common sense allowed me, since Perl string pattern features are awesome. And I’ve came up to conclusion that, talking of web development, it is always better choice to keep data associated with file separately from the file name.
Keep in mind that nowadays, when mobile interfaces are dominating, actual file name is a less important thing that it was 5, 10 years ago. But even if this will be crucial in context of your application, you can always involve some old school magic with with involving Content-Disposition: attachment; filename="pretty_file_name.jpg"
HTTP header, constructing any relevant file name you wish. Also, modern browsers are paving way to new HTML5 attribute, download. I don’t believe that actually seeing “human readable” image name is a thing you should think about in majority of cases.
UPD: A modification can be made in order not to have too many files in one directory – just take first 3 letter and create dir.
2
The chances of collisions with something like sha4 are infinitesimal. If you combine the hash with the userid or even a simple date, even less so.