Is it bad practice to store metadata information in file names? Better solutions?

I have noticed where I work people are keen on storing information in file names, and parsing the file names.

To me this doesn’t seem to be especially good practice. I already see the occasional issues with scripts globbing for a file, and getting the wrong one because another file matches first.We are also discussing how to get around problems with separators for the fields.

Is it considered bad practice or not?

What are other accepted solutions for retrieving files from a file system based on some type of metadata?

Yes I think it’s bad practice. It is subject to all sorts of problems – for example length limits, encoding issues and conflicts due to duplicate data.

Better is to use a “master file” (sometimes called manifest or index) that contains metadata and paths to the files. Or something similar in a database, register or whatnot. Or to put the meta data inside the actual files, at the top level of some datastructure contained in the file in for example JSON or XML.

This is somewhat analogous to the concept of putting information, or namespacing keys in key-value stores. I think this is ok as long as you use it only to namespace and do quick lookups – the key components are not there to provide parsable information. If you need that information, duplicate it into the value (file in the above case).

First, metadata is a blurry concept.

That said, many cases of metadata in files already exist:

version numbers of libraries
date and time of images, or at least sequence index
file type, which triggers what application should open the file
name of your home directory, which must be your session username

Nevertheless, that short list is not an argument in favor of the practice.

Alternatives are:

handle metadata in the FS level, like Apple old HFS for instance
put metadata in the file itself, like Exif for images or ID3 for sounds
put metadata in another file or in a database, like most media managers.

It sounds like you need a database.

There are lots of security issues with putting user data in file names. Let’s say that you have a file for each user (“username.txt”). What happens what someone registers the username “../../../../etc/passwd” depends on how you are filtering user input.

Database frameworks will sometimes assist you with sanitizing user input.

First, let us agree what a file is. A file is a packaged data with a name that can be transmitted, received, created and deleted with (very close to) atomic operations.

Many file systems (Mac OS, and more recent Linux file systems) implement “forks”, often used to store resources and metadata. This approach to storing metadata was problematic in that traditional network transfer methods, backup and restore methods and file copying methods were inconsistent, especially when the source and destination file systems understood file forks differently.

The file name is used to hold metadata because a) it is always there, b) metadata has always been present in the file name (at least in the use of file extensions), and c) the file name undergoes very little translation when moving between systems (case distinctions, character set limitations, character limitations aside).

So, the file name is visible, portable, and manageable. This is not a bad thing for storing some metadata.

Probably the best solution to address general file metadata is to use a content repository, where the content repository can be configured with the metadata schema to be used for the files. In many cases this is overkill, but, IMHO, is the way to go for serious metadata management.

No… well.. not necessarily.

So long as you have a strict convention and common parsing and validation means (scripts, libraries etc) readily available you are good to go.

Take for example packaging and dependency management systems (Maven, NuGet and the likes). Though many will use specific files for metadata to store the more advanced information, basic information is often part of the file name itself. Relying on strict conventions the file name can contain the most pertinent information about the package : it’s vendor, it’s name, it’s version, it’s type. Sometimes that is all you need… 4 or 5 short pieces of information.

If the metadata is simple then a file naming convention makes perfect sense requiring nothing to put in place. It can be strengthened with very simple tools and scripts, no database needed, no specialised infrastructure just a few scripts and a naming convention.

If nothing out there quite does what you need and your needs are simple i’d start with this.

your requirements outgrow this convention ? extend it with a proper metadata file.
You later need better search for this ? There are already good solutions out there for searching files that get get you to where you need.

It’s not that I dislike databases, quite the contrary they are really powerful and useful but they require some amount of overhead to get going. They need to be installed, backed up, maintained, you will need staff that, if not completely dedicated, will need to dedicate part of their time to this infrastructure. They are also more complex and cryptic to the laymen, loose the dev that set you up and your system will be stuck in time until you find a replacement.

Never underestimate the power of low tech with the proper oversight it can get you a long way.

And by the time you outgrow your low tech solution you will have gathered all the experience and requirements to implement the perfect system for your needs.

My take on this is that you may have seen some code somewhere that does sloppy or brittle things with file names, but that does not mean that “storing metadata in filenames” is bad in general.

File names are metadata- they are data about the data in the file, independent of the file data itself. In fact, filenames are so old that they are probably the canonical example of metadata.

If you consider that file extensions are just the end part of the filename, then the filename-as-metadata concept becomes even more unavoidable.

I want to rank metadata. Yes, a filename should contain metadata, because it actually is metadata, or do you refer to files by their inode number?

The filename: This is the first info you ever get about the file. Think
of it like a title of a book. Thus, in addition to uniquely identify the file,
it should be short summary of its content. For example lenna.bmp indicates
that the file is the face of Lena Forsén, encoded as a Windows bitmap file.
A better name would be lena_with_hat.bmp, because that is what the picture
is, but since lenna is an established name, we continue with that name.
The file headers: These should contain at least contain all information
required to process the data, in this case, a picture. The header would tell
you that the picture is 512×512 pixels, and that each pixel occupies 3 bytes.
Here, some information may be implicit from the fact that it is a Windows
bitmap file. For example, you need to know which one is red, green and blue,
or if pixels are stored as separate planes. Since it is a Windows bitmap
file, it is known that data is stored BGR, BGR, BGR …. It should be noted
that different applications may require different metadata. For example,
to replicate the camera that was used to take the picture, apperture, film
size, and focal length would be needed.
The index file: Here, you store everything else, that would not fit
into the other two categories. It could be additional notes about the picture
(location, copyright info) although these could go into (2) as well. It could
also contain a list of related pictures.

Notice that sometimes, the filename may contain information found in the file headers. This is not ideal but could be the case when there are multiple versions of the same file. In this particular example, there could be files like

`lenna_512x512.bmp`, `lenna_256x256.bmp`, and `lenna_128x128.bmp`

While these names contains information that are already in the file header, the filenames have to be unique, and they should be descriptive. What makes these files unique? They have different resolution. Thus, it is not a bad choice to include the pixel dimensions in the filename. As an alternative schema, you could use the suffixes _hires _midres, and _lowres, though it would be tricky to come up with names if you have many versions. The ultimate solution here is to store the different versions in the same file, but for some reason, you do have to stick to a file format that does not support that, so you have to do it like this.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 07:31

Thẻ: programming-practices

Thiết kế website giá rẻ

Danh mục

Is it bad practice to store metadata information in file names? Better solutions?