Is there a proper way to create a file format?

I’m building a proprietary file format for an application I wrote in C# .NET to store save information and perhaps down the line project assets. Is there a standard on how to do this in any way? I was simply going to Serialize my objects into binary and create a header that would tell me how to parse the file. Is this a bad approach?

The most straight-forward method is probably to serialize your structure to XML using the XMLSerializer class. You probably wouldn’t need to create a separate header and body structure – but serialize all assets into XML. This allows you to easily inspect / edit your file structure outside of your own program, and is easily manageable.

However, if your file structure is really complex, containing many different assets of different types, such that serializing the entire structure to XML is too burdensome, you might look at serializing each asset separately and compiling them into a single package using the Packaging library in C#. This is essentially how .docx, .xslx, .pptx, and other office file formats are constructed.

From someone who has had to parse a lot of file formats, I have opinions on this from a different point of view to most.

Make the magic number very unique so that people’s file format detectors for other formats don’t misidentify it as yours. If you use binary, allocate 8 or 16 randomly-generated bytes at the start of a binary format for the magic number. If you use XML, allocate a proper namespace in your domain so that it can’t clash with other people. If you use JSON, god help you. Maybe someone has sorted out a solution for that abomination of a format by now.
Plan for backwards compatibility. Store the version number of the format somehow so that later versions of your software can deal with differences.
If the file can be large, or there are sections of it which people might want to skip over for some reason, make sure there is a nice way to do this. XML, JSON and most other text formats are particularly terrible for this, because they force the reader to parse all the data between the start and end element even if they don’t care about it. EBML is somewhat better because it stores the length of elements, allowing you to skip all the way to the end. If you make a custom binary format, there is a fairly common design where you store a chunk identifier and a length as the first thing in the header, and then the reader can skip the entire chunk.
Store all strings in UTF-8.
If you care about long-term extensibility, store all integers in a variable-length form.
Checksums are nice because it allows the reader to immediately abort on invalid data, instead of potentially stepping into sections of the file which could produce confusing results.

Well, there are times what you describe can be a very bad approach. This is assuming when you say ‘serialize’ you’re talking about using a language/framework’s ability to simply take an object and output directly to some sort of binary stream. The problem is class structures change over the years. Will you be able to reload a file made in a previous version of your app if all your classes change in a newer one?

For long term stability of a file format, I’ve found it better to roll up your sleeves a little bit now and specifically write your own ‘serializing’/’streaming’ methods within your classes. ie, manually handle the writing of values to a stream. Write a header as you state that describes the format version, and then the data you want saved in the order you want it in. On the reading side, handling different versions of the file format becomes a lot easier.

The other option of course is XML or JSON. Not necessarily the greatest for binary heavy content, but simple and human readable… a big plus for long term viability.

I was simply going to Serialize my objects into binary and create a header that would tell me how to parse the file. Is this a bad approach?

From someone who’s been on the receiving end of someone else doing this …

YES, it’s a Bad Idea.

We had a very old application, written in a now-obsolete technology that did exactly this – dumped the object out of memory and wrote it into a file. Easy to code, nice quick solution for the Developers. Two decades and some down the line, when that technology got trashed on security grounds, we were left with thousands of these binary nightmare files lying around, still used by the business, but with no way to edit them.
Picking the file “format” apart and interpreting it into a replacement application was … “Fun”.

I would also love to hear answers to this question from people with years more experience than myself.

I have personally implemented several file formats for my work, and I’ve moved over to using an XML file format. My requirements and hardware that I interact with change all the time, and there is no telling what I will need to add to the format in the future. One of XML’s primary advantages is that it is semi-structured. For this reason, I generally avoid automatic XML Serialization that .NET provides because I believe it forces it to expect an exact format.

My goal was to create an XML format that allowed for new elements and attributes to be added in the future and for the order of the tags to not matter whenever possible. If you are sure that you can load your entire file into memory then XPATH is probably a good choice.

If you are dealing with particularly large files, or for other reasons cannot load the file all at once, then you are probably left with using an XmlStreamReader and scanning for known elements and recursing into those elements with ReadSubtree and scanning again…

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 18:19

Thẻ: .net, c++, file-structure

Thiết kế website giá rẻ

Danh mục

Is there a proper way to create a file format?