Using XML as data storage [closed]

I was thinking about the XML format and the following quote:

“XML is not a database. It was never meant to be a database. It is never going to be a database. Relational databases are proven technology with more than 20 years of implementation experience. They are solid, stable, useful products. They are not going away. XML is a very useful technology for moving data between different databases or between databases and other programs. However, it is not itself a database. Don’t use it like one.“ -Effective XML: 50 Specific Ways to Improve Your XML by Elliotte Rusty Harold (page 230, Part 4, Item 41, 2nd paragraph)

This seems to really stress that XML should not be used for data storage and should only be used for program to program interoperability.

Personally, I disagree and .NET’s app.config file that’s used to store a program’s settings is an example of data storage in an XML file. However for databases rather than configurations etc XML should not be used.

To develop my point, I will use two examples:
A) Data about customers with fields that are all on one level i.e. there are a number of fields all relating to one customer with no children
B) Data about configuration of an application where nested fields and properties make a lot of sense

So my question is, Is this still a valid statement and is it now acceptable to store data using XML?

EDIT: I’ve sent an email to the author of that quote to ask for his input/extra context.

3

This quote is not about using XML as a storage format in general (for which it is fine, depending on the requirements), but for database-type storage.

When people talk about databases, they usually mean storage systems that store huge quantities of data, often in the gigabyte or terabyte range. A database is potentially much larger than the amount of available RAM on the server that stores it. Since nobody ever needs all the data in a database at once, databases should be optimized for fast retrieval of selective subsets of their data: this is what the SELECT statement is for, and relational databases as well as NoSQL solutions optimize their internal storage format for fast retrieval of such subsets.

XML, however, doesn’t really fit these requirements. Due to its nested tag structure, it is impossible to determine where in the file a certain value is stored (in terms of a byte offset into a file) without walking the entire document tree, at least up to the match. A relational database has indexes, and looking up a value in an index, even with a primitive binary-search implementation, is a single O(log n) lookup, and then getting to the actual values is nothing but a file-seek (e.g. fseek(data_file_handle, row_index * row_size)), which is O(1). In an XML file, the most efficient way is to run a SAX parser over your document, doing an awful lot of reads and seeks before you get to your actual data; you can hardly get this any better than O(n), unless you use indexes, but then, you’d have to rebuild the entire index for every insertion (see below).

Inserting is even worse. Relational databases do not guarantee row order, which means they can just append new rows, or overwrite any rows marked as ‘deleted’. This is extremely fast: the DB can just keep a pool of writable locations around; getting an entry from the pool is O(1) unless the pool is empty; worst case, the pool is empty and a new page has to be created, but this too is O(1). By contrast, an XML-based database would have to move everything after the insertion point to make room; this is O(n). When indexes come into play, things become even more interesting: typical relational-database indexes can be updated with relatively low complexity, say O(log n); but if you want to index your XML files, every insertion potentially changes the on-disk location of every value in the document, so you have to rebuild the entire index. This also goes for updates, because updating, say, an element’s text content, can change its size, which means the consecutive XML has to shift. A relational database doesn’t have to touch the index at all if you update a non-indexed column; an XML database would have to rebuild the entire index for each update that changes the size of the updated XML node.

Those are the most important downsides, but there are more. XML is very verbose, which is good for server-to-server communication, because it adds safety (the receiving server can perform all sorts of integrity checks on the XML, and if anything went wrong in the transfer, the document is unlikely to validate). For mass storage, however, this is killing: it is not uncommon to have 100% or more overhead for XML data (it is not uncommon to see overhead ratios in the 1000% range for things like SOAP messages), while typical relational DB storage schemes have only a constant overhead for table metadata, plus a tiny bit per row; most of the overhead in relational databases comes from fixed column widths. If you have a terabyte of data, a 500% overhead is simply unacceptable, for many reasons.

XML is lousy for data storage. First, it is very verbose. Data stored in an XML file will take much more disk space then the same data stored in any reasonable database system. In an XML record, the name of a particular field will be stored twice, along with the string representation of the data. So, for example, to store a single integar in a field called “foobar”, you end up with this 19 byte string:

<foobar>42</foobar>

On the other hand, a real database will store this as a single integar value, taking 4 bytes. If your database is small, that doesn’t mean much, but if you have 10,000 records, that’s a problem.

Second, an XML has to be parsed from text every single time the file is read. For the above field, a real database simply reads the binary data into memory from the offset it knows it stored the field “foobar” in. If the file is stored as XML, it has to read the field “foobar”, parse that text, determine what field it is, then parse the string “42” and convert it into the binary 42.

Thus the performance penalties for using XML are huge. The benefits of XML are that it is somewhat human readable, and that it allows for easy transfer of data between completely separate systems. Neither of those advantages applies for a local database.

The one exception is configuration files, which are generally small, and generally need to be editable by humans.

An XML database absolutely will be larger and slower than any reasonable SQL system. Unless you can find a counterbalancing advantage in human readability or interoperability, there’s just no point in using it for data storage.

3

XML Is viable depending on the context. If your data is pretty static, and not changing much (Sample data for example), yes XML Is a good use.

Configuration settings, sample data (even if it’s millions of rows, but rarely changing), are all good uses of XML.

Hard disk read/writes are expensive, way more than accessing data from an Oracle/Sql stack.

This seems to really stress that XML should not be used for data
storage and should only be used for program to program
interoperability.

Your premise is flawed.

The paragraph you quote is actually saying that XML is not a replacement for a database, not that it shouldn’t be used for data storage.

It is clear that a settings file is not the same thing as a database, and so different technologies can (and should?) be used.

Correct me if I am wrong, but you seem to have more experience with mark-up languages than databases. If you got a bit of experience with databases you’d realise which domains the two different techologies are suited for.

This is really subjective. That quote is, like, someones opinion, man.

Honestly, I think XML is a viable alternative to a database as it has multiple advantages over a RDMS, including low overhead, which equals cheaper storage (especially when using a hosting service that charges for databases separately).

Take a look at dasBlog and BlogEngine. Both of those applications use xml for storage as a default.

That said. It isn’t a RDMS, and if you have high volatility (lots of updates, inserts, or deletes) in your data or require high availability, use a database. XML is fine for storing small things like configuration data and low volatility data.

3

my question is, Is this still a valid statement and is it now acceptable to store data using XML?

I see your point in you example about .NET configuration files. However, any other file format could have been used. In fact, in the old days, such settings used to be stored in regular text files called INI files.

I see that the statement you have presented in gray, is valid and correct if you define a database as a software system.

The definition of XML in XML-Definition
states that ” (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.”

This definition focuses on readability and language rather than on mechanisms to manage the data.

Compared to an RDBMS, XML does not provide means to randomly insert and delete rows in an XML file. For example, if you have 1000000 rows, and you wanted to delete rows at random even in a single user environment XML based file would not be a good choice for a database. Also, XML does not provide any native mechanisms for locking data. In fact, since XML is not a software, all ACID (atomicity, consistency, isolation, durability) properties that guarantee that database transactions are processed reliably in a shared environment are left to the developer to build (with the exception of Durability).
XML does not have a robust specification to handle data integrity across XML files, let alone different servers (e.g. customer xml file and orders xml file – No FKs to enforce integrity).

The above is not an enumeration of what XML lacks, instead, it could server as a quick justification of the statement that XML is not a database software.

XML never meant to be a database or replace it.

XML is mainly defined for Web documents that allows for the creation of customized tags for individual information fields. However, you would never achieve relational centralized data management with it.

Why would you actually want to use XML for storing data in the first place? I mean, it’s a language after all…

While one could argue that it’s a flexible and easy to understand format, that only applies when you have to do manual editing to the files. When you actually interact with the database with common interface(fetch data X which meets the requirements Y and Z, store/update data X, …) those advantages become void.

2

Short answer:
It depends.

Long answer:
From my point of view this strongly depends on the amount of data you want to store. E.g. if you have a couple of objects in your application during runtime and you want to store them after running the tool a XML file is perfectly fine. However, if your webshop has 5000 custumers and even more orders a database would be a more appropriate data storage.

Additionally I think storing settings in a database and not in a file like app.config is in most cases not very useful, but I don’t think this example proves the quote wrong.

XML is an excellent choice for configuration settings. Not only are XML files easy to parse/highlight in an IDE, they’re very easy for non-programmers to edit. I find them incredibly useful in web development scenarios where maintenance tasks are being performed by designers and content managers.

XML should typically not be used as a primary data source for any non-trivial applications. The serialization/deserialization overhead alone begs for a different solution.

The term database can refer to either the raw data only, or the database management system as well. This definition makes a big difference in the entire argument.

If we use the RDBMS definition, then XML has very little in that sense. You get very little in terms of ACID guarantees (you’d have to write your own code to accomplish those). If you need those (and most transactional systems do), you are already in major trouble. I could give a list of hundreds of features which are taken for granted with RDBMSes, which you’d have to reinvent and reimplement. Think security models, replication, backups, just to name a few basic ones.

In the above sense, no, XML is not a database, and you shouldn’t try to use it as one.

If we use the “raw data” definition, XML fares a lot better, but still not that great. As others have pointed out though, it is hugely verbose in general, typically lacking binary encoding, and having duplicate tags, etc. These are trade-offs made so that XML can be human-readable – basically, efficiency is the enemy of this requirement. XML is also not a particularly good fit for even the simplest situations where you are inserting records continuously. Assuming you want your XML file to be valid, you need a single closing tag, which means that appending a record means you need to shift up the tags at the end. This is pretty expensive (how do we know where that tag begins? what if there are multiple “tables”, do we just move up the entire file?), and if you want to work around it, you’ll reinvent a similar approach to many databases – spreading out tables over multiple files, and dynamically growing those files as needed.

There are situations where XML is appropriate – config files are a great example, because they are typically small and human readability is an excellent feature to have. Having a database just for a config file may be overkill.

Databases, on the other hand, are excellent when you have thousands (or millions / billions) of records, and have many users concurrently updating them. So yes, XML is not a database, and you shouldn’t use it like one. Your example happens to be one of those situations where you didn’t need a DB in the first place, and XML is the better fit.

The way I see it is this: if you use XML as a DB (say, as a backing store for a transactional system), you will end up reinventing and rewriting an RDBMS. That’s a really poor way to spend your time and energy. I think this is what that quote was saying as well.

I agree that it’s not a relational database. I think the author is simply saying in the quote not to use it as one.

Having said that though you may or may not need one. If you don’t really need to do much querying on the data, and only intend to store it and then fetch it later based on some limited query criteria then you need XML DOCUMENT storage and retrieval – not a relational database.

There are plenty of applications which simply need to store a document with data in it for retrieval in whole later. If this is the case then it’s useless to create a SQL based schema, parse the XML, and then serialize it to the database only to do just the reverse later. There is a lot of code overhead potentially involved in doing that. There is less though if you do it right.

You can use ORM tools like Hibernate and tools like Apache Axis in order to autogenerate practically all the code you’d need to build a service which just handles simple CRU operations. You’d have to wrap that in authentication of course, and possibly might want to segregate the data based on the user, level of access, etc. You may even want to limit which operations a given user is allowed to do via SOAP service for example.

In this sense you’re doing more like content management than anything else.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật