What would be a good use case to use a native XML database such as Apache Xindice and eXist-db? I have used XML features of SQL Server in the past and they were of great value, but there it is possible to use XML for 5% and traditonal storage for 95% of the application. Which application types would benefit from a 100% XML storage?
2
Given you have a program or program system which produces output in form of XML documents, and the data in those XML documents does not fit well to, for example, a relational data model, then an XML database may be the best tool for storing the documents and make them queryable.
This is not a hypothetical use case – our team develops a product where such XML documents are produced. Until now we did not use a database, only folders with collections of XML documents, and an increasing number of tools for batch processing the documents on a per-file basis. But I think the more tools we get, the more it will make sense to switch to a database.
1
The only use case that comes to mind is a small blog, but I’ve seen more than one of those
converted to a database when they found out that using XML for storage doesn’t scale.
XML is often confused with databases by those unfamiliar with it. XML is designed to be a data exchange format, not a storage medium.
That said, apache Xindice states the case for an XML database fairly eloquently. They say:
The benefit of a native solution is that you don’t have to worry about
mapping your XML to some other data structure. You just insert the
data as XML and retrieve it as XML. You also gain a lot of flexibility
through the semi-structured nature of XML and the schema independent
model used by Xindice. This is especially valuable when you have very
complex XML structures that would be difficult or impossible to map to
a more structured database.
The application characteristics that make an XML format attractive (vs. RDBMS) for storage are:
- Processing of individual “root” items in isolation (as opposed to processing groups of individuals in a batch)
- A small number of individuals (“n=1”), or significant variability across individuals in respect to the presence or number of optional attributes (possibly extending to wide use of custom attributes).
- Highly restricted “ownership” of data, such that competition for locks to update individuals does not occur.
By “individual” here I do not mean (only) people, but whatever entity is central to the business logic of the application. E.g. an “individual” might be residential properties in a real estate sales application, or a calendar date in a day-planner application.
2