As per this link, this is the xml that maintains “BookStore” data.
I see that number of occurences of author are varying for any book.
xml’s hierarchical representation of BookStore
looks more meaningful/intuitive for me to visualise the data unlike relational table tuples. It is rather more difficult(kind of tweek) to map such hierarchical information in tables.
It is strange to see that real world(hierarchical) data(for example, SNMP MIBs of network devices or BookStore etc…) are mapped to tables of records in many commercial softwares. For me, it is not-intuitive(rather unnecessary skill) to convert such hierarchical data to tables, despite relational DB(like MSSQL) support massive/safe/multi-user/convenient/efficient/reliable unlike xml file format.
So, we are trying to fit hierarchical ordered data into table tuples which is an overhead. Do we have any commercial database that companies use to maintain schema in hierarchical fashion?
Note: am currently part of database course.
2
If you choose to maintain a hierarchy in a relational database, you need to look into the Nested Set design pattern. (See Wikipedia)
This model involves some programming, and it involves some overhead at insert or update time. The benefit comes at retrieval time. Retrieving the path or the subtree for any given node is easy and fast, when compared to the traditional approach, called Adjacency List.
1
Hierarchical databases used to be very popular, but they went out of fashion in the 80s because they are not very good at supporting ad hoc querying, and setting them up could be difficult, I believe. The best known standard for hierarchical databases was the CODASYL data model (see http://en.m.wikipedia.org/wiki/CODASYL for details of this), which was integrated into the COBOL language. Commercial implementations are still available, but I don’t anyone does new work in it any more.
A more modern equivalent is the document-store database, of which mongodb is the most popular. Mongodb stores and handles queries about documents in BSON, which is a data format roughly equivalent to JSON in capability. This means that while not all xml can be easily mapped to it, a lot can, including the examples you link. See http://bsonspec.org/ formore detail about BSON.
3
Some (much?) data, such as the book example, isn’t inherently hierarchical. That the bookstore example uses hierarchical storage is a consequence of XML’s tree structure, not the inherent structure of the data. Consider that a book can have many authors, and an author can write many books, which means neither can strictly belong to the other. XML gets around this by using identity attributes by which one node can reference another; the same technique is used in other serializations of circular data structures.
A full relational model can deal with truly hierarchical, homological data using closure properties; specifically, the transitive closure allows tree paths to be retrieved using a parent-child relationship. The real problem is that SQL and most production RDBMSs don’t support closure properties in general. Transitive closures are available in SQL with Common Table Expressions/the WITH RECURSIVE
clause but are relatively new in implementation and don’t seem to get used as much (and aren’t supported by all RDBMSs). More typically, you see the full path stored in the table (Farey Fractions can be considered paths using their decimal expansions and special markers for repeated trailing digits, similar to quote notation).
Another data model you used to see often was the network model, where nodes are datums and edges are relationships. In the book model, there’d be an edge from each author node to each book node for a book that the author wrote. The database reports & specifications created by CODASYL used a network model. There are various issues with using a network model that the relational model specifically addresses; Codd’s seminal paper has more.
If the relational model seems non-intuitive, perhaps it’s because you have yet to grok the relational model. Don’t think of it as tables and rows (which are more what you find in a spreadsheet) but as declarative relationships:
Hector Garcia-Molina wrote "A First Course in Database Systems" Jeffrey Ullman wrote "A First Course in Database Systems" "Database Systems: The Complete Book" is a book with ISBN '0-13-815504-6' and price $85 ...
From there, you write the statements using predicates:
Wrote(Hector Garcia-Molina, "A First Course in Database Systems") Wrote(Jeffrey Ullman, "A First Course in Database Systems") Book("Database Systems: The Complete Book", ISBN:0-13-815504-6, price:$85) ...
Note that the Book
example isn’t a simple predicate because two of the datums are tagged with names; this is part of what distinguishes a relationship from predicates, relations and other similar mathematical objects. Predicates define sets, so you can use set operations to define new relationships. This brief overview is very informal and imprecise, but should give you a starting point.
2