I am a very solid Relational Database guy and understand all the way to 3rd normal form, appreciate the algebraic set theory roots of SQL, and can probably relationalize a broken heart (or not).
I haven’t figured out a relational database structure FOR date nights with my wife, but I HAVE thought about relational database projects ON date nights with my wife..
Now I’m hearing about NoSQL, and researching it. Cutting to the chase, is there anything about NoSQL that is ground breaking, mathematically novel, or a “hey you don’t even really need to organize your data relationally, this is so much easier” type of approach?
Is NoSQL like a super shell to the data structure? In my mind, data must ultimately have structure to be retrieved and the retrieval must be defined in a language of some sort.
9
NoSQL is more evolutionary than revolutionary. It essentially combines the existing ideas of “external database storage” with “using familiar data structures, not relational tables.”
There are more types of databases than relational, for example hierarchical databases. While archaic by today’s standards, it meshed really well with the data structures of its data (e.g. COBOL records). The point is, the data in the database was modeled closely to how records were laid out in the programming languages that used them.
Fast forward to the invention of relational databases, where finally the database separated concerns and, when properly normalized, is a great way to visualize most types of data and relationships between data. It is really easy to understand compared to other types of databases. What it utterly fails at, however, is storing data in a way that mirrors objects and classes in a program. Hence, the invention of object-relational mapping. In other words, the design of the database is actually a hindrance to the design of the program that uses it, which is why we need ORM libraries such as Hibernate. While clean and consistent, there is always that nagging doubt in the back of my mind that something is not quite right there.
This gave rise to two more types of databases, object databases and NoSQL.
Both attempt to solve the issues introduced by relational databases while not exposing us to the mind-bending horrors of hierarchical databases. Data is still laid out in repositories that vaguely resemble tables, but in actuality are more like programming data structures than relational tables. While object databases follow mostly well-defined rules, my understanding is that NoSQL is rather arbitrary. For example, a table might be visualized as a hash table or an array. There is not an easy, well-defined way to query them using an arbitrary tool analogous to Oracle SQL Developer or SQL Server Management Studio.
The idea is that one may define data structures that are easily searched in code, rather than piecing together SQL queries that are better-suited to a SQL database engine rather than expressing the query one desires. For example, fuzzy or partial matches are more difficult and perform worse in a relational database, while a NoSQL database may have a structure that is optimized for such a search and completes in a fraction of the time.
There are languages for querying NoSQL. However, there is no universal language such as what SQL is for relational databases.
Late Edit:
While I am familiar enough with NoSQL databases, this question was the impetus for me to buy a quality book on the topic and to start reading it with the eventual goal of being a real expert on the topic. The remaining comments are based on NoSQL Distilled: a Brief Guide to the Emerging World of Polyglot Persistence by Pramod Sadalage and Martin Fowler.
The authors state that relational databases do not scale well to clusters capable of serving the data needed for sites such as Amazon and Google: NoSQL was developed to fit this niche, relaxing the concurrency and durability in ACID in order to server large number of queries that largely use static data (hence, ACID transactions are not as important).
Furthermore, they posit that NoSQL databases operate without a schema (page 10) which allows NoSQL databases to modify the structure of data more easily. I am not sure that the presence or absence of a formal schema matters in this regard, since SQL databases allow modifying schemas as well. Regardless, the two renowned authors make the claim so it is worth examining.
I believe that both of these main points serve only to enforce my primary point that NoSQL is evolutionary, not revolutionary. They still store data, and make incremental improvements to the scale and modifiability. They also make the point that NoSQL does not seek to usurp relational databases as the king of data storage, only to provide an alternative means of data storage for the types of data that need to scale and morph in a way that (they believe) relational databases do not support well enough.
12
I think you’d definitely like to look at this paper by Erik Meijer & Gavin Bierman, titled “Contrary to popular belief, SQL and NoSQL are really just two sides of the same coin”. In short, it claims that mathematically speaking both approaches base on the same theory, but with some differences.
Couple of interesting differences are, from my opinion, the following: the direction of cross-type dependencies (FK in SQL) are the opposite in SQL and NoSQL and collections type is not limited to set in NoSQL (and therefore some set-theoretical operations might not apply in NoSQL world anymore, but some others are still valid). Yet another interesting point from the article is the single query language proposed for querying both SQL and NoSQL databases. It’s called LINQ, and if you think you might’ve heard this name before, you’re correct: that’s the Microsoft’s querying language from C#.
2
Snowman’s answer correctly describes how SQL and NoSQL differ in their data structures and how these are accessed. However, a probably even more important difference is their respective problem domain.
NoSQL is not a successor of SQL. Rather, the various branches of NoSQL sacrifice some qualities of SQL in order to be better at others. The CAP theorem states that it is impossible for any distributed database system to satisfy all of the following properties:
- Consistency
- Availability
- Partition tolerance
Thus, some NoSQL variants follow the BASE principle instead, which relaxes the always-full-consistency constraint of ACID, which is the basis for classical SQL databases. By losing some consistency guarantees, they gain the possibility of combining high availability and partition tolerance in widely distributed systems, such as for websites with high amounts of data and user queries, but little demand for perfect consistency. Thus, such NoSQL databases are at the heart of Google, Facebook, and Amazon. So, to answer your question: Yes, NoSQL is ground-breaking in that it pretty much enables such massive web services.
This is only one example, as NoSQL is a diverse field, and its variants cover pretty much all possible combinations of parameters within the CAP triangle.
2
Common use cases of NoSQL are groundbreaking in their productivity gains as compared to common SQL-based databases. There are several factors in this.
One is housekeeping. Most NoSQLs are open source and can be installed on a workstation or a VM with a few commands, and work with reasonable defaults out of the box. In my experience, even Postgres and MySQL are not that way; configuration is usually necessary to get started even on a workstation for development purposes.
Another is convenient development, as other answers have described in detail. The JSON indexing capabilities of Mongo, or the key/value semantics of redis and Riak, may be all certain webapps need to get the job done, and the APIs are easy. Some NoSQLs provide their own RESTful APIs, whereas with SQL you typically have to write them yourself.
These factors make NoSQL databases attractive for small-scale projects. The pick-up time tends to be low. Sure, when you go to production you have to configure for security and scale, but the ability to start coding and collaborating quickly is powerful and, I argue, groundbreaking.
Also, related to the above, for small-scale applications (like company-internal or app-to-app services), a team may be able to stand up a production NoSQL database without involving their DBA teams, and without suffering performance or integrity issues as a result. Professional DBAs may not like this, but developers who see DBAs as a source of obstruction (right or wrong) sometimes see NoSQL as a way to bypass having to deal with them. I confess to this – I once changed a small-scale application from Postgres to SQLite to cut out the adversarial DBA, and I have chosen to implement on Mongo rather than Oracle to avoid DBA approval processes and access restrictions. With no adverse consequences in either case.