I recently team-taught an all-day course at my company on “Introduction to SQL and Relational Databases.” It was a great experience and I had a lot of fun — feedback so far has been good :).
We were talking about all sorts of SQL basics, like:
1. DML vs. DDL vs. DCL vs. Other (where Other are things like vendor-specific commands to start and stop the database)
2. Datatypes in SQL
3. WHERE clause and permutations thereof
4. Introduction to Joins, basic INNER, LEFT OUTER, RIGHT OUTER, etc.
5. I gave a quick demonstration of what “normalization” means and how it can be useful to avoid problems like data inconsistency and duplication. I did not actually teach how to normalize and model data, however, because that is a whole involved other discussion that we could talk about for days on its own.
Some questions came up that were fairly involved that I could not answer during the course. one of those questions was “What is JSON and why do we/I see it in some databases?”
JSON is a method to represent data in a text file. It is a kind of data serialization . Data serialization is a tool when you are working in a system and want to export data and send it to another system to import. The “old-school” method of data serialization is “flat text” files, like comma-separated or tab-delimited files. CSV or tab-delimited files can be useful but are problematic because if you don’t know what the fields mean in them beforehand and they also don’t allow any representation of hierarchical relationships in the file.
So then people dreamed up the idea of XML, which actually has tags in the file to represent the data and also allows hierarchies to be represented.It also has a bunch of other advantages, like allowing definitions of schemas to check variable types. Simple XML example:
<?xml version="1.0"?> <catalog> <book id="bk101"> <author>Gambardella, Matthew</author> <title>XML Developer's Guide</title> <genre>Computer</genre> <price>44.95</price> <publish_date>2000-10-01</publish_date> <description>An in-depth look at creating applications with XML.</description> </book> <book id="bk102"> <author>Ralls, Kim</author> <title>Midnight Rain</title> <genre>Fantasy</genre> <price>5.95</price> <publish_date>2000-12-16</publish_date> <description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description> </book> </catalog>
So by “hierarchy” I mean how in this example we could have authors in books in a catalog. You can’t really do this accurately in this way in a flat-text file.
XML has problems though, like verbosity and complexity, as well as taking up a lot of space in a file to describe a given amount of data vs. comma-delimited or tab-delimited files. In the meantime, the IT community had been working on other data serialization formats, and came up with JSON.
Now — why does JSON sometimes show up in a database when it is a data serialization format and generally intended for sending between systems?
1. That anything that makes development faster and more efficient is useful and we should move in that direction (the school of thought that puts XML or JSON right into a database.)
2. That the most important thing is the quality and consistency of the data, and while developer effort is important to consider, it is not such a bad thing to make developers work harder up-front in their applications to force quality and consistency of data upon insertion or modification of data in the database rather than have to deal with data issues later. So just use JSON or XML for transmissions between systems, but not as permanent data storage formats. Model and store data that should be relational in a relational form, hopefully appropriately normalized. 1
I fall into camp #2 (angry DBA stereotype, right?) but there are lively debates going on all over the place about this kind of thing.
A comparison of data serialization formats here.
EDIT: 2/23/2015 1:17 PM PST
I was thinking a bit more about what bothers me about JSON in a database. And after reading this excellent article by Fabian Pascal about Hadoop vs. RDBMSes. To quote:
“…The core idea behind the database concept is native application neutrality–a general purpose data resource with multiple “uses” (applications) and data functions centralized in a DBMS…. ”
“…This requires moving data constraints out of the database and into the application…”
That requires you to trust the application developers/coders to truly understand data consistency, rules, etc., and 90% of the time I would not trust in this matter.
- I admit that there are certain kinds of data, like lots of BLOBS, that are not well-suited to relational databases. I also admit that I am glossing over the whole CAP vs. ACID/BASE vs. ACID discussion here and that in very large-scale systems you might have other considerations. But most of the time we are not building giant globe-spanning systems and we also do not have the staff and resources of an Amazon or a Google to build and maintain these kinds of systems.