Doug Wilson
3 min readMay 30, 2021

--

"Although relational databases usually provide a decent solution for storing data, speed and scalability might be an issue in some cases."

With 50 years of evolution, optimization, and trial by fire (in server, mobile, and desktop environments) under their belts, I'd say that relational databases can provide an *excellent* solution for capturing and storing data, IF they're understood and used well.

It's worth noting that relational systems are optimized for efficient data capture -- fast, normalized, and valid, according to the data model.

Speed and scalability as "an issue" is something I hear a lot ... from people who really don't understand relational databases well and haven't really used them at scale. It's not something I think I've ever seen *supported* by those who make this vague claim.

Please don't get me wrong. I like your article. I like the side-by-side examples. But the way the data in your examples is modeled appears to pull some data modeling habits that are extreme anti-patterns (bad practices) from the NoSQL world and apply them to the SQL world. This immediately puts SQL at a disadvantage (and perpetuates these anti-patterns as "the right way" or at least a valid way of doing things).

Properly designed relational data models would NEVER allow 'make' or 'color' values like "hyundai" or "white". Instead, these values would come from *related* tables (hence the name "relational") and would therefore be represented in your first "one car" data point as discrete 'makeid' and 'colorid' integer values, not string values.

Why is this important?

1. Integer-based joins are wildly more efficient and performant than joins based on string values.

2. Choosing from a predefined list of Make and Color values enforces canonicity (a known set of values for each), prevents "creative" misspellings like "hunday", "hundai", "Hundai", etc and colors that don't exist for 2017 Hyundais from entering the database. Also, each row in a relational table usually contains a unique integer id value or "key".

3. These relationships are automatically enforced by the database engine (known as "referential integrity"), i.e. if someone tries to send an integer Make value that doesn't exist in the Make table, the operation fails. This is code that has to be written and tested by hand and then deployed and maintained forever in many NoSQL systems. Worse, the need for this code doesn't become obvious until after the database is chock full of these variant values that then must be hunted down and fixed by hand. Super efficient, yeah?

4. I could go on at length, but I hope that you can see that the engineers at Postgres, MySQL, Microsoft, etc haven't been twiddling their thumbs for the last 50 years. They've been building tools that can significantly reduce the amount of code that project teams have to build and maintain ... IF those teams will take the time to understand what relational systems are best at and how they work. This means not hiding behind ORM (Object Relational Mapping) and actually learning and understanding why SQL is far more than just a language used to "store data in tabular form".

I'm a big advocate of choosing the right tool for the job, and the right tool for storing entity attribute data (e.g. cars, car buyers, orders, deliveries, etc) is tabular data in a relational database. It's the best fit, and (done correctly) results in the most efficient capture and storage of this kind of data.

As you point out, key-value pairs, wide column, graph, and document data is often (but not always) a better fit for NoSQL databases.

I would love to see you update your examples to make them more realistic in order to make them even more valuable to your audience. The use cases are actually much different than your current examples imply, and I'm concerned that this could unintentionally mislead people.

Maybe "relational-ize" your relational car entity data examples to add a 'carid' column and to reference separate Make and Color tables with integer foreign key values and "document-ize" your NoSQL data examples to reference the 2017 White Hyundai's 'carid' value in order to link its semi-structured marketing data in a MongoDB document record?

I'd also love to see you add the examples of the other use case for tabular data -- dimensional data, which is optimized for data retrieval rather than capture.

Thank you for sharing your article and for considering my suggestions, and please keep writing.

--

--

Doug Wilson
Doug Wilson

Written by Doug Wilson

Doug Wilson is an experienced software application architect, music lover, problem solver, former film/video editor, philologist, and father of four.

Responses (1)