Choosing between Relational database VS Document database (#NoSQL)

Hesham Yassin
3 min readJul 18, 2020

--

When I design a data-driven system, whether it is for an interview, home project, or work-related purpose, the database type is the last thing I think of. The wrapper design pattern gives that flexibility.

Sometimes, the answer is obvious as it could be Relational, Document, or a hybrid of both.

  • In this post, I will not discuss performance and scaling.
  • When it comes to text search, considering text-search engines such as Lucene, Solr or ElasticSearch is the right choice.
  • Graph databases and column databases are not a concern in this post.
  • This post doesn’t provide practices for in-memory databases such as MemCached or Redis.

Resolving the argument

data relationship and application code simplicity

The data relationship is an important factor for choosing a data model. The possible data relationships are:

  1. one-to-one: For example, each country id is mapped into a country name.
  2. one-to-many: A hierarchical data model. e.g. user record and its group of related info such as gender, age, school…
  3. many-to-many: When entity A has many children from entity B and vice-versa. A classical example for that is the Student-Class relationship. A student can enroll to many classes and any class may include many enrolled students.
  4. many-to-one: The opposite direction of one-to-many: e.g. All the users at certain age.

The document database is more efficient to use for one-to-many relationships because one query will retrieve all the entity’s related data. However, reading partial data is not efficient as the whole document will be loaded each time (See Data locality below).

Use-case: If you want to use the document database to store LinkedIn profiles, and you are interested in retrieving the available users names only, this function is less efficient in the document model.

With either many-to-many or many-to-one relationships, the Document model is less appealing as the complexity of the joins will move to the application code.

Data locality

With document databases, the document is stored as a continuous json or xml while in relational databases, the data is shredded over several tables. Retrieving the data in a document model is more efficient and easier for processing. The advantage comes in handy only when the user need large parts of the document at the same time.

Updating a specific document key requires rewriting the whole document. This is not the case with relational databases.

Declarative vs imperative query

The query language of the document model is imperative. i.e. the application code implements a logic to filter the desired query by iterating the documents and filter the desired ones. MapReduce comes in handy to eliminate such boilerplate code. However, it is not efficient as using a declarative SQL query language.

Imperative querying is less efficient as it doesn’t enable parallelism easily: It is hard to asynchronously iterate over documents to filter the desired ones. While it is less easier to run SQL query over SQL engine where the later has the flexibility to optimize it.

Schema flexibility

The document model places no restriction on the schema (and the JSON mode in the relational databases as in DB2/PostgresDb). As schemaless as it may seem, the application code still assumes a basic structure over the document.

Changing an attribute name in the document model doesn’t require down-time as in relational database.

With relational database, an UPDATE+ALTER query must execute over the database which requires downtime. while with document databases, the user applies the new schema for each record.

Data warehousing

Data warehousing are a practice used for sharpshooting the companies databases into dedicated databases for data analysis. Amazon Redshift and SAP HANA are a good example. Such databases are usually relational due to the fact that declarative query language are better for data analysis. \

Relational databases are sufficient solution when we are talking about millions of rows. However, when it comes to trillion of rows, Column based databases are more practical. I will not elaborate about column based databases (See Vertica) in this article.

I hope you enjoyed this post. Like, Comment and share.

Originally published at http://exhesham.com on July 18, 2020.

--

--

Hesham Yassin
Hesham Yassin

Written by Hesham Yassin

Software engineer. works for General Motors. was employed by IBM, Microsoft, Amazon and CheckPoint.

No responses yet