NoSQL databases: A detailed overview for beginners

By Alex Williams, Hosting Data UK researcher

NoSQL (‘Not Only SQL’) represents a philosophy for creating databases with design patterns that avoid relational data storage. This need came during the 21st century as databases needed to handle a wider range of decisions.

Disclaimer

Many Graphical User Interfaces (GUIs) exist for bridging gaps between modern databases of different types. Today it may therefore be more important for application developers and database owners to firstly consider operational capabilities required to handle their project first, then secondly to choose a specific SQL / NoSQL database. Simply, hundreds of different options exist to choose from, even though broad myths still circulate. While some specific databases are better than others in particular scenarios, many function as stalworth general-purpose solutions.

What Is NoSQL? A bit of history: Past & present

The term ‘NoSQL’ was coined organically (as a Twitter hashtag) in around 2009 by a community group developing Non-Relational and Document Models—ie. before this, the early-to-mid-2000s saw SQL databases used as standard. NoSQL became a popular nickname in the developer community, representing emerging non-SQL databases.

As we mentioned above in the disclaimer, it’s trickier to make sense of the landscape now, which is why the term has since been refined, in formal-speak, to “Not Only SQL”.

This Guide

With the above said, for the purposes of this beginner guide we are going to focus on the baseline function of NoSQL databases.

As NoSQL was a response to SQL's inflexible structure, we will compare these two these two in terms of scalability, schema, performance and support. We are also going to explain four NoSQL databases— document databases, key-value stores, column-oriented databases and graph databases. (Time-series databases are a recent addition and have had an explosive growth.)

NoSQL vs SQL

As NoSQL is essentially the response to SQL’s rigid structure, I'm going to do a comparison of these two in terms of scalability, schema, performance and support.

Scalability

A traditionally central difference between SQL and NoSQL; NoSQL databases are considered to have higher scalability than SQL databases. For instance, MongoDB has native support for sharing (horizontal data partitioning) and replication, to aid this. Although you can insert these features, to a degree, into SQL databases, requiring significant human and hardware effort depending on what specific database you use. Either way, optimizing cloud storage competitively / massively may increasingly need machine-learning-power.

Schema

Another key concept. First let’s look at the difference between a normal schema structure and a schema-less structure.

vrEyMiQkehy5QW3WxK-Gf1PH1VYxrJAeD3RTBzRgVY-kX_tQbQl6c8INQTYAN7zgmGCkb2s_w7j9p2_GD_lq3E30SoarjVoUPazoYpr_JQbkcu7Y8su_KunWY0tUjFtemawObFGI

On the left is a schema-less example. NoSQL allows a Non-Relational data structure between two documents wherein no common fields are needed, so that different types of data can be stored. Whereas the Relational schema, as seen in the right example (a JOIN operator used between two tables), stores data in a defined way so that it can be retrieved with rigid predictability.

Performance

We’ve already mentioned that NoSQL databases have higher scalability, but you could use Object-Relational Mapping (ORM) tools to make in-depth analytics even for multiple data sources. While this will eat up system resources, MySQL is nonetheless used in high-end apps by technological giants, offering robust performance—eg. by YouTube, Google, Facebook, WordPress, Twitter, among many others.

Security

Staying secure as you build: network security (SQL injections/SQi, cross-site scripting, so on) should be simple and automated, with enough support to secure custom codes, and able to identify threats well before users or systems are impacted.

Rule-of-thumb where SQL is superior:

Integrity of data is imperative
Highly logical discrete data assets that are well formed
Team with good developer support and experience, using rubric-based proven tech

Rule-of-thumb where NoSQL superior:

Flexible or less complicated project goals, ability to immediately code
Growing evolving and unrelated data needs, with unrelated aspects
Scalability and speed is essential

Community support

SQL is older than NoSQL, so the community has had way more time to grow. In fact, it was created by IBM in the 1970s. By comparison, NoSQL was created by Carlo Strozzi in 1998. For this reason, you’ll find tons of communities and forums for SQL in comparison to NoSQL.

Types of NoSQL databases

Choosing a database is part of a company’s first step of a digital journey into the Internet of Things (IoT). While we could cover more types of commonly used NoSQL databases, a great place to start for beginners is with the following four key ones:

1) Key-value stores—TL;DR: Based on distributed hash tables and Amazon’s Dynamo paper, letting you schema-lessly handle loose data in massive amounts.

Key-value stores are the least complex NoSQL database. Due to this, they are especially scalable, allowing you to do horizontal scaling for massive data quantities. To query individual objects, specific keys are required. This churns out a value (resulting data) which is assigned to that key.

Unlike relational databases, no preset schemas are required which means you can store data more flexibly with faster growth performance. No need to depend on placeholders.

When to use

When a low-resource solution is needed and you are playing with simple loose data in large quantities. Key-value stores are typically used for storing, caching, recommendations, ad services and handling user sessions.

Database examples: Redis, Riak and Project Voldemort

2) Document stores—TL;DR: Collections of documents flexibly independent of one another (each a collection of key-value type), simple to modify.

Also known as document databases, this type is made up of key-value pair groups found inside a document. Each document is a fundamental piece of query data which can be grouped into collections categorized by function.

No schemas are needed to store data. Relatively simple: just migrate each object model into your document. The most commonly used formats are JSON, XML and BSON. Here is a quick look at a straightforward JSON-formatted document with three key-value pairs:

{"ID" : "002","Name" : "Naomi","Grade" : "Junior",}

Furthermore, nested queries are possible in these formats, allowing you to more easily distribute data throughout many disks and have improved speeds.

For example, a nested value could be stringed to the example above as follows:

{ "ID" : "002", "Name" : "Naomi", "Grade" : "Junior", "Classes" : { "Class01" : "French" "Class02" : "Mathematics" } }

When to use

When you need speedy performance of loose data while evolving a database. For instance, document databases are used for handling user profile management based on differing values. The lack of a rigid structure lets you add-on an array of each: values and attributes.

Database examples: CouchDB, MongoDB, Elasticsearch, among others.

3) Column-oriented databases—TL;DR: Uses columns: each column is treated separately, with values stored contiguously, boosting query performance (as it has access to specific data columns).

Let’s talk specifically about wide-column databases (wide-column stores). These store data grouped into several columns rather than rows. Information is organised and performed in such a way comparable to relational databases.

These differ from relational databases, however, in being much more flexible; while you have columns, there are no preset keys or column names. This lets you produce differing column names even inside a single table—ie. append additional columns as-and-when needed.

When to use

Column-oriented databases are especially optimal when large amounts of data need to be stored inside single columns. This minimizes disk-resource demands and querying times; it’s particularly good for data existing across multiple servers.

Database examples: CosmoDB, Cassandra, HBase and Apache.

4) Graph databases (graph networks)—TL;DR: Store data in a graph-like structure, optimal for data that frequently changes.

This final type organizes data through a graphical representation model. As with all NoSQL databases, it is highly flexible. Each graph consists of two components: nodes (which store data entities) and edges (these store entity relationships) so that information in that store can be exactly associated. Commonly, single queries are needed per task and are simple.

When to use

This is the least common NoSQL database. But as you can see from the example above, it is particularly powerful in giving graphical representations as a database solution. Social networks frequently use graphs for managing user profiles and relationships between users.

Database examples: Neo4j, RedisGraph and OrientDB.

Conclusion

You should generally use NoSQL when dealing with highly flexible unstructured data models or some specific need outside of the classic Relational model.

For lots of unstructured data, consider a Document database such as CouchDB or MongoDB. For speedy querying of key-value data (without strong integrity), Redis can be great. And for advanced flexible querying of massive data, Elasticsearch.

Once you get to an intermediate understanding of this topic, you will need to keep in mind that the gap between NoSQL and SQL has been bridged to a high degree in modern databases. A general understanding of the landscape (the problems that app developers face) may be very handy in giving you perspective, so you can focus on personalized problem-solving rather than the SQL / NoSQL camps which are becoming less determinate.