The history of database development may sound like a fast trip to boredom, but it's actually much more interesting than you might think. The landmark paper "A Relational Model of Data for Large Shared Data Banks" was published in 1969 by E.F. Codd, proposing the relational model for databases that has remained the standard ever since. In a nutshell, the relational model separates the logical organization of the database from the physical storage method.
An introductory course in database development will most likely contain the following topics. Notice the prevalence of the word "relational" in these topics.
A good place to stay up to date on database developments is by visiting Database Journal. This website is current and popular.
The core of every application architecture is some kind of data store, because only through these data stores can we achieve persistence in our applications. This is the backbone of every website and application in software, whether it's a customer/orders database on an e-store, posts on a forum, or files in an operating system. It is the job of data architects, developers, and database administrators to work out the best ways to model and store information in these data stores.
Database systems can follow a number of different models. One such model is the relational model, and it is commonly found as the back end component on websites. This can be contrasted to the naïve concept of a ‘flat file’ database, which in its extreme could be represented in a simple spreadsheet. There are similarities between both: a table has columns and rows, much like a spreadsheet, where columns define headings and rows define entities within your database. At the intersection of each row and column, therefore, we find a single property defining that entity. In a relational model, a single entity has a single dedicated row. For instance, let’s imagine we run a web store. A ‘Customer’ table would have 1 row for each customer. The columns then would describe different attributes of those customers. For example:
ID | NAME | PHONE | STATE | EMAIL
------------------------------------------------------------
1 | Bob Smith | 123-4567 | NY | bob@smith.com
It is important to note that there is a unique identifier (the ID) on the row. This is because these attributes are not unique: a phone number might have multiple customers, and names are not unique either. For instance, let’s say Bob’s wife Alice lives at the same address and makes an order from our company. She too is a customer but has the same phone number as Bob. Similarly, another customer with a different address might also be called Bob Smith. It wouldn’t do if we could not store customers with identical attributes in our database due to these collisions. As a result, our ID is what’s known as a primary key. This allows our database to look as follows:
ID | NAME | PHONE | STATE | EMAIL
------------------------------------------------------------
1 | Bob Smith | 123-4567 | NY | bob@smith.com
2 | Alice Smith | 123-4567 | NY | alice@smith.com
3 | Bob Smith | 765-4321 | TX | bob.smith@email.com
Our database, however, would be useless if it could only contain customer information. We also need products to sell to our customers:
PRODUCT_ID | PRODUCT | PRICE
------------------------------------------
1 | Big Widget | $30.00
2 | Small Widget | $10.00
Again, our ID (here PRODUCT_ID) is unique. It is conceivable that we may have multiple products of the same name (perhaps to sell in different regions) or multiple products at the same price, and so we instead need to be able to generate these artificial, unique identifiers.
Finally we must consider how to actually implement the relationship between customers and products. Customers will be able to make orders of any of our products, and clearly we cannot model orders as part of customers or products, as they are another independent entity. Instead, we must build a table to model orders. This makes use of a new concept: Foreign keys. Foreign keys are references to primary keys from other tables. They have the same values but are no longer constrained by uniqueness. As a result, we can say that one customer has multiple products in an order, like so:
ORDER_ID | CUSTOMER_ID | PRODUCT_ID | QUANTITY
----------------------------------------------
1 | 1 | 1 | 3
1 | 1 | 2 | 1
Note that we have two rows. These indicate that customer id 1 (Bob) has placed an order for 3 large widgets (product id 1) and 1 small widget (product id 2). They share an order number, because it is a single order. We could just as easily represent multiple orders by the same person, because there isn’t a unique key for this table.
These entity relationships are often represented graphically in what’s known as an Entity Relationship Diagram. Below is a simple example which reflects our examples above. Pay attention to the "crow’s feet" that show which side of the relationship is 'one' and which is 'many' – for instance, one customer has many orders. Likewise, one order has many products.
Knowing how to model and represent data entities in such relationship diagrams is a core skill used in both academic and commercial development. Databases excel at the manipulation and storage of large volumes of data, and can quickly become impossibly complex. Only through understanding these core concepts can one succeed in the field of data architecture or database administration.
Being able to store data in databases is only half of the picture. Developers and Database Administrators (DBAs) must be able to pull data back out of databases efficiently and effectively. This is done by writing queries against databases. This article specifically deals with SQL queries (where SQL stands for Structured Query Language), but other database systems exist and have their own query languages (for instance NoSQL on something like Mongo DB). SQL queries are most commonly associated with relational database schemas, although exceptions exist.
The language of a query in SQL is driven by two concepts. The first is that of the reserved keywords which can vary between implementations of SQL (the previous link refers to Microsoft SQL Server’s implementation, Transact SQL), but are largely identical. These keywords are used to leverage certain operations on the database, and can be thought of as being like the function library if you were to compare databases to coded applications. The second element is the schema, which is to say the set of tables, views and stored procedures that exist on the database, and are akin to the variables that exist in your application.
By far the most common SQL keyword is SELECT. SELECT is used to retrieve data from a table, either by specifying specific columns and rows or simply requesting everything. SELECT is paired with the FROM keyword, in order to specify which table you are selecting your data from. The simplest version of select simply returns everything from a table (where TABLE is the name of a table in this example):
SELECT * FROM TABLE
However, we can begin to narrow down on either the columns or rows we are selecting from, as we rarely wish to extract the full contents of a table. We do this by specifying the column names, separated by commas, instead of using the asterisk:
SELECT COLUMN1, COLUMN2 FROM TABLE
It should be noticed that in some implementations you will see objects like column names and table names wrapped with square brackets. This is generally considered good practice as it allows flexibility in the naming of these objects.
Selecting the entire contents of two columns remains a relatively crude operation. Instead we will want only certain rows. As a result, we can introduce the WHERE keyword to our SELECT… FROM statement. WHERE allows us to pick certain criteria to filter our results on, which we supply with the keyword. As an example, therefore, our query would be expanded to look like this:
SELECT COLUMN1, COLUMN2 FROM TABLE
WHERE COLUMN1 = VALUE
The conditions and values used can vary. For instance, Text columns must be compared against other values in quotation marks, such as NAME = ‘Bob’, but can also use the ‘like’ keyword to do partial matches. For instance, NAME LIKE ‘%ob’ would match ‘Bob’ or ‘Rob’, but not ‘Robert’, and ‘Bob%’ would match ‘Bob’ or ‘Bobby’. Such pattern matching can quickly become complex and is a topic and of itself. If the values are numeric, we can also compare to conditions such as greater than (>) or less than (<) – but we do not use quotation marks for the value. For instance, ORDER_NUMBER > 3. We can then combine these with AND or OR to produce complex logic – like WHERE ORDER_NUMBER = 3 AND NAME LIKE ‘BOB%’.
Understanding how to perform these types of queries gives you the fundamental building block for any SQL-based database interrogation. This allows you to move on to advanced topics like joins, CTEs, Stored Procedures and inner queries. Indeed if you wish to undertake a career in software, whether as a developer, tester or even product manager/business analyst, you will either be required to know how to write SQL queries or find it extremely beneficial in performing related duties.
If you wish to practice what you have read here, websites like w3schools often provide live demos where you can run queries against a virtual database. This enables you to practice without having to set up and populate your own database, which can be a large task in and of itself. SQL queries can become very complex and perform poorly, so it is definitely worth learning the basics.
Please note this article assumes a familiarity with SQL databases. If you’d like to learn more, you can start by reading the above section, or by contacting our computer science tutors at 24HourAnswers for further help.
As the name implies, NoSQL Document databases eschew typical Structured Query Language conventions – i.e. tables with rows and columns of data – for a new paradigm. Instead, NoSQL attempts to embrace the reality that data is often, in fact, not all that structured after all. We can often have gaps in our data on certain records, or schema variations between subtypes of data, or indeed nothing in common at all. Imagine perhaps a medical system that contains patient records. For some patients we may have only a name (or not even that, in the case of an unknown patient rushed into hospital); for others receiving long term care, we may have a complete medical history alongside all sorts of notes, test results, and personal data. Were this a traditional SQL relational database, the first patient would have columns of NULL data associated with them; the database itself would need to maintain complex relationships (i.e. joins) between tables, to account for the various levels of multiplicity between tables; reading (and erasing or modifying) that data would be complex; and database normalization would need to be rigorously enforced to prevent the data from becoming unusable junk. So what else can be done?
Types of Document Database
Document databases instead focus on providing querying capability across collections of documents that need only vaguely resemble each other, typically using a specific markup – e.g. JSON or XML. Various platforms exist to accommodate different use cases – for instance, eXist focuses on XML, whereas MongoDB uses BSON behind the scenes, though users will interact with the database in terms of JSON. For the majority of this article we will use JSON in our examples and talk in general terms about a couple of database technologies both MongoDB and Microsoft’s Cosmos DB. In any case, whatever technology is driving your database DB, our expert tutors at 24 Hour Answers will be able to help answer your questions and rapidly solve any problems you encounter.
As a quick reminder, a JSON (JavaScript Object Notation) document is an easy to use, lightweight format commonly encountered in all tiers of software development, from configuration for or descriptions of front ends, to API responses, and even pipeline and container configuration. A simple example of a JSON object might be as follows:
{
"forename": "John",
"surname": "Doe",
"DateofBirth": "1982-04-23T18:25:43.511Z",
"history": null
}
Obviously, each field could instead be its own object, and those objects themselves could have varying degrees of completeness. So, how do our documents get organized within the database?
Collections
Collections in document databases are analogous to tables in SQL databases. Whereas a SQL table will enforce that all the rows within it adhere to the same schema, a collection merely describes and organizes the documents it contains. This is possible because document databases are schema-free. Collections will have restrictions on them – for example, in MongoDB, collection names cannot contain certain characters (such as $ or the NULL character); there are size limits to documents (16MB of BSON); and a limit to how deeply nested documents can be (100). In Mongo, collections need not be explicitly created; inserting a document into a collection that doesn’t exist will create a collection.
Let’s look at an example. In our hypothetical hospital system, it would make sense to have a collection containing all our patient records, but perhaps we might have a separate collection to contain information about the doctors working at our hospital, and yet another to contain other hospitals or practices with which our patients might have a history. Essentially, it makes sense to put entities that exist in their own right into their own collection. This means that if something changes within that entity (a doctor’s name, or another surgery’s address), we only have to update one record, instead of finding all the records that reference that entity and updating them.
So, once we have collected documents into collections within databases, how do we access them?
Queries in Mongo
Let’s see how things are done in MongoDB. As it happens, the Mongo query language is incredibly powerful and anyone looking to learn how to make the most of it should consider enrolling in Mongo University. However, if you have a specific query you need help with, or otherwise could benefit from some expert advice or tuition, our experts at 24HourAnswers will be able to help you achieve your objectives, from getting started to advanced problem solving.
We’ll assume that you’re using the Mongo Shell to query your database. Having connected, you’ll be able to retrieve all documents using the Mongo command:
db.[collection].find()
where ‘[collection]’ is the name of your collection. In SQL, this would be the same as doing
SELECT * FROM [COLLECTION]
Whilst an essential first step, it’s not a terribly practical way to interact with the database. Instead, we will want to be able to query for documents that fulfill a certain condition. Given that our documents consist of JSON, we will need to specify the field and value that we want to match on. We can do this by reusing our previously learned command with a small modification:
db.[collection].find({ surname: “Doe” })
We have inserted a JSON fragment into the find() method. As a result, this fragment will be matched against all documents in the collection and they’ll be returned in the response. This is the equivalent of a SQL WHERE clause. Where Mongo gets a bit more complicated is when we need to perform logical operators on our queries. For instance, we might want to look for a number of possible values, and so we have to use the $in operator, like so:
db.[collection].find({ surname: { $in: [“Doe”, “Smith”] } })
Indeed, Mongo provides a number of different operators to handle different querying conditions. For example, to find documents where a numeric value is greater than a certain target, we can use the following:
db.[collection].find({ stock: { $gt: 30 } })
While you can find extensive documentation provided by MongoDB themselves, our computer science tutors at 24HourAnswers are ready to help with any question you may have regarding Mongo queries, or indeed queries on any form of database technology.
Sharding
Because document databases don’t need to store data in indexed tables, they are easily scalable across multiple machines. This allows the database to scale horizontally over a number of nodes (instead of vertically by upgrading the host machine). This is vastly more flexible: there’s obviously a limit to how far a single machine can be upgraded, but we can keep adding more nodes to a cluster as the database grows.
In order to take advantage of this, we need to inform the cluster how it will be sharded. In Mongo, we do this by setting the shard key for the collection. We have a couple of ways of doing this. If we select a field to shard on, we can set up a number of data ranges to create our shards with. For example, we could use a field that represents a patient’s ‘month of birth’, and create 12 chunks (one for each month). This ensures that we’d have a fairly good distribution of data between the chunks – assuming every month of birth is equally probable. But we could also make a mistake. If we decided to shard on a key that’s not equally distributed – for the sake of argument, hair colour – we’d find an enormous distribution of documents in the most common colours (say black or brown), and few documents in the rarer chunks (like ginger). This would be inefficient, as we want to balance the documents throughout all the chunks evenly.
To get around this, we can take a field, or set of them, and compute the hash of it. Mongo will do this for you automatically. By hashing something like a name, we can take something that might be unevenly distributed (names starting with X are rarer than those starting with S, for example), and produce an even distribution instead. This will ensure the cluster stays healthy and that certain chunks don’t become overloaded.
Hopefully this has given you an insight into the various features of document databases and will give you the courage to explore them in addition to SQL databases, given the opportunity. If you run into any problems, whether at beginner or expert level, our tutors at 24HourAnswers are always available to help you.
Why students should use our service
As you learn to use document databases, our tutors are available to help every step along the way. We have highly qualified tutors who can help with any level of NoSQL and document databases. If you are just beginning working with Mongo, Cosmos, or any other platform, we offer introductory support and online homework help, where tutors can assist you with learning and understanding how to make the most of databases. We offer this either through live tutoring or through writeup deliverables, in which the tutor delivers you clear queries with explanations that demonstrate how to create, read and update documents in a document database, or deal with any other challenge in database development. For intermediate users of databases who are working through larger projects, our experts are available to help guide the design and development of systems, providing advice, feedback, and making changes and revisions to the schema as needed. And for advanced users of NoSQL stuck on the trickiest problems in computer science, we have experts who are seasoned veterans in database development. and computer science techniques, available to explain or solve the toughest challenges you may encounter.
24HourAnswers has been helping students as a US-based online tutoring business since 2005, and our tutors have worked tirelessly to provide students with the best support possible. We are proud to be A+ rated by the Better Business Bureau (BBB), a testament to the quality support delivered by our tutors every day. We have the highest quality experts, with tutors from academia and esteemed institutions such as the Massachusetts Institute of Technology (MIT).
Our expert online tutors in database technology are available anytime to offer help with NoSQL/Document databases or guide your understanding and mastery of your database coursework. For any data modeling or querying challenge—big or small—bring us the question, and we have the answers!
Alexander Sofras is a technical architect with over 20 years of programming experience and 10 years in industry. He currently works in e-commerce and specialises in product discovery and recommendations.
Since we have tutors in all Database Development related topics, we can provide a range of different services. Our online Database Development tutors will:
With these capabilities, our college Database Development tutors will give you the tools you need to gain a comprehensive knowledge of Database Development you can use in future courses.
Our tutors are just as dedicated to your success in class as you are, so they are available around the clock to assist you with questions, homework, exam preparation and any Database Development related assignments you need extra help completing.
In addition to gaining access to highly qualified tutors, you'll also strengthen your confidence level in the classroom when you work with us. This newfound confidence will allow you to apply your Database Development knowledge in future courses and keep your education progressing smoothly.
Because our college Database Development tutors are fully remote, seeking their help is easy. Rather than spend valuable time trying to find a local Database Development tutor you can trust, just call on our tutors whenever you need them without any conflicting schedules getting in the way.