INTRODUCTION TO SQL - A Beginner’s Guide to Structured Query Language

Learn about SQL concepts, commands, and their applications in data analysis in this beginner friendly introduction to one of the most important languages in Data Science and Analytics.
Aug 27, 2024
12 min read
Hand flipping a cube from SQL to NoSQL on a wooden surface, illustrating the choice between SQL and NoSQL databases.

Table of Contents:

1. What is SQL?

2. What is a database?

3. What is a DBMS?

4. Relational Database model Structure 

5. Functions of SQL

6. SQL Commands and their types

7. Excel vs SQL

8. Variants of SQL

What is SQL?

SQL (expanded as Structured Query Language) is a programming language designed with databases in mind. SQL was originally developed to operate on relational database management systems (RDBMS), however in recent years, the International SQL standard has also incorporated parts of Object-Oriented Database Management Systems (OODBMS), resulting in hybrid object-relational databases.

What is a Database?


In the simplest terms, a database is a structured way of storing information (data). In the digital age, computers have changed the way we have done many things, and data is no different. It could even be said that the digitalization of data has been the catalyst of digital advancement.

A database contains 2 components – Data and metadata. Metadata is the data that describes the structure of the database.

To visualize how data is stored in a database, think of a database as a set of interrelated tables – each which can be visualized with rows and columns. A record is a representation of a physical or conceptual object. For example, in a database of customers, each customer is assigned a record. Each record has multiple attributes, for example – name, address and phone number.

In a table, records are the rows, and attributes are the columns. The names, addresses etc together constitute the data.

What is a DBMS?


A Database Management System (DBMS) is software designed to store, manage, and facilitate access to databases.

Relational Database Model

Before Relationship Database Models existed, there were hierarchal and network databases. The first databases in the 1950s were hierarchal. Although structurally very simple, they suffered from structural inflexibility and redundancy issues, making modification difficult. Soon, Network databases were created to solve that problem, however these databases became structurally complex. 

Enter relational models. The relational database model was developed by Dr E.F Codd of IBM which featured minimum redundancy and an easy to understand structure. SQL was developed to operate on relational databases.

5 Key Concepts Of RDBMS & Major Differences From DBMS
Sample relational database schema for a student course registration system, demonstrating the structure and relationships between tables in a relational database.

Structure of a Relational Database

A Relational Database consists of a number of interconnected or related tables. In the above illustrated example, we see 3 tables. Each course and student is assigned a unique ID in tables assigned to them, often called “Masters”. The Student Master contains ID# as a unique identifier, or UID, and the Course Master contains ClassID. These UIDs are both referenced in the “Takes_Course” table which shows which student is taking which course, and links the Student and Courses table.

In the Course and Student tables, the UIDs cannot be repeated. These UIDs are called the Primary Key, as they can be referred to by many other tables to link them. In the “Takes Course” table, these values are no longer unique, as this table is used to show a relationship which isn’t necessarily 1:1. Here, each of the referred columns is called a “Foreign Key”, as they are linked to Primary Keys in other tables.

This is a microcosm of how an RDBMS works, often having hundreds if not thousands of tables linked to each other through primary keys and foreign keys, which provides the relationship, which gives RDBMS its name.

MySQL Sample Database
 Entity-relationship (ER) diagram showing the structure of a business database, with tables like products, customers, orders, and payments interconnected to illustrate data relationships.

Above is a slightly more complex relational database with more tables. In practice, the number of tables can often be in the hundreds or even thousands with potentially billions of records.

Functions of SQL

Data Retrieval

SQL is primarily used to access data stored in a database.

Storage 

SQL is used to store and manipulate data in databases 

Database Management

SQL is used to create databases and tables, alter their structure such as by adding or removing columns, or dropping databases and tables entirely.

Data Control

SQL is used to modify the amount of control different users can have over the database,

Data Analysis

Through various built-in functions, such as aggregate functions, SQL can perform calculations on data 

Data Integrity

SQL allows you to add constraints to data to enforce data integrity, such as ensuring a UID isn’t repeated accidentally.

Reporting

SQL can be used to generate reports by extracting and summarizing data which can be further formatted for presentation or analysis.

In summary, SQL is an essential tool for any task that involves managing, manipulating, or analyzing data in a relational database. Its versatility and power make it the standard language for database management across various industries.

Types of SQL Commands

SQL commands are divided into 5 types

DDL – Data definition language – This is used to alter the structure of the tables. This includes adding or deleting columns, also called fields or metadata, or modifying their names and characteristics. 

DML – Data manipulation language – These commands used to add, remove or modify the data entered into the table, rather than the table itself.

DCL – Data control language – Each database is accessible to a number of users who have varying degrees of access or control over the data. DCL is used by the database administrator to grant, or revoke specified permissions.

TCL – Transaction Control Language – SQL commands can either be “committed” automatically, or manually. Commitment simply means applying the changes to the database given in the commands. When manual commitment is used, TCL commands are used to either commit or undo the commands, or “transactions.”

DQL – Data Query Language – These commands are used to fetch data from the database.

Diagram showing SQL command categories such as DDL, DML, DCL, TCL, and DQL, each with corresponding SQL commands
Diagram showing SQL command categories such as DDL, DML, DCL, TCL, and DQL, each with corresponding SQL commands

For a more detailed guide to SQL commands and functions, please check out our <Basic SQL Queries> article 

EXCEL vs SQL

Excel is the most widely used data analytics tool, for good reason. It is simple, has a wide knowledge base, and is visually intuitive without requiring much use of coding, in a spreadsheet format. You can do almost anything on Excel that you can do with SQL. So why use SQL?

While Excel can handle thousands of rows, there comes a stage when that is not enough. Excel can only hold “only” about a million rows of data. You won’t even come close to this number before noticing a severe slowdown in calculations.

There are of course, other considerations:

  • SQL is designed to be a relational database management system. While Excel can also link tables and worksheets, this is not its strength.
  • Excel is prone to human error. You could easily type over a critical formula.
  • You cannot use the same spreadsheet as others and restrict them from seeing critical data.

Whether you’re considering learning SQL online or Excel, we offer free courses for <both>.

SQL Server, PostgreSQL, MySQL... what's the difference? Where do I start?

There are many dialects of the SQL language, each with slight differences in syntax. So, what differentiates them? 

Line chart comparing the popularity of SQL database tags on Stack Overflow over time, with trends for MySQL, SQL Server, PostgreSQL, and others.

Before we look at the differences, we must understand the distinction between RDBMS and a dialect.

A relational database management system (RDBMS) is a program used to create, update, and manage relational databases. Examples include: MS SQL Server Management Studio, Oracle DBMS, MySQL, PostgreSQL, Microsoft Access, MariaDB etc.


A dialect is a modification of the standard SQL language used by each of these RDBMS to perform their functions.

T-SQL is Microsoft’s proprietary extension to the SQL language. It is used in Microsoft’s own SQL Server. Traditionally, Microsoft had dominated the Business Intelligence market with products like Excel, PowerBI and Access. SQL Server was no different, however the rise of MySQL catapulted Oracle’s open source extension of the SQL language into the front seat. MS Access is also owned by Microsoft, and is used for businesses on a much smaller scale than MS Sql Server.


MySQL is a free and open source RDBMS, which uses the dialect of the same name. MySQL was released in 1995 by Swedish company MYSQL AB, which was subsequently taken over by Oracle.

While Oracle continues to support MySQL, they have also created a proprietary  RDBMS, also called Oracle, which is licensed for commercial purposes. However, Oracle does have an express edition, designed for students only.

MariaDB was created in 2009 as a fork of the MySQL database management system, to ensure that the MySQL codebase would be free  forever.

PostgreSQL is another open source SQL DBMS that has adopted a hybrid object-relational database management system. PostgreSQL supports a lot more data types and supports custom data types.

NoSQL

NoSQL databases, also known as “Not Only SQL” databases store data differently. NoSQL databases are designed to be flexible and can store large amounts of semi-structured and unstructured data. The best known example of a NoSQL platform is MongoDB.

Comparison table of popular database management systems (DBMS), showing differences in scalability, data types, and licensing for systems like MySQL, PostgreSQL, MongoDB, and Oracle.

Conclusion

SQL is a powerful and flexible language for managing and interacting with relational databases. It allows users to perform a wide range of operations, from simple data retrieval to complex transactions and database management. Understanding SQL is crucial for working with databases, whether for development, data analysis, or database administration.

SQL has become an integral part of the data industry today. In a time where the dataset sizes have dwarfed the optimum levels required for Excel, SQL opens many career opportunities such as Data analysts, server developers, business analysts, web designers and a whole lot more.

To get started, you can check out our <SQL Course>, Introduction to SQL: A Beginner’s Guide to Structured Query Language

SIMILAR BLOGS

Interested in Writing for Us?

Share your expertise, inspire others, and join a community of passionate writers. Submit your articles on topics that matter to our readers. Gain visibility, grow your portfolio, and make an impact.
Join Now