SQL Course

Structured Query Language

Structured Query Language, often abbreviated SQL is a normalized language to process queries on a relational database. It's based on a "model" that you defined. If you need something flexible, you should look at NoSQL databases.

Relational databases are provided by a Database management system DBMS (Système de gestion de base de données (SGBD)). You will need one to manage your databases among

  • Oracle
  • PostgreSQL
  • MariaDB
  • MySQL
  • SQLite
  • ...

You need to install one, to continue this course. Please, refer to your DBMS documentation 🤓. A temporary solution might be to use SQLiteOnline (🙄).

ENSIIE students got a some notes in the WEB section to use our school database.

SQL 89/92: Both SQL-89 and SQL-92 are quite used. There is one major difference: what we call "JOIN clauses" are not available in SQL-89.


Relational model

In the second screen, you can see the usual representation of your data in a database. This is literally an Excel table with rows, columns, and entries. In the first screen, this is the UML representation, that we use when creating a database from specifications (see relational modeling).

https://cdn.jsdelivr.net/gh/lgs-games/memorize/info/sql/./images/table_modeling.png

https://cdn.jsdelivr.net/gh/lgs-games/memorize/info/sql/./images/table_view.png

Some vocabulary

Element Name (UML) Name (DBMS) Note
Customer Class Table
id, name, age, gender Attributes Attributes/Variables (=columns)
(1, "Luna", 18, "Woman"), ... Objects Tuples/Records (=rows). We are saying "tuples" in French too.
id Identifiers
(Keys, but this is a mistake)
Keys A primary key is a set of one or more attributes.
We can find a row in a table, given its values in the primary key.
The primary key as a whole is unique and not-null.
This is used to reference a tuple in another table (Foreign key).
Usually, we are using an artificial key
which is incrementing (+=1) each time we are adding a row.
Here, given "id=1", we know we are talking of "(1, Luna, ...)".

Note: I know that the class's name "customer" should start with an uppercase, as every class should. I should also add something before/after id (as many classes may have an attribute id, so I won't have to prefix it), and I should add something before/after name (as name is a reserved keyword otherwise, I may have to escape it later), for instance: c_id and c_name.


Zest of knowledge

Request

This is a statement in which you are using clauses to do something on the database. A request must ends with a ; (unless this is a simple request).

Clause

This is the keywords SELECT, FROM, ... that are making a SQL request. Clauses are optional aside SELECT (even FROM is optional, when selecting a value/function). The clauses

  • are declared in THIS order: SELECT > FROM > WHERE > GROUP BY > HAVING > ORDER BY > LIMIT
  • executed in THIS order: FROM > WHERE > GROUP BY > HAVING > SELECT > ORDER BY > LIMIT

Schema

To make it simple, this is a table storing data. Some are saying this is a library of databases. To call a table in a schema, do schema.table.

  • For a user, this is the list of its tables, views, constraints
  • For a database, this is the table INFORMATION_SCHEMA (TABLES, USERS, COLUMNS/DOMAINS, CHECK_CONSTRAINTS)

This word is also used for requests 🧐: this is the selected tables, and the attributes.

As the ENSIIE, we got one schema per user (ex: prenom.nom) in which we got another schema called public. You will store your databases in it.

DML, DDL, DCL

We are using DML/DDL/DCL, to talk about a category of clauses. This is something quite used by pros 😎.

  • DML (Data Manipulation): Select, Insert, Update, Delete
  • DDL (Data Definition): Create, Alter, Drop
  • DCL (Data control): Grant, Revoke

☑✅ Primary and Foreign keys ☑✅

YOU MUST KNOW THIS. The primary key is explained in the previous section. This is usually one attribute called "id" which is an artificial key. A Foreign key is a set of attributes (=keys) that are taking values from another key.

For instance, a customer(c_id, c_name, ...) will purchase(p_id, date, description, ...) things. You will add in the table purchases an attribute c_id (the name can be different) taking values from the table "customer". You should see this as referencing a customer inside another table.

Domain

Well, for an attribute, this is the "range" of the values that your attribute can take. It's made of the type, but sometimes you will add constraints, and all of this is what we call domain.


SQL Syntax

Before digging into the clauses, here is a list of everything that you will be supposed to know in the next sections.

SELECT * 
FROM information_schema.TABLES T
WHERE T.TABLE_SCHEMA = 'information_schema';
  • Clauses names are not case-sensitive (uppercase, lowercase, mix of both)
  • Requests end with a ; (unless they are simple)
  • You can put everything on the same line

More specifically

Notion (General) In SQL
Assignment (PL/SQL only) a = 5, or à := 5
Comments -- comment or /* comment */
a % b MOD(a,b)
Reserved words Select date [...] ❌ (date, name, ... are reserved)
Select `date` [...]
convert/cast
  • Simple CAST
    CAST(value as new_type)
  • Extract something from a date
    EXTRACT(element from some_date)
    With element YEAR, MONTH, DAY, HOUR, ...
Notion (Strings) In SQL
String 'a' or "a" (the latter may not work)
String (escape) ex: use a quote in a quote '\''
Concatenate 'a' || 'b'
Extract chars LEFT(string, count) or RIGHT(string, count)
Others STRCMP(str1, str2), LENGTH('str'), LOWER('str'), UPPER('str')

Note: More functions at W3Schools - SQL Server Functions.
Test a function?: SELECT EXTRACT(DAY from '2020-03-25')
Test a function?: SELECT CAST(15 AS VARCHAR(11))

Types
Notion Type in SQL Description
Char varchar(1) A string of one character
String varchar(n)
varchar2(n) (Oracle)
A string of n character
String text A string taking a non-determined number of characters
Integer int, int(n), number An Integer. int is the same as int(11).
Boolean int(1) An int, but taking only 0 or 1
Real float or real To represents a price, you may use decimal(n,p)
date date To represents a date
Enumeration enum('v1', 'v2', ...) An attribute/value that can take a fixed number of values.

DML (Data Manipulation)

You can see a DML request, as a request returning a table. You will define in SELECT the attributes of your table, in FROM where the tuples are selected, and in WHERE some restrictions on the tuples selected.

Select (Projection)

If I'm using Select name FROM customer, I will get a table with a column "name" and the tuples will be the values for "name" in customer (I should escape name, otherwise this may not work in every DBMS).

-- Select every attribute of a table
SELECT * FROM customer;
SELECT id, name, age, gender FROM customer;

-- no duplicate results (each line once)
SELECT DISTINCT name, gender FROM customer;

-- name is renamed as 'Customer name'
SELECT name as 'Customer name', id FROM customer;
-- escape name
SELECT `name`, id FROM customer;

SELECT 5 FROM customer; -- 5 in every row
SELECT 5, name FROM customer; -- (5, name) in every row
SELECT age + 1 FROM customer; -- simple calculation
-- a bit complex as we are using both renaming
-- and escape (name)
SELECT id as 'c_id', `name`, age FROM customer;

Output: Select SQL output

-- a stupid request with
-- 5 in the first column
-- and the first letter of the name in the 2nd column
SELECT DISTINCT 5 as 'GroupID',
                LEFT(name, 1) as 'Letter'
FROM customer;

Output: Select SQL output

From

You are using columns in select. But they are coming from somewhere... That's where the FROM comes in handy. Pick the table (for more than one, see JOINT clauses) you want to select columns (and their values) from.

-- the basic 🤓
SELECT * FROM customer;
-- prefix (used later, ONLY WHEN NEEDED)
SELECT customer.* FROM customer;
SELECT c.* FROM customer c;
SELECT c.name FROM customer c;

-- this is the cartesian product, you will get
-- count_lines(c1) * count_lines(c2) records
-- LEARN ABOUT JOINTURES LATER
SELECT * FROM customer c1, customer c2;
Where (Restriction)

Most likely the most complex one. You can filter your result using this clause.

-- you can use > >= = <> (same as !=) < <=
SELECT name FROM customer WHERE name <> 'Luna'; -- Henry
SELECT name FROM customer WHERE name = 'Luna'; -- Luna

-- && and || are working too, but we are using AND and OR
Select name FROM customer WHERE name = 'Luna' OR gender <> 'Woman'; -- Both
Select name FROM customer WHERE name = 'Luna' AND gender <> 'Woman'; -- None

SELECT name FROM customer WHERE name IN ('Luna'); -- Luna
SELECT name FROM customer WHERE name IN ('Luna', 'Henry'); -- Both
SELECT name FROM customer WHERE name NOT IN ('Luna');

SELECT name FROM customer WHERE name IS NULL; -- none
SELECT name FROM customer WHERE name IS NOT NULL; -- Both

-- age in [18,24], same as age >= 18 AND age <= 24
SELECT name FROM customer WHERE age BETWEEN 18 AND 24;

You can use patterns too, with _ an unknown character, and % a string of unknown characters that may be empty. BEWARE, patterns aren't case-sensitive (=I is the same as i).

-- any character followed by una
SELECT name FROM customer WHERE name LIKE '_una';
-- Ending with una
SELECT name FROM customer WHERE name LIKE '%una';
-- Starting with Lun
SELECT name FROM customer WHERE name LIKE 'Lun%';

And you can use a nested request

-- you will use the variable c in the nested request.
-- if the request return one row, then EXISTS is true
SELECT name FROM customer c WHERE EXISTS (SELECT NULL);
SELECT name FROM customer c WHERE NOT EXISTS (SELECT NULL);

-- same as previously, but the set is generated with a request
SELECT name FROM customer WHERE name IN (SELECT NULL)
SELECT name FROM customer WHERE name NOT IN (SELECT NULL)

-- the attribute is greater than every value returned in the nested request
-- the nested request must return exactly one attribute
SELECT name FROM customer c WHERE name >= ALL (SELECT NULL, NULL);
-- at least one
SELECT name FROM customer c WHERE name >= ANY (SELECT NULL)

A summary is needed 🧐, here you go ✨🚀.

Select From Where
one attribute one table >, >=, <, <=, =, !=, <>
multiples attributes (,) multiples tables = cartesian product IN, NOT IN (set)
all (*) multiples tables = see JOINT (later) IS/IS NOT (NULL/...)
no duplicates (DISTINCT) prefix a column BETWEEN min and max
rename (... as ...) LIKE 'pattern'
a value EXISTS/NOT EXISTS (request)
a function IN/NOT IN (request)
ALL/ANY

SQL is pretty flexible (in a way). Every value can be replaced with a request returning one column and one row. A table, can be replaced with a request too. This example is simply showing that you can use requests almost everywhere 😎✨.

-- dumb example
SELECT (SELECT 5) as 'five' FROM (SELECT 5) s WHERE 5 = (SELECT 5) 

DML (Data Manipulation) - Useful stuff

LIMIT (Skip results, limit the number of results)

You will use LIMIT a lot, to skip the first k results (optionally), and define the max number of results n.

LIMIT n; -- return up to n records
LIMIT k, n; -- skip k records, returns up to n records
LIMIT 0, 1; -- up to one record
LIMIT 1, 1; -- up to one record, skip the first record
ORDER BY (sort results)

You can sort your results with ORDER BY, with ASC (default=optional, A -> Z), and DESC (Z -> A)

SELECT name FROM customer c ORDER BY name; -- (ASC) Henry, Luna
SELECT name FROM customer c ORDER BY name ASC; -- Henry, Luna
SELECT name FROM customer c ORDER BY name DESC; -- Luna, Henry
SELECT name FROM customer c ORDER BY id DESC; -- Henry (2), Luna (1)
-- THIS IS QUITE USEFUL when you selected a function (ex: AVG)
-- and you want to order your results using this function results
SELECT name FROM customer c ORDER BY 1 DESC; -- 1 = name = 1 rst param
UNION/INTERSECT/EXCEPT on two sets of results

You can make the union, the intersection, or the difference of two requests' results, but they must have the same number of attributes in SELECT. We usually use SELECT NULL to fill missing arguments (you could have used a number such as 20, if you wanted to fill the missing data of the second request with 20 instead of NULL).

SELECT name, age FROM customer c
UNION -- you could use INTERSECT or EXCEPT 
SELECT name, NULL FROM customer c;
-- 4 rows
-- Luna (18), Henry (24)
-- Luna (null), Henry (null)

Note: ORDER, or LIMIT may only be applied on the whole request.

Calculations (on all tuples, on groups of tuples)

You can only make calculations in SELECT and in a new clause HAVING (NOT in the WHERE, you may use a nested request).

-- the most used ones
SELECT SUM(age) FROM customer; -- 18+24
SELECT AVG(age) FROM customer; -- (18+24)/2
SELECT MIN(age) FROM customer; -- 18
SELECT MAX(age) FROM customer; -- 24
SELECT COUNT(*) FROM customer; -- 2 = number of tuples
-- you may add DISTINCT (no duplicates)
-- or ALL (default, allow duplicates)
SELECT COUNT(DISTINCT age) FROM customer; -- 2 (no duplicates ages)
SELECT SUM(DISTINCT age) FROM customer; -- 18 + 24
SELECT SUM(ALL age) FROM customer; -- same as SUM(age)

The clause HAVING is only working on groups (ex: group by name, and check the average age per record having this name). You may create groups of one element by making groups on the primary key as the primary key is unique.

-- this request is returning the sum of age PER GENDER
SELECT gender, SUM(age) FROM customer 
GROUP BY gender -- Woman(18), Not specified (24)

-- same but we are ONLY KEEPING GROUPS for which the SUM is greater (or equals) than 20
SELECT gender, SUM(age) FROM customer 
GROUP BY gender HAVING SUM(age) >= 20 -- Not specified (24)

-- better (' and ` are not needed if there are no spaces)
SELECT gender, SUM(age) as 'Sum of age' FROM customer 
GROUP BY gender HAVING `Sum of age` >= 20
Conditional select
SELECT
	-- concatenate Ms/Mr/none and the name
    CONCAT((CASE gender
                WHEN 'Woman' THEN 'Ms. '
                WHEN 'Man' THEN 'Mr. '
                ELSE '' -- default
        END), name) as 'Name'
FROM customer
-- Ms. Luna
-- Henry

An alternate form, useless here, but you can change the "=" with something else...

SELECT
    CONCAT((CASE WHEN gender = 'Woman' THEN 'Ms. '
                WHEN gender = 'Man' THEN 'Mr. '
                ELSE '' -- default
        END), name) as 'Name'
FROM customer

DML (Data Manipulation) - Joint clause

The last time we tried to use two tables, we got the cartesian product, which is what you will get if the joint clause fails. This clause will try to merge the tables in the FROM. Given the two following tables Customer2 ("A") and Purchase ("B")

Table A Table B

NATURAL JOIN (Jointure Naturelle, >=SQL-92)

Cartesian product based on the columns having the same name.

-- name of every customer that made of purchase
SELECT * FROM customer2 NATURAL JOIN purchase
-- result: [(1, 'Luna', ..., '1', '2021-10-23'), (1, 'Luna', ..., '2', '2021-10-23')]

Beware of Natural Join!!! You may have missed the fact that you got the column 'name' (example) in both tables making your result different from what you were expecting (it happens, really...).

NATURAL JOIN before? (<=SQL-89)

We were making a cartesian product, before filtering the results in the where.

SELECT * FROM customer2 c, purchase p
WHERE c.c_id = p.c_id
JOIN on an attribute (Jointure sur un attribut)

This is called Equi-join. This idea is the same as for NATURAL JOIN, but you are picking the columns this time.

  • Equi-join (attributes), >=SQL-92
SELECT * FROM customer2 JOIN purchase USING (c_id)
-- merge the column c_id available in both table
-- sort of better NATURAL JOIN as you are picking columns
-- you may use ',' to add conditions in 'USING'
  • Equi-join (criteria)
SELECT c.*, p_id, `date` FROM customer2 c JOIN purchase p ON c.c_id = p.c_id
-- same result, we should use using in this case so this is more wordy than usual
-- you may use 'AND' to add conditions in 'JOIN ON'
OUTER JOIN (Jointure externe)

Until now, the row "(2, Henry, ...)" was never shown, because there was no "c_id=2" in purchases. You can show such records using OUTER JOIN.

  • LEFT OUTER JOIN
SELECT * FROM customer2 c LEFT OUTER JOIN purchase p
	ON c.c_id = p.c_id -- p.c_id can be null 
SELECT * FROM customer2 c RIGHT OUTER JOIN purchase p
	ON c.c_id = p.c_id -- c.c_id can be null 
SELECT * FROM customer2 c FULL OUTER JOIN purchase p
	ON c.c_id = p.c_id -- either c.c_id or p.c_id can be null 

You may note that you can chain jointures, for instance

 /* ... */ FROM A NATURAL JOIN B NATURAL JOIN B /* ... */
-- not tested recently 😖
FROM (TABLE_A NATURAL JOIN TABLE_B) b JOIN TABLE_C ON /* ... */

DDL (Data Definition)

...


DCL (Data control)

...


Advanced concepts

...


CheatSheet and conventions

...

  • Conventions

Avoid using reserved keywords, and spaces, use lowercase and snake case, when naming something. Naming is quite important in a DBMS, more than ever, as names will last long.

Do not prefix/postfix attributes such as "id", prefix it in complex requests. Prefix foreign keys with the whole name of the referenced table.

As always, give relevant names to indexes, views, and constraints.


Sources

French