Structured Query Language
Structured Query Language, often abbreviated SQL is a normalized language to process queries on a relational database. It's based on a "model" that you defined. If you need something flexible, you should look at NoSQL databases.
Relational databases are provided by a Database management system DBMS (
Système de gestion de base de données (SGBD)). You will need one to manage your databases among
You need to install one, to continue this course. Please, refer to your DBMS documentation 🤓. A temporary solution might be to use SQLiteOnline (🙄).
ENSIIE students got a some notes in the WEB section to use our school database.
SQL 89/92: Both SQL-89 and SQL-92 are quite used. There is one major difference: what we call "JOIN clauses" are not available in SQL-89.
In the second screen, you can see the usual representation of your data in a database. This is literally an Excel table with rows, columns, and entries. In the first screen, this is the UML representation, that we use when creating a database from specifications (see relational modeling).
✨ Some vocabulary ✨
|Element||Name (UML)||Name (DBMS)||Note|
|id, name, age, gender||Attributes||Attributes/Variables||(=columns)|
||Objects||Tuples/Records||(=rows). We are saying "tuples" in French too.|
(Keys, but this is a mistake)
|Keys||A primary key is a set of one or more attributes.
We can find a row in a table, given its values in the primary key.
The primary key as a whole is unique and not-null.
This is used to reference a tuple in another table (Foreign key).
Usually, we are using an artificial key
which is incrementing (+=1) each time we are adding a row.
Here, given "id=1", we know we are talking of "(1, Luna, ...)".
Note: I know that the class's name "customer" should start with an uppercase, as every class should. I should also add something before/after id (as many classes may have an attribute id, so I won't have to prefix it), and I should add something before/after name (as name is a reserved keyword otherwise, I may have to escape it later), for instance: c_id and c_name.
Zest of knowledge
This is a statement in which you are using clauses to do something on the database. A request must ends with a
;(unless this is a simple request).
This is the keywords SELECT, FROM, ... that are making a SQL request. Clauses are optional aside SELECT (even FROM is optional, when selecting a value/function). The clauses
- are declared in THIS order:
- executed in THIS order:
To make it simple, this is a table storing data. Some are saying this is a library of databases. To call a table in a schema, do
- For a user, this is the list of its tables, views, constraints
- For a database, this is the table INFORMATION_SCHEMA (TABLES, USERS, COLUMNS/DOMAINS, CHECK_CONSTRAINTS)
This word is also used for requests 🧐: this is the selected tables, and the attributes.
As the ENSIIE, we got one schema per user (ex:
prenom.nom) in which we got another schema called
public. You will store your databases in it.
DML, DDL, DCL
We are using DML/DDL/DCL, to talk about a category of clauses. This is something quite used by pros 😎.
- DML (Data Manipulation): Select, Insert, Update, Delete
- DDL (Data Definition): Create, Alter, Drop
- DCL (Data control): Grant, Revoke
☑✅ Primary and Foreign keys ☑✅
YOU MUST KNOW THIS. The primary key is explained in the previous section. This is usually one attribute called "id" which is an artificial key. A Foreign key is a set of attributes (=keys) that are taking values from another key.
For instance, a
customer(c_id, c_name, ...)will
purchase(p_id, date, description, ...)things. You will add in the table purchases an attribute
c_id(the name can be different) taking values from the table "customer". You should see this as referencing a customer inside another table.
Well, for an attribute, this is the "range" of the values that your attribute can take. It's made of the type, but sometimes you will add constraints, and all of this is what we call domain.
Before digging into the clauses, here is a list of everything that you will be supposed to know in the next sections.
SELECT * FROM information_schema.TABLES T WHERE T.TABLE_SCHEMA = 'information_schema';
- Clauses names are not case-sensitive (uppercase, lowercase, mix of both)
- Requests end with a ; (unless they are simple)
- You can put everything on the same line
|Notion (General)||In SQL|
|Assignment (PL/SQL only)||
|a % b||
|Notion (Strings)||In SQL|
|String (escape)||ex: use a quote in a quote
Note: More functions at W3Schools - SQL Server Functions.
Test a function?:
SELECT EXTRACT(DAY from '2020-03-25')
Test a function?:
SELECT CAST(15 AS VARCHAR(11))
|Notion||Type in SQL||Description|
||A string of one character|
||A string of n character|
||A string taking a non-determined number of characters|
||An int, but taking only 0 or 1|
||To represents a price, you may use
||To represents a date|
||An attribute/value that can take a fixed number of values.|
DML (Data Manipulation)
You can see a DML request, as a request returning a table. You will define in SELECT the attributes of your table, in FROM where the tuples are selected, and in WHERE some restrictions on the tuples selected.
If I'm using
Select name FROM customer, I will get a table with a column "name" and the tuples will be the values for "name" in customer (I should escape name, otherwise this may not work in every DBMS).
-- Select every attribute of a table SELECT * FROM customer; SELECT id, name, age, gender FROM customer; -- no duplicate results (each line once) SELECT DISTINCT name, gender FROM customer; -- name is renamed as 'Customer name' SELECT name as 'Customer name', id FROM customer; -- escape name SELECT `name`, id FROM customer; SELECT 5 FROM customer; -- 5 in every row SELECT 5, name FROM customer; -- (5, name) in every row SELECT age + 1 FROM customer; -- simple calculation
-- a bit complex as we are using both renaming -- and escape (name) SELECT id as 'c_id', `name`, age FROM customer;
-- a stupid request with -- 5 in the first column -- and the first letter of the name in the 2nd column SELECT DISTINCT 5 as 'GroupID', LEFT(name, 1) as 'Letter' FROM customer;
You are using columns in select. But they are coming from somewhere... That's where the
FROM comes in handy. Pick the table (for more than one, see JOINT clauses) you want to select columns (and their values) from.
-- the basic 🤓 SELECT * FROM customer; -- prefix (used later, ONLY WHEN NEEDED) SELECT customer.* FROM customer; SELECT c.* FROM customer c; SELECT c.name FROM customer c; -- this is the cartesian product, you will get -- count_lines(c1) * count_lines(c2) records -- LEARN ABOUT JOINTURES LATER SELECT * FROM customer c1, customer c2;
Most likely the most complex one. You can filter your result using this clause.
-- you can use > >= = <> (same as !=) < <= SELECT name FROM customer WHERE name <> 'Luna'; -- Henry SELECT name FROM customer WHERE name = 'Luna'; -- Luna -- && and || are working too, but we are using AND and OR Select name FROM customer WHERE name = 'Luna' OR gender <> 'Woman'; -- Both Select name FROM customer WHERE name = 'Luna' AND gender <> 'Woman'; -- None SELECT name FROM customer WHERE name IN ('Luna'); -- Luna SELECT name FROM customer WHERE name IN ('Luna', 'Henry'); -- Both SELECT name FROM customer WHERE name NOT IN ('Luna'); SELECT name FROM customer WHERE name IS NULL; -- none SELECT name FROM customer WHERE name IS NOT NULL; -- Both -- age in [18,24], same as age >= 18 AND age <= 24 SELECT name FROM customer WHERE age BETWEEN 18 AND 24;
You can use patterns too, with
_ an unknown character, and
% a string of unknown characters that may be empty. BEWARE, patterns aren't case-sensitive (=
I is the same as
-- any character followed by una SELECT name FROM customer WHERE name LIKE '_una'; -- Ending with una SELECT name FROM customer WHERE name LIKE '%una'; -- Starting with Lun SELECT name FROM customer WHERE name LIKE 'Lun%';
And you can use a nested request
-- you will use the variable c in the nested request. -- if the request return one row, then EXISTS is true SELECT name FROM customer c WHERE EXISTS (SELECT NULL); SELECT name FROM customer c WHERE NOT EXISTS (SELECT NULL); -- same as previously, but the set is generated with a request SELECT name FROM customer WHERE name IN (SELECT NULL) SELECT name FROM customer WHERE name NOT IN (SELECT NULL) -- the attribute is greater than every value returned in the nested request -- the nested request must return exactly one attribute SELECT name FROM customer c WHERE name >= ALL (SELECT NULL, NULL); -- at least one SELECT name FROM customer c WHERE name >= ANY (SELECT NULL)
A summary is needed 🧐, here you go ✨🚀.
|one attribute||one table||>, >=, <, <=, =, !=, <>|
|multiples attributes (,)||multiples tables = cartesian product||IN, NOT IN (set)|
|all (*)||multiples tables = see JOINT (later)||IS/IS NOT (NULL/...)|
|no duplicates (DISTINCT)||prefix a column||BETWEEN min and max|
|rename (... as ...)||LIKE 'pattern'|
|a value||EXISTS/NOT EXISTS (request)|
|a function||IN/NOT IN (request)|
SQL is pretty flexible (in a way). Every value can be replaced with a request returning one column and one row. A table, can be replaced with a request too. This example is simply showing that you can use requests almost everywhere 😎✨.
-- dumb example SELECT (SELECT 5) as 'five' FROM (SELECT 5) s WHERE 5 = (SELECT 5)
DML (Data Manipulation) - Useful stuff
LIMIT (Skip results, limit the number of results)
You will use LIMIT a lot, to skip the first k results (optionally), and define the max number of results n.
LIMIT n; -- return up to n records LIMIT k, n; -- skip k records, returns up to n records LIMIT 0, 1; -- up to one record LIMIT 1, 1; -- up to one record, skip the first record
ORDER BY (sort results)
You can sort your results with ORDER BY, with ASC (default=optional, A -> Z), and DESC (Z -> A)
SELECT name FROM customer c ORDER BY name; -- (ASC) Henry, Luna SELECT name FROM customer c ORDER BY name ASC; -- Henry, Luna SELECT name FROM customer c ORDER BY name DESC; -- Luna, Henry SELECT name FROM customer c ORDER BY id DESC; -- Henry (2), Luna (1) -- THIS IS QUITE USEFUL when you selected a function (ex: AVG) -- and you want to order your results using this function results SELECT name FROM customer c ORDER BY 1 DESC; -- 1 = name = 1 rst param
UNION/INTERSECT/EXCEPT on two sets of results
You can make the union, the intersection, or the difference of two requests' results, but they must have the same number of attributes in SELECT. We usually use
SELECT NULL to fill missing arguments (you could have used a number such as 20, if you wanted to fill the missing data of the second request with 20 instead of NULL).
SELECT name, age FROM customer c UNION -- you could use INTERSECT or EXCEPT SELECT name, NULL FROM customer c; -- 4 rows -- Luna (18), Henry (24) -- Luna (null), Henry (null)
Note: ORDER, or LIMIT may only be applied on the whole request.
Calculations (on all tuples, on groups of tuples)
You can only make calculations in SELECT and in a new clause HAVING (NOT in the WHERE, you may use a nested request).
-- the most used ones SELECT SUM(age) FROM customer; -- 18+24 SELECT AVG(age) FROM customer; -- (18+24)/2 SELECT MIN(age) FROM customer; -- 18 SELECT MAX(age) FROM customer; -- 24 SELECT COUNT(*) FROM customer; -- 2 = number of tuples -- you may add DISTINCT (no duplicates) -- or ALL (default, allow duplicates) SELECT COUNT(DISTINCT age) FROM customer; -- 2 (no duplicates ages) SELECT SUM(DISTINCT age) FROM customer; -- 18 + 24 SELECT SUM(ALL age) FROM customer; -- same as SUM(age)
The clause HAVING is only working on groups (ex: group by name, and check the average age per record having this name). You may create groups of one element by making groups on the primary key as the primary key is unique.
-- this request is returning the sum of age PER GENDER SELECT gender, SUM(age) FROM customer GROUP BY gender -- Woman(18), Not specified (24) -- same but we are ONLY KEEPING GROUPS for which the SUM is greater (or equals) than 20 SELECT gender, SUM(age) FROM customer GROUP BY gender HAVING SUM(age) >= 20 -- Not specified (24) -- better (' and ` are not needed if there are no spaces) SELECT gender, SUM(age) as 'Sum of age' FROM customer GROUP BY gender HAVING `Sum of age` >= 20
SELECT -- concatenate Ms/Mr/none and the name CONCAT((CASE gender WHEN 'Woman' THEN 'Ms. ' WHEN 'Man' THEN 'Mr. ' ELSE '' -- default END), name) as 'Name' FROM customer -- Ms. Luna -- Henry
An alternate form, useless here, but you can change the "=" with something else...
SELECT CONCAT((CASE WHEN gender = 'Woman' THEN 'Ms. ' WHEN gender = 'Man' THEN 'Mr. ' ELSE '' -- default END), name) as 'Name' FROM customer
DML (Data Manipulation) - Joint clause
The last time we tried to use two tables, we got the cartesian product, which is what you will get if the joint clause fails. This clause will try to merge the tables in the FROM. Given the two following tables Customer2 ("A") and Purchase ("B")
NATURAL JOIN (
Jointure Naturelle, >=SQL-92)
Cartesian product based on the columns having the same name.
-- name of every customer that made of purchase SELECT * FROM customer2 NATURAL JOIN purchase -- result: [(1, 'Luna', ..., '1', '2021-10-23'), (1, 'Luna', ..., '2', '2021-10-23')]
Beware of Natural Join!!! You may have missed the fact that you got the column 'name' (example) in both tables making your result different from what you were expecting (it happens, really...).
NATURAL JOIN before? (<=SQL-89)
We were making a cartesian product, before filtering the results in the where.
SELECT * FROM customer2 c, purchase p WHERE c.c_id = p.c_id
JOIN on an attribute (
Jointure sur un attribut)
This is called Equi-join. This idea is the same as for NATURAL JOIN, but you are picking the columns this time.
- Equi-join (attributes), >=SQL-92
SELECT * FROM customer2 JOIN purchase USING (c_id) -- merge the column c_id available in both table -- sort of better NATURAL JOIN as you are picking columns -- you may use ',' to add conditions in 'USING'
- Equi-join (criteria)
SELECT c.*, p_id, `date` FROM customer2 c JOIN purchase p ON c.c_id = p.c_id -- same result, we should use using in this case so this is more wordy than usual -- you may use 'AND' to add conditions in 'JOIN ON'
OUTER JOIN (
Until now, the row "(2, Henry, ...)" was never shown, because there was no "c_id=2" in purchases. You can show such records using OUTER JOIN.
- LEFT OUTER JOIN
SELECT * FROM customer2 c LEFT OUTER JOIN purchase p ON c.c_id = p.c_id -- p.c_id can be null SELECT * FROM customer2 c RIGHT OUTER JOIN purchase p ON c.c_id = p.c_id -- c.c_id can be null SELECT * FROM customer2 c FULL OUTER JOIN purchase p ON c.c_id = p.c_id -- either c.c_id or p.c_id can be null
You may note that you can chain jointures, for instance
/* ... */ FROM A NATURAL JOIN B NATURAL JOIN B /* ... */ -- not tested recently 😖 FROM (TABLE_A NATURAL JOIN TABLE_B) b JOIN TABLE_C ON /* ... */
DDL (Data Definition)
DCL (Data control)
CheatSheet and conventions
Avoid using reserved keywords, and spaces, use lowercase and snake case, when naming something. Naming is quite important in a DBMS, more than ever, as names will last long.
Do not prefix/postfix attributes such as "id", prefix it in complex requests. Prefix foreign keys with the whole name of the referenced table.
As always, give relevant names to indexes, views, and constraints.