Introduction to SQL
SQL is a very important language we used in Relational Database management system. Writing efficient SQL query can help us do lots of things.
Relational model
In the beginning, I would like to talk about relational model. In short, it is a table with different columns. In a more specific definition, it is an approach to managing data using a structure and language consistent with first-order predicate logic. A relation is an unordered set and it contains the relationship of attributes. The primary key in each relation is unique and we could use it to find any relation. A tuple is a set of attribute values in the relation.
SQL language and Relational algebra
There are several important operations in the Relational algebra and I will introduce them with SQL operations.
Select
This operation acts a filter, which can help us get a subset of the tuples we want. We can express the condition to filter the data with a selection predicate. We can also combine multiple predicates using AND.
In SQL, we can use the following command to execute SELECT. We put the predicates after where clause.
SELECT * FROM table WHERE x>10;
Projection
This operation can generate a relation with tuples that contains only the specified attributes. It acts as a subset operation. In is also a very common operation in the SQL
In SQL, we can use the following command to execute Projection. We put attributes after SELECT clause.
SELECT a1,a2 FROM table;
UNION/INTERSECTION/DIFFERENCE
These three operations are not very common in SQL. It can get results from two relations.
Union generates a relation that contains all tuples that appear in either only one or both of the input relations. Intersection generates a relation that contains only the tuples that appear in both of the input relations. Difference generates a relation that contains only the tuples that appear in the first and not the second of the input relations.
(SELECT * FROM R) UNION (SELECT * FROM S);
(SELECT * FROM R) INTERSECT (SELECT * FROM S);
(SELECT * FROM R) EXCEPT (SELECT * FROM S);
PRODUCT/JOIN
The product will generate a relation that contains all possible combinations of tuples from the input relations. If we had M relations in table R and N relations in table S, we will have M*N results after the PRODUCT.
JOIN is to generate a relation that contains all tuples that are a combination of two tuples. There are also some other various kind of JOIN. Inner JOIN is the same as JOIN.
LEFT JOIN returns all records from the left table (table1), and the matched records from the right table (table2).
OUTER JOIN return all records when there is a match in either left (table1) or right (table2) table records.
SELECT * from table1 INNER/LEFT/OUTER JOIN table2 on table1.attr1 = table2.attr2
Advanced SQL
Aggregations + Group By
There are many aggregate functions, including the following operations:
- AVG(col) Return the average col value.
- MIN(col) Return minimum col value.
- MAX(col) Return maximum col value.
- SUM(col) Return sum of values in col.
- COUNT(col) Return # of values for col.
We could also use DISTINCT keyword for COUNT, SUM, AVG.
If we want to get aggregate results based on different groups, we can use group by
to calculate aggregates against each subset.
If we want to filter the result based on aggregation computation, we can use HAVING attr > x
after group by
.
OUTPUT control
For output redirection, we can use INTO table
to store query result to another table. Or CREATE TABLE A (select ....)
can do the same thing.
In order to get ORDER result, we can use ORDER BY COL
and use ASC|DESC
to control the order.
We can also use LIMIT
keyword to control the offset for output.
Nested Queries
We can also use a query inside a clause. For example, we can put a select query after WHERE. select a from b where x in (select x from...)
. Here there are 4 operators we can use.
- ALL→ Must satisfy expression for all rows in subquery
- ANY→ Must satisfy expression for at least one row in sub-query.
- IN→ Equivalent to ‘=ANY()’ .
- EXISTS→ At least one row is returned.
Common Table Expressions
We can use this as a temporary table.
WITH cteSource (maxId) AS ( SELECT MAX(sid) FROM enrolled
),
X(Y) AS (SELECT ....)
SELECT name FROM student, cteSource WHERE student.sid = cteSource.maxId