Tables
Queries
Relations
A table is kind of like a single simple spreadsheet.
It consists of some numbers of rows each consisting of the same columns but with different values.
If you squint, the set of columns in a table is kind of like a class (defines a structure of related values) and the rows are like instances of that class.
How we find the data in tables. Expressed in SQL, the language you’ve been learning in Select Star SQL.
Simple queries query a single table and can slice things vertically, selecting certain columns, and horizontally, selecting certain rows.
More complex queries can do almost anything such as grouping.
The key thing about relational databases is that tables are related to each other.
Table | Columns |
---|---|
students |
student_id , name , grade |
courses |
course_id , title , subject |
enrollments |
student_id , course_id , pass_fail |
Conceptually a join is created by making the Cartesian product of all the relevant tables (pair every row int table A with every row in table B) and then filter down to only the rows in the product that we care about.
In practice the database engine will use any of a number of strategies to achieve the same end more efficiently. (Remember the records database assignment from last semester?)
sqlite> select * from students;
┌────────────┬────────────┬───────┐
│ student_id │ name │ grade │
├────────────┼────────────┼───────┤
│ 1 │ Joe Schmoe │ 12 │
│ 2 │ Sally Sue │ 11 │
└────────────┴────────────┴───────┘
sqlite> select * from courses;
┌───────────┬──────────────────────┬──────────────────┐
│ course_id │ title │ subject │
├───────────┼──────────────────────┼──────────────────┤
│ 1 │ Software Engineering │ Computer Science │
│ 2 │ AP Physics C │ Physics │
│ 3 │ American Literature │ English │
└───────────┴──────────────────────┴──────────────────┘
sqlite> select * from enrollments;
┌────────────┬───────────┬───────────┐
│ student_id │ course_id │ pass_fail │
├────────────┼───────────┼───────────┤
│ 1 │ 1 │ 1 │
│ 2 │ 1 │ 0 │
│ 1 │ 3 │ 0 │
│ 2 │ 2 │ 0 │
└────────────┴───────────┴───────────┘
sqlite> select students.name, courses.title, pass_fail
from students
join courses
join enrollments using (student_id, course_id);
┌────────────┬──────────────────────┬───────────┐
│ name │ title │ pass_fail │
├────────────┼──────────────────────┼───────────┤
│ Joe Schmoe │ Software Engineering │ 1 │
│ Joe Schmoe │ American Literature │ 0 │
│ Sally Sue │ Software Engineering │ 0 │
│ Sally Sue │ AP Physics C │ 0 │
└────────────┴──────────────────────┴───────────┘
Cross all three tables but only keep those rows where the student_id
from students
and enrollments
match and also the course_id
from courses
and enrollments
.
Basic principle: don’t store the same data in more than one place.
For instance in the tables from the preceeding slide, we don’t store student names or course titles in the enrollment table.
That way if a student changes their name or we retitle a course, we only have to change it in one place.
Normalization is a complex topic with many “levels” of normalization.
Relational Database Management System
A piece of software that lets you create and use a database.
E.g. SQLite, MySQL, Postgres, Oracle, Microsoft SQL Server.