Relational databases

Most basic concepts

  • Tables

  • Queries

  • Relations

Tables

A table is kind of like a single simple spreadsheet.

It consists of some numbers of rows each consisting of the same columns but with different values.

If you squint, the set of columns in a table is kind of like a class (defines a structure of related values) and the rows are like instances of that class.

Queries

How we find the data in tables. Expressed in SQL, the language you’ve been learning in Select Star SQL.

Simple queries query a single table and can slice things vertically, selecting certain columns, and horizontally, selecting certain rows.

More complex queries can do almost anything such as grouping.

Relations

The key thing about relational databases is that tables are related to each other.

Some related tables

Table Columns
students student_id, name, grade
courses course_id, title, subject
enrollments student_id, course_id, pass_fail

Joins

Conceptually a join is created by making the Cartesian product of all the relevant tables (pair every row int table A with every row in table B) and then filter down to only the rows in the product that we care about.

In practice the database engine will use any of a number of strategies to achieve the same end more efficiently. (Remember the records database assignment from last semester?)

Join example

Students

sqlite> select * from students;
┌────────────┬────────────┬───────┐
│ student_id │    name    │ grade │
├────────────┼────────────┼───────┤
│ 1          │ Joe Schmoe │ 12    │
│ 2          │ Sally Sue  │ 11    │
└────────────┴────────────┴───────┘

Courses

sqlite> select * from courses;
┌───────────┬──────────────────────┬──────────────────┐
│ course_id │        title         │     subject      │
├───────────┼──────────────────────┼──────────────────┤
│ 1         │ Software Engineering │ Computer Science │
│ 2         │ AP Physics C         │ Physics          │
│ 3         │ American Literature  │ English          │
└───────────┴──────────────────────┴──────────────────┘

Enrollments

sqlite> select * from enrollments;
┌────────────┬───────────┬───────────┐
│ student_id │ course_id │ pass_fail │
├────────────┼───────────┼───────────┤
│ 1          │ 1         │ 1         │
│ 2          │ 1         │ 0         │
│ 1          │ 3         │ 0         │
│ 2          │ 2         │ 0         │
└────────────┴───────────┴───────────┘

Join query

sqlite> select students.name, courses.title, pass_fail
   from students
   join courses
   join enrollments using (student_id, course_id);
┌────────────┬──────────────────────┬───────────┐
│    name    │        title         │ pass_fail │
├────────────┼──────────────────────┼───────────┤
│ Joe Schmoe │ Software Engineering │ 1         │
│ Joe Schmoe │ American Literature  │ 0         │
│ Sally Sue  │ Software Engineering │ 0         │
│ Sally Sue  │ AP Physics C         │ 0         │
└────────────┴──────────────────────┴───────────┘

Cross all three tables but only keep those rows where the student_id from students and enrollments match and also the course_id from courses and enrollments.

Normalization

Basic principle: don’t store the same data in more than one place.

For instance in the tables from the preceeding slide, we don’t store student names or course titles in the enrollment table.

That way if a student changes their name or we retitle a course, we only have to change it in one place.

Normalization is a complex topic with many “levels” of normalization.

RDBMS

Relational Database Management System

A piece of software that lets you create and use a database.

E.g. SQLite, MySQL, Postgres, Oracle, Microsoft SQL Server.