Craig Kerstiens@craigkerstiens
For as powerful as it is, SQL isn't exactly a user friendly language to work in. Debugging SQL can be slow and painful. But, you can write more legible SQL by taking advantage of one of the least used functionalities: CTEs (Common Table Expressions).
CTEs are essentially a temporary view within a query.
CTEs are named, and then can be reference later.
CTEs can build upon each other, essentially you get to create your own smaller foundational building blocks for a query just like you would in code.
There are also recursive CTEs, but we'll save that one for another day.
Lets start to take a look at them.
I'm going to start with a query for a project management tracking app that will give me all users current tasks:
WITH users_tasks AS (
SELECT
users.email,
array_agg(tasks.name) as task_list,
projects.title
FROM
users,
tasks,
project
WHERE
users.id = tasks.user_id
projects.title = tasks.project_id
GROUP BY
users.email,
projects.title
)
Using the above CTE I could query it with:
SELECT *
FROM users_tasks;
But where it gets more interesting is when you want to chain various pieces together. So while I have all tasks assigned to each user here, perhaps I want to then find which users are responsible for more than 50% of the tasks on a given project, thus being the bottleneck. To oversimplify this we could do it a couple of ways, total up the tasks for each project, and then total up the tasks for each user per project:
total_tasks_per_project AS (
SELECT
project_id,
count(*) as task_count
FROM tasks
GROUP BY project_id
),
tasks_per_project_per_user AS (
SELECT
user_id,
project_id,
count(*) as task_count
FROM tasks
GROUP BY user_id, project_id
),
Then we would want to combine and find the users that are now over that 50%:
overloaded_users AS (
SELECT tasks_per_project_per_user.user_id,
FROM tasks_per_project_per_user,
total_tasks_per_project
WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2)
)
Now let me combing all of it to get a comma separated list of tasks from all over loaded people so I can then go and figure out how to reprioritize.
--- Initial query to grab project title and tasks per user
WITH users_tasks AS (
SELECT
users.id as user_id,
users.email,
array_agg(tasks.name) as task_list,
projects.title
FROM
users,
tasks,
project
WHERE
users.id = tasks.user_id
projects.title = tasks.project_id
GROUP BY
users.email,
projects.title
),
--- Calculates the total tasks per each project
total_tasks_per_project AS (
SELECT
project_id,
count(*) as task_count
FROM tasks
GROUP BY project_id
),
--- Calculates the projects per each user
tasks_per_project_per_user AS (
SELECT
user_id,
project_id,
count(*) as task_count
FROM tasks
GROUP BY user_id, project_id
),
--- Gets user ids that have over 50% of tasks assigned
overloaded_users AS (
SELECT tasks_per_project_per_user.user_id,
FROM tasks_per_project_per_user,
total_tasks_per_project
WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2)
)
SELECT
email,
task_list,
title
FROM
users_tasks,
overloaded_users
WHERE
users_tasks.user_id = overloaded_users.user_id
CTEs won’t always be quite as performant as optimizing your SQL to be as concise as possible. But generally what they will be is more readable and thus easier to debug and collaborate with others on.
While the formatting may not look amazing on twitter, give them a shot the next time you have to write some SQL report.