5 August 2024

Some notes on the SQL GROUP BY statement

by Jacob Dichter

August 5, 2024

I was reviewing some StrataScratch SQL questions and found I still lacked a good intuition for writing SQL even though logically parsing someone else’s written SQL statements seemed straightforward.

SELECT 
    user_id,
    ROUND(SUM(listen_duration) / 60) AS total_listen_duration,
    COUNT(DISTINCT song_id) AS unique_song_count
FROM 
    listening_habits
GROUP BY 
    user_id;

The blog post will continue with notes on GROUP BY relatedness to a for-loop in programming, and the importance of understanding the SQL order of execution.

11/23/24

I wanted to add a few more notes for this same idea arose when querying my US Census trade data. First, I want to count the records for a given state:

SELECT COUNT(*)
FROM state_exports
WHERE state = 'AR';

Simple - this returns a single value representing the numbers of rows that satify the WHERE condition, regardless of null values. Next I wanted to know how many unique commodities each state had. I could get the value for a single state:

SELECT COUNT(DISTINCT(export_commodity))
FROM state_exports
WHERE state = 'AR'

But how do I get this result for each state?

We use the GROUP BY clause. We add the state variable to the SELECT clause and to the GROUP BY clause.

SELECT state, COUNT(DISTINCT(export_commodity))
FROM state_exports
GROUP BY state
tags: