SQL window functions OVER PARTITION
As I wrote earlier in the post about SQL filter clause I like SQL. It is so compact, with a small amount of code, great results can be achieved. Once for some time, I have a revelation and I can write a query using so-called modern techniques. Recently I used OVER PARTITION BY
.
This SQL feature is related to Window Functions. It allows you to perform a calculation on other rows that are somehow connected to the current row. Seems cryptic, right?
So imagine that you have purchases
and those purchases have a price
, name
of the sold product and date of transaction.
name | price | created_at |
---|---|---|
ball | 9.99 | 2020-03-30 14:00 |
hat | 19.99 | 2020-03-30 15:10 |
ball | 9.99 | 2020-03-31 04:14 |
ball | 9.99 | 2020-03-31 10:45 |
hat | 9.99 | 2020-03-31 16:08 |
cup | 4.99 | 2020-03-31 19:36 |
When you need to know how much you earn each day, you will execute the SQL query with simple aggregation.
SELECT created_at::date AS day, SUM(price)
FROM purchases
GROUP BY 1;
day | sum |
---|---|
2020-03-30 | 29.98 |
2020-03-31 | 34.96 |
IF you need to know how much balls, hats and mugs you sold and how much you earn each day?
SELECT created_at::date AS day, name, COUNT(*), SUM(price)
FROM purchases
GROUP BY 1, 2;
day | name | count | sum |
---|---|---|---|
2020-03-30 | ball | 1 | 9.99 |
2020-03-30 | hat | 1 | 19.99 |
2020-03-31 | ball | 2 | 19.98 |
2020-03-31 | hat | 1 | 19.99 |
2020-03-31 | cup | 1 | 4.99 |
Cool, that was easy. What a great data analyst you are! ;-)
Now lets add one more column to above result - sum of daily all earnings, no matter which product was sold.
SELECT
created_at::date AS day,
name,
COUNT(*),
SUM(price),
SUM(SUM(price)) OVER (PARTITION BY created_at::date) AS daily_sum
FROM purchases
GROUP BY 1, 2;
Result will look like:
day | name | count | sum | daily_sum |
---|---|---|---|---|
2020-03-30 | ball | 1 | 9.99 | 29.98 |
2020-03-30 | hat | 1 | 19.99 | 29.98 |
2020-03-31 | ball | 2 | 19.98 | 44.96 |
2020-03-31 | hat | 1 | 19.99 | 44.96 |
2020-03-31 | cup | 1 | 4.99 | 44.96 |
It it is not ideal:
daily_sum
is duplicated acros rows with the sameday
- this
SUM(SUM(price))
looks ugly
For sure there is some more elegant solution for such use case. But this one suits my needs perfectly.
- I got the sum & count grouped by day and product name
- I got the daily sum grouped by day
- all is in one result set
OVER
can be also used without GROUP BY
. Over internet you can find plenty of examples with employees salary. If you need choose then go with GROUP BY
as it is probably better optimized in your database.
I found it really handy when you need to perform some kind of aggregation but you don’t want to loose any rows from your dataset.