As I wrote earlier in the post about SQL filter clause I like SQL. It is so compact, with a small amount of code, great results can be achieved. Once for some time, I have a revelation and I can write a query using so-called modern techniques. Recently I used OVER PARTITION BY.

This SQL feature is related to Window Functions. It allows you to perform a calculation on other rows that are somehow connected to the current row. Seems cryptic, right?

So imagine that you have purchases and those purchases have a price, name of the sold product and date of transaction.

name price created_at
ball 9.99 2020-03-30 14:00
hat 19.99 2020-03-30 15:10
ball 9.99 2020-03-31 04:14
ball 9.99 2020-03-31 10:45
hat 9.99 2020-03-31 16:08
cup 4.99 2020-03-31 19:36

When you need to know how much you earn each day, you will execute the SQL query with simple aggregation.

SELECT created_at::date AS day, SUM(price)
FROM purchases
day sum
2020-03-30 29.98
2020-03-31 34.96

IF you need to know how much balls, hats and mugs you sold and how much you earn each day?

SELECT created_at::date AS day, name, COUNT(*), SUM(price)
FROM purchases
GROUP BY 1, 2;
day name count sum
2020-03-30 ball 1 9.99
2020-03-30 hat 1 19.99
2020-03-31 ball 2 19.98
2020-03-31 hat 1 19.99
2020-03-31 cup 1 4.99

Cool, that was easy. What a great data analyst you are! ;-)

Now lets add one more column to above result - sum of daily all earnings, no matter which product was sold.

  created_at::date AS day,
  SUM(SUM(price)) OVER (PARTITION BY created_at::date) AS daily_sum
FROM purchases
GROUP BY 1, 2;

Result will look like:

day name count sum daily_sum
2020-03-30 ball 1 9.99 29.98
2020-03-30 hat 1 19.99 29.98
2020-03-31 ball 2 19.98 44.96
2020-03-31 hat 1 19.99 44.96
2020-03-31 cup 1 4.99 44.96

It it is not ideal:

  • daily_sum is duplicated acros rows with the same day
  • this SUM(SUM(price)) looks ugly

For sure there is some more elegant solution for such use case. But this one suits my needs perfectly.

  • I got the sum & count grouped by day and product name
  • I got the daily sum grouped by day
  • all is in one result set

OVER can be also used without GROUP BY. Over internet you can find plenty of examples with employees salary. If you need choose then go with GROUP BY as it is probably better optimized in your database.

I found it really handy when you need to perform some kind of aggregation but you don’t want to loose any rows from your dataset.