As I wrote earlier in the post about SQL filter clause I like SQL. It is so compact, with a small amount of code, great results can be achieved. Once for some time, I have a revelation and I can write a query using so-called modern techniques. Recently I used
OVER PARTITION BY.
This SQL feature is related to Window Functions. It allows you to perform a calculation on other rows that are somehow connected to the current row. Seems cryptic, right?
So imagine that you have
purchases and those purchases have a
name of the sold product and date of transaction.
When you need to know how much you earn each day, you will execute the SQL query with simple aggregation.
SELECT created_at::date AS day, SUM(price) FROM purchases GROUP BY 1;
IF you need to know how much balls, hats and mugs you sold and how much you earn each day?
SELECT created_at::date AS day, name, COUNT(*), SUM(price) FROM purchases GROUP BY 1, 2;
Cool, that was easy. What a great data analyst you are! ;-)
Now lets add one more column to above result - sum of daily all earnings, no matter which product was sold.
SELECT created_at::date AS day, name, COUNT(*), SUM(price), SUM(SUM(price)) OVER (PARTITION BY created_at::date) AS daily_sum FROM purchases GROUP BY 1, 2;
Result will look like:
It it is not ideal:
daily_sumis duplicated acros rows with the same
For sure there is some more elegant solution for such use case. But this one suits my needs perfectly.
- I got the sum & count grouped by day and product name
- I got the daily sum grouped by day
- all is in one result set
OVER can be also used without
GROUP BY. Over internet you can find plenty of examples with employees salary. If you need choose then go with
GROUP BY as it is probably better optimized in your database.
I found it really handy when you need to perform some kind of aggregation but you don’t want to loose any rows from your dataset.