Couple weeks ago production database of one service I am developing and maintaining started to behave badly. There is specific hour when traffic of this page is increasing - users are starting to use the product. There was a big peak of IO operations and Amazon RDS was consuming free disk space very fast. There was no Postgresql update, not application significant update. Is just started on Monday morning.

First what we (I together with Jakub) do was to restart database on Amazon RDS. Yeah, I know … lame as hell, but it helped and gave us time to figure out what was the reason of postgres took all disk space.

After checking running queries with

SELECT datname, query_start, STATE, pid, query
FROM pg_stat_activity
WHERE STATE != 'idle'
ORDER BY query_start;

we spot couple of queries which seems easy and light at the begging SELECT COUNT(DISTINCT ...) FROM .... EXPLAIN told that this was not so easy like we thought.

What I found on psql-performance mailing list

count(distinct …) always sorts, rather than using a hash, to do its work.

And probably this was the fact that query with COUNT(DISTINCT ...) creates big temporary file on disk.

Of course Postgresql can tell you everything If you ask in proper way. Below is how you should ask about how many files have been opened so far by psql.

SELECT datname, temp_files, temp_bytes FROM pg_stat_database;

After quick fixing the query and releasing it to production we spot that database still created temp files. To be honest even more temp files, but small ones. But Postgresql is doing well with such small files.

If you want to locally check how your database behaves when a lot of temp files are created you can decrease the WORK_MEM setting by SET WORK_MEM='64kB'.

What I can advice is to monitor, or at least know how to check, number of opened files by Postgres.

Big kudos to Jakub Tokaj that helped me with this struggle.