Couple weeks ago production database of one service I am developing and maintaining started to behave badly. There is specific hour when traffic of this page is increasing - users are starting to use the product. There was a big peak of IO operations and Amazon RDS was consuming free disk space very fast. There was no Postgresql update, not application significant update. Is just started on Monday morning.
First what we (I together with Jakub) do was to restart database on
Amazon RDS. Yeah, I know … lame as hell, but
it helped and gave us time to figure out what was the reason of
took all disk space.
After checking running queries with
SELECT datname, query_start, STATE, pid, query FROM pg_stat_activity WHERE STATE != 'idle' ORDER BY query_start;
we spot couple of queries which seems easy and light at the begging
SELECT COUNT(DISTINCT ...) FROM ....
EXPLAIN told that
this was not so easy like we thought.
What I found on
psql-performance mailing list
count(distinct …) always sorts, rather than using a hash, to do its work.
And probably this was the fact that query with
COUNT(DISTINCT ...) creates
big temporary file on disk.
Of course Postgresql can tell you everything If
you ask in proper way. Below is how you should ask about how many files have
been opened so far by
SELECT datname, temp_files, temp_bytes FROM pg_stat_database;
After quick fixing the query and releasing it to production we spot that database still created temp files. To be honest even more temp files, but small ones. But Postgresql is doing well with such small files.
If you want to locally check how your database behaves when a lot of temp files are
created you can decrease the
WORK_MEM setting by
What I can advice is to monitor, or at least know how to check, number of opened files by Postgres.
Big kudos to Jakub Tokaj that helped me with this struggle.