Find tables on which Postgresql doing Sequential Scan

First let’s understand why we should find out the tables for Sequential Scan. Basically what happens is when we design the schema and create indexes, we want that almost in every case the system should use indexes for getting the data, but later somebody writes a query which dosen’t use indexes and do sequential scan and slows down your system at scale.

To solve this part we should first find out on what tables Postgresql is doing sequential scan. For this postgresql provides stats.

Lets do a query on the stats to find tables for sequential scan:

SELECT schemaname, relname,seq_scan, seq_tup_read,seq_tup_read / seq_scan as avg_seq_tup_read FROM pg_stat_all_tables WHERE seq_scan > 0 ORDER BY 5 DESC LIMIT 5;

schemaname | relname  | seq_scan | seq_tup_read | avg_seq_tup_read 
------------+----------+----------+--------------+------------------
 public     | tickets  |        3 |     24000000 |          8000000
 public     | orders5  |       11 |     50000000 |          4545454
 public     | student  |       18 |      5600010 |           311111
 pg_catalog | pg_class |    89061 |     27452103 |              308
 pg_catalog | pg_index |       35 |         1174 |               33
(5 rows)

The above query tells on which tables sequential scan is done , seq_scan provides the count of sequential scan , seq_tup_read tells the total no. of rows read by the system in all the sequential scan and avg_seq_tup_read tells on average how much rows is read by the query.

Now lets see how to figure out whether this is fine or not:

  • First list all the tables on which sequential scan should not be running – in our case lets take tickets tables
  • Now get the total no. of records in tickets table. – in our case it is – 8000000
  • Now check whether system is scanning the entire table or large portion of the table every time
    • as we can see that avg_seq_tup_read is also 8000000 which means system is scanning entire table every time
  • If the system scanning entire table and you know this table would be huge in future that means some queries on this table are actually not using the index – which is not good
  • check from postgresql logs – the queries on this table which take time (there may be a case you need to enable all query logs for postgresql for some time — generally do not enable this for long , might cause downtime in production , a ok setting would be log any query which is greater than 1 sec)
  • Then check using explain analyze whether this query is doing sequential scan – (if in doubt do not run in production system or run in very low load hours)
  • After you discovered the query try to create indexes – we have a blogs on understanding indexes and why system not using your index , you can check out these if you are unable to figure out.
Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s