~emersion/soju#234: 
Potential Postgres database speed-up: most recent message

I've been running Soju for a few months, storing messages in a Postgres database. My install has ~4.6 million messages:

soju=# select count(*) from "Message";
   count
---------
  4629046

I went through some of the code for retrieving messages from Postgres found a potential optimization for retrieving the most recent message of a given target.

To be honest - I'm not sure how often retrieving the most recent message of a target is, I haven't gotten to build a custom soju yet and see how much of an impact this makes, but it gets one query from 5 seconds down to under a second so, it may add up. At some point I'll try patching my copy with this and see how noticeable the performance improvement is.

The current query for finding the most recent message for a given target is:

	SELECT m.id FROM "Message" AS m, "MessageTarget" as t
	WHERE t.network = $1 AND t.target = $2 AND m.target = t.id
	ORDER BY m.time DESC LIMIT 1

This results in a big hash join when "Message" gets large, right now on my install this query takes about 5 seconds, here's the output of "explain analyze", I've replaced the target channel with "(my-target)":

  Limit  (cost=195466.63..195466.75 rows=1 width=12) (actual 
time=4798.091..4849.794 rows=1 loops=1)
    ->  Gather Merge  (cost=195466.63..195842.09 rows=3218 width=12) 
(actual time=4753.783..4805.483 rows=1 loops=1)
          Workers Planned: 2
          Workers Launched: 2
          ->  Sort  (cost=194466.61..194470.63 rows=1609 width=12) 
(actual time=4566.813..4566.829 rows=1 loops=3)
                Sort Key: m."time" DESC
                Sort Method: top-N heapsort  Memory: 25kB
                Worker 0:  Sort Method: top-N heapsort  Memory: 25kB
                Worker 1:  Sort Method: top-N heapsort  Memory: 25kB
                ->  Hash Join  (cost=8.18..194458.56 rows=1609 width=12) 
(actual time=31.911..4490.601 rows=181417 loops=3)
                      Hash Cond: (m.target = t.id)
                      ->  Parallel Seq Scan on "Message" m  
(cost=0.00..189365.14 rows=1930414 width=16) (actual 
time=0.710..4029.037 rows=1543032 loops=3)
                      ->  Hash  (cost=8.17..8.17 rows=1 width=4) (actual 
time=28.123..28.129 rows=1 loops=3)
                            Buckets: 1024  Batches: 1  Memory Usage: 9kB
                            ->  Index Scan using 
"MessageTarget_network_target_key" on "MessageTarget" t  
(cost=0.15..8.17 rows=1 width=4) (actual time=28.068..28.078 rows=1 
loops=3)
                                  Index Cond: ((network = 1) AND (target 
= '(my-target)'::text))
  Planning Time: 1.594 ms
  JIT:
    Functions: 37
    Options: Inlining false, Optimization false, Expressions true, 
Deforming true
    Timing: Generation 5.493 ms, Inlining 0.000 ms, Optimization 10.794 
ms, Emission 117.567 ms, Total 133.854 ms
  Execution Time: 4852.882 ms

I noticed if I manually get the target id the result is incredibly fast:

explain analyze select m.id from "Message" as m where m.target=396097 
order by m.time limit 1;

  Limit  (cost=0.43..1.63 rows=1 width=12) (actual time=0.380..0.381 
rows=1 loops=1)
    ->  Index Scan using "MessageIndex" on "Message" m  
(cost=0.43..682062.37 rows=569093 width=12) (actual time=0.371..0.371 
rows=1 loops=1)
          Index Cond: (target = 396097)
  Planning Time: 0.753 ms
  Execution Time: 0.448 ms

So, I crafted a query that forces that to happen - it looks up the MessageTarget id as a subquery, turns it into an array, and accesses the first element:

explain analyze select m.id from "Message" as m where 
m.target=(array(select t.id from "MessageTarget" as t where t.network=1 
and t.target='(my-target)'))[1] order by m.time desc limit 1;

  Limit  (cost=8.60..11.09 rows=1 width=12) (actual time=0.396..0.398 
rows=1 loops=1)
    InitPlan 1 (returns $0)
      ->  Index Scan using "MessageTarget_network_target_key" on 
"MessageTarget" t  (cost=0.15..8.17 rows=1 width=4) (actual 
time=0.188..0.191 rows=1 loops=1)
            Index Cond: ((network = 1) AND (target = 
'(my-target)'::text))
    ->  Index Scan Backward using "MessageIndex" on "Message" m  
(cost=0.43..499757.62 rows=201438 width=12) (actual time=0.395..0.395 
rows=1 loops=1)
          Index Cond: (target = ($0)[1])
  Planning Time: 0.698 ms
  Execution Time: 0.527 ms

This should give the same result as the original but be a lot faster with larger message stores:

	SELECT m.id FROM "Message" AS m
	WHERE m.target = (ARRAY(
		SELECT t.id
		FROM "MessageTarget" as t
		WHERE t.network = $1 AND t.target = $2
	))[1]
	ORDER BY m.time DESC LIMIT 1
Status
REPORTED
Submitter
~jprjr
Assigned to
No-one
Submitted
a month ago
Updated
a month ago
Labels
No labels applied.

~jprjr a month ago

Hi there, another potential speedup.

The core of the Postgres version of ListMessageLastPerTarget is essentially:

SELECT t.target, MAX(m.time) AS latest
FROM "Message" m, "MessageTarget" t
WHERE m.target = t.id and t.network = $1
GROUP BY t.target;

With additional constraints being placed based on parameters - but that seems to be the main part of the query and again, on my system takes some time:

  GroupAggregate  (cost=0.58..163824.50 rows=6 width=40) (actual 
time=872.914..6008.324 rows=24 loops=1)
    Group Key: t.target
    ->  Nested Loop  (cost=0.58..163708.67 rows=23154 width=40) (actual 
time=13.871..5188.214 rows=4462643 loops=1)
          ->  Index Scan using "MessageTarget_network_target_key" on 
"MessageTarget" t  (cost=0.15..24.26 rows=6 width=36) (actual 
time=0.070..0.198 rows=24 loops=1)
                Index Cond: (network = 1)
          ->  Index Only Scan using "MessageIndex" on "Message" m  
(cost=0.43..25267.39 rows=201335 width=12) (actual time=0.076..185.870 
rows=185943 loops=24)
                Index Cond: (target = t.id)
                Heap Fetches: 772903
  Planning Time: 0.933 ms
  JIT:
    Functions: 9
    Options: Inlining false, Optimization false, Expressions true, 
Deforming true
    Timing: Generation 2.175 ms, Inlining 0.000 ms, Optimization 0.977 
ms, Emission 12.658 ms, Total 15.811 ms
  Execution Time: 6012.178 ms
(14 rows)

If instead, a left lateral join is used:

SELECT t.target, l.latest
FROM "MessageTarget" t LEFT JOIN LATERAL (
	SELECT m.target, m.time as latest
	FROM "Message" m
	WHERE m.target = t.id
	ORDER BY m.time DESC LIMIT 1
) AS l ON t.id = l.target
WHERE t.network = $1;

You get something like this:

  Sort  (cost=19.17..19.19 rows=6 width=40) (actual time=1.483..1.487 
rows=24 loops=1)
    Sort Key: l.latest DESC
    Sort Method: quicksort  Memory: 26kB
    ->  Nested Loop Left Join  (cost=4.63..19.10 rows=6 width=40) (actual 
time=0.235..1.387 rows=24 loops=1)
          ->  Bitmap Heap Scan on "MessageTarget" t  (cost=4.20..13.67 
rows=6 width=36) (actual time=0.132..0.147 rows=24 loops=1)
                Recheck Cond: (network = 1)
                Heap Blocks: exact=1
                ->  Bitmap Index Scan on 
"MessageTarget_network_target_key"  (cost=0.00..4.20 rows=6 width=0) 
(actual time=0.102..0.103 rows=24 loops=1)
                      Index Cond: (network = 1)
          ->  Subquery Scan on l  (cost=0.43..0.89 rows=1 width=12) 
(actual time=0.049..0.049 rows=1 loops=24)
                Filter: (t.id = l.target)
                ->  Limit  (cost=0.43..0.88 rows=1 width=12) (actual 
time=0.047..0.048 rows=1 loops=24)
                      ->  Index Only Scan Backward using "MessageIndex" 
on "Message" m  (cost=0.43..90555.82 rows=201337 width=12) (actual 
time=0.046..0.046 rows=1 loops=24)
                            Index Cond: (target = t.id)
                            Heap Fetches: 19
  Planning Time: 0.686 ms
  Execution Time: 1.648 ms

Cutting the time down from around 5-6 seconds to under 2ms

Like before, I still need to create a soju build with these changes and verify it actually improves performance.

~jprjr a month ago

I implemented these queries and opened a pull request, figured I should link to it from here: https://codeberg.org/emersion/soju/pulls/7

Register here or Log in to comment, or comment via email.