~mil/mobroute-tickets#96: 
Mobroute: Switch to SQLite RTree for calculating generated transfers table (_vtransfersgen)

Currently the slowest part of the load/compute cycle for most feeds (especially on mobile & resource constrained devices) is creating the generated transfers table. This table is essentially a self-join of stops and filtering based on 3km max distance. The self join component itself is very expensive. This was partially optimized with the 0.5 release in which the core perf was dramatically improved by using a 'coarse' heuristic to determine which stops to consider as potential transfers rather than using haversine distance for all stop combinations. That said, further work can be done based on using RTree for range queries which is included in SQLite by default (https://www.sqlite.org/rtree.html) and is essentially built for such as usecase as range queries for 'nearby' stops. A common piece of user-feedback is compute is slow (#79, #80, #82, #19) and switching to rtree would likely solve most of these issues. Also OOM (#69) likely has to do with this as well & rtree would likely solve this.

In initial benchmarking, using rtree would result in a dramatic perf improvement. For benchmarking, use .timer on in sqlite CLI and see the comparison of the below queries.

Current SQL Logic - e.g. basis of current _vtransfersgen view (constant 0.03 used for simplicity / comparison):

select count(*) from (
with rough_stop_bounds as (
  select
    feed_id,
    rowid as rid,
    stop_id,
    stop_lat,
    stop_lon,
    stop_lat - 0.03 as min_lat,
    stop_lat + 0.03 as max_lat,
    stop_lon - 0.03 as min_lon,
    stop_lon + 0.03 as max_lon
  from
    stops
)
select *
from rough_stop_bounds sb
  join stops s
    on (s.stop_lat between sb.min_lat and sb.max_lat)
    and (s.stop_lon between sb.min_lon and sb.max_lon)
    and (s.feed_id < sb.feed_id or (s.feed_id = sb.feed_id))
)

POC using rtree:

create virtual table stops_ridx using rtree(
   id int,
   lon_min real, lon_max real,
   lat_min real, lat_max real,
   +stop_id text
);

insert into stops_ridx (stop_id, lat_min, lat_max, lon_min, lon_max)
select stop_id, stop_lat, stop_lat, stop_lon, stop_lon from stops;

select count(*)
  from stops_ridx r1
  join stops_ridx r2
on 
  r1.lon_min between r2.lon_min - 0.03 and r2.lon_max + 0.03 and
  r1.lat_min between r2.lat_min - 0.03 and r2.lat_max + 0.03

Benchmarking (feed id 1830, 11k stops):

-- Current implementation
7325326
Run Time: real 2.763 user 2.759060 sys 0.003309

-- Rtree
7326455
Run Time: real 0.579 user 0.366429 sys 0.184305

Benchmarking (feed id 510, 3k stops):

-- Current implementation
1129560
Run Time: real 0.397 user 0.396043 sys 0.000194

-- Rtree
1129853
Run Time: real 0.060 user 0.049771 sys 0.009884

Benchmarking (feed id 1077, 70k stops):

-- Current implementation
47033816
Run Time: real 23.182 user 23.133878 sys 0.037290

-- Rtree
47034481
Run Time: real 2.121 user 1.884138 sys 0.219073
Status
REPORTED
Submitter
~mil
Assigned to
No-one
Submitted
17 days ago
Updated
15 days ago
Labels
high-priority mobroute pending-release performance release:0.8