~mil/mobroute-tickets#19: 
Mobroute: Excessive-sized GTFS Archives Performance

Problem diagnosis:

Performance on very large GTFS archives (typically aggregates from multiple providers) should be addressed. Typical 50-100MB stoptimes isn't a problem; In excess of 500MB there are very apparent problems.

An MDBID to test is 1078 which is Norway's aggregate (500MB Zip, 1GB stop_times tables extracted). Load process through Mobsql goes ok - compute cycle for creating _cvconnd takes >1hr on a i7 quadcore. There are mainly 3 bottlenecks from compute cycle: _vstoptoparent (now resolved), _vconnd, and _vtransfersgen.

Current potential approaches / solutions

  • Optional generated transfers (#43)
  • Use gtfstidy prior to import (#83)
  • Further optimize _vconnd view
  • Further optimize _vtransfersgen view
  • Further optimize CSV loading via mobsql

Test Feeds:

  • 1078: Norway Aggregate (500MB Zip, 1GB stoptimes)
  • 1077: Netherlands Aggregate (200MB Zip, 800MB stoptimes)
  • 1090: Germany Aggregate (200MB Zip, 1.3GB stoptimes)
Status
REPORTED
Submitter
~mil
Assigned to
No-one
Submitted
8 months ago
Updated
a month ago
Labels
mobroute performance

~mil 8 months ago

One idea is to figure out someway to automatically segment larger GTFS feeds; for example if a route only is within one small locale of a larger locale it makes sense to only load {stop,connections,routes} for the smaller locale. This does very much increase routing logic complexity so architecture would need to be fully thought out before implementation.

~mil 7 months ago*

Bare minimum euclidian distance for benchmarking tests for reworking _vtransfersgen:

select count(*) as total_distances_less_than_5km
from (
  select v1.source, v2.stop_id,
  sqrt(power(v1.stop_lat - v2.stop_lat, 2) + power(v1.stop_lon - v2.stop_lon, 2)) as euclidean_distance
  from stops v1
  join stops v2
) as dists where euclidean_distance < 5;

~mil closed duplicate ticket #39 7 months ago

~whynothugo closed duplicate ticket #39 7 months ago

~mil 7 months ago*

Sample routes to test for 1077 (functional once generated transfers wholesale disabled):

./mobroute route -p '{load: {filter: {mdbid: [1077]}}, from: {lat: 52.3791, lon: 4.8999}, to: {lat: 52.0894, lon: 5.1130}, output_formats: [legs, diagnostics, mapurl]}'
./mobroute route -p '{load: {filter: {mdbid: [1077]}}, from: {lat: 52.07272, lon: 4.3227}, to: {lat: 52.2281, lon: 5.1692}, output_formats: [legs, diagnostics, mapurl]}'

Example diagnostics info:

"diagnostics": {
    "perf_sloadcompute": "3.364085342s",
    "perf_smemload_connections": "2.697538645s",
    "perf_smemload_transfers": "723.211966ms",
    "perf_smemload_walksfrom": "325.92529ms",
    "perf_smemload_walksto": "325.593969ms",
    "perf_smemload_connsverb": "740.238µs",
    "perf_sroute": "65.283806ms",
    "perf_nmemtransfers": 67198,
    "perf_nmemwalksfrom": 34482,
    "perf_nmemwalksfromnonnil": 27863,
    "perf_nmemwalksto": 34482,
    "perf_nmemwalkstononnil": 27863,
    "perf_nmemconns": 288313
  },

Large sources & aggregates (atleast 1077) seems to work after import (transfers table _cvtransfersgen needs to be disabled; _vconnd works as is.. though space inefficient).

Once #43 is complete to atleast fix import.. would greatly benefit from #9; on mobile have not tested but above diagnostics are from a somewhat recent i5 quadcore machine. On mobile (a decently recent phone) I'd expect probably on order of 30s - minutes to 'prep' data for total of (sloadcompute, smemload_{connections,transfers}). The walksfrom/to calculations would need to stay out of first stage (routing runtime / memory context prep) to have re-usability value. Actual CSA alg benchmark from above is acceptable (65ms) and the cheapest piece currently.

~mil closed duplicate ticket #39 7 months ago

~mil closed duplicate ticket #39 6 months ago

~mil referenced this from #64 a month ago

Register here or Log in to comment, or comment via email.