Problem diagnosis:
Performance on very large GTFS archives (typically aggregates from multiple providers) should be addressed. Typical 50-100MB stoptimes isn't a problem; In excess of 500MB there are very apparent problems.
An MDBID to test is 1078 which is Norway's aggregate (500MB Zip, 1GB stop_times tables extracted). Load process through Mobsql goes ok - compute cycle for creating _cvconnd
takes >1hr on a i7 quadcore. There are mainly 3 bottlenecks from compute cycle: _vstoptoparent (now resolved), _vconnd, and _vtransfersgen.
Current potential approaches / solutions
Test Feeds:
One idea is to figure out someway to automatically segment larger GTFS feeds; for example if a route only is within one small locale of a larger locale it makes sense to only load {stop,connections,routes} for the smaller locale. This does very much increase routing logic complexity so architecture would need to be fully thought out before implementation.
Bare minimum euclidian distance for benchmarking tests for reworking _vtransfersgen:
select count(*) as total_distances_less_than_5km from ( select v1.source, v2.stop_id, sqrt(power(v1.stop_lat - v2.stop_lat, 2) + power(v1.stop_lon - v2.stop_lon, 2)) as euclidean_distance from stops v1 join stops v2 ) as dists where euclidean_distance < 5;
Sample routes to test for
1077
(functional once generated transfers wholesale disabled):./mobroute route -p '{load: {filter: {mdbid: [1077]}}, from: {lat: 52.3791, lon: 4.8999}, to: {lat: 52.0894, lon: 5.1130}, output_formats: [legs, diagnostics, mapurl]}' ./mobroute route -p '{load: {filter: {mdbid: [1077]}}, from: {lat: 52.07272, lon: 4.3227}, to: {lat: 52.2281, lon: 5.1692}, output_formats: [legs, diagnostics, mapurl]}'Example diagnostics info:
"diagnostics": { "perf_sloadcompute": "3.364085342s", "perf_smemload_connections": "2.697538645s", "perf_smemload_transfers": "723.211966ms", "perf_smemload_walksfrom": "325.92529ms", "perf_smemload_walksto": "325.593969ms", "perf_smemload_connsverb": "740.238µs", "perf_sroute": "65.283806ms", "perf_nmemtransfers": 67198, "perf_nmemwalksfrom": 34482, "perf_nmemwalksfromnonnil": 27863, "perf_nmemwalksto": 34482, "perf_nmemwalkstononnil": 27863, "perf_nmemconns": 288313 },Large sources & aggregates (atleast 1077) seems to work after import (transfers table
_cvtransfersgen
needs to be disabled;_vconnd
works as is.. though space inefficient).Once #43 is complete to atleast fix import.. would greatly benefit from #9; on mobile have not tested but above diagnostics are from a somewhat recent i5 quadcore machine. On mobile (a decently recent phone) I'd expect probably on order of 30s - minutes to 'prep' data for total of (sloadcompute, smemload_{connections,transfers}). The walksfrom/to calculations would need to stay out of first stage (routing runtime / memory context prep) to have re-usability value. Actual CSA alg benchmark from above is acceptable (65ms) and the cheapest piece currently.