The gtfstidy project could be used as a drop-in library infront of the load process for mobsql. On certain archives gtfstidy can reduce load size / improve load speed by quite a large factor. Would likely make using a large number of archives much more feasible - and would solve many issues with respect aggregate feeds / #19 once implemented.
Only problem currently is that afaict gtfstidy loads each necessary CSV file fully into memory (https://github.com/patrickbr/gtfstidy/issues/17). As a workaround, gtfstidy could be made optional in the load process (on Transito there could be a checkbox to enable/disable on load). Alternative would be to fork gtfstidy and implement mmap or something similar
Would also solve issues such as #68 (for duplicate rows in stop_times) where the Mobsql import fails as we validate against GTFS spec in SQL schema