~mil/mobroute-tickets#68: 
Mobsql: GTFS 1084 / insert error duplicate entries (violates stop_times trip_id/stop_sequence unique)

GTFS 1084 error computing downloaded file sql insert failed

Status
REPORTED
Submitter
~sldfjklaskdf
Assigned to
No-one
Submitted
a month ago
Updated
a month ago
Labels
mobsql

~mil a month ago*

Raw error:

2024/09/12 15:27:44 Failed to load DB via mobsql: Error 1084 loading downloaded zip to DB: Import CSV (stop_times.txt) to DB from directory failed: Problem inserting: Error with insert statement: insert into stop_times ('feed_id','feed_id','trip_id','arrival_time','departure_time','stop_id','stop_sequence','stop_headsign','pickup_type','drop_off_type') values(:feed_id,:feed_id,:trip_id,:arrival_time,:departure_time,:stop_id,:stop_sequence,:stop_headsign,:pickup_type,:drop_off_type)
  with map: map[arrival_time:22800 departure_time:22800 drop_off_type:0 feed_id:1084 pickup_type:0 shape_dist_traveled:0.00 stop_headsign:<nil> stop_id:ch:23005:27:3:5 stop_sequence:1 trip_id:1.T0.99-55-E-j23-2.1.R]
  error: UNIQUE constraint failed: stop_times.feed_id, stop_times.trip_id, stop_times.stop_sequence

This is essentially an error with the feed itself, there's a duplicate entry for the trip_id/stop_sequence combination:

cat stop_times.txt | grep '1.T0.99-55-E-j23-2.1.R' | grep ch:23005:27:3:5
"1.T0.99-55-E-j23-2.1.R","06:20:00","06:20:00","ch:23005:27:3:5","1","","0","0","0.00"
"1.T0.99-55-E-j23-2.1.R","06:20:00","06:20:00","ch:23005:27:3:5","1","","0","0","0.00"
"21.T0.99-55-E-j23-2.1.R","11:20:00","11:20:00","ch:23005:27:3:5","1","","0","0","0.00"
"21.T0.99-55-E-j23-2.1.R","11:20:00","11:20:00","ch:23005:27:3:5","1","","0","0","0.00"

These are duplicates; but what would happen if the data differed for two rows with the same trip_id/stop_sequence entries? Which stop_times entry would be valid to take according to GTFS spec? I believe unique on trip_id / stop_sequence as is currently implemented is correct and this is a feed error.

Only way to resolve currently would be to remove the trip_id/stop_sequence unique constraint and handle with upsert but not really sure that's correct, probably something to report to the feed issuer that they have duplicate data.

~sldfjklaskdf a month ago

if its exact duplicates, one can be dropped.

A report to the issuer is fine, but the app should still work, until the issuer fixes it. At the moment there is no way to use it at all. The app could output a warning instead of an error

~mil referenced this from #83 a month ago

~mil a month ago

Potentially can be addressed by #83

Register here or Log in to comment, or comment via email.