GTFS-RT and SIRI feeds contain different data

Given the data quality issues with the GTFS-RT feed, we wanted to see if SIRI was better.

Our aim was to gain an initial understanding of how similar these two data feeds are. For example, to what extent do they report the same locations for the same bus? Do the two feeds contain the same non-location based information about each bus?

At 4pm on 10th December 2024, for approximately one hour, we download the live locations of every bus in England from the BODS feed in both GTFS-RT and SIRI format, every 10 seconds.

Unfortunately, the two formats don’t have many common fields. So we rely on coordinates, vehicle_id and timestamps to match the data.

To make the two comparable, we converted the data we collected into flat CSVs, removed duplicate data, and rounded the coordinates to 5 decimal places (1.1m precision at the equator).

For data downloaded every 10s:

  • SIRI reported 2,172,071 bus locations in the hour.
  • GTFS-RT reported 1,642,849 locations in the hour.
  • 1,218,178 were common to both.
  • 957,833 appeared only in SIRI.
  • 424,671 appeared only in GTFS-RT.
  • Number of unique vehicle_ids in GTFS-RT: 29,479
  • Number of unique vehicle_ids in SIRI: 23,031

It appears that GTFS-RT covers a larger number of buses, but with lower frequency of location updates.

To investigate this further, we randomly picked a bus and looked at the data that was in each of the feeds over an hour sampling at 10s intervals. We looked at the location, bearing and timestamp of the data, and did not alter the data in any way.

GTFS-RT (top) vs SIRI (bottom) reported locations for the same bus vehicle_id.

The SIRI feed has far more points in it. But the GTFS-RT feed has points that don’t exist in the SIRI feed. It feels as if they are both taking separate snapshots of a much higher resolution dataset, but neither quite getting everything. It looked as though GTFS had a ~50-second delay in certain timestamps showing up in the data, compared to SIRI. This delay would mean SIRI was a better choice for “live” apps that show you where a bus is.

Why are there more unique buses in the GTFS-RT feed, but fewer reported locations? Shouldn’t the two feeds contain the same location information?