From July 8, TfL will track every Wi-Fi enabled device that travels across the tube network. Want to opt out? Turn your Wi-Fi off
In the autumn of 2016, Transport for London (TfL) started tracking its passengers. A month long trial at 54 stations used Wi-Fi signals from phones and other devices to harvest depersonalised data about where, when and how people were using the tube.
TfL has now decided to rollout the scheme across the whole Underground network. Starting on July 8, every device will be tracked was people complete their journeys. The data harvested could lead directly to improvements in how the tube runs and operates, by giving the transport body a much more detailed insight into customer behaviour than has previously been available.
“The benefits this new depersonalised dataset could unlock across our network – from providing customers with better alerts about overcrowding to helping station staff have a better understanding of the network in near-real time – are enormous,” says Lauren Sager Weinstein, TfL’s chief data officer. “By better understanding overall patterns and flows, we can provide better information to our customers and help us plan and operate our transport network more effectively for all.”
The system will use Wi-Fi beacons that are already present in 260 TfL-managed tube stations. As well as serving up internet access, the beacons will log the unique hardware addresses – or MAC addresses – of every device they sniff out, whether those devices are connected to Wi-Fi or not. The same technology is already in use in many shopping malls, museums and other public spaces, which use the data for similar purposes.
TfL’s use of Wi-Fi data is particularly interesting, however, because of its sheer scale. The 2016 trial collected 509 million pieces of data from 5.6m mobile devices on 42m journeys. Until now, all TfL has known about your journey is where you tapped in and out, if you were using an Oyster Card or contactless payments. Wi-Fi can fill in the gaps. Transport planners will be able to see exactly which route between two stations was taken by customers, and how they move around each station.
The trial data contained some intriguing insights, including the convoluted paths that some customers take. While the majority of those travelling between Liverpool Street and Victoria changed at Oxford Circus, two per cent of travellers inexplicably took the Central line to Holborn, then the Piccadilly line to Green Park, then the Victoria line to Victoria. It also revealed that passengers have 18 different ways to get between King’s Cross and Waterloo, and that it takes 86 seconds to get from the ticket hall to the platforms at Victoria.
TfL plan to use the data to model passenger behaviour, and squeeze more capacity out of the existing tube network. It can, for example, show how passengers react to problems on the network. When the Waterloo and City line was suspended in December 2016, TfL was able to use Wi-Fi data to see exactly what alternatives people took.
Apps that use TfL data, such as Google Maps and CityMapper – will also be able to use the data, to incorporate information about delays and congestion. If Wi-Fi beacons detect queues forming in a ticket hall, apps could suggest alternative routes for subsequent travellers.
There’s also a clear commercial incentive – which may be particularly important to TfL given the dual blows of the Crossrail delay and the loss of its central government grant. Documents about the 2016 trial released under the Freedom of Information Act revealed that the data could also be used to inform the placement of advertising inside stations. TfL will know which areas have the most footfall and the longest dwell times. The documents suggest that it could lead to advertising being sold based on journey patterns, so the same adverts will appear at multiple points along a given route.
The spectre of advertising following you to work is a grim one, and there are obvious privacy concerns with this system. TfL is keen to stress that all data is ‘depersonalised,’ and says it worked closely with the Information Commissioner’s Office in deciding how it will be used. The launch press release promises that “individual customer data will never be shared and customers will not be personally identified,” and afterwards TfL confirmed that depersonalised data will be held for two years.
Signs will be placed around the network, similar to those for CCTV, warning passengers that their device may be being tracked. The good news is that it is possible to opt out. All you have to do is switch off the Wi-Fi on your device, and it will stop broadcasting your MAC address for TfL to pick up. It’s effective, but not exactly a convenient way of not being tracked.
To protect privacy during the 2016 pilot, collected MAC addresses were first hashed – a type of one-way encryption that obscures the actual unique ID of the phone is not known. Then to tighten the encryption, it was hashed again with a “salt” – a string of arbitrary characters that is thrown into the hashing process to make it harder to crack.
Crucially, what this means is that while the data was depersonalised, it wasn’t completely anonymised. Though there are in principle well over a trillion combinations of possible MAC address, due to the way manufacturers are allocated blocks of numbers, in practice there are actually many fewer. It’s akin to how guessing a specific mobile phone number is made much easier by knowing that all UK mobile phone numbers start with “07,” which reduces the number of possible options by several orders of magnitude.
If you know the salt, it could conceivably be possible to create a lookup table that would reveal the real MAC address hiding inside every hash. In other words, at some point in the future, it’s possible to imagine an organisation like the police, for example, asking TfL for the salt and being able to track everyone on the tube.
For the full roll out, hashing has been replaced with a system of tokenisation – fully replacing the MAC address with an identifier that cannot be tied back to any personal information. TfL says this is a “more sophisticated mechanism” than what was used before, and that this solution has been fully approved by both the Information Commissioner and the organisation’s own internal cybersecurity team.
Wi-Fi tracking clearly has benefits. The tube is already maxed out in many places, and even Crossrail is expected to be at full capacity from day one (whenever that is), so using data to optimise journeys and inform planning could make a real difference. But robust security measures are essential, or we could end up trading our privacy for a slightly smoother commute.
All Rights Reserved for James O’Malley