Spotting circling helicopters

When a helicopter circles overhead, I always wonder what’s up.

It’s a sign, a trigger. The longer it circles, the more likely I’ll check Twitter or a local news site to find out what’s going on.

Take that a step further, and that sign – a circling helicopter – could be a good tip for local reporters. Something worth checking out.

So we wondered: Could we train a computer to spot circling helicopters?

It turns out, we can.

Getting the data

Nearly all aircraft transmit their identity and location, helping other pilots and air traffic control keep track of them. Those signals are not encrypted and are relatively easy to detect on the ground with an antenna, a hobby computer, and a little code.

It’s thousands of these receivers that power real-time maps like FlightAware and Flightradar24. Those sites are a great source for flight data, and Flightradar24 is where BuzzFeed got data for its great work detecting hidden spy planes.

But you can collect your own data, too. With a couple of these receivers, we now track all aircraft flying around New York City – including NYPD helicopters. Quartz’s Jeremy B. Merrill even set up a nifty bot that posts a message in our Slack every time one of those choppers is aloft.

Two maps showing helicopter trails taken as a screenshot from the Slack application.
Our #bot-preschool Slack channel.

But is it circling?

Even better would be knowing not just when a ‘copter is in the air, but when one is flying in circles. That’s the sign.

Merrill has been working on a machine learning model to spot those moments based on rows and rows of data we have about each aircraft’s speed, direction, altitude, and so on. It’s all part of a larger project we hope will be useful for any newsroom (so more to come).

Meanwhile I was taking a free online class called Practical Machine Learning for Coders from fast.ai, a machine-learning organization (and code library) I like. Their motto is “making neural nets uncool again.”

The first lessons were about image recognition, and I quickly learned how to teach a computer to identify dog and cat breeds(!) and to distinguish real bears from teddy bears. At one point instructor Jeremy Howard mentioned that someone had turned computer-mouse movements into data pictures so that an image-recognition algorithm could detect patterns, such as fraud.

Maps as “data pictures”

That got me thinking: Those maps Merrill’s bot is posting in Slack are basically data pictures. I can see when a helicopter is circling. Could I train a computer to see the same thing?

I went to the directory where we store the helicopter maps and grabbed some of the most recent – 183 in all. I sorted them into two folders: 1 for circling, and 0 for not circling. I then pointed the fast.ai code at those two folders.

Six maps of helicopter trails over new york city. Those showing circle patters tagged with 1, those without tagged with 0
Teaching the model: 1 is circling, 0 is not circling.

Following the lesson steps, in my pajamas while home sick, I quickly got a reliable accuracy of 89% – which is pretty great. The computer guessed correctly almost 9 out of 10 times! While preparing this blog post and running through the steps anew, with some additional tweaks I learned, I hit 94% accuracy.

That is pretty dang amazing.

Not only that, I could apparently see where the computer was getting distracted.

Nine pictures showing actual and predicted scores for each map.
Images the model was most wrong about. First digit is its guess, second digit is the reality. Where they match, the model was less confident, even though it guessed correctly.

With little more code, I could ask the model about fresh, single images:

Map of a single helicopter route, which includes several loops.
“Category 1” is a correct guess!

This could run on an everyday server, so we can pretty easily add this detection to our Slack bot.

Quick takeaways

Here are some of the things I learned:

  • This is not hard. I encourage you to check out the fast.ai course. With the images I had available to me, I got a lot done quickly.

  • I learned that I could get good results with very few images. One-hundred eighty-three images is not “big data.” This works mainly because I’m building on a previously trained architecture, called resnet34, trained with the famous ImageNet data set. (ImageNet, you should be aware, has a US and European bias.) With more map images, I bet it would be even better.

  • I’m pretty sure that if we generated maps without the base map, just the helicopter paths over blank “ground,” the computer would get even better.

This also got me thinking anew about visualizing data. While on the WNYC Data News Team, I thought hard about how to make data visualizations readable by the general public. Now I’m imagining all the ways one might make data visualizations easier for a computer to read.

You can do this

We’re working to make all of our code easy to use and to try online, but we’re still experimenting with the best way to do that. In the meantime, you can see my code in this Jupyter notebook. And if you’re intrepid, you can spin up a powerful computer for just a few bucks, clone the repository, and try it yourself.

And if you’re a journalist with a project where you think this kind of sorting could really help, please let us know at bots@qz.com. Finally, if you’d like to know when our copter-spotting project is ready for your newsroom, add your name to our email list below.