Recognizing you need help is the first step to getting it.
It’s clear from our conversations with journalists that most folks just don’t know when a machine-learning algorithm might help them with a story.
We’re hoping to change that. Then, once you recognize an algorithm might help you, we’ll give you tools to use, steps to try, and advice on where to get more help.
So to kick it off, here some situations and feelings you might have where machine learning could help:
- We’ll never be able to read all of these documents.
- What’s unique about this text compared to all the rest?
- My eyes sting from searching these images for the same thing.
- We need to find more records like these in a huge pile of data.
- I could really use a heads-up before this happens again. (Post to come.)
My working theory is that, with some exceptions, machine learning can’t do anything you couldn’t – at least if you had a lot of time and an awesome attention span. But what machine learning can do, it does faster than you.
Machine learning might be able to find emails similar to one you already have. It might be able to help you find frames of a video that contain a senator.
The computer, like a new-to-the-beat colleague, must be explicitly taught some basic knowledge about the question you’re trying to solve. What discussions are typical in this industry? What do senators look like?
With that base knowledge, you can get a computer to solve a few common “shapes” of problems: flagging for you things that might match from among a set of documents, searching through a pile of pictures, filtering a complicated spreadsheet of data points, or sorting a cache of reader tips.
Whether or not the documents or images or data points it finds are actually interesting or newsworthy is still up to you. But machine learning could get you from an unmanageable pile of information to a manageable one. And, like any source, the computer could be confused, mistaken, or didn’t understand your question. So your journalism continues.
The year ahead
We’re doing experiments to see how modern machine-learning can answer questions reporters have about their data. We’ll show our work, with an aim towards writing guides that help newsrooms solve problems, either on their own or in collaboration with local technologists.
We’ll discuss a variety of situations, possible approaches, and examples of how other journalists have addressed those problems. The list of instances of journalists using machine-learning is short (and we may have missed something), but we hope it’ll get longer this year.
We and our partners will do a lot of background research on a topic, gathering data, and interviewing sources, before we apply a machine-learning algorithm. Once we get the best result we can from the computer, we’ll still have a lot of journalism to do, reporting the rest of the story, the who, the why, the how – and making sure the algorithm’s results make sense.
We’ll document all the steps we took, the dead-ends, and our eventual success (or failure).
So stay tuned. If you recognize the feelings we mentioned and have a project we might tackle together, drop us a note at firstname.lastname@example.org. And if you’d like an email whenever we have new things to share, give us your address in the box below.