The Rise Of Machine Learning In The News And Publishing World

Technology has accelerated production and efficiency across most industries today. Business use of artificial intelligence (AI) and machine learning (ML) is reducing dependence on manual intervention. The result is increased efficiency, dependability, and predictability compared to processes with heavy human effort. Machine learning has proven its resourcefulness across use cases: identifying the error rate in a production line, or diagnosing ailments during a pandemic.

It's a common misnomer that ML cannot establish its place in a creative world. Now I may not be the best mind to comment on creativity, but since I’m deeply invested in the news, media, and publishing world, I wanted to describe just how critical ML is to this space.

Understanding audiences

Given the fast-paced nature of digital newsrooms, and the need for quick decision-making, editors in these environments can benefit significantly from the use of machine learning.

The London School of Economics and Political Science with support from the Google News Initiatives (GNI) aims to solve just that. JournalismAI is a project by Polis, a think tank by LSE with support from GNI, and includes a course on the use of AI and machine learning in journalism.

Also Read: Publishers Need To Relook At How They Perceive Marketing

For digital editors, the most commonly used indicator for audience preference are dashboards such as Google Trends, Google News, and general social media trends that align with the interest of readers relevant to the publication. Since these are mainstream indicators and are covered by most publications, it’s usually the same approach to the topic.

NewYorkTimes Covid tracker Image: A heatmap of data around Covid hotspots, hospitalization, and vaccination progress across the United States. Source: New York Times

The play of ML along with big data sets, however, opens up a new opportunity to offer deeper insights and explain trends more effectively. For example, during the pandemic over the past two years, interactive data storytelling by publications such as the New York Times offers far more clarity and context than the simple reporting of facts. Some of these interactive stories have fascinated me. It’s common to think these are the result of great UI and interfaces.

NYTimes Covid data storytelling

Image: Interactive display of cases per million across rural areas in the United States. Source: New York Times

What many miss, though, is that these stories would not have been possible without sifting through a vast data set. ML has enabled the real-time processing and interpretation of large data sets. It enables editors to present reams of data and let users dissect and interpret it as they wish to. As a result, not only are readers more informed about a prevalent situation but are increasingly likely to revisit the publication.

Also Read: 10 Lessons From Meeting 300 Publishers Across The Globe

Editorial decision making

In the existing setup, digital editors rely on conventional analytical tools such as Google Analytics to understand the pieces of content that work well with the publication’s audience. In November 2021, the International Consortium of Investigative Journalists (ICIJ) published the Pandora Papers - a collection of 11.9 million leaked documents that comprised 2.9 TB of data. The leak impacted 35 global leaders and over 100 billionaires across the globe. Before this, the Panama Papers, at a marginally lower scale, also included 11.5 million documents and 2.6TB of data.

Pandora Papers Such large journalistic efforts involve looking into big data sets. According to the ICIJ, “it used machine learning and tools such as the Fonduer and Scikit-learn software to identify and separate specific forms from longer documents.” ICIJ explains that in cases where the information was present in spreadsheets, “ICIJ removed duplicates and combined it into a master spreadsheet. For PDF or document files, ICIJ used programming languages such as Python to automate data extraction and structuring as much as possible.” In certain cases, “it used machine learning to tag files in Datashare, enabling reporters to exclude them from their searches.”

The use of machine learning and Python will enhance the data processing capabilities of the newsroom. Moreover, they will go a long way in ensuring the objective processing of data to yield deeper stories of societal impact.

Also Read: How News Editors Coped With May 2022 Google Core Update

Audience engagement

Jennifer Brandel is Founder and CEO of Hearken, a tech-enabled company that works with newsrooms. In her Medium post, she explains how newsrooms can effectively use machine learning to unearth audience engagement insights. I find the Philadelphia Enquirer example particularly fascinating. The publication rolled out a feature called Curious Philly to understand what its audience wanted to know. What began as an exercise to be more aware about the local population’s interests brought more clarity to eventually alter the news product itself.

Hearken’s auto-categorization feature has been used by newsrooms to tweak the category listing on WBEZ’s newsroom. In 2012, the local population that read the Philadelphia Enquirer didn’t regard the economy as an important topic. At least that’s what the newsroom believed. The economy was lower down the order of priority and visibility on the home page. However, after analyzing interest patterns, Economy was the second most important category in 2018!

These are great examples of listening closely to the interests and priorities of readers and tweaking the approach of a publication to be user-first.

Preparing for the evolution of publishing

A clear takeaway from the examples above is that machine learning offers an element of scalability that is not possible with manual intervention. Moreover, the approach is more data-driven and objective rather than a subjective point of view. Whether it’s looking at facts, sorting, filtering, or optimizing newsroom priorities, machine learning has the potential to transform the industry and prepare it for growth.

Conventional use cases of machine learning involve automating many decisions based on regular patterns in repetitive tasks. However, in the news industry, machine learning can play a pivotal role in presenting the true picture of a story. It offers the biggest opportunity to present “both sides of the story” within the same news piece. Power then lies in the hands of the reader who is able to interpret and understand information that is objectively presented.

Stories based on data are by far the most objective compared to editorialized pieces that tend to be influenced by human and environmental biases that have come to plague the news industry.

I’d sum up the role of ML in the news industry as having the potential to turn journalism into what it was always meant to be: the honest and objective presentation of facts. That makes me hopeful for the future of news!

The Rise Of Machine Learning In The News And Publishing World

Last updated on Mar 28, 2024