OpenAI to train LLMs on Financial Times content — with permission

It marks the latest in a string of new deals between OpenAI and publishers


OpenAI to train LLMs on Financial Times content — with permission

The Financial Times (full disclosure — the owners of The Next Web) have inked a deal with OpenAI. The American firm will use the British publisher’s content to train its generative AI models.

The deal is the latest in a string of new partnerships between OpenAI and global news publishers like Axel Springer, Associated Press, and Le Monde. The company did not disclose the financial terms of any of the contracts.

In 2023 alone, hundreds of pages of litigation and countless articles accused tech firms of stealing artists’ and publishers’ work to train their AI models.

OpenAI has come under fire for training its GPT models on content scraped from the web without consent. Last year, The New York Times even sued OpenAI and Microsoft for copyright infringement.

OpenAI’s recent tie-ups with publishers will allow it to continue to train its algorithms on web content. But, this time, it will have permission.

Strategic partnership

The FT called the deal with OpenAI a “strategic partnership.”

The 100 million-plus users of ChatGPT will have direct access to summaries, quotes, and links to the publisher’s articles. This content is usually hidden behind a paywall. OpenAI will attribute all information from the FT to the publication.

In exchange, OpenAI will help the news organisation develop new AI tools. The FT already uses OpenAI products, including ChatGPT Enterprise, we can confirm.

FT Group CEO John Ridding said the publisher was still committed to “human journalism.”

“This is an important agreement in a number of respects,” said Ridding. “It recognises the value of our award-winning journalism and will give us early insights into how content is surfaced through AI.”

“Apart from the benefits to the FT, there are broader implications for the industry. It’s right, of course, that AI platforms pay publishers for the use of their material,” Ridding continued. “OpenAI understands the importance of transparency, attribution, and compensation – all essential for us. At the same time, it’s clearly in the interests of users that these products contain reliable sources.”

Fair use or unfair?

However, just because OpenAI is cozying up to publishers doesn’t mean it’s not still scraping information from the web without permission.

Earlier this month, the New York Times reported that OpenAI was using Youtube scripts to train its models. According to the publication, this contravenes copyright laws, since YouTube creators who upload videos to the platform still retain the copyright to the content they create.

OpenAI, however, insists its use of online material constitutes “fair use.” The firm, and many other tech companies, claim their large language models (LLMs) transform information gathered online into something entirely new.

Yet, as we’ve previously reported in-depth, studies have shown that LLMs consistently regurgitate large chunks of their original training text verbatim.

Agreements with publishers could mark a potential step forward for AI copyright contentions. However, they are likely to remain more the exception than the rule.

 

Get the TNW newsletter

Get the most important tech news in your inbox each week.

Also tagged with