Bad Girls, Good Guys, and Two-Fisted Action: artificial intelligence

Showing posts with label artificial intelligence. Show all posts

Thursday, September 11, 2025

[Link] What Authors Need to Know About the $1.5 Billion Anthropic Settlement

Today, Anthropic agreed to pay $1.5 billion to settle claims that it downloaded pirated books to train its AI systems—the largest U.S. copyright settlement in history. The parties in Bartz v Anthropic, one of the major copyright lawsuits brought by authors against an AI company for using pirated books to train its large language models, filed a proposed settlement agreement with the court that would settle the claims regarding the company’s mass piracy in downloading millions of books from notorious pirate sources Library Genesis (LibGen) and PiLiMi and then retaining them in a central library.

The settlement provides that Anthropic will pay $1.5 billion plus interest in cash into a settlement fund, representing the largest U.S. copyright infringement settlement ever and greater than any copyright damages award ever secured. The amount of the award sends a signal to all AI companies that downloading illegal copies of books to train AI comes with a heavy cost and, we expect, will foster further licensing, given the potential enormous liability AI companies risk when they help themselves to books for free from illegal channels.

“This historic settlement is a vital step in acknowledging that AI companies cannot simply steal authors’ creative work to build their AI just because they need books to develop quality LLMs,” said Authors Guild CEO Mary Rasenberger. “It is truly shocking that Anthropic and the other major LLM owners engaged in criminal-level piracy schemes to torrent millions of books knowingly from infamous foreign ebook piracy sites that the publishing industry has actively been trying to take down for years. Imagine the outrage if Anthropic and others had illegally siphoned off electricity to build their AI, claiming it was too expensive to pay for it? These vastly rich companies, worth billions, stole from those earning a median income of barely $20,000 a year. This settlement sends a clear message that AI companies must pay for the books they use just as they pay for the other essential components of their LLMs. This settlement lays down an anchor that it is not okay. We expect that the settlement will lead to more licensing that gives author both compensation and control over the use of their work by AI companies, as should be the case in a functioning free market society.”

Read the full article: https://authorsguild.org/news/what-authors-need-to-know-about-the-anthropic-settlement/

Saturday, December 7, 2024

[Link] Penguin Random House books now explicitly say ‘no’ to AI training

The copyright page on new books and reprints now says they can’t be used or reproduced ‘for the purpose of training artificial intelligence.’

By Emma Roth

Book publisher Penguin Random House is putting its stance on AI training in print. The standard copyright page on both new and reprinted books will now say, “No part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems,” according to a report from The Bookseller spotted by Gizmodo.

The clause also notes that Penguin Random House “expressly reserves this work from the text and data mining exception” in line with the European Union’s laws. The Bookseller says that Penguin Random House appears to be the first major publisher to account for AI on its copyright page.

Read the full article: https://www.theverge.com/2024/10/18/24273895/penguin-random-house-books-copyright-ai

Saturday, November 23, 2024

[Link] AI Audiobook Narrators in OverDrive and the Issue of Library AI Circulation Policy

by SB Sarah

OverDrive is the company that provides a lot of digital content to libraries. If you’ve borrowed an ebook or an audiobook in Libby, or read a magazine in Kanopy, that’s OverDrive.

It seems there is some AI weirdness with audiobook narration on OverDrive, and the narrator is only part of the story.

On Monday, October 14, librarian Robin Bradford posted on Bluesky that she’d purchased an AI audiobook for her library system and she was really upset about it:

Robin Bradford on Blussky saying Good Morning, BlueSky. I'm annoyed at myself today because I bought an AI audiobook that sucks. Clearly, I need to pay more attention to who the narrator is, instead of just buying the title because someone wants to listen to it. Hope your Monday is going better than mine!
Also, when I go to the book author's webpage it is....incredibly bare. I wonder if the whole thing is AI now. Books, audiobooks, everything. Well, the good news is we only have 9 audiobooks, by 3 authors, with that AI voice. The surprising news is one of the authors. The sketch news is all 3 author websites look eerily similar, and now I have more questions than answers. And I'm hungry. My list has grown to 101 titles by a group of "narrators' going by different names. I guess, on the bright side, they are at least labeling them. But I wish they didn't have them at all. And I'm going to be so much more careful about what I purchase even if people want it.

Over 100 titles by AI “narrators” were in their catalog, and Robin was having trouble finding indications that the authors themselves are real?

Interesting.

The authors with AI audiobooks in our catalog: Blake Pierce, Molly Black, Fiona Grace, Rylie Dark, Kate Bold, Ava Strong, Jack Mars, Taylor Strk, Mia Gold, Laura Rise, Audrey Shine, Sophie Love, Ella Swift, Vin Strong, Katie Rush.

Not only is that A LOT of audiobooks, but similar to my casual foray into contemporary romance cover art, I noticed that there was an odd pattern to those author names:

Mostly one or two-syllable first names
One-syllable last names
All very basic nouns and adjectives as surnames

That homogeneity is a little strange, right? Good thing I’m really nosy.

I reached out to Robin for an interview to learn more. What had brought this to her attention? What was her next step? While we were corresponding, we both searched names and author websites for more information.

This is really weird, y’all.

Her investigation started when she received a message from a patron of her library system that there was something wrong with an audiobook they had borrowed.

Read the full article: https://smartbitchestrashybooks.com/2024/10/ai-audiobook-narrators-in-overdrive-and-the-issue-of-library-ai-circulation-policy/

Saturday, September 30, 2023

[Link] You Just Found Out Your Book Was Used to Train AI. Now What?

This week, many authors discovered that their books were used without permission to train AI systems. Here’s what you need to know if your books are in the Books3 dataset, as well as actions you can take now to speak out in defense of your rights.

If you’re an author, you may have recently discovered that your published book was included in a dataset of books used to train artificial intelligence systems without your permission. (Search the dataset here.) This can be an unsettling revelation, raising concerns about copyright, compensation, and the future implications of AI. Here’s what you need to know if your work has been used to “train” AI without permission:

Books3 Is One of Several Books Datasets Used to Train AI Systems

The Books3 dataset contains 183,000 books, downloaded from pirate sources. We know that companies like Meta (creators of LLaMA), EleutherAI, and Bloomberg have used it to train their language models. OpenAI has not disclosed training information about GPT 3.5 or GPT 4—the models underlying ChatGPT—so we don’t know whether it also used Books3. Regardless of whether GPT was trained on Books3, the class action lawsuits against OpenAI should uncover more information on the datasets used by OpenAI, which we believe also include books obtained from pirate sources.

You Don’t Have to Be a Named Plaintiff in the Lawsuits to Benefit From the Outcome

In addition to the recent lawsuit in which the Authors Guild is a named plaintiff, there are other author class action suits pending against OpenAI, Meta, and Google. You don’t need to be a named plaintiff in any of these lawsuits to participate because the respective named plaintiffs represent their entire class. Even if you don’t fall within one or more classes, an outcome in favor of authors should benefit you by clarifying that books need to be licensed when used to “train” generative AI.

Read the full article: https://authorsguild.org/news/you-just-found-out-your-book-was-used-to-train-ai-now-what/

Take the Tour