This week, many authors discovered that their books were used without permission to train AI systems. Here’s what you need to know if your books are in the Books3 dataset, as well as actions you can take now to speak out in defense of your rights.
If you’re an author, you may have recently discovered that your published book was included in a dataset of books used to train artificial intelligence systems without your permission. (Search the dataset here.) This can be an unsettling revelation, raising concerns about copyright, compensation, and the future implications of AI. Here’s what you need to know if your work has been used to “train” AI without permission:Books3 Is One of Several Books Datasets Used to Train AI Systems
The Books3 dataset contains 183,000 books, downloaded from pirate sources. We know that companies like Meta (creators of LLaMA), EleutherAI, and Bloomberg have used it to train their language models. OpenAI has not disclosed training information about GPT 3.5 or GPT 4—the models underlying ChatGPT—so we don’t know whether it also used Books3. Regardless of whether GPT was trained on Books3, the class action lawsuits against OpenAI should uncover more information on the datasets used by OpenAI, which we believe also include books obtained from pirate sources.
You Don’t Have to Be a Named Plaintiff in the Lawsuits to Benefit From the Outcome
In addition to the recent lawsuit in which the Authors Guild is a named plaintiff, there are other author class action suits pending against OpenAI, Meta, and Google. You don’t need to be a named plaintiff in any of these lawsuits to participate because the respective named plaintiffs represent their entire class. Even if you don’t fall within one or more classes, an outcome in favor of authors should benefit you by clarifying that books need to be licensed when used to “train” generative AI.
Read the full article: https://authorsguild.org/news/you-just-found-out-your-book-was-used-to-train-ai-now-what/