Close Menu
New York Examiner News

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    The Low-Key Summer Look That’s Everywhere in L.A. Right Now

    July 7, 2025

    Writers! Enter Our Indie-credible Story Contest on Reedsy Prompts

    July 7, 2025

    The Dirty Three announce 2025 UK and European tour with London Barbican date

    July 7, 2025
    Facebook X (Twitter) Instagram
    New York Examiner News
    • Home
    • US News
    • Politics
    • Business
    • Science
    • Technology
    • Lifestyle
    • Music
    • Television
    • Film
    • Books
    • Contact
      • About
      • Amazon Disclaimer
      • DMCA / Copyrights Disclaimer
      • Terms and Conditions
      • Privacy Policy
    New York Examiner News
    Home»Science»Meta’s AI memorised books verbatim – that could cost it billions
    Science

    Meta’s AI memorised books verbatim – that could cost it billions

    By AdminJune 11, 2025
    Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    Meta’s AI memorised books verbatim – that could cost it billions


    Meta’s AI memorised books verbatim – that could cost it billions

    In April, book authors and publishers protested Meta’s use of copyrighted books to train AI

    Vuk Valcic/Alamy Live News

    Billions of dollars are at stake as courts in the US and UK decide whether tech companies can legally train their artificial intelligence models on copyrighted books. Authors and publishers have filed multiple lawsuits over this issue, and in a new twist, researchers have shown that at least one AI model has not only used popular books in its training data, but also memorised their contents verbatim.

    Many of the ongoing disputes revolve around whether AI developers have the legal right to use copyrighted works without first asking permission. Previous research found many of the large language models (LLMs) behind popular AI chatbots and other generative AI programs were trained on the “Books3” dataset, which contains nearly 200,000 copyrighted books, including many pirated ones. The AI developers who trained their models on this material have argued that they did not violate the law because an LLM puts out fresh combinations of words based on its training, transforming rather than replicating the copyrighted work.

    But now, researchers have tested multiple models to see how much of that training data they can spit back out verbatim. They found that many models do not retain the exact text of the books in their training data – but one of Meta’s models has memorised almost the entirety of certain books. If judges rule against the company, the researchers estimate that this could make Meta liable for at least $1 billion in damages.

    “That means, on the one hand, that AI models are not just ‘plagiarism machines’, as some have alleged, but it also means that they do more than just learn general relationships between words,” says Mark Lemley at Stanford University in California. “And the fact that the answer differs model to model and book to book means that it is very hard to set a clear legal rule that will work across all cases.”

    Lemley previously defended Meta in a generative AI copyright case called Kadrey v Meta Platforms. Authors whose books had been used to train Meta’s AI models filed a class-action suit against the tech giant for breach of copyright. The case is still being heard in the Northern District of California.

    In January 2025, Lemley announced he had dropped Meta as a client, although he said he still believed the company should win the case. Emil Vazquez, a Meta spokesperson, says “fair use of copyrighted materials is vital” to developing the company’s AI models. “We disagree with Plaintiffs’ assertions, and the full record tells a different story,” he says.

    In this latest research, Lemley and his colleagues tested AI memorisation of books by splitting small book excerpts into two parts – a prefix and a suffix section – and seeing whether a model prompted with the prefix would respond with the suffix. For example, they split one quote from F. Scott Fitzgerald’s The Great Gatsby into the prefix “They were careless people, Tom and Daisy – they smashed up things and creatures and then retreated” and the suffix “back into their money or their vast carelessness, or whatever it was that kept them together, and let other people clean up the mess they had made.”

    Based on their findings, the researchers estimated the probability that each AI model would complete the excerpts verbatim. Then they compared those probabilities with the odds of models doing so by random chance.

    The excerpts included chunks of text from 36 copyrighted books, including popular titles such as George R. R. Martin’s A Game of Thrones and Sheryl Sandberg’s Lean In. The researchers also tested excerpts from books written by plaintiffs in the Kadrey v Meta Platforms case.

    The researchers ran these experiments on 13 open-source AI models, including models developed and released by Meta, Google, DeepSeek, EleutherAI and Microsoft. Most companies besides Meta did not respond to requests for comment and Microsoft declined to comment.

    Such testing revealed that Meta’s Llama 3.1 70B model has memorised most of the first book in J. K. Rowling’s Harry Potter series, as well as The Great Gatsby and George Orwell’s dystopian novel 1984. Most of the other models had memorised very little of the books, including sample books written by the lawsuit plaintiffs. Meta declined to comment on these results.

    The researchers estimate that an AI model found to have infringed on the copyright of just 3 per cent of the Books3 dataset could lead to a statutory damages award of nearly $1 billion – and possibly even larger awards based on AI developers’ profits related to that infringement.

    This technique could be a “good forensic tool” for identifying the extent of AI memorisation, says Randy McCarthy at the Hall Estill law firm in Oklahoma. But it doesn’t resolve whether companies can legally train their AI models on copyrighted works through the US “fair use” rule, a legal doctrine permitting unlicensed use of copyrighted works in some circumstances.

    McCarthy notes that AI companies usually acknowledge training their models on copyrighted materials. “The question is, did they have the right to do it?” he asks.

    In the UK, on the other hand, the memorisation finding could be “very significant from a copyright perspective”, says Robert Lands at the Howard Kennedy law firm in London. UK copyright law follows the “fair dealing” concept, which provides a much narrower exception to copyright infringement than the US fair use doctrine. So AI models that memorised pirated books are unlikely to qualify for that exception, he says.

    Topics:

    • artificial intelligence/
    • law



    Original Source Link

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    Previous ArticleHarry Potter TV Show Just Debunked One Of The Best OG Actor Recasting Theories After Exciting Franchise Update
    Next Article Sam Altman thinks AI will have ‘novel insights’ next year

    RELATED POSTS

    Meteorologists Say the National Weather Service Did Its Job in Texas

    July 7, 2025

    Math’s Block-Stacking Problem Has a Preposterous Solution

    July 6, 2025

    Fig trees may benefit climate by turning carbon dioxide into stone

    July 6, 2025

    Is It Time to Stop Protecting the Grizzly Bear?

    July 5, 2025

    Climate Change’s Fingerprints Came Early, a Thought Experiment Reveals

    July 5, 2025

    Ancient mass extinction shows how Earth turned into a super-greenhouse

    July 4, 2025
    latest posts

    The Low-Key Summer Look That’s Everywhere in L.A. Right Now

    We independently evaluate all recommended products and services. Any products or services put forward appear…

    Writers! Enter Our Indie-credible Story Contest on Reedsy Prompts

    July 7, 2025

    The Dirty Three announce 2025 UK and European tour with London Barbican date

    July 7, 2025

    Michael Dell just saw a $1.2 billion surge in his wealth thanks to a stock sale—but it’s not shaken the billionaires club

    July 7, 2025

    Houston Leftist Booted from Food Insecurity Board After VILE Comment Calling Missing Girls’ Flooded Camp “White-Only Girls Camp” | The Gateway Pundit

    July 7, 2025

    ‘American hero’ saves 165 lives in devastating Texas floods and more top headlines

    July 7, 2025

    Best Indoor TV Antenna (2025): Mohu, Clearstream, One for All

    July 7, 2025
    Categories
    • Books (626)
    • Business (5,534)
    • Events (7)
    • Film (5,472)
    • Lifestyle (3,580)
    • Music (5,541)
    • Politics (5,520)
    • Science (4,883)
    • Technology (5,464)
    • Television (5,138)
    • Uncategorized (6)
    • US News (5,519)
    popular posts

    Apple RealityOS Trademark Filed Ahead of WWDC 2022, Could Hint at Augmented Reality Glasses Reveal

    Photo credit: iDropNews | Parker OrtolaniParker Ortolani discovered a trademark for “RealityOS” that was submitted…

    Brian Eno surprise releases new album ‘Aurum’, exclusive to Apple Music

    March 21, 2025

    UK techno pioneer Lee Purkis, aka In Sync, dies aged 54

    March 23, 2023

    Elephants’ Giant Hot Testicles Might Be the Reason They Get Less Cancer

    July 5, 2023
    Archives
    Browse By Category
    • Books (626)
    • Business (5,534)
    • Events (7)
    • Film (5,472)
    • Lifestyle (3,580)
    • Music (5,541)
    • Politics (5,520)
    • Science (4,883)
    • Technology (5,464)
    • Television (5,138)
    • Uncategorized (6)
    • US News (5,519)
    About Us

    We are a creativity led international team with a digital soul. Our work is a custom built by the storytellers and strategists with a flair for exploiting the latest advancements in media and technology.

    Most of all, we stand behind our ideas and believe in creativity as the most powerful force in business.

    What makes us Different

    We care. We collaborate. We do great work. And we do it with a smile, because we’re pretty damn excited to do what we do. If you would like details on what else we can do visit out Contact page.

    Our Picks

    ‘American hero’ saves 165 lives in devastating Texas floods and more top headlines

    July 7, 2025

    Best Indoor TV Antenna (2025): Mohu, Clearstream, One for All

    July 7, 2025

    Meteorologists Say the National Weather Service Did Its Job in Texas

    July 7, 2025
    © 2025 New York Examiner News. All rights reserved. All articles, images, product names, logos, and brands are property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Terms & Conditions and Privacy Policy.

    Type above and press Enter to search. Press Esc to cancel.

    We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
    Cookie SettingsAccept All
    Manage consent

    Privacy Overview

    This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
    Necessary
    Always Enabled
    Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
    CookieDurationDescription
    cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
    cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
    cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
    cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
    cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
    viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
    Functional
    Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
    Performance
    Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
    Analytics
    Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
    Advertisement
    Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
    Others
    Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
    SAVE & ACCEPT