Close Menu
New York Examiner News

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Brent Smith Ranks Every Shinedown Album (Even the New One)

    April 17, 2026

    Trump says Iran war will end ‘pretty soon’ as uranium deal is in sight

    April 17, 2026

    Donald Trump Has Lost His Power To Gaslight America

    April 17, 2026
    Facebook X (Twitter) Instagram
    New York Examiner News
    • Home
    • US News
    • Politics
    • Business
    • Science
    • Technology
    • Lifestyle
    • Music
    • Television
    • Film
    • Books
    • Contact
      • About
      • Amazon Disclaimer
      • DMCA / Copyrights Disclaimer
      • Terms and Conditions
      • Privacy Policy
    New York Examiner News
    Home»Technology»Many safety evaluations for AI models have significant limitations
    Technology

    Many safety evaluations for AI models have significant limitations

    By August 5, 2024
    Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    Many safety evaluations for AI models have significant limitations


    Despite increasing demand for AI safety and accountability, today’s tests and benchmarks may fall short, according to a new report.

    Generative AI models — models that can analyze and output text, images, music, videos and so on — are coming under increased scrutiny for their tendency to make mistakes and generally behave unpredictably. Now, organizations from public sector agencies to big tech firms are proposing new benchmarks to test these models’ safety.

    Toward the end of last year, startup Scale AI formed a lab dedicated to evaluating how well models align with safety guidelines. This month, NIST and the U.K. AI Safety Institute released tools designed to assess model risk.

    But these model-probing tests and methods may be inadequate.

    The Ada Lovelace Institute (ALI), a U.K.-based nonprofit AI research organization, conducted a study that interviewed experts from academic labs, civil society, and who are producing vendors models, as well as audited recent research into AI safety evaluations. The co-authors found that while current evaluations can be useful, they’re non-exhaustive, can be gamed easily, and don’t necessarily give an indication of how models will behave in real-world scenarios.

    “Whether a smartphone, a prescription drug or a car, we expect the products we use to be safe and reliable; in these sectors, products are rigorously tested to ensure they are safe before they are deployed,” Elliot Jones, senior researcher at the ALI and co-author of the report, told TechCrunch. “Our research aimed to examine the limitations of current approaches to AI safety evaluation, assess how evaluations are currently being used and explore their use as a tool for policymakers and regulators.”

    Benchmarks and red teaming

    The study’s co-authors first surveyed academic literature to establish an overview of the harms and risks models pose today, and the state of existing AI model evaluations. They then interviewed 16 experts, including four employees at unnamed tech companies developing generative AI systems.

    The study found sharp disagreement within the AI industry on the best set of methods and taxonomy for evaluating models.

    Some evaluations only tested how models aligned with benchmarks in the lab, not how models might impact real-world users. Others drew on tests developed for research purposes, not evaluating production models — yet vendors insisted on using these in production.

    We’ve written about the problems with AI benchmarks before, and the study highlights all these problems and more.

    The experts quoted in the study noted that it’s tough to extrapolate a model’s performance from benchmark results and unclear whether benchmarks can even show that a model possesses a specific capability. For example, while a model may perform well on a state bar exam, that doesn’t mean it’ll be able to solve more open-ended legal challenges.

    The experts also pointed to the issue of data contamination, where benchmark results can overestimate a model’s performance if the model has been trained on the same data that it’s being tested on. Benchmarks, in many cases, are being chosen by organizations not because they’re the best tools for evaluation, but for the sake of convenience and ease of use, the experts said.

    “Benchmarks risk being manipulated by developers who may train models on the same data set that will be used to assess the model, equivalent to seeing the exam paper before the exam, or by strategically choosing which evaluations to use,” Mahi Hardalupas, researcher at the ALI and a study co-author, told TechCrunch. “It also matters which version of a model is being evaluated. Small changes can cause unpredictable changes in behaviour and may override built-in safety features.”

    The ALI study also found problems with “red-teaming,” the practice of tasking individuals or groups with “attacking” a model to identify vulnerabilities and flaws. A number of companies use red-teaming to evaluate models, including AI startups OpenAI and Anthropic, but there are few agreed-upon standards for red teaming, making it difficult to assess a given effort’s effectiveness.

    Experts told the study’s co-authors that it can be difficult to find people with the necessary skills and expertise to red-team, and that the manual nature of red teaming makes it costly and laborious — presenting barriers for smaller organizations without the necessary resources.

    Possible solutions

    Pressure to release models faster and a reluctance to conduct tests that could raise issues before a release are the main reasons AI evaluations haven’t gotten better.

    “A person we spoke with working for a company developing foundation models felt there was more pressure within companies to release models quickly, making it harder to push back and take conducting evaluations seriously,” Jones said. “Major AI labs are releasing models at a speed that outpaces their or society’s ability to ensure they are safe and reliable.”

    One interviewee in the ALI study called evaluating models for safety an “intractable” problem. So what hope does the industry — and those regulating it — have for solutions?

    Mahi Hardalupas, researcher at the ALI, believes that there’s a path forward, but that it’ll require more engagement from public-sector bodies.

    “Regulators and policymakers must clearly articulate what it is that they want from evaluations,” he said. “Simultaneously, the evaluation community must be transparent about the current limitations and potential of evaluations.”

    Hardalupas suggests that governments mandate more public participation in the development of evaluations and implement measures to support an “ecosystem” of third-party tests, including programs to ensure regular access to any required models and data sets.

    Jones thinks that it may be necessary to develop “context-specific” evaluations that go beyond simply testing how a model responds to a prompt, and instead look at the types of users a model might impact (e.g. people of a particular background, gender or ethnicity) and the ways in which attacks on models could defeat safeguards.

    “This will require investment in the underlying science of evaluations to develop more robust and repeatable evaluations that are based on an understanding of how an AI model operates,” she added.

    But there may never be a guarantee that a model’s safe.

    “As others have noted, ‘safety’ is not a property of models,” Hardalupas said. “Determining if a model is ‘safe’ requires understanding the contexts in which it is used, who it is sold or made accessible to, and whether the safeguards that are in place are adequate and robust to reduce those risks. Evaluations of a foundation model can serve an exploratory purpose to identify potential risks, but they cannot guarantee a model is safe, let alone ‘perfectly safe.’ Many of our interviewees agreed that evaluations cannot prove a model is safe and can only indicate a model is unsafe.”



    Original Source Link

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    Previous Article‘Gem’ of a Proof Breaks 80-Year-Old Record, Offers New Insights Into Prime Numbers
    Next Article Secret Service failure at Trump rally exposes culture rot, staffing woes

    RELATED POSTS

    New leaders, new fund: Sequoia has raised $7B to expand its AI bets

    April 17, 2026

    Dark Matter May Be Made of Black Holes From Another Universe

    April 16, 2026

    DeepL, known for text translation, now wants to translate your voice

    April 16, 2026

    NASA Wants to Put Nuclear Reactors on the Moon

    April 15, 2026

    Spotify launches the ability to purchase physical books in the US and UK

    April 15, 2026

    In the Wake of Anthropic’s Mythos, OpenAI Has a New Cybersecurity Model—and Strategy

    April 14, 2026
    latest posts

    Brent Smith Ranks Every Shinedown Album (Even the New One)

    With a new album underway, Brent Smith ranked every Shinedown album — even the new…

    Trump says Iran war will end ‘pretty soon’ as uranium deal is in sight

    April 17, 2026

    Donald Trump Has Lost His Power To Gaslight America

    April 17, 2026

    Trump nominates former deputy surgeon general Erica Schwartz for CDC director

    April 17, 2026

    New leaders, new fund: Sequoia has raised $7B to expand its AI bets

    April 17, 2026

    Former deputy surgeon general Erica Schwartz nominated as new CDC chief

    April 17, 2026

    Paramount Skydance Confirming New Star Trek Movie After 10-Year Hiatus Is A Familiar Refrain

    April 17, 2026
    Categories
    • Books (1,188)
    • Business (6,091)
    • Events (44)
    • Film (6,028)
    • Lifestyle (4,130)
    • Music (6,140)
    • Politics (6,090)
    • Science (5,445)
    • Technology (6,022)
    • Television (5,710)
    • Uncategorized (6)
    • US News (6,080)
    popular posts

    Astronomers Discover ‘Gold Standard’ Star in the Milky Way Galaxy, Composed of 65 Elements

    University of Michigan astronomers, led by Ian Roederer, have discovered the ‘gold standard’ star, also…

    Maduro capture mission’s biggest challenge revealed by Delta Force veteran

    January 4, 2026

    Rowdy Rebel Releases Debut Album Rebel vs. Rowdy: Listen

    July 15, 2022

    USC Scripter Awards: 2024 Winners Revealed

    March 3, 2024
    Archives
    Browse By Category
    • Books (1,188)
    • Business (6,091)
    • Events (44)
    • Film (6,028)
    • Lifestyle (4,130)
    • Music (6,140)
    • Politics (6,090)
    • Science (5,445)
    • Technology (6,022)
    • Television (5,710)
    • Uncategorized (6)
    • US News (6,080)
    About Us

    We are a creativity led international team with a digital soul. Our work is a custom built by the storytellers and strategists with a flair for exploiting the latest advancements in media and technology.

    Most of all, we stand behind our ideas and believe in creativity as the most powerful force in business.

    What makes us Different

    We care. We collaborate. We do great work. And we do it with a smile, because we’re pretty damn excited to do what we do. If you would like details on what else we can do visit out Contact page.

    Our Picks

    Former deputy surgeon general Erica Schwartz nominated as new CDC chief

    April 17, 2026

    Paramount Skydance Confirming New Star Trek Movie After 10-Year Hiatus Is A Familiar Refrain

    April 17, 2026

    Stars on Robby and Abbot’s Trauma Talk, Mohan’s Future, More (Exclusive)

    April 17, 2026
    © 2026 New York Examiner News. All rights reserved. All articles, images, product names, logos, and brands are property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Terms & Conditions and Privacy Policy.

    Type above and press Enter to search. Press Esc to cancel.

    We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
    Cookie SettingsAccept All
    Manage consent

    Privacy Overview

    This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
    Necessary
    Always Enabled
    Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
    CookieDurationDescription
    cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
    cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
    cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
    cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
    cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
    viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
    Functional
    Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
    Performance
    Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
    Analytics
    Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
    Advertisement
    Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
    Others
    Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
    SAVE & ACCEPT