Close Menu
New York Examiner News

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Inside a Restored A. Quincy Jones Home in Brentwood, California

    February 5, 2026

    Stories of Weather, Warnings and the Cost of Waiting

    February 5, 2026

    Sorry Share New Songs “Billy Elliot” and “Alone In Cologne”

    February 5, 2026
    Facebook X (Twitter) Instagram
    New York Examiner News
    • Home
    • US News
    • Politics
    • Business
    • Science
    • Technology
    • Lifestyle
    • Music
    • Television
    • Film
    • Books
    • Contact
      • About
      • Amazon Disclaimer
      • DMCA / Copyrights Disclaimer
      • Terms and Conditions
      • Privacy Policy
    New York Examiner News
    Home»Technology»Did xAI lie about Grok 3’s benchmarks?
    Technology

    Did xAI lie about Grok 3’s benchmarks?

    By AdminFebruary 23, 2025
    Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    Did xAI lie about Grok 3’s benchmarks?


    Debates over AI benchmarks — and how they’re reported by AI labs — are spilling out into public view.

    This week, an OpenAI employee accused Elon Musk’s AI company, xAI, of publishing misleading benchmark results for its latest AI model, Grok 3. One of the co-founders of xAI, Igor Babushkin, insisted that the company was in the right.

    The truth lies somewhere in between.

    In a post on xAI’s blog, the company published a graph showing Grok 3’s performance on AIME 2025, a collection of challenging math questions from a recent invitational mathematics exam. Some experts have questioned AIME’s validity as an AI benchmark. Nevertheless, AIME 2025 and older versions of the test are commonly used to probe a model’s math ability.

    xAI’s graph showed two variants of Grok 3, Grok 3 Reasoning Beta and Grok 3 mini Reasoning, beating OpenAI’s best-performing available model, o3-mini-high, on AIME 2025. But OpenAI employees on X were quick to point out that xAI’s graph didn’t include o3-mini-high’s AIME 2025 score at “cons@64.”

    What is cons@64, you might ask? Well, it’s short for “consensus@64,” and it basically gives a model 64 tries to answer each problem in a benchmark and takes the answers generated most frequently as the final answers. As you can imagine, cons@64 tends to boost models’ benchmark scores quite a bit, and omitting it from a graph might make it appear as though one model surpasses another when in reality, that’s isn’t the case.

    Grok 3 Reasoning Beta and Grok 3 mini Reasoning’s scores for AIME 2025 at “@1” — meaning the first score the models got on the benchmark — fall below o3-mini-high’s score. Grok 3 Reasoning Beta also trails ever-so-slightly behind OpenAI’s o1 model set to “medium” computing. Yet xAI is advertising Grok 3 as the “world’s smartest AI.”

    Babushkin argued on X that OpenAI has published similarly misleading benchmark charts in the past — albeit charts comparing the performance of its own models. A more neutral party in the debate put together a more “accurate” graph showing nearly every model’s performance at cons@64:

    Hilarious how some people see my plot as attack on OpenAI and others as attack on Grok while in reality it’s DeepSeek propaganda
    (I actually believe Grok looks good there, and openAI’s TTC chicanery behind o3-mini-*high*-pass@”””1″”” deserves more scrutiny.) https://t.co/dJqlJpcJh8 pic.twitter.com/3WH8FOUfic

    — Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) (@teortaxesTex) February 20, 2025

    But as AI researcher Nathan Lambert pointed out in a post, perhaps the most important metric remains a mystery: the computational (and monetary) cost it took for each model to achieve its best score. That just goes to show how little most AI benchmarks communicate about models’ limitations — and their strengths.





    Original Source Link

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    Previous ArticleMicrosoft’s New Majorana 1 Processor Could Transform Quantum Computing
    Next Article Savannah Chrisley ‘lost some deals’ due to Trump support, conservative beliefs

    RELATED POSTS

    Sam Altman got exceptionally testy over Claude Super Bowl ads

    February 5, 2026

    A New AI Math Startup Just Cracked 4 Previously Unsolved Problems

    February 4, 2026

    Epstein-linked longevity guru Peter Attia leaves David Protein, and his own startup ‘won’t comment’

    February 4, 2026

    Upgrade Your Roku Before the Big Game

    February 3, 2026

    Fintech CEO and Forbes 30 Under 30 alum has been charged for alleged fraud

    February 3, 2026

    Dyson Deals: WIRED’s Top Pick Pet Vacuum and Purifier Heater

    February 2, 2026
    latest posts

    Inside a Restored A. Quincy Jones Home in Brentwood, California

    We may receive a portion of sales if you purchase a product through a link…

    Stories of Weather, Warnings and the Cost of Waiting

    February 5, 2026

    Sorry Share New Songs “Billy Elliot” and “Alone In Cologne”

    February 5, 2026

    Nevada legislator to push for independent audit of altered record in OSHA Boring Co. inspection 

    February 5, 2026

    Trump Goes Into Hiding As His Approval Rating Crashes

    February 5, 2026

    NYPD officer shoots mentally ill man with knife in Queens apartment

    February 5, 2026

    Sam Altman got exceptionally testy over Claude Super Bowl ads

    February 5, 2026
    Categories
    • Books (1,045)
    • Business (5,948)
    • Events (30)
    • Film (5,885)
    • Lifestyle (3,996)
    • Music (5,987)
    • Politics (5,950)
    • Science (5,300)
    • Technology (5,879)
    • Television (5,563)
    • Uncategorized (6)
    • US News (5,937)
    popular posts

    Biden Needs To Take The Blame For Inflation

    By Adam Brandon for RealClearPolitics Last week, President Biden gave a speech listing everyone and…

    House Republicans Demand Communication Records Between J6 ‘Star Witness’ Cassidy Hutchinson and Fani Willis’ Office | The Gateway Pundit

    June 9, 2024

    Shanghai Film Festival Dates for First Edition Since COVID Pandemic – The Hollywood Reporter

    April 4, 2023

    Porsche Family to Acquire More Than 25% of the Car Maker as IPO Nears

    September 3, 2022
    Archives
    Browse By Category
    • Books (1,045)
    • Business (5,948)
    • Events (30)
    • Film (5,885)
    • Lifestyle (3,996)
    • Music (5,987)
    • Politics (5,950)
    • Science (5,300)
    • Technology (5,879)
    • Television (5,563)
    • Uncategorized (6)
    • US News (5,937)
    About Us

    We are a creativity led international team with a digital soul. Our work is a custom built by the storytellers and strategists with a flair for exploiting the latest advancements in media and technology.

    Most of all, we stand behind our ideas and believe in creativity as the most powerful force in business.

    What makes us Different

    We care. We collaborate. We do great work. And we do it with a smile, because we’re pretty damn excited to do what we do. If you would like details on what else we can do visit out Contact page.

    Our Picks

    NYPD officer shoots mentally ill man with knife in Queens apartment

    February 5, 2026

    Sam Altman got exceptionally testy over Claude Super Bowl ads

    February 5, 2026

    Nasal spray could prevent infections from any flu strain

    February 5, 2026
    © 2026 New York Examiner News. All rights reserved. All articles, images, product names, logos, and brands are property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Terms & Conditions and Privacy Policy.

    Type above and press Enter to search. Press Esc to cancel.

    We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
    Cookie SettingsAccept All
    Manage consent

    Privacy Overview

    This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
    Necessary
    Always Enabled
    Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
    CookieDurationDescription
    cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
    cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
    cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
    cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
    cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
    viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
    Functional
    Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
    Performance
    Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
    Analytics
    Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
    Advertisement
    Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
    Others
    Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
    SAVE & ACCEPT