Close Menu
New York Examiner News

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Muna Announce 2026 Tour | Pitchfork

    May 8, 2026

    You’re probably safe from the Hantavirus outbreak, but here’s what you absolutely must not do

    May 8, 2026

    Trump Billion Dollar Ballroom Is Sinking Fast

    May 8, 2026
    Facebook X (Twitter) Instagram
    New York Examiner News
    • Home
    • US News
    • Politics
    • Business
    • Science
    • Technology
    • Lifestyle
    • Music
    • Television
    • Film
    • Books
    • Contact
      • About
      • Amazon Disclaimer
      • DMCA / Copyrights Disclaimer
      • Terms and Conditions
      • Privacy Policy
    New York Examiner News
    Home»Science»OpenAI’s o3 model aced a test of AI reasoning – but it’s still not AGI
    Science

    OpenAI’s o3 model aced a test of AI reasoning – but it’s still not AGI

    By AdminDecember 21, 2024
    Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    OpenAI’s o3 model aced a test of AI reasoning – but it’s still not AGI


    OpenAI’s o3 model aced a test of AI reasoning – but it’s still not AGI

    OpenAI announced a breakthrough achievement for its new o3 AI model

    Rokas Tenys / Alamy

    OpenAI’s new o3 artificial intelligence model has achieved a breakthrough high score on a prestigious AI reasoning test called the ARC Challenge, inspiring some AI fans to speculate that o3 has achieved artificial general intelligence (AGI). But even as ARC Challenge organisers described o3’s achievement as a major milestone, they also cautioned that it has not won the competition’s grand prize – and it is only one step on the path towards AGI, a term for hypothetical future AI with human-like intelligence.

    The o3 model is the latest in a line of AI releases that follow on from the large language models powering ChatGPT. “This is a surprising and important step-function increase in AI capabilities, showing novel task adaptation ability never seen before in the GPT-family models,” said François Chollet, an engineer at Google and the main creator of the ARC Challenge, in a blog post.

    What did OpenAI’s o3 model actually do?

    Chollet designed the Abstraction and Reasoning Corpus (ARC) Challenge in 2019 to test how well AIs can find correct patterns linking pairs of coloured grids. Such visual puzzles are intended to make AIs demonstrate a form of general intelligence with basic reasoning capabilities. But throwing enough computing power at the puzzles could let even a non-reasoning program simply solve them through brute force. To prevent this, the competition also requires official score submissions to meet certain limits on computing power.

    OpenAI’s newly announced o3 model – which is scheduled for release in early 2025 – achieved its official breakthrough score of 75.7 per cent on the ARC Challenge’s “semi-private” test, which is used for ranking competitors on a public leaderboard. The computing cost of its achievement was approximately $20 for each visual puzzle task, meeting the competition’s limit of less than $10,000 total. However, the harder “private” test that is used to determine grand prize winners has an even more stringent computing power limit, equivalent to spending just 10 cents on each task, which OpenAI did not meet.

    The o3 model also achieved an unofficial score of 87.5 per cent by applying approximately 172 times more computing power than it did on the official score. For comparison, the typical human score is 84 per cent, and an 85 per cent score is enough to win the ARC Challenge’s $600,000 grand prize – if the model can also keep its computing costs within the required limits.

    But to reach its unofficial score, o3’s cost soared to thousands of dollars spent solving each task. OpenAI requested that the challenge organisers not publish the exact computing costs.

    Does this o3 achievement show that AGI has been reached?

    No, the ARC challenge organisers have specifically said they do not consider beating this competition benchmark to be an indicator of having achieved AGI.

    The o3 model also failed to solve more than 100 visual puzzle tasks, even when OpenAI applied a very large amount of computing power toward the unofficial score, said Mike Knoop, an ARC Challenge organiser at software company Zapier, in a social media post on X.

    In a social media post on Bluesky, Melanie Mitchell at the Santa Fe Institute in New Mexico said the following about o3’s progress on the ARC benchmark: “I think solving these tasks by brute-force compute defeats the original purpose”.

    “While the new model is very impressive and represents a big milestone on the way towards AGI, I don’t believe this is AGI – there’s still a fair number of very easy [ARC Challenge] tasks that o3 can’t solve,” said Chollet in another X post.

    However, Chollet described how we might know when human-level intelligence has been demonstrated by some form of AGI. “You’ll know AGI is here when the exercise of creating tasks that are easy for regular humans but hard for AI becomes simply impossible,” he said in the blog post.

    Thomas Dietterich at Oregon State University suggests another way to recognise AGI. “Those architectures claim to include all of the functional components required for human cognition,” he says. “By this measure, the commercial AI systems are missing episodic memory, planning, logical reasoning and, most importantly, meta-cognition.”

    So what does o3’s high score really mean?

    The o3 model’s high score comes as the tech industry and AI researchers have been reckoning with a slower pace of progress in the latest AI models for 2024, compared with the initial explosive developments of 2023.

    Although it did not win the ARC Challenge, o3’s high score indicates that AI models could beat the competition benchmark in the near future. Beyond its unofficial high score, Chollet says many official low-compute submissions have already scored above 81 per cent on the private evaluation test set.

    Dietterich also thinks that “this is a very impressive leap in performance”. However, he cautions that, without knowing more about how OpenAI’s o1 and o3 models work, it is impossible to evaluate just how impressive the high score is. For instance, if o3 was able to practise the ARC problems in advance, then that would make its achievement easier. “We will need to await an open-source replication to understand the full significance of this,” says Dietterich.

    The ARC Challenge organisers are already looking to launch a second and more difficult set of benchmark tests sometime in 2025. They will also keep the ARC Prize 2025 challenge running until someone achieves the grand prize and open-sources their solution.

    Topics:

    • artificial intelligence/
    • AI



    Original Source Link

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    Previous ArticleWhat makes Bill Skarsgård so monstrous?
    Next Article Meet Skyseed, a VC fund and incubator backing the Bluesky and AT Protocol ecosystem

    RELATED POSTS

    The Pentagon Releases New Trove of Declassified UFO Files

    May 8, 2026

    Is Pluto a planet? That’s asking the wrong question

    May 8, 2026

    Hantavirus outbreak will not cause a covid-style pandemic, says WHO

    May 7, 2026

    Mexico City Is Sinking. A Powerful NASA Satellite Just Revealed How Fast

    May 7, 2026

    The hantavirus cruise ship outbreak is a dangerous experiment

    May 6, 2026

    Extinct relative of koalas discovered in Western Australia

    May 6, 2026
    latest posts

    Muna Announce 2026 Tour | Pitchfork

    Muna just released their new album, Dancing on the Wall, and now they’ve lined up…

    You’re probably safe from the Hantavirus outbreak, but here’s what you absolutely must not do

    May 8, 2026

    Trump Billion Dollar Ballroom Is Sinking Fast

    May 8, 2026

    Protest at Park East Synagogue sparks criticism of NYC Mayor Mamdani

    May 8, 2026

    Top Megelin Deals for Laser and LED Therapy Devices (2026)

    May 8, 2026

    The Pentagon Releases New Trove of Declassified UFO Files

    May 8, 2026

    Renée Zellweger to Star as ‘A Woman in the Sun’ Opposite Sissy Spacek

    May 8, 2026
    Categories
    • Books (1,231)
    • Business (6,134)
    • Events (52)
    • Film (6,071)
    • Lifestyle (4,168)
    • Music (6,189)
    • Politics (6,134)
    • Science (5,489)
    • Technology (6,066)
    • Television (5,753)
    • Uncategorized (7)
    • US News (6,123)
    popular posts

    32 last minute Christmas gifts that will arrive just in time 2022

    Procrastinated a bit with your holiday gift shopping this year? Don’t panic: Thanks to unique…

    Michael Jackson’s Daughter Defends Showing Off Her Armpit Hair – ‘Get Over Yourselves’

    August 31, 2023

    6 Best MagSafe Power Banks for iPhones (2024): High Capacity, Slim, Kickstands

    April 8, 2024

    What the DNA of Ancient Humans Reveals About Pandemics

    June 23, 2022
    Archives
    Browse By Category
    • Books (1,231)
    • Business (6,134)
    • Events (52)
    • Film (6,071)
    • Lifestyle (4,168)
    • Music (6,189)
    • Politics (6,134)
    • Science (5,489)
    • Technology (6,066)
    • Television (5,753)
    • Uncategorized (7)
    • US News (6,123)
    About Us

    We are a creativity led international team with a digital soul. Our work is a custom built by the storytellers and strategists with a flair for exploiting the latest advancements in media and technology.

    Most of all, we stand behind our ideas and believe in creativity as the most powerful force in business.

    What makes us Different

    We care. We collaborate. We do great work. And we do it with a smile, because we’re pretty damn excited to do what we do. If you would like details on what else we can do visit out Contact page.

    Our Picks

    The Pentagon Releases New Trove of Declassified UFO Files

    May 8, 2026

    Renée Zellweger to Star as ‘A Woman in the Sun’ Opposite Sissy Spacek

    May 8, 2026

    ‘The Voice’ Confirms Surprising Season 30 Coach

    May 8, 2026
    © 2026 New York Examiner News. All rights reserved. All articles, images, product names, logos, and brands are property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Terms & Conditions and Privacy Policy.

    Type above and press Enter to search. Press Esc to cancel.

    We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
    Cookie SettingsAccept All
    Manage consent

    Privacy Overview

    This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
    Necessary
    Always Enabled
    Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
    CookieDurationDescription
    cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
    cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
    cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
    cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
    cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
    viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
    Functional
    Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
    Performance
    Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
    Analytics
    Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
    Advertisement
    Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
    Others
    Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
    SAVE & ACCEPT