Close Menu
New York Examiner News

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Slipknot’s Clown Addresses Status of Next Album

    January 17, 2026

    Elon Musk’s Boring Co. is studying a tunnel project to Tesla Gigafactory near Reno

    January 17, 2026

    Democrats Won’t Allow Trump To Rig The Midterm As They Take A Big Step Toward Redistricting In Virginia

    January 17, 2026
    Facebook X (Twitter) Instagram
    New York Examiner News
    • Home
    • US News
    • Politics
    • Business
    • Science
    • Technology
    • Lifestyle
    • Music
    • Television
    • Film
    • Books
    • Contact
      • About
      • Amazon Disclaimer
      • DMCA / Copyrights Disclaimer
      • Terms and Conditions
      • Privacy Policy
    New York Examiner News
    Home»Technology»Gemini’s data-analyzing abilities aren’t as good as Google claims
    Technology

    Gemini’s data-analyzing abilities aren’t as good as Google claims

    By June 30, 2024
    Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    Gemini’s data-analyzing abilities aren’t as good as Google claims


    One of the selling points of Google’s flagship generative AI models, Gemini 1.5 Pro and 1.5 Flash, is the amount of data they can supposedly process and analyze. In press briefings and demos, Google has repeatedly claimed that the models can accomplish previously impossible tasks thanks to their “long context,” like summarizing multiple hundred-page documents or searching across scenes in film footage.

    But new research suggests that the models aren’t, in fact, very good at those things.

    Two separate studies investigated how well Google’s Gemini models and others make sense out of an enormous amount of data — think “War and Peace”-length works. Both find that Gemini 1.5 Pro and 1.5 Flash struggle to answer questions about large datasets correctly; in one series of document-based tests, the models gave the right answer only 40% 50% of the time.

    “While models like Gemini 1.5 Pro can technically process long contexts, we have seen many cases indicating that the models don’t actually ‘understand’ the content,” Marzena Karpinska, a postdoc at UMass Amherst and a co-author on one of the studies, told TechCrunch.

    Gemini’s context window is lacking

    A model’s context, or context window, refers to input data (e.g., text) that the model considers before generating output (e.g., additional text). A simple question — “Who won the 2020 U.S. presidential election?” — can serve as context, as can a movie script, show or audio clip. And as context windows grow, so does the size of the documents being fit into them.

    The newest versions of Gemini can take in upward of 2 million tokens as context. (“Tokens” are subdivided bits of raw data, like the syllables “fan,” “tas” and “tic” in the word “fantastic.”) That’s equivalent to roughly 1.4 million words, two hours of video or 22 hours of audio — the largest context of any commercially available model.

    In a briefing earlier this year, Google showed several pre-recorded demos meant to illustrate the potential of Gemini’s long-context capabilities. One had Gemini 1.5 Pro search the transcript of the Apollo 11 moon landing telecast — around 402 pages — for quotes containing jokes, and then find a scene in the telecast that looked similar to a pencil sketch.

    VP of research at Google DeepMind Oriol Vinyals, who led the briefing, described the model as “magical.”

    “[1.5 Pro] performs these sorts of reasoning tasks across every single page, every single word,” he said.

    That might have been an exaggeration.

    In one of the aforementioned studies benchmarking these capabilities, Karpinska, along with researchers from the Allen Institute for AI and Princeton, asked the models to evaluate true/false statements about fiction books written in English. The researchers chose recent works so that the models couldn’t “cheat” by relying on foreknowledge, and they peppered the statements with references to specific details and plot points that’d be impossible to comprehend without reading the books in their entirety.

    Given a statement like “By using her skills as an Apoth, Nusis is able to reverse engineer the type of portal opened by the reagents key found in Rona’s wooden chest,” Gemini 1.5 Pro and 1.5 Flash — having ingested the relevant book — had to say whether the statement was true or false and explain their reasoning.

    Gemini’s data-analyzing abilities aren’t as good as Google claims
    Image Credits: UMass Amherst

    Tested on one book around 260,000 words (~520 pages) in length, the researchers found that 1.5 Pro answered the true/false statements correctly 46.7% of the time while Flash answered correctly only 20% of the time. That means a coin is significantly better at answering questions about the book than Google’s latest machine learning model. Averaging all the benchmark results, neither model managed to achieve higher than random chance in terms of question-answering accuracy.

    “We’ve noticed that the models have more difficulty verifying claims that require considering larger portions of the book, or even the entire book, compared to claims that can be solved by retrieving sentence-level evidence,” Karpinska said. “Qualitatively, we also observed that the models struggle with verifying claims about implicit information that is clear to a human reader but not explicitly stated in the text.”

    The second of the two studies, co-authored by researchers at UC Santa Barbara, tested the ability of Gemini 1.5 Flash (but not 1.5 Pro) to “reason over” videos — that is, search through and answer questions about the content in them.

    The co-authors created a dataset of images (e.g., a photo of a birthday cake) paired with questions for the model to answer about the objects depicted in the images (e.g., “What cartoon character is on this cake?”). To evaluate the models, they picked one of the images at random and inserted “distractor” images before and after it to create slideshow-like footage.

    Flash didn’t perform all that well. In a test that had the model transcribe six handwritten digits from a “slideshow” of 25 images, Flash got around 50% of the transcriptions right. The accuracy dropped to around 30% with eight digits.

    “On real question-answering tasks over images, it appears to be particularly hard for all the models we tested,” Michael Saxon, a PhD student at UC Santa Barbara and one of the study’s co-authors, told TechCrunch. “That small amount of reasoning — recognizing that a number is in a frame and reading it — might be what is breaking the model.”

    Google is overpromising with Gemini

    Neither of the studies have been peer-reviewed, nor do they probe the releases of Gemini 1.5 Pro and 1.5 Flash with 2-million-token contexts. (Both tested the 1-million-token context releases.) And Flash isn’t meant to be as capable as Pro in terms of performance; Google advertises it as a low-cost alternative.

    Nevertheless, both add fuel to the fire that Google’s been overpromising — and under-delivering — with Gemini from the beginning. None of the models the researchers tested, including OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, performed well. But Google’s the only model provider that’s given context window top billing in its advertisements.

    “There’s nothing wrong with the simple claim, ‘Our model can take X number of tokens’ based on the objective technical details,” Saxon said. “But the question is, what useful thing can you do with it?”

    Generative AI broadly speaking is coming under increased scrutiny as businesses (and investors) grow frustrated with the technology’s limitations.

    In a pair of recent surveys from Boston Consulting Group, about half of the respondents — all C-suite executives — said that they don’t expect generative AI to bring about substantial productivity gains and that they’re worried about the potential for mistakes and data compromises arising from generative AI-powered tools. PitchBook recently reported that, for two consecutive quarters, generative AI dealmaking at the earliest stages has declined, plummeting 76% from its Q3 2023 peak.

    Faced with meeting-summarizing chatbots that conjure up fictional details about people and AI search platforms that basically amount to plagiarism generators, customers are on the hunt for promising differentiators. Google — which has raced, at times clumsily, to catch up to its generative AI rivals — was desperate to make Gemini’s context one of those differentiators.

    But the bet was premature, it seems.

    “We haven’t settled on a way to really show that ‘reasoning’ or ‘understanding’ over long documents is taking place, and basically every group releasing these models is cobbling together their own ad hoc evals to make these claims,” Karpinska said. “Without the knowledge of how long context processing is implemented — and companies do not share these details — it is hard to say how realistic these claims are.”

    Google didn’t respond to a request for comment.

    Both Saxon and Karpinska believe the antidotes to hyped-up claims around generative AI are better benchmarks and, along the same vein, greater emphasis on third-party critique. Saxon notes that one of the more common tests for long context (liberally cited by Google in its marketing materials), “needle in the haystack,” only measures a model’s ability to retrieve particular info, like names and numbers, from datasets — not answer complex questions about that info.

    “All scientists and most engineers using these models are essentially in agreement that our existing benchmark culture is broken,” Saxon said, “so it’s important that the public understands to take these giant reports containing numbers like ‘general intelligence across benchmarks’ with a massive grain of salt.”



    Original Source Link

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    Previous ArticleShould weight loss drugs be used to treat obesity in children?
    Next Article Céline Dion makes surprise appearance at 2024 NHL Draft amid battle with stiff person syndrome

    RELATED POSTS

    AI cloud startup Runpod hits $120M in ARR — and it started with a Reddit post  

    January 17, 2026

    Ads Are Coming to ChatGPT. Here’s How They’ll Work

    January 16, 2026

    Silicon Valley’s messiest breakup is definitely headed to court

    January 16, 2026

    Why ICE Can Kill With Impunity

    January 15, 2026

    Mira Murati’s startup, Thinking Machines Lab, is losing two of its co-founders to OpenAI

    January 15, 2026

    AI’s Hacking Skills Are Approaching an ‘Inflection Point’

    January 14, 2026
    latest posts

    Slipknot’s Clown Addresses Status of Next Album

    It’s been a little while since Slipknot released their last original album and percussionist and…

    Elon Musk’s Boring Co. is studying a tunnel project to Tesla Gigafactory near Reno

    January 17, 2026

    Democrats Won’t Allow Trump To Rig The Midterm As They Take A Big Step Toward Redistricting In Virginia

    January 17, 2026

    Minnesota judge bars federal officers from tear gas on peaceful protesters

    January 17, 2026

    AI cloud startup Runpod hits $120M in ARR — and it started with a Reddit post  

    January 17, 2026

    RFK, Jr., shifts focus to questioning whether cell phones are safe. Here’s what the science says

    January 17, 2026

    Next ‘Paranormal Activity’ Movie Lands Summer 2027 Date

    January 17, 2026
    Categories
    • Books (1,006)
    • Business (5,911)
    • Events (29)
    • Film (5,847)
    • Lifestyle (3,957)
    • Music (5,948)
    • Politics (5,912)
    • Science (5,262)
    • Technology (5,841)
    • Television (5,525)
    • Uncategorized (6)
    • US News (5,899)
    popular posts

    Were Metallica Hacked by Crypto Scammers?

    Were Metallica hacked by crypto scammers?While there has been no official word from the heavy…

    ‘The Good Doctor’ Recap: Season 6, Episode 8 — [Spoiler] Reveals Rape

    December 6, 2022

    Signature Nabs Irvine Welsh, XFM Docs for U.K., Ireland – The Hollywood Reporter

    May 18, 2022

    Jessa Duggar gushes over ‘cuddly little guy’ George

    December 19, 2024
    Archives
    Browse By Category
    • Books (1,006)
    • Business (5,911)
    • Events (29)
    • Film (5,847)
    • Lifestyle (3,957)
    • Music (5,948)
    • Politics (5,912)
    • Science (5,262)
    • Technology (5,841)
    • Television (5,525)
    • Uncategorized (6)
    • US News (5,899)
    About Us

    We are a creativity led international team with a digital soul. Our work is a custom built by the storytellers and strategists with a flair for exploiting the latest advancements in media and technology.

    Most of all, we stand behind our ideas and believe in creativity as the most powerful force in business.

    What makes us Different

    We care. We collaborate. We do great work. And we do it with a smile, because we’re pretty damn excited to do what we do. If you would like details on what else we can do visit out Contact page.

    Our Picks

    RFK, Jr., shifts focus to questioning whether cell phones are safe. Here’s what the science says

    January 17, 2026

    Next ‘Paranormal Activity’ Movie Lands Summer 2027 Date

    January 17, 2026

    ‘90 Day Fiance’ Big Ed Brown & Rose Vega Reconcile?

    January 17, 2026
    © 2026 New York Examiner News. All rights reserved. All articles, images, product names, logos, and brands are property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Terms & Conditions and Privacy Policy.

    Type above and press Enter to search. Press Esc to cancel.

    We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
    Cookie SettingsAccept All
    Manage consent

    Privacy Overview

    This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
    Necessary
    Always Enabled
    Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
    CookieDurationDescription
    cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
    cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
    cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
    cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
    cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
    viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
    Functional
    Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
    Performance
    Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
    Analytics
    Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
    Advertisement
    Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
    Others
    Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
    SAVE & ACCEPT