The Google search rollercoaster

In May last year (2024) Google Search began to leverage a customised version of Gemini to deliver AI-enhanced search results, or AI Overviews, a re-branded Search Generative Experience (SGE) that had been announced in their product blog a year earlier.   The sources from which the model drew on to generate its response to queries were varied and included social platforms, satirical news sites and – presumably – most of the rest of the internet.  Incorrect and potentially harmful search results to innocuous questions were downplayed by Google as “extremely rare queries” and not “representative of most people’s experiences”.  From advising the use of glue on pizza, to eating rocks, and identifying Barack Obama as America’s first Muslim president, it quickly became apparent that the model needed adjustments.

So, by mid-June Google had reduced AI in search results from 84% to under 15%, while running tests on the feature in a handful of other countries.  AI Overviews were then rolled out formally from mid-August to Brazil, India, Indonesia, Japan, Mexico and the UK.  Over the last few months, frequent users of Google will notice that the AI overlay has changed in size, scope and focus, and the same is true for the model itself, which appears to now draw on traditional ranking factors when selecting a source for the AI overview.  In practice this means that the AI portion of the result is likely to have come from one of the top ten search results (based on research conducted in Q3 2024).

Image of Asia and Australia by NASA is licensed under CC-CC0 1.0

Fine-tuning a product/service this complicated is always going to take time, but in its current state it can’t be relied on to deliver correct answers to search queries.  One of Tenon’s clients is a private equity backed digital engineering business, and as such, specific financial information is not freely available.  A simple search for the turnover returns a result, listed at the top of the AI overview, that is wildly incorrect, approximately 4.5x greater than the real number.  The result itself has been quoted from another website that appears as if it scrapes data from other sources, with no apparent human verification or analysis.  An AI answer based on AI content.. This is probably closer to error compounding than model collapse, but worrying nonetheless because many users may well assume results to be true.

A study by AWS researchers published in June last year suggests that more than 57% of the content available online is AI generated or translated content. With the proliferation of LLMs and AI tools available for free it’s hard to envisage this situation improving in the medium term – it’s easier than ever before to generate generic content without much effort, and all of it simply adds to the pool of unhelpful nonsense.

Our approach now, obeying part of Euripides’ quote, is to “question everything”, and until the functionality can be relied upon, we’re using Google the old-fashioned way: scrolling past the sponsored content and ferreting through the search results one page at a time until we’re satisfied.

If you’d like to discuss any of this, particularly talent in the data and AI space and how we might be able to help, please get in touch.

Leave a comment