Introduction to Google’s AI Overviews
A sharp-eyed search marketer recently discovered the reason behind Google’s AI Overviews showing spammy web pages. This discovery was made possible by a passage in the recent Memorandum Opinion in the Google antitrust case. The passage offers a clue as to why this happened and speculates how it reflects Google’s move away from links as a prominent ranking factor.
Grounding Generative AI Answers
The passage occurs in a section about grounding answers with search data. Ordinarily, it’s fair to assume that links play a role in ranking the web pages that an AI model retrieves from a search query to an internal search engine. However, this is not the case at Google. Google has a separate algorithm that retrieves fewer web documents and does so at a faster rate. This algorithm is called FastSearch, which is based on RankEmbed signals – a set of search ranking signals. FastSearch generates abbreviated, ranked web results that a model can use to produce a grounded response.
How FastSearch Works
FastSearch delivers results more quickly than Search because it retrieves fewer documents, but the resulting quality is lower than Search’s fully ranked web results. Ryan Jones, the founder of SERPrecon, shared his insights on this matter, stating that this is interesting and confirms both what many of us thought and what we were seeing in early tests. He believes that for grounding, Google doesn’t use the same search algorithm, and they need it to be faster, but they also don’t care about as many signals. They just need text that backs up what they’re saying.
The Role of RankEmbed
The RankEmbed model is a deep-learning model that identifies patterns in massive datasets and can identify semantic meanings and relationships. It does not understand anything in the same way that a human does; it is essentially identifying patterns and correlations. The Memorandum explains that RankEmbed is one of Google’s top-level signals, which are inputs to producing the final score for a web page. RankEmbed uses "user-side" data, which includes search logs and scores generated by human raters.
User-Side Data
RankEmbed and its later iteration, RankEmbedBERT, are ranking models that rely on two main sources of data: search logs and scores generated by human raters. The RankEmbed model itself is an AI-based, deep-learning system that has strong natural-language understanding. This allows the model to more efficiently identify the best documents to retrieve, even if a query lacks certain terms. The data underlying RankEmbed models is a combination of click-and-query data and scoring of web pages by human raters.
A New Perspective On AI Search
Is it true that links do not play a role in selecting web pages for AI Overviews? Google’s FastSearch prioritizes speed, and Ryan Jones theorizes that it could mean Google uses multiple indexes, with one specific to FastSearch made up of sites that tend to get visits. This may be a reflection of the RankEmbed part of FastSearch, which is said to be a combination of "click-and-query data" and human rater data. Regarding human rater data, with billions or trillions of pages in an index, it would be impossible for raters to manually rate more than a tiny fraction. So, it follows that the human rater data is used to provide quality-labeled examples for training.
Conclusion
In conclusion, Google’s AI Overviews use a separate algorithm called FastSearch, which is based on RankEmbed signals. FastSearch prioritizes speed and retrieves fewer web documents, resulting in lower quality results. The RankEmbed model uses user-side data, including search logs and scores generated by human raters. This new perspective on AI search suggests that links may not play a role in selecting web pages for AI Overviews, and instead, Google may use multiple indexes, with one specific to FastSearch. This discovery provides insight into how Google’s AI Overviews work and how they prioritize speed over quality.