Introduction to Google’s Trial Documents
Google has appealed the ruling that says they need to give proprietary information to competitors. The latest document in the DOJ vs. Google trial reveals some interesting things about how Google’s search engine works.
Key Takeaways
- Google has been ordered to give information to competitors so as not to be an illegal monopoly. Google does not want to give its extensive user-side data away.
- Google’s data on page quality and freshness is proprietary. They don’t want to give it away.
- Pages that are indexed are marked up with annotations, including signals that identify spam pages.
- If spammers got hold of those spam signals, it would make stopping spam difficult.
- User data is important to Google’s Glue system that stores info on every query searched, what the user saw, and how they interacted with the search results.
- User data is important for training RankEmbed BERT – one of the deep learning systems behind Search.
Google’s Proprietary Page Quality and Freshness Signals
This really isn’t a surprise. Freshness signals are at the heart of Google’s proprietary secrets. Every page in Google’s index is marked up with annotations to help it understand the page. These include signals to identify spam and duplicate pages.
Pages Marked Up with Proprietary Page Understanding Annotations
Every page in Google’s index is marked up with annotations to help it understand the page. These include signals to identify spam and duplicate pages. Google argues that giving competitors a list of indexed URLs will enable them to “forgo crawling and analyzing the larger web, and to instead focus their efforts on crawling only the fraction of pages Google has included in its index.”
The Role of User Data in Google’s Ranking Systems
User data is used to train, build, and operate RankEmbed models. Google Glue is a huge table of user activity. It collects the text of the queries searched, the user’s language, location and device type, and information on what appeared on the SERP, what the user clicked on or hovered over, how long they stayed on a SERP, and more. RankEmbed BERT is one of the deep learning systems that underpins Search. RankEmbed BERT is used in reranking the results returned by traditional ranking systems. RankEmbed BERT is trained on click and query data from actual users.
User Interactions and Google’s Success
The AI systems behind search are continually learning to improve upon presenting searchers with satisfying results. Google looks at what they are clicking on and whether they return to the SERPs or not. Google also runs live experiments that look at what searchers choose to click on and stay on. Those actions help train RankEmbed BERT. The take-home point is that user satisfaction is by far the most important thing we should be optimizing for.
Conclusion
In conclusion, Google’s trial documents reveal a lot about how their search engine works, including the importance of user data and proprietary page quality and freshness signals. Google’s use of user data is crucial to their success, and they are not willing to share this information with their competitors. As Google continues to evolve and improve their search engine, it’s likely that user data will play an even bigger role in the future.

