Introduction to Centerpiece Content
Google’s Gary Illyes discussed the concept of "centerpiece content" at the recent Google Search Central Deep Dive event in Asia. According to Illyes, Google goes to great lengths to identify the main content of a web page, which is crucial for ranking and retrieval. The phrase "main content" is familiar to those who have read Google’s Search Quality Rater Guidelines, which define main content as any part of the page that directly helps the page achieve its purpose.
What is Centerpiece Content?
Centerpiece content, also known as main content, includes text, images, videos, page features, and user-generated content. It is the content that has the greatest weight in ranking and retrieval, and it is located in the main body of the page, rather than in the header, footer, or navigation areas. Illyes emphasized that words and phrases located in the main content area carry significantly more weight than those in other areas of the page.
How Google Identifies Main Content
Google analyzes the rendered web page to locate the content and assign an importance score to the words on the page. This is not about identifying the position of keywords, but rather about identifying the content within a web page. Illyes noted that moving a term from a low-importance area to the main content area will directly increase its weight and potential to rank. Using semantic HTML can help Google identify the main content and less important areas, making web pages less ambiguous.
Tokenization and Indexing
Google uses tokenization to convert words and phrases into a representation of them for indexing. Tokenization is the foundation of Google’s index, and it enables semantic understanding of queries and content. This is important for publishers and SEOs to focus on writing about topics from the point of view of how they are helpful to users, rather than just focusing on keywords.
Soft 404s: A Critical Error
Soft 404s are pages that should return a 404 response but instead return a 200 OK response. This can happen when an SEO or publisher redirects a missing web page to the home page or an error page. Illyes emphasized that soft 404s are a critical error that can negatively impact crawl budget and provide a poor user experience. Google actively identifies and de-prioritizes these pages, and Illyes shared that even Google’s own documentation page about soft 404s was flagged as a soft 404 by its own systems and couldn’t be indexed.
Takeaways
The key takeaways from Illyes’ discussion are:
- Main content is prioritized by Google for ranking and retrieval
- Using semantic HTML can help Google identify main content
- Tokenization enables semantic understanding of queries and content
- Soft 404s are a critical error that can negatively impact crawl budget and user experience
Conclusion
In conclusion, understanding centerpiece content and how Google identifies it is crucial for publishers and SEOs. By prioritizing main content, using semantic HTML, and avoiding soft 404s, websites can improve their ranking and retrieval, and provide a better user experience. As Google continues to evolve and improve its algorithms, it is essential to stay up-to-date with the latest best practices and guidelines to ensure optimal website performance.