Introduction to Google’s AI Content Policy
Google’s Gary Illyes has confirmed that AI-generated content is acceptable as long as it meets high-quality standards. In an exclusive interview with Kenichi Suzuki, Illyes clarified that Google’s policy is more focused on "human-curated" content rather than "human-created" content. This means that the quality and accuracy of the content are more important than how it was generated.
AI Models Used by Google
Illyes revealed that Google uses custom Gemini models for AI Overviews and AI Mode. These models are trained differently, but the exact details of their training are not publicly known. Illyes stated, "So as you noted, the model that we use for AIO (for AI Overviews) and for AI mode is a custom Gemini model and that might mean that it was trained differently."
Grounding and Indexes
Illyes explained that AI Overviews and AI Mode use Google Search for grounding, which involves connecting answers to a database or search index to ensure reliability and accuracy. He said, "As far as I know, Gemini, AI Overview and AI Mode all use Google search for grounding. So basically they issue multiple queries to Google Search and then Google Search returns results for that those particular queries."
Training Data and Google Extended Crawler
Illyes discussed the role of the Google Extended crawler in collecting training data for AI Overviews and AI Mode. He stated that when grounding happens, there is no AI involved, and the generation is affected by the Google Extended crawler. However, if a website disallows Google Extended, Gemini will not ground for that site.
AI Content and Search Index
Illyes addressed the issue of AI-generated content polluting LLMs (Large Language Models). He said that while this may not be a problem for the search index, it could be an issue for LLMs. Illyes stated, "I’m not worried about the search index, but model training definitely needs to figure out how to exclude content that was generated by AI."
Content Quality and AI-Generated Content
Illyes emphasized that content quality is a leading consideration for LLM training data, regardless of how it was generated. He said that factual accuracy and content similarity are important factors. Illyes stated, "Sure, but if you can maintain the quality of the content and the accuracy of the content and ensure that it’s of high quality, then technically it doesn’t really matter."
Human-Reviewed AI-Generated Content
Illyes highlighted the importance of human review for AI-generated content. He said that publishers should review and validate the accuracy of their content before publishing it. Illyes stated, "I don’t think that we are going to change our guidance any time soon about whether you need to review it or not. So basically when we say that it’s human, I think the word human created is wrong. Basically, it should be human curated."
Conclusion
In summary, Google’s policy on AI-generated content is focused on quality and accuracy rather than how the content was generated. As long as the content is factually accurate, original, and reviewed by humans, it is acceptable for search and model training. Publishers should apply editorial oversight to validate the accuracy of their content and ensure that it is not extremely similar to existing content. By following these guidelines, publishers can create high-quality AI-generated content that meets Google’s standards.