Introduction to Large Language Models
Large language models, such as OpenAI’s GPT, Google’s Gemini, and Perplexity, are changing the way we interact with information online. These models can generate human-like text and answer complex questions, but they cite sources differently than traditional search engines like Google.
How Large Language Models Cite Sources
A recent study by Search Atlas, an SEO software company, compared citations from these three models against Google search results. The analysis of 18,377 matched queries found a significant gap between traditional search visibility and AI platform citations. The study revealed that Perplexity, which performs live web retrieval, has a higher overlap with Google search results compared to ChatGPT and Gemini.
Perplexity: The Closest to Traditional Search
Perplexity showed a median domain overlap of around 25-30% with Google results, and a median URL overlap of close to 20%. This means that Perplexity shares a significant number of domains and URLs with Google search results. In total, Perplexity shared 18,549 domains with Google, representing about 43% of the domains it cited.
ChatGPT and Gemini: More Selective Citation
ChatGPT and Gemini, on the other hand, are more selective in their citations. ChatGPT showed a median domain overlap of around 10-15% with Google results, and its URL matches typically remained below 10%. Gemini’s behavior was less consistent, with some responses having almost no overlap with search results, while others lined up more closely. Overall, Gemini shared just 160 domains with Google, representing about 4% of the domains that appeared in Google’s results.
Implications for Visibility
The study’s findings have significant implications for online visibility. Ranking in Google does not guarantee citations in large language models. Perplexity’s architecture, which actively searches the web, makes it more likely to cite sources that already rank well in Google. However, ChatGPT and Gemini rely more on pre-trained knowledge and selective retrieval, making them less tied to current rankings.
Study Limitations
The study had some limitations, including a dataset that heavily favored Perplexity, which accounted for 89% of matched queries. The researchers also used semantic similarity scoring to match queries, which may not perfectly reflect real-world user searches. Additionally, the two-month window of the study provides only a recent snapshot, and longer timeframes would be needed to see whether the same overlap patterns hold over time.
Looking Ahead
For retrieval-based systems like Perplexity, traditional SEO signals and overall domain strength are likely to matter more for visibility. However, for reasoning-focused models like ChatGPT and Gemini, these signals may have less direct influence on which sources appear in answers. As large language models continue to evolve, it’s essential to understand how they cite sources and how this affects online visibility.
Conclusion
In conclusion, large language models cite sources differently than traditional search engines, and understanding these differences is crucial for online visibility. While Perplexity’s citations are more closely tied to Google search results, ChatGPT and Gemini are more selective in their citations. As the online landscape continues to shift, it’s essential to stay informed about the latest developments in large language models and their implications for online visibility.

