AI and Global Knowledge Collapse
Highlighting Deep Hidden Impacts of AI on Global Knowledge
As GenAI becomes the primary way to find information, local and traditional wisdom is being lost. And we are only beginning to realise what we’re missing
This article was originally published as ‘Holes in the web’ on Aeon.co
In a world where AI increasingly mediates access to knowledge, future generations may lose connection with vast bodies of experience, insight and wisdom. AI developers could argue that this is simply a data problem, solvable by incorporating more diverse sources into training datasets. While that might be technically possible, the challenges of data sourcing, prioritisation and representation are far more complex than such a solution implies.....
Structures of Power
To understand how certain ways of knowing rise to global dominance, often at the expense of Indigenous knowledge, it helps to consider the idea of cultural hegemony developed by the Italian philosopher Antonio Gramsci.
Gramsci argued that power is maintained not solely through force or economic control, but also through the shaping of cultural norms and everyday beliefs. Over time, epistemological approaches rooted in western traditions have come to be seen as objective and universal. This has normalised western knowledge as the standard, obscuring the historical and political forces that enabled its rise. Institutions such as schools, scientific bodies and international development organisations have helped entrench this dominance.
In her book Decolonizing Methodologies (1999), the Māori scholar Linda Tuhiwai Smith emphasises that colonialism profoundly disrupted local knowledge systems – and the cultural and intellectual foundations on which they were built – by severing ties to land, language, history and social structures. Smith’s insights reveal how these processes are not confined to a single region but form part of a broader legacy that continues to shape how knowledge is produced and valued. It is on this distorted foundation that today’s digital and GenAI systems are built.
LLM Structures I recently worked with Microsoft Research, examining several GenAI deployments built for non-western populations. Observing how these AI models often miss cultural contexts, overlook local knowledge and frequently misalign with their target community has brought home to me just how much they encode existing biases and exclude marginalised knowledge.
The work has also brought me closer to understanding the technical reasons why such inequalities develop inside the models. The problem is far deeper than gaps in training data. By design, LLMs also tend to reproduce and reinforce the most statistically prevalent ideas, creating a feedback loop that narrows the scope of accessible human knowledge.
Why so? The internal representation of knowledge in an LLM is not uniform. Concepts that appear more frequently, more prominently or across a wider range of contexts in the training data tend to be more strongly encoded. For example, if pizza is commonly mentioned as a favourite food across a broad set of training texts, when asked “what’s your favourite food?”, the model is more likely to respond with “pizza” because that association is more statistically prominent.
More subtly, the model’s output distribution does not directly reflect the frequency of ideas in the training data. Instead, LLMs often amplify dominant patterns or ideas in a way that distorts their original proportions. This phenomenon can be referred to as “mode amplification”.
This uneven encoding gets further skewed through reinforcement learning from human feedback (RLHF), where GenAI models are fine-tuned based on human preferences. This inevitably embeds the values and worldviews of their creators into the models themselves. Ask ChatGPT about a controversial topic and you’ll get a diplomatic response that sounds like it was crafted by a panel of lawyers and HR professionals who are overly eager to please you. Ask Grok, X’s AI chatbot, the same question and you might get a sarcastic quip followed by a politically charged take that would fit right in at a certain tech billionaire’s dinner party.
Commercial pressures add another layer entirely. The most lucrative users – English-speaking professionals willing to pay $20-200 monthly for premium AI subscriptions – become the implicit template for “superintelligence”. These models excel at generating quarterly reports, coding in Silicon Valley’s preferred languages and crafting emails that sound appropriately deferential to western corporate hierarchies. Meanwhile, they stumble over cultural contexts that don’t translate to quarterly earnings.
It should not come as a surprise that a growing body of studies shows how LLMs predominantly reflect western cultural values and epistemologies. They overrepresent certain dominant groups in their outputs, reinforce and amplify the biases held by these groups, and are more factually accurate on topics associated with North America and Europe. Even in domains such as travel recommendations or storytelling, LLMs tend to generate richer and more detailed content for wealthier countries compared with poorer ones.
And beyond merely reflecting existing knowledge hierarchies, GenAI has the capacity to amplify them, as human behaviour changes alongside it. The integration of AI overviews in search engines, along with the growing popularity of AI-powered search engines such as Perplexity, underscores this shift.
As AI-generated content has started to fill the internet, it adds another layer of amplification to ideas that are already popular online. The internet, as the primary source of knowledge for AI models, becomes recursively influenced by the very outputs those models generate. With each training cycle, new models increasingly rely on AI-generated content. This risks creating a feedback loop where dominant ideas are continuously amplified while long-tail or niche knowledge fades from view.
Knowledge Collapse
The AI researcher Andrew Peterson describes this phenomenon as “knowledge collapse”: a gradual narrowing of the information humans can access, along with a declining awareness of alternative or obscure viewpoints. As LLMs are trained on data shaped by previous AI outputs, underrepresented knowledge can become less visible – not because it lacks merit, but because it is less frequently retrieved or cited. Peterson also warns of the “streetlight effect”, named after the joke where a person searches for lost keys under a streetlight at night because that’s where the light is brightest. In the context of AI, this would be people searching where it’s easiest rather than where it’s most meaningful. Over time, this would result in a degenerative narrowing of the public knowledge base.
Across the globe, GenAI is also becoming part of formal education, used to generate learning content and support self-paced education through AI tutors. For example, the state government of Karnataka, home to the city of Bengaluru, has partnered with the US-based nonprofit Khan Academy to deploy Khanmigo, an AI-powered learning assistant, in schools and colleges. I would be surprised if Khanmigo holds the insights of elder Neerugantis – grounded in local knowledge and practices – needed to teach school students in Karnataka how to care for their water ecologies.
All this means that, in a world where AI increasingly mediates access to knowledge, future generations may lose connection with vast bodies of experience, insight and wisdom. AI developers could argue that this is simply a data problem, solvable by incorporating more diverse sources into training datasets. While that might be technically possible, the challenges of data sourcing, prioritisation and representation are far more complex than such a solution implies.
The Loss of Indigenous Knowledge
Epistemologies are not just abstract and cognitive. They are all around us, with a direct impact on our bodies and lived experiences. To understand how, let’s consider an example that contrasts sharply with the kind of Indigenous construction practices that Dharan seeks to revive: high-rise buildings with glass facades in the tropics.
Experts have realised that the key to saving Bengaluru from its water crisis lies in bringing these lake systems back to life. A social worker I spoke with, who’s been involved in several of these projects, said they often turn to elders from the Neeruganti community for advice. Their insights are valuable, but their local knowledge is not written down, and their role as community water managers has long been delegitimised. Knowledge exists only in their native language, passed on orally, and is mostly absent from digital spaces – let alone AI systems.
Education / Farming
This was brought into focus by a conversation I had with a senior leader involved in the development of an AI chatbot which serves more than 8 million farmers across Asia and Africa. The system provides agricultural advice based mostly on databases from government advisories and international development organisations, which tend to rely on research literature. The leader acknowledged how many local practices that could be effective are still excluded from the chat responses, because they are not documented in the research literature.
The rationale isn’t that research-backed advice is always right or risk-free. It’s that it offers a defensible position if something goes wrong. In a system this large, leaning on recognised sources is seen as the safer bet, protecting an organisation from liability while sidelining knowledge that hasn’t been vetted through institutional channels. So the decision is more than just technical. It’s a compromise shaped by the structural context, not based on what is most useful or true.
This structural context doesn’t just shape institutional choices. It also shapes the kinds of challenges I heard about in my conversation with Perumal Vivekanandan, founder of the nonprofit organisation Sustainable-agriculture and Environmental Voluntary Action (Seva). His experiences highlight the uphill battle faced by those working to legitimise Indigenous knowledge.
Formed in 1992, Seva focuses on preserving and disseminating Indigenous knowledge in agriculture, animal husbandry and the conservation of agricultural biodiversity in India. Over the years, Vivekanandan has documented more than 8,600 local practices and adaptations, travelling village to village.
Still, the work constantly runs into systemic roadblocks. Potential funders often withhold support, questioning the scientific legitimacy of the knowledge Seva seeks to promote. When Seva turns to universities and research institutions to help validate this knowledge, they often signal a lack of incentives to engage. Some even suggest that Seva should fund the validation studies itself. This creates a catch-22: without validation, Seva struggles to gain support; but without support, it can’t afford validation. The process reveals a deeper challenge: finding ways to validate Indigenous knowledge within systems that have historically undervalued it.
Seva’s story shows that while GenAI may be accelerating the erasure of local knowledge, it is not the root cause. The marginalisation of local and Indigenous knowledge has long been driven by entrenched power structures. GenAI simply puts this process on steroids.
We often frame the loss of Indigenous knowledge as a tragedy only for the local communities who hold it. But ultimately, the loss is not just theirs to bear, but belongs to the world at large.
The disappearance of local knowledge is not a trivial loss. It is a disruption to the larger web of understanding that sustains both human and ecological wellbeing. Just as biological species have evolved to thrive in specific local environments, human knowledge systems are adapted to the particularities of place. When these systems are disrupted, the consequences can ripple far beyond their point of origin.
Wildfire smoke doesn’t care about transgressing postcodes. Polluted water doesn’t pause at state lines. Rising temperatures ignore national borders. Infectious germs don’t have visa waiting periods. Whether we acknowledge it or not, we are enmeshed in shared ecological systems where local wounds inevitably become global aches.
The climate crisis is revealing cracks in our dominant knowledge paradigms. Yet at the same time, AI developers are convinced that their technology will accelerate scientific progress and solve our greatest challenges. I really want to believe they’re right. But several questions remain: are we capable of moving towards this technological future while authentically engaging with the knowledge systems we’ve dismissed, with genuine curiosity beyond tokenism? Or will we keep erasing forms of understanding through the hierarchies we’ve built, and find ourselves scrambling to colonise Mars because we never learned to listen to those who knew how to live sustainably on Earth?
Maybe the intelligence we most need is the capacity to see beyond the hierarchies that determine which knowledge counts. Without that foundation, regardless of the hundreds of billions we pour into developing superintelligence, we’ll keep erasing knowledge systems that took generations to develop.
Last updated