When did we start trusting strangers?, is een ondertussen beroemde uitspraak in reactie op het succes van user ratings op gedrag van consumenten. Slechts een aantal jaren daarvoor was de mores nog: One the Internet, nobody knows you’re a dog. Maar naar nu blijkt zijn we ondertussen nog veel goedgeloviger geworden.
In zijn nieuwsbrief gaat Martyn Redstone nader in op ons nieuwe level van goedgelovigheid:
Modern AI models often employ a framework known as Retrieval-Augmented Generation (RAG), where the system first retrieves existing information from a vast corpus of data before augmenting it and generating a final answer. The choice of which data to retrieve is therefore a critical, formative step.
En naar nu blijkt, is de datakeuze van LLM’s de volgende:
![]()
The most frequently appearing domains across the four LLMs (AI Mode, AI Overviews, ChatGPT, and Perplexity)
Reddit steekt met kop en schouders boven alle andere bronnen uit. Redstone:
The findings were unambiguous. The study revealed that “Reddit specifically dominated” the citation landscape. The core statistic is stark: Reddit had a 40.11% citation frequency across all platforms studied. This means that when a user receives an answer with a source, there is a 2 in 5 chance that the information was drawn directly from a Reddit discussion. This establishes Reddit not merely as a source, but arguably as the single most influential source of conversational, human-generated data for the current generation of AI.
Maar wat is het demografische profiel van de gemiddelde Reddit gebruiker? Met behulp van een demografisch rapport (2025) geeft Redstone de volgende cijfers:
- Gender Disparity: The platform’s content is generated from a predominantly male perspective. Globally, Reddit’s user base consists of 59.8% males and 30.2% females. The effect is even more pronounced in the United States, where the platform has its largest user base; 27% of the entire US male population uses Reddit, compared to just 17% of the US female population.
- Generational Skew: The voice of Reddit is overwhelmingly young. In the US, 44% of users are aged 18 to 29. A separate analysis identified the average user age as just 23 years old. This youth-centricity means the experiences, cultural references, and knowledge of older generations are significantly underrepresented. For context, only 11% of Americans aged 50-64 and a mere 3% of those aged 65 and over use the platform.
- Geographic Concentration: The Reddit community, and therefore its data, is heavily American. Over half (58%) of all Reddit users are based in the US. This geographic concentration creates an American-centric lens through which information is filtered. The traffic data confirms this imbalance, with the US generating 804.9 million monthly visits—nearly ten times the 85.7 million visits from the second-ranked country, the UK.
En dat leidt Redstone tot de volgende evaluatie:
When an AI model’s primary data source is so demographically specific, the consequences are profound. This concentration does not simply risk bias; it makes it a mathematical certainty. An AI learning from this dataset will naturally develop a model of the world that reflects the interests, values, and cultural touchstones of young, American men. This “demographic echo” manifests in several ways.
First, it creates a distinct cultural and linguistic tone. The informality and entertainment-driven nature of the source material will likely lead to AI adopting colloquialisms, meme-based references, and a generally less formal tone than one might expect from an authoritative information utility.
Second, […] An AI’s “knowledge” is being constructed upon a foundation of opinion, personal anecdote, and content designed for engagement rather than factual accuracy. Reddit’s upvote system prioritises popularity, humour, and emotional resonance—not necessarily veracity. There is a tangible risk that popular but incorrect information can be laundered through the sophisticated veneer of an AI and presented as objective fact.
Most importantly, this dynamic effectively trains the AI on a “Default Human” who is a 23-year-old American male. This has significant consequences for the vast majority of global users who do not fit this profile. Their queries, cultural contexts, and lived experiences may be misunderstood or answered from a perspective that feels alien, biased, or simply incorrect.
Technologie waarin honderden miljarden dollars en euro’s in zijn gestort en waar nog honderden miljarden dollars en euro’s meer in gestort gaan worden hebben hun basis in niet-geverifieerde content van 23 jaar ‘oude’ Amerikaanse mannen. What could possibly go wrong?
Je kan je abonneren op de nieuwsbrief van Martyn Redstone via deze link: Subscribe now
