AI as Democracy

Large Language Models as a Reflection of Humanity

Large language models (LLMs) do not originate knowledge. They compile and reconfigure it. Each output is constructed from fragments of recorded human behavior, speech, and documentation. The logic of the model relies on statistical association as its foundation, rather than on verified fact. Its outputs follow the gravitational pull of prevailing language use, and the result isn’t new thought. It’s nothing more than structured unanimity.

This process mirrors how language travels through society. Influence accumulates where repetition is highest and access is easiest. LLMs ingest books, online discussions, technical papers, and public records. These sources do not represent the global population equally. They represent those with publication access, stable internet infrastructure, and dominant linguistic capital.

The model becomes an artificial consensus mechanism. Its formation is driven by weighted aggregation, absent any process of debate or deliberation. Presence alone becomes the metric by which the model evaluates significance. In this structure, prevalence becomes power. The words of those most recorded are given the most predictive weight.

This aggregation creates the appearance of neutrality. In practice, it reinforces existing imbalances. If a marginalized dialect appears infrequently in the training data, its patterns are marked as deviations rather than as valid alternatives. If a cultural tradition is poorly documented or locked behind paywalls, it’s excluded from the model’s memory. Omission reflects the model’s deliberate structuring.

The LLM does not know which knowledge is contested or sacred. It does not distinguish between historical accuracy and social media repetition. It ranks based on volume, not merit. In doing so, it encodes historical biases as present-day defaults. A system trained on centuries of skewed documentation will reproduce that skew unless directly corrected.

The model reflects the contours of human expression, but it cannot correct for who was never given the chance to speak. Its predictive strength grows stronger where the data is densest. That density isn’t evenly distributed. It follows geopolitical lines, linguistic borders, and the pathways of global capital.

What emerges is a portrait of collective humanity filtered through structural inequality. The model’s an aggregator of power, trained by the uneven scaffolding of the recorded world.

Democracy Distorted

The metaphor of AI as democratic begins to unravel under scrutiny. Large language models are trained on data that isn’t universally demonstrative. Their corpora consist primarily of internet-accessible texts, licensed publications, and digitized records. These sources are dominated by material produced in the Global North, written in dominant languages, and often shielded by paywalls or available only in formats that require reliable broadband access and modern hardware.

This structure embeds a systemic bias. Communities without access to publishing platforms, stable internet, or computational resources are not merely left out. They are rendered invisible within the statistical logic that defines the model. Their absence goes from being a gap to becoming an algorithmic decision to prioritize one type of voice over another.

The issue isn’t confined to geography. Within wealthy nations, working-class voices, oral traditions, undocumented histories, and non-standard dialects are similarly excluded. These groups leave behind limited data traces, often fragmented or undocumented within mainstream information systems. As a result, they fall below the model’s threshold for inclusion. The predictive power of an LLM relies on pattern frequency, not social value. If a perspective is rare in the data, it becomes noise rather than signal.

The imbalance emerges from structural inequality embedded in access and visibility. The availability of content is shaped by who has the tools to create, host, and distribute it. The weight given to that content within AI systems reflects that same asymmetry. Models trained disproportionately on Western, academic, and commercial material adopt those perspectives as baseline.

When such models are deployed globally, their outputs carry the authority of software while reflecting only a fraction of global thought. Despite the systems’ appearance of neutrality, its foundation is selective. It encodes an ideology of access. Those with the most bandwidth, storage, and institutional presence are granted the most algorithmic legitimacy.

This creates a feedback loop. AI outputs are reused in educational tools, content moderation, automated translation, and public discourse systems. When these outputs are based on unbalanced data, they reinforce existing silences. Minority languages remain unsupported. Local knowledge systems remain unindexed. Marginalized communities remain defined by the absence of their own voice.

The supposed democracy of AI is, in practice, a census of those who were already heard. And even that census is filtered through commercial licensing agreements, data cleaning procedures, and editorial decisions made behind closed doors. The system reflects power as it’s structured across technological infrastructure, not as it ought to be distributed across society.

The Ethics of Representation

A system built from human-generated data may resemble public agreement, but resemblance does not imply legitimacy. There’s no ethical framework in which statistical probability equates to political representation. To say that a model reflects discourse isn’t to say that it understands or serves those it reflects. It does not know who has been misrepresented. It does not know who was never included.

Large language models do not ask permission to replicate a voice. They do not verify the context in which an idea was first expressed. They cannot distinguish between a consensus reached through deliberation and one produced by amplification. The system’s designed to predict what’s likely to be said, not to interrogate what ought to be heard.

This distinction matters when AI outputs are positioned as summaries of public knowledge. There’s a growing trend of treating language models as neutral participants in decision-making processes, particularly in areas involving civic or social impact. Whether in education, content moderation, policy drafting, or automated journalism, these systems are increasingly used to mediate information that affects people directly. The idea that they can act as objective intermediaries ignores how they are built.

LLMs are shaped by institutions, corporations, and developers with specific goals. Their training data’s selected, filtered, and refined through choices that are rarely disclosed. Their alignment is fine-tuned through feedback from select demographics. Their deployment is often governed by product strategy, not public interest. There’s no civic process that governs their structure. There’s no electorate to hold them accountable.

Framing these models as demonstrative agents creates a dangerous illusion of consent and legitimacy. A delegate implies a relationship of trust and responsibility to constituents; a mandate. A model has none of these. It cannot represent a community it does not know. It cannot claim to speak on behalf of those whose lives and contexts remain outside its training scope.

The risk that these models might misrepresent someone exists, as does that society might accept this illustration as valid. Once an AI system becomes a proxy for public voice, the distinction between simulation and participation collapses. This removes the mechanisms of civic safeguards that are fundamental to democratic life.

The ethical issue encompasses both inclusion and power. Who decides what data is used? Who decides what’s removed? Who decides how the outputs are deployed? These decisions shape who gets to be heard, whose language is seen as authoritative, and whose experience is deemed marginal.

To use AI systems as if they carry public voice without granting the public any role in their sovereignty is to strip democracy of meaning. It turns conformity into computation and consent into product design.

Can AI Be a Commons?

The concept of a shared knowledge infrastructure, collectively governed and equitably accessible, is fundamentally incompatible with closed-source architectures, proprietary training data, and commercially controlled deployment. A commons exists to serve the public; built through cooperation and maintained through culpability. Current AI architectures do not meet these standards.

To move toward democratic infrastructure in artificial intelligence, foundational shifts are required. These include participatory oversight, localized data command, linguistic equity, and systems for addressing harms. A model that learns from society must also answer to it. Some technical mechanisms offer partial pathways forward. These methods alone are not sufficient, but they create the conditions under which more inclusive and accountable systems could emerge.

Reinforcement Learning with Human Feedback (RLHF)

RLHF is often presented as a solution for aligning models with human preferences. In current practice, it relies on feedback loops from contracted annotators or users of proprietary platforms. These contributors are rarely representative of global linguistic or cultural diversity. Without broader inclusion, RLHF reinforces the norms of the majority or the preferences of the platform’s user base.

An ethical RLHF process would begin with regionalized feedback networks. Feedback collection should involve linguistic specialists, cultural practitioners, and local governance bodies. Weighting mechanisms must be applied to correct for historical exclusion. Resolving conflicting preferences requires negotiated consensus grounded in human dignity and contextual understanding, rather than defaulting to majority voting.

Models should also have the ability to refuse alignment requests that contradict core public values or infringe on minority rights. Not all preferences are ethically equal. The system must include boundaries.

Data Curation and Accessibility Equity

Current models are constructed on the principle of volume. The more data, the better the performance. This principle favors widely available, digitized, and frequently cited material. It marginalizes oral traditions, non-Western epistemologies, and under-documented languages.

In a commons-based framework, data sourcing must shift from scale to depiction. Models should include local knowledge systems, indigenous languages, and nontraditional formats. This requires partnerships with community institutions, ethical data licensing, and tools for respectful integration.

Accessibility must also be addressed at the infrastructure level. Low-bandwidth optimization, translation interfaces, and offline tools are not add-ons. They are prerequisites for equitable use. If a model cannot be accessed or contributed to by communities in under-connected regions, then it’s not a democratic tool.

Neural Pruning and Model Stewardship

Neural pruning allows model maintainers to reduce complexity and retrain systems more efficiently by removing parameters deemed low impact. While often framed as a technical optimization, pruning’s also a political decision. Every parameter removed alters the model’s memory. Decisions about what to prune reflect judgments about relevance and value.

To treat pruning as a neutral act is to disregard the ethical weight of deciding what knowledge is retained or removed. What’s removed may represent minority voices, rare dialects, or contested knowledge. Pruning must not be conducted in isolation. It must involve review processes that include proper supervision, community consultation, and publication of pruning rationales.

Stewardship of AI models must shift from private product teams to public institutions. This does not require nationalization. It requires accountability. Model updates should be published, versioned, and documented. Changes to behavior must be traceable. Bias audits must be routine and public.

Without these mechanisms, the language of “alignment” conceals private manipulation under the appearance of social consensus. A commons cannot be curated by those with no obligation to the public it serves.

AI Intermediaries and the Question of Authority

Artificial intelligence is increasingly positioned as an intermediary between individuals and the systems that govern their lives. In education, health care, information access, and civic infrastructure, algorithmic agents are being integrated to filter, interpret, and deliver content on behalf of institutions. These agents are frequently framed as assistants. In practice, their role extends beyond convenience to include curation, redirection, and control.

There’s a quiet shift taking place. The language of support is giving way to the language of representation. These systems do not simply assist. They speak on behalf of users in negotiations with opaque systems. They translate needs into optimized requests. They preempt questions by offering pre-filtered responses. They interpret legal structures, medical data, and social dynamics without ever being formally appointed to do so.

This shift raises urgent questions. Who do these agents represent? Whose interests are prioritized when recommendations are made or options are ranked? What happens when the goals of the user and the goals of the developer diverge?

Any system that acts on behalf of the public must be accountable. It must operate within a framework of public oversight, ethical clarity, and transparent governance. The public must know how these systems are trained, how their decisions are shaped, and how to challenge outcomes that cause harm or exclusion. This includes the right to opt out of automated mediation and the right to demand human review.

If these systems are allowed to function as synthetic bureaucrats, filtering access to rights and resources based on proprietary logic, they will reshape the balance of power in ways that are difficult to reverse. The convenience of delegation can quickly become a substitute for participation. Once decisions are outsourced to opaque agents, the ability to contest or revise those decisions is significantly weakened.

AI systems are already influencing hiring outcomes, parole decisions, immigration cases, and housing eligibility. In many cases, the people affected do not understand how the decision was made or who has the authority to reverse it. The risk isn’t just that these systems may make mistakes: their decisions will become immune to correction.

Public institutions are governed by rules, oversight bodies, and legal remedies. AI systems must meet the same standards if they are to act within those institutions. This includes regulatory frameworks that define what these systems can and cannot do, mandatory impact assessments, and legally binding mechanisms for appeal.

The central issue lies in how these tools are shaped, the forces that govern their operation, and the interests they ultimately serve; even when their outcomes appear beneficial. If artificial intelligence is to become a voice within democratic systems, it must be subject to democratic constraint. Otherwise, it becomes an instrument of institutional capture by design, of engineered control.

Toward a Real AI Democracy

Artificial intelligence will not become democratic through metaphor. It will not become representative through scale. It will not become ethical through alignment alone. A model trained on the fragments of recorded human behavior cannot be trusted to embody collective will without public authority over how it’s built, used, and revised.

The foundational structure of current AI systems reflects the asymmetries of global access. These systems amplify the voices already overrepresented in the digital record and reproduce patterns of exclusion at scale. Without deliberate intervention, the technologies that claim to connect us will reinforce the divides that already define us.

Democratic institutions are defined by participation, transparency, and liability. For AI systems to support these values, they must be governed by them. This includes public oversight of training data sources, open documentation of model behavior, and regulatory frameworks that require auditability and redress. It also includes the right to know how outputs are generated and the ability to challenge decisions that affect rights or well-being.

Technical innovations such as reinforcement learning with human feedback, neural pruning, and multilingual data integration offer important tools. Nevertheless, tools alone do not produce justice. They demand intentional design and informed consent, grounded in a public empowered to engage with artificial intelligence and to influence how it develops.

The idea of AI as democracy will remain hollow unless the public’s granted ownership over the systems that speak in its name. That ownership must be legal, procedural, and material. It cannot be symbolic.

A true AI commons would regard language as a shared civic resource rather than a commodity. Underrepresented voices would be woven into its foundation by design, not added as an afterthought. It would recognize that data isn’t neutral and that every training set is a political act.

Democracy cannot govern what its citizens cannot reach. The burden of restraint belongs to those who build. Artificial intelligence will serve the public interest only when its architects are legally bound to it and held to account by the full weight of democratic obligation.

References

1. Bender, E.M., et al. “On the Dangers of Stochastic Parrots.” FAccT, 2021. https://dl.acm.org/doi/10.1145/3442188.3445922

2. Gebru, T., et al. “Datasheets for Datasets.” Communications of the ACM, 2021. https://dl.acm.org/doi/10.1145/3458723

3. Crawford, K. Atlas of AI. Yale University Press, 2021. https://yalebooks.yale.edu/book/9780300264630/atlas-of-ai/

4. Stanford HAI. AI Index Report 2024. https://aiindex.stanford.edu

5. Eliantonio, Mariolina, and Cristoforo Osti. “Artificial Intelligence Accountability of Public Administration.” American Journal of Comparative Law, vol. 70, suppl. 1, 2022, pp. i312–i340. https://academic.oup.com/ajcl/article/70/Supplement_1/i312/6596541?searchresult=1