Metadata for the Written Word

Cataloging as Exclusion

This note is part of a series that critiques the cult of literacy in libraries — exposing how reading, writing, and the book have been crowned as the only valid forms of knowledge, while everything else is silenced, excluded, or reshaped to fit the page. Check all the notes in this section's index.

Introduction

Metadata is often described as the invisible infrastructure of libraries: a neutral scaffolding that ensures discoverability, interoperability, and long-term access.

Yet neutrality is a myth.

Systems such as MARC (Machine-Readable Cataloging), the Dewey Decimal Classification (DDC), and the Library of Congress Classification (LCC) were designed within, and for, a world where the book was the central unit of knowledge. They did not emerge in a vacuum but in late-19th and early-20th century contexts that assumed literacy as both cultural norm and intellectual ideal. The result is a literocentric bias embedded deep within the architecture of library description.

This bias has consequences. When metadata structures presuppose paginated, authored, and published objects, they inevitably misrepresent, marginalize, or erase forms of knowledge that do not conform to print culture. Oral traditions, performative practices, ephemeral or multisensory expressions — these appear in catalogs only as shadows of themselves, often mediated through written surrogates.

Literate Data Structures

MARC, created in the 1960s at the Library of Congress, illustrates the privileging of the book. Its record structure is optimized for bibliographic entities with authors, titles, publishers, and page counts. Even the "non-book" formats incorporated later —maps, sound recordings, videos— are forced into the same template. A vinyl record is described by track lists and durations; a film is captured through production credits and length in minutes. The data model does not account for the experiential qualities of listening, viewing, or participating.

Classification systems reproduce the same literate logics. Dewey places oral traditions under narrow literary categories (often in "folk literature"), and LCC relegates Indigenous oral works to areas adjacent to written literature, thereby subsuming them into literary genres rather than recognizing them as autonomous epistemic forms. The act of fitting knowledge into preexisting classes reveals the bookish DNA of these systems.

Classifying the Unclassifiable

When confronted with non-literate knowledge, cataloging resorts to translation into textual surrogates. Consider oral epic traditions: the Homeric poems or the West African griot repertoires are represented in catalogs not as living performances but as editions, critical commentaries, or transcriptions. Similarly, ritual dances preserved in audiovisual archives are cataloged through textual metadata —choreographer names, dates, locations— rather than by movement vocabularies meaningful to practitioners.

Sound archives provide stark examples. The Alan Lomax Collection at the Library of Congress contains thousands of field recordings of songs and narratives. In the MARC catalog, these are discoverable primarily through the transcribed title of the piece, performer's name, or recording date. The catalog rarely captures the tonalities, silences, or performative contexts — the very features that constitute the knowledge embodied in the recording. The metadata points the user back toward textual access points, not sonic or experiential ones.

Another case: the Sámi Archives in Norway maintain collections of yoik, a vocal tradition central to Sámi culture. While digital preservation projects have safeguarded recordings, cataloging remains oriented toward bibliographic description, with metadata fields for "song title" and "composer" — concepts foreign to yoik's relational and contextual identity. Here, literocentrism does not just misdescribe; it alters the ontology of the practice.

Consequences of Literocentric Cataloging

The literate orientation of cataloging produces two key consequences. First, discovery bias: if a knowledge form cannot be translated into bibliographic descriptors, it becomes invisible in search interfaces. Oral performances, if not transcribed, are practically unretrievable. Second, epistemic exclusion: cataloging communicates not just what exists, but what is recognized as legitimate knowledge. By encoding only literate surrogates, libraries perpetuate a hierarchy in which textual artifacts are considered worthy of preservation and description, while embodied, oral, and ephemeral practices are relegated to marginal status.

This reproduces what Christine Borgman (2000, From Gutenberg to the Global Information Infrastructure) identified as the dominance of textual epistemologies in library science. Even in digital libraries, where metadata schemas could be radically reimagined, we see continuity: Dublin Core, MODS, and BIBFRAME retain strong bibliographic orientations, ensuring the persistence of literocentric defaults.

Toward Alternatives

There are, however, experiments that push against this bias.

Sonic archives such as the British Library's Sounds project have begun incorporating descriptors beyond bibliographic fields, including genre tags, contextual notes, and community-sourced metadata. While still limited, these moves hint at possibilities for sonic-centered description.

Oral history projects like the Columbia Center for Oral History increasingly use thematic indexing rather than bibliographic surrogacy, allowing narratives to be accessed through keywords, emotions, or topics rather than textual publication data.

Indigenous data frameworks such as the CARE Principles for Indigenous Data Governance emphasize contextual, relational, and community-authorized descriptions of knowledge. Here, metadata is not about fitting into universal standards but about respecting epistemic sovereignty.

Semantic web approaches also open possibilities. Ontology-based models allow for the description of sounds, movements, or gestures on their own terms, without collapsing them into "titles" or "pages." Projects like Europeana's Linked Open Data have shown the feasibility of multimodal description, although uptake remains uneven.

Conclusion

Cataloging is not a neutral technical exercise but a political act. When MARC, Dewey, or LCC insist on literate structures, they reinforce the assumption that only written knowledge "counts." Oral traditions, performances, silences, and multisensory practices are not absent because they do not exist, but because library infrastructures fail to represent them.

The challenge is not simply to "add fields" for songs or dances, but to rethink metadata from the ground up: to ask what it would mean to start not with the book, but with the voice, the gesture, the rhythm, or the pause. Only then can libraries begin to loosen the hold of literocentrism and move toward more inclusive architectures of memory.