Home > Blog The Log of a Librarian > Metadata as Revolt (07 of 10)
Metadata as Revolt (07 of 10)
Micro-Thesauri and Relational Logics
Building Semantic Worlds That Smell Like Corn and Sing Like Wind
This post is part of a series that explores how metadata can be used as a site of resistance, refusal, and poetic subversion. From classification to linked data, the series investigates how cataloging practices can encode oppression, and how they can be reimagined to challenge dominant systems and speak from the margins. Check all the posts in this section's index.
From Resistance to Infrastructure
The earlier essays in this series dissected metadata's ideological core — its pretense of neutrality, its colonial inheritance, and its disciplinary grammar. They showed that standards such as Dublin Core, SKOS, and RDF are not descriptive languages but infrastructures of authority. Subversion was possible: tricking SKOS's label hierarchies, bending RDF triples, turning minimal schemas against themselves. However, critique and tactical sabotage reach a limit. Systems do not change by exposure alone. They change when new infrastructures emerge.
This text moves from resistance to construction. It asks what a post-universal metadata ecology might look like: one where description is not harmonized under a single vocabulary but distributed across small, interlinked, and accountable systems. The proposal is pragmatic: build micro-thesauri governed by relational logics rather than hierarchical inheritance.
The Limits of Universality
Contemporary knowledge organization still depends on large-scale vocabularies — LCSH, AAT, AGROVOC, Darwin Core, TDWG, UNESCO Thesaurus — that claim universality through "semantic harmonization." The goal is interoperability, while the result is semantic homogenization.
Under these regimes, linguistic, ecological, and disciplinary nuance is flattened into standardized descriptors. LCSH can recognize "Indigenous art" but not the internal distinctions that matter to its makers. Darwin Core captures species names and coordinates but not ecological relations or cultural contexts. The macro-thesaurus functions as a filtering machine: what does not fit its syntax vanishes.
This is not a moral failure: it is an architectural one. The database scale itself enforces centralization. A universal vocabulary assumes that the world can be consistently segmented, labeled, and reconciled. The assumption is false. Knowledge domains evolve at different speeds, in different directions, and under different epistemic logics. Forcing them into a single semantic lattice is not efficiency — it is epistemic compression.
The Case for Micro-Thesauri
A micro-thesaurus is a bounded, domain-specific vocabulary built for a particular context: a research group, a museum collection, a community archive, a field station, or a dataset. It is small enough to be comprehensible, flexible enough to evolve, and open enough to interconnect. The term parallels "microservices" in software engineering: modular components that communicate through defined interfaces without sharing a monolithic architecture.
Micro-thesauri do not aspire to global coverage. Their strength lies in precision and accountability. Each term has a known provenance; each relation reflects local consensus. They support semantic subsidiarity: decisions made at the smallest competent scale. Instead of universal categories, we get a federation of situated vocabularies whose boundaries are explicit and negotiable.
Such systems are not antithetical to standards: they extend them horizontally. A micro-thesaurus can be expressed in SKOS, OWL, or JSON-LD. What changes is the governance model: from centralized maintenance to distributed authorship, and from fixed hierarchies to adaptive relations.
Relational Logic as Design Principle
Conventional thesauri are hierarchical trees. Concepts descend from broader to narrower terms, with lateral links as afterthoughts. This design assumes that knowledge can be arranged as inheritance. But modern information ecosystems — biological networks, environmental data, linked open science — operate through relation, not lineage.
Relational logic treats concepts as nodes connected by typed predicates rather than as children of a single parent class. In SKOS, this means prioritizing skos:related, mapping relations (exactMatch, closeMatch, broadMatch, narrowMatch), and, when necessary, defining custom properties. In OWL, it means building classes through property restrictions and domain-specific relationships instead of deep taxonomic trees.
A concept in a micro-thesaurus is thus defined by what it connects to, not by what it inherits from. The shift may appear semantic, but it is actually philosophical and operational. Hierarchy presumes control; relation presumes negotiation. The former optimizes for authority; the latter for adaptability.
How to Build a Micro-Thesaurus
Constructing a micro-thesaurus begins with a clear definition of scope. Every semantic environment — a marine-biology station, an oral-history corpus, a local instrument collection — demands a specific conceptual granularity and disciplinary vocabulary. Scoping determines the boundaries of relevance: what the thesaurus will represent, and what it will deliberately leave out.
From there, the process advances through term elicitation. Terminology is gathered from fieldwork, disciplinary literature, existing metadata, and the natural language of practitioners. Synonyms and variants are documented as separate expressions until the community or domain experts negotiate a shared meaning. This initial heterogeneity is valuable: it exposes the diversity of actual usage before the vocabulary is formalized.
Once the lexicon is stabilized, relations among terms are modeled. Instead of relying exclusively on hierarchical broader–narrower structures, micro-thesauri emphasize equivalence, association, and processual relations — connections such as "is used in," "is produced by," "co-occurs with," or "causes." These can be expressed through SKOS, OWL, or custom RDF predicates, depending on the desired level of semantic precision. Implicit hierarchies are avoided unless a clear conceptual dependency justifies them.
Validation follows as a critical social phase. Definitions and relations are reviewed by domain experts, researchers, or community stewards. Changes are tracked, version-controlled, and accompanied by brief rationales to ensure transparency over time.
Publication then transforms the micro-thesaurus into a usable artifact. It can be exposed as Linked Data through Git repositories, SPARQL endpoints, or lightweight JSON-LD exports. The key principle is that interoperability is achieved through explicit mappings and alignments, not by surrendering to a larger ontology.
Finally, maintenance is treated as a continuous, iterative activity. A living thesaurus must evolve with its users and context. The objective is not stability but transparency: documenting how meaning shifts and why. Tools such as VocBench, Protégé, PoolParty, or even minimal Markdown-to-RDF workflows can support this cycle. What matters more than software sophistication is governance — knowing who decides, who reviews, and who assumes responsibility for the semantic fabric being created.
Advantages of Distributed Semantics
The rationale for distributed semantics lies in its operational and ethical advantages. First is resilience: failure in a micro-thesaurus remains local. An inconsistent relation or deprecated term affects only its immediate domain instead of destabilizing an entire global framework.
Equally crucial is accountability. In small-scale vocabularies, provenance and authorship are explicit. Each decision carries a signature; each definition a traceable rationale. This visibility contrasts sharply with the opacity of institutional taxonomies, where authority is dispersed and anonymous.
Adaptability follows as a practical virtue. Because a micro-thesaurus is small and self-governed, it can evolve as soon as new realities emerge. A new technology, a novel species, or an unforeseen social category can be incorporated without waiting for consensus from a distant standards body.
Interoperability, too, takes on a different character. Instead of enforcing one master vocabulary, mappings between autonomous thesauri operate as translation layers. They allow systems to converse without demanding assimilation, preserving difference while enabling connection.
Finally, distributed semantics enhances cognitive accessibility. Practitioners can read and understand the structure they work with. The thesaurus stops being invisible infrastructure and becomes a tangible interface: a knowledge map that mirrors practice rather than obscuring it.
Together, these qualities define an epistemic model where scale, transparency, and relation converge: a federated architecture of meaning that can change without collapsing, speak without dominating, and endure without hardening into dogma.
Ethics and Politics of Scale
Scale is not neutral. A global vocabulary centralizes not only data but decision-making. Every inclusion and exclusion, every definition, becomes an act of governance. Micro-thesauri invert that dynamic. They locate control where expertise and consequence coexist — at the edge of practice.
This is the semantic equivalent of federated infrastructure: multiple independent servers that communicate through open protocols but retain autonomy. The analogy is instructive. Just as federated social networks resist platform monopolies, federated vocabularies resist ontological monopolies. They distribute the cost of meaning-making across nodes instead of concentrating it in an invisible center.
Ethically, this model aligns with CARE (Collective Benefit, Authority to Control, Responsibility, Ethics) principles developed in Indigenous data governance, without resorting to token inclusion. The principle is structural: authority over description should coincide with responsibility for its consequences.
Relational Architectures in Practice
Examples of relational logic already exist. In biodiversity informatics, BioSchemas extends Schema.org through community-defined metadata profiles that describe biological and environmental resources. These modular extensions complement, rather than replace, the base vocabulary. In cultural heritage, PeriodO represents temporal definitions as linked assertions — multiple scholarly statements about when a period begins and ends — rather than as a single fixed interval. In citizen science and biodiversity knowledge graphs, projects like OpenBiodiv and Plinian Core provide domain-scoped vocabularies that interconnect without subsuming one another, aligning around shared identifiers and predicates instead of hierarchical inheritance. Each functions, in practice, as a micro-thesaurus: small, modular, transparent, and contextually governed.
A similar approach can guide libraries, archives, and research infrastructures. Instead of revising LCSH, institutions can build modular extensions — localized micro-thesauri that interact through explicit crosswalks. The global graph becomes a federation of local schemas, each preserving its logic while remaining linkable.
The long project of cataloging assumed that universality was the condition of order. The next phase of metadata work assumes the opposite: that plurality is the condition of coherence.
Relational logics make it possible to design infrastructures that connect without homogenizing. SKOS and OWL, when stripped of their bureaucratic pretensions, become lightweight instruments for distributed semantics — as effective for a local seed bank as for a national archive.
A metadata system built from micro-thesauri is not chaotic: it is polycentric. Each vocabulary defines a semantic neighborhood, where mappings form the roads between them. The resulting landscape resembles an ecological network more than a hierarchy: dense, differentiated, and alive to change.
The Quiet Revolution of Scale
Metadata revolts not only through critique or refusal but through scale. Building small, autonomous vocabularies that interconnect laterally rather than vertically constitutes a form of infrastructural dissent. It replaces the fantasy of the universal catalog with the practice of federated description.
The work is incremental, unglamorous, and technical — yet it redefines the political economy of knowledge. Each micro-thesaurus becomes a site where meaning is negotiated rather than imposed, where structure arises from relation, and where interoperability is a dialogue, not a decree.
The job, then, is producing new standards — quietly, locally, and precisely — until the architecture of universality collapses under the weight of its own obsolescence.