WisPaper
WisPaper
学术搜索
学术问答
价格
TrueCite
The Hidden Gatekeepers: How Library History Shaped the Modern arXiv Taxonomy
总结
问题
方法
结果
要点
摘要

This historical analysis traces the evolution of preprint classification in physics from private circulation to institutionalized library systems like CERN and DESY. It examines how these manual categorization practices laid the foundational logic for modern platforms like arXiv, shaping how scientific "insiders" and "outsiders" are defined.

TL;DR

Before the digital age of arXiv, the classification of physics preprints was a manual, politically charged process managed by librarians at CERN and DESY. This article explores how those early "pragmatic" decisions created the social boundaries of physics today, influencing who is considered a core researcher and whose work is relegated to the "uninteresting" fringes of the general category.

Background: Sorting the Flood of Information

We often view the category labels on arXiv (like hep-ph or cond-mat) as objective scientific bins. However, as historian Phillip Roth argues, these systems are not just technical tools but the byproduct of institutional needs from the 1960s. Understanding this history is crucial to recognizing how modern AI-driven moderation still enforces "insider-outsider" dynamics in the scientific community.

The Shift from Private Networks to Public Gatekeeping

In the post-WWII era, waiting for journal publication was too slow for high-energy physics. Physicists shared "preprints" via private mailing lists. This changed at CERN in the late 1950s when librarian Luisella Goldschmidt-Clermont began centralizing these documents to help visiting researchers stay informed.

The Birth of "Selectivity"

By the mid-1960s, the volume of preprints at CERN surged by nearly 300%. This "flood" forced libraries to stop being mere collectors and start being editors.

  • The Problem: A "heterogeneous mass" could choke out "relevant" items.
  • The Strategy: Libraries employed "scientific information officers"—hybrids of physicists and librarians—to decide what was "in" and what was "out."

1960s Library Classification Logic (Note: Early preprint handling transitioned from private mailing lists to centralized library reading rooms, establishing the first formal exchange infrastructure.)

Not All Categories Are Created Equal

The classification systems developed at CERN and DESY (the High-Energy Physics Index) were not neutral. They were designed around the specific research interests of those laboratories.

  1. Mainstream Bias: At DESY, the more "mainstream" a paper was, the more keywords it received, making it easier to find.
  2. The "Border" Material: Papers that didn't fit neatly into established bins were often rejected or labeled as "border" material—a process the 1965 CERN manual admitted was "inevitably somewhat arbitrary."

From Human Librarians to the gen-ph "Dump"

This historical arbitrariness persists in the digital age. Roth points out that arXiv’s gen-ph (General Physics) category often acts as a socio-technical "buffer."

  • The Perception: While it looks like a standard category, it is often used for papers that have no specific audience and are deemed "generally uninteresting."
  • The Modern Spin: Today, machine learning tools recommend reclassifications, masking social sorting behind a "veneer of technicality."

Modern arXiv Performance Metrics (Note: Research indicates that being classified into 'gen-ph' rather than a specialized sub-field significantly impacts a paper's visibility and community acceptance.)

Critical Insight & Conclusion

The "correct" categorization of science is a myth. Every taxonomy is a choice that reflects a human vision of what information is valuable. As we move toward more automated classification tools, we must remain critical of the "naturalized" boundaries they enforce.

Key Takeaways:

  • Social Sorting: Classification is a form of social incentive; inclusion in a "top-tier" category acts as a badge of membership.
  • Technological Continuity: Modern AI classifiers on preprint servers are not innovating new logic; they are automating the pragmatic, sometimes biased, selectivity of 1960s librarians.
  • Call to Action: Researchers should be aware that the metadata of their submissions is as much a political statement as a technical one.

References

  • Roth, P. H. (2025). "Formalizing Informal Communication: An Archaeology of the Pre-Web Preprint Infrastructure at CERN."
  • Bowker, G. C., & Star, S. L. (1999). "Sorting Things Out: Classification and its Consequences."

发现相似论文

试试这些示例

  • Search for recent sociological studies on how arXiv's 'gen-ph' or 'physics.gen-ph' category affects the citation impact and visibility of unconventional research.
  • Which historical papers first discussed the concept of "grey literature" in physics, and how has the definition evolved with the advent of open-access repositories?
  • Explore how machine learning-based classification tools in major scientific repositories (like INSPIRE-HEP or arXiv) handle interdisciplinary papers that challenge existing taxonomies.
目录
The Hidden Gatekeepers: How Library History Shaped the Modern arXiv Taxonomy
1. TL;DR
2. Background: Sorting the Flood of Information
3. The Shift from Private Networks to Public Gatekeeping
3.1. The Birth of "Selectivity"
4. Not All Categories Are Created Equal
5. From Human Librarians to the `gen-ph` "Dump"
6. Critical Insight & Conclusion
7. References