1968 in Declassified CIA Documents: A VŠE–UT Austin–UC Berkeley Team Develops an Agentic AI System for Analyzing Digital Archives

Prague / Austin / Berkeley – The year 1968 is one of the most pivotal moments in modern Czech history. The Prague Spring, hopes for reform, and the subsequent invasion by Warsaw Pact troops shaped entire generations. An international research team from the Prague University of Economics and Business (VŠE), The University of Texas at Austin, and the UC Berkeley Library (University of California, Berkeley) has now introduced a multi-stage agentic artificial intelligence system capable of automatically extracting structured information from large collections of declassified archives and generating a chronologically ordered overview of events. The results have been published in the scholarly journal The Electronic Library.

The research focused on a collection of documents released under the U.S. Freedom of Information Act (FOIA), specifically the so-called President’s Daily Briefs—intelligence summaries prepared for the President of the United States. The authors processed 201 documents from the period January 1968 to January 1969, comprising a total of 2,122 pages, and examined how the U.S. intelligence community informed the White House about developments in Czechoslovakia before, during, and after the invasion.

A key challenge of historical archives lies in their format: many materials exist only as scans or unstructured documents that are not machine-readable and typically require weeks of manual processing. The proposed approach employs agentic AI across the entire pipeline—from document retrieval and text conversion using OCR, through filtering relevant passages, to summarization, named entity extraction (people, places, institutions), and thematic analysis. The outputs include monthly summaries and structured datasets suitable for further scholarly research.

The study also includes a comparison of four large language models (GPT-5, Claude Sonnet 4.5, Grok 4, and Magistral Medium) in terms of output quality, speed, cost, and stability. The results demonstrate that there is no universally best model in practice: some deliver higher-quality outputs, while others are significantly faster or more cost-effective. The authors therefore emphasize that responsible AI deployment in public institutions requires systematic measurement of these parameters and decision-making based on specific use cases.

The research is particularly relevant for libraries, archives, and memory institutions managing large digital collections, as well as for academia and analytical practice more broadly. The study illustrates how modern AI tools can be used in a transparent and measurable way—and how extensive, hard-to-access archives can be transformed into structured data and analytical overviews that enable further work.

Publication:
Černý, J., Avramov, K., & Pendse, L. R. (2025). A multi-stage agentic AI system for extracting information from large digital archives: case study on the Czechoslovak year 1968 in CIA’s FOIA collection. The Electronic Library.
DOI: 10.1108/EL-06-2025-0272

  • Author: Jan Černý
  • Created on:
  • Last update: