Just-in-Time Accessibility for Massive Collections: Why Accessible Archive From Pneuma Solutions Matters
As a sighted engineer and access technology publisher, I spend a lot of time in two very different worlds:
The world of beautifully digitized archives, institutional repositories, special collections, legislative records, historic newspapers, scanned theses, and more.
The world of people who can’t use any of it because the content is locked in inaccessible PDFs and page images.
If you work in a university library, a public archive, a government records office, or a large enterprise with decades of documents, you already know the tension. You’ve done the hard work of digitization and preservation… but for many blind and print-disabled users, those collections might as well not exist.
Most organizations respond with the tools they have:
- Ad hoc accommodations when someone requests a specific document.
- A few small, expensive remediation projects on “high-priority” collections.
- More policies and good intentions than actual accessible content.
It’s not because you don’t care. It’s because the traditional model of document remediation simply doesn’t scale to millions of pages.
That’s the problem Accessible Archive from Pneuma Solutions is designed to solve.
This article is aimed at the people who own this challenge: library directors, archives and records managers, CIOs and digital strategy leaders, accessibility and compliance officers, and anyone else sitting on a mountain of PDFs with a mandate to “make them accessible.”
Why Legacy Archives Are the Hardest Accessibility Problem You Have
Web pages and new documents are hard enough. But archives bring a special mix of constraints that make accessibility feel impossible.
1. The Sheer Volume
Most large collections are measured in:
- Millions of documents.
- Tens or hundreds of millions of pages.
Even if manual remediation cost you “only” $10 per page (and it’s often higher), you can see the problem immediately. Doing everything is financially out of reach.
2. The Long Tail
Only a small percentage of your documents are heavily used. But you don’t actually know, in advance, which items a given student, researcher, or citizen is going to need:
- Yesterday’s obscure thesis might be tomorrow’s key source.
- A niche environmental report might suddenly matter in litigation or policy debates.
- A decades-old local ordinance could become central to a real estate dispute.
If you only remediate the obvious “top 1%,” you are guaranteeing that most blind users will hit a wall at the exact moment something niche becomes important to them.
3. The Changing Legal Landscape
If you hold public records or serve students and the public, you’re probably navigating:
- ADA and Section 504/508 in the US.
- EU accessibility directives.
- National disability laws and human rights codes.
Regulators and courts are increasingly clear: “we have too much content” is not a valid excuse for never making it accessible. You may not have to finish overnight, but you do need a credible, systematic approach.
4. The Limits Of “Big Remediation Projects”
Big RFPs and one-time contracts sound appealing: fix everything in a few years and be done with it.
In practice:
- They only ever cover a subset of your holdings.
- Once they’re done, they’re done. If standards change, tools improve, or errors are discovered, you’re back to square one.
- You spend heavily on documents that may never be accessed, while requests for un-remediated content keep arriving.
- You end up with a few islands of accessible content in a sea of inaccessible PDFs.
Rethinking the Model: From Pre-Emptive to On-Demand
The core idea behind Accessible Archive is deceptively simple:
Stop trying to fix everything up front.
Fix what people actually use, and do it in a way that gets better over time.
Instead of a single, massive, pre-emptive remediation project, you treat accessibility as a just-in-time service that sits in front of your existing repository.
When a user requests a document:
- You detect that request.
- You send that document to Accessible Archive.
- Accessible Archive converts it into accessible formats using Pneuma’s Augmented Document Remediation engine (the same technology behind Scribe).
- You deliver the accessible version back to the user, and keep a copy in cache for the next request.
As the remediation engine improves, that same document can be reprocessed automatically to a higher standard without starting another “project.”
This lets you:
- Align cost with actual usage, not with hypothetical future interest.
- Expand coverage continuously, driven by real user demand.
- Capture evidence about what you did and when, for compliance and risk management.
That’s Accessible Archive in one sentence:
Just-in-time, evidence-backed accessibility for large document collections.
What Accessible Archive Actually Is
Accessible Archive is not a new repository or a separate digital library. It’s an accessibility layer that plugs into the discovery and access tools you already use.
Think of it as a specialized service that:
- Sits behind your catalog, discovery layer, institutional repository, DMS, or portal.
- Remediates documents on demand when a user needs them.
- Returns accessible versions in formats that work for blind and print-disabled users.
- Caches the results and records what was done for future reference.
It’s built for organizations like:
- Universities and academic libraries with institutional repositories, theses, and digitized collections.
- Public libraries and state archives with scanned newspapers, local history, and government documents.
- Museums and cultural institutions with digitized exhibits and research collections.
- Government agencies with legislative records, planning documents, and public reports.
- Enterprises with large document management systems and regulatory archives.
If you’re responsible for legacy content and accessibility (or the lawsuits that come with it), Accessible Archive is aimed squarely at your world.
How Accessible Archive Works Under the Hood
Let’s walk through a typical access scenario.
Step 1: A User Requests A Document
A user finds an item in your catalog, repository, or portal, maybe a 1993 zoning report, a 2005 PhD thesis, or a 1970s city council minute.
On that record, you’ve added an option like “Accessible version”.
When the user clicks it, your system sends a request to Accessible Archive with:
- An identifier or link for the original file (PDF, TIFF, etc.).
- Any relevant metadata.
- The user’s requested format, if they’ve chosen one (e.g., HTML, tagged PDF, audio).
Step 2: Accessible Archive Ingests And Analyzes The File
Accessible Archive retrieves the file from your storage and runs it through Pneuma’s Augmented Document Remediation (ADR) pipeline:
- OCR is applied if needed.
- The engine identifies headings, paragraphs, lists, tables, and reading order.
- Noise like repeated headers, page numbers, and decorative elements is handled appropriately.
- A semantic model of the document is created.
Step 3: Accessible Formats Are Generated
From that model, Accessible Archive generates one or more accessible outputs, such as:
- Accessible HTML.
- Tagged PDF aligned with current best practices.
- MP3 audio using high-quality TTS voices.
- Braille-ready files (e.g., BRF).
- Large print PDFs.
You control which formats you offer by default.
Step 4: Results Are Cached And Evidence Is Recorded
The accessible outputs, and associated metadata, are stored in a cache along with an evidence packet that can include:
- Timestamp of remediation.
- Version of the remediation engine.
- Hashes of the input and output files (to prove what was converted into what).
- Automated accessibility checks and their results.
- Any additional QA notes (if you add human review for certain classes of documents).
This gives you an audit trail you can reference later.
Step 5: Subsequent Requests Are Faster And Better
When another user requests the same item:
- If nothing has changed and the cached version is still valid, you return it immediately.
- If the remediation pipeline has improved significantly, the system can reprocess the original file, generate a more accurate accessible version, and update the cache.
You get the best of both worlds: fast responses for popular content, and continuous quality improvement as the technology advances.
Integration: Working With, Not Against, Your Existing Systems
From an engineering perspective, one of the key design decisions in Accessible Archive is that it doesn’t try to replace your repository or DMS. It layers on top of what you already have.
There are a few common integration patterns:
1. Discovery And Catalog Layers
- Add an “Accessible version” button or link in your discovery UI.
- When clicked, your application calls an Accessible Archive API endpoint for that item.
- The user is either immediately presented with available accessible formats, or notified when the conversion is complete (if the document is particularly complex).
2. Institutional Repositories And DMS
- Embed Accessible Archive into the download flow of your repository.
- The first time a document is requested by a user who needs an accessible version, the system invokes Accessible Archive.
- Optionally, you can store the accessible version back into your system as a separate file or version.
3. Public Records And FOIA Portals
- For FOIA/public records requests, associate an Accessible Archive job with the underlying documents.
- Provide accessible formats alongside the standard downloads as part of your response.
- Build evidence from the remediation process into your compliance and reporting workflows.
In all these cases, your users stay in the interface they already know. Accessible Archive runs in the background as a specialized service.
Compliance, Evidence, and Risk Management
From a legal and risk standpoint, one of the most valuable aspects of Accessible Archive is not just what it produces, but what it records.
Most organizations today struggle to answer questions like:
- When did we make this document accessible?
- What standard did we apply?
- Can we prove that we didn’t change the substantive content?
Accessible Archive’s evidence packets give you a way to respond:
- Timestamps and engine versions let you show when and how a file was remediated.
- Input/output hashes prove the relationship between the original and accessible versions.
- Automated check results show that you’re aligning with standards like WCAG and PDF/UA, not just guessing.
This doesn’t eliminate all legal risk, nothing can, but it turns accessibility from “we react when someone complains” into “here is our documented, systematic approach, and here is proof of what we’ve done so far.”
For oversight bodies, disability offices, and courts, that distinction matters.
Deployment and Data Protection
Every institution has its own constraints around privacy, confidentiality, and data sovereignty. Accessible Archive is designed to work within those boundaries.
Cloud
For many academic, cultural, and public institutions, a secure cloud deployment is acceptable and cost-effective:
- Documents are transmitted securely to the service, processed, cached according to agreed policies, and purged when no longer needed.
- No long-term storage happens outside the caching and evidence you intentionally retain.
- You get elasticity and scalability without managing additional infrastructure.
On-Premises / Private Deployments
For agencies and enterprises with stricter requirements (healthcare, justice, finance, national archives):
- Accessible Archive can be deployed as an on-premises appliance or in a private cloud/VPC that you control.
- The remediation engine runs inside your network, connected to your own storage systems.
- No document ever has to leave your infrastructure.
In both models:
- All traffic is encrypted in transit.
- Access controls reflect your roles and security policies.
- You decide how long caches and evidence are kept.
From my perspective as an engineer, the goal is simple: you shouldn’t have to choose between data protection and accessibility. You can, and should, have both.
Changing the Economics: From “We Can’t Afford It” to “We Can’t Afford Not To”
Let’s talk about cost in concrete terms.
Imagine you have 10 million pages in your archives.
At $20 per page for manual remediation, you’re looking at $200 million, not including project management, QA, rework, and the opportunity cost of spending that money elsewhere.
Even if you only target 10% of your holdings, that’s still $20 million, and you’re leaving 90% of the archive inaccessible.
Now consider a model like Accessible Archive where:
- Automated remediation costs drop to cents per page at scale.
- You only process documents that someone actually requests.
- Popular documents are processed once and reused, while never-accessed documents never incur cost.
You’re not going to remediate everything overnight. But you change the slope of the curve:
- Year by year, more and more of your frequently used content becomes accessible.
- The “long tail” of rarely accessed documents still has a path to accessibility when needed.
- You spend your budget making real, measurable progress instead of re-remediating the same small sets of documents through one-off projects.
It’s not that cost stops mattering. It’s that accessibility goes from “financially impossible” to “another operating cost we can plan for and defend.”
Where Accessible Archive Delivers the Biggest Impact
In my experience, Accessible Archive tends to move the needle most in a few specific contexts.
Universities And Academic Libraries
- Institutional repositories with thousands of theses and dissertations.
- Digitized collections used in research and teaching.
- Course reserves and subject-specific archives.
Accessible Archive lets libraries support blind and print-disabled researchers without manually remediating entire collections up front, and without making every access request an emergency ticket.
Public Libraries And State Archives
- Historic newspapers and local history materials.
- Government documents and agency records.
- Community collections and oral histories.
Here, Accessible Archive turns “we’ll try to help if you call us” into “there is a clear, predictable way for you to get accessible versions of our holdings.”
Government Agencies And Records Offices
- Planning and zoning documents.
- Legislative histories and committee reports.
- Public comments, environmental assessments, and more.
Instead of treating accessibility as an afterthought in FOIA or open records processes, Accessible Archive lets you build it into the core workflow.
Regulated Enterprises
- Banks, insurers, and utilities with document backfiles.
- Pharmaceutical and manufacturing companies with regulatory submissions and safety documents.
- Corporations with long-lived compliance archives.
For these organizations, the combination of on-prem deployment, evidence packets, and just-in-time remediation is especially powerful.
How to Start: A Pragmatic Roadmap
If Accessible Archive sounds promising, here’s how I’d suggest approaching it.
1. Identify A High-Value, Constrained Pilot
Good candidates include:
- A specific collection (e.g., all theses after a certain year).
- A set of public records you’re frequently asked for.
- A portion of your digital archive that’s important for teaching or research.
You want something big enough to matter, small enough to manage.
2. Map Your Current Process
For that scope, answer:
- How does a blind or print-disabled user get accessible access today?
- How long does it realistically take?
- Who does the work, and at what cost (time, dollars, reputation)?
- What happens when they need something outside the “high priority” list?
This gives you a baseline for comparison.
3. Integrate Accessible Archive Into One Access Path
Don’t try to redesign everything at once. Instead:
- Add an “Accessible version” button to your catalog or portal for that collection.
- Wire it up to Accessible Archive.
- Decide which formats you’ll offer and what caching policy makes sense.
Then let real users try it.
4. Measure Outcomes
Track:
- Time from request to accessible document.
- Number of unique documents remediated.
- Number of repeat accesses served from cache.
- Feedback from disabled users and frontline staff.
The goal is not perfection from day one; it’s to demonstrate that the direction of travel has changed. Accessibility becomes a service the archive provides, not a one-off favor.
5. Plan For Scale
If the pilot shows value:
- Expand to additional collections or repositories.
- Formalize policies: when to use Accessible Archive, how long to keep evidence, how to combine automated and human remediation.
- Align your accessibility statements and public guidance with this new capability.
Over time, your accessibility posture evolves from “we’ll try to help if you ask” to “this is how our archive serves everyone.”
Closing Thoughts: Accessibility as a Core Part of Stewardship
As technologists and stewards of information, we’ve spent decades solving hard problems around storage, preservation, and discovery. We know how to keep bits safe. We know how to make them searchable.
What we haven’t done nearly as well is make those bits usable by everyone who has a right to them.
Accessible Archive is one answer to that gap: a way to make accessibility a systemic, scalable service rather than a series of emergency responses.
If you’re responsible for a large archive, I’d encourage you to ask yourself two questions:
- If a blind researcher, student, or citizen explored our collections today, how much of it could they actually use?
- What story can we tell about how that answer will improve in the next year, three years, five years?
If the honest answers make you uneasy, you’re not alone, and you’re not stuck.
Tools like Accessible Archive exist precisely so that you can start changing those answers, in a way that respects both your mission and your budget.
And in my view as an engineer and access technology specialist, that shift, from well-intentioned exceptions to accessible infrastructure, is one of the most important changes we can make in how we treat the knowledge entrusted to us.
” The greatest barrier to acessibility is indifference. “
Aaron Di Blasi, PMP
Engineer, Educator, Advocate, Publisher and Journalist, President & Sr. PMP, Mind Vault Solutions, Ltd., PR Director: AT-Newswire, Publisher: AI-Weekly, Top Tech Tidbits, Access Information News, Title II Today
Mind Vault Solutions, Ltd.
President, Sr. Project Management Professional (2006 — Present)
Innovative ideas. Solutions that perform.
Top Tech Tidbits
Publisher (2020 — Present)
The Week’s News in Access Technology
Access Information News
Publisher (2022 — Present)
The Week’s News in Access Information
AI-Weekly
Publisher (2024 — Present)
The Week’s News in Artificial Inteligence
AT-Newswire.com
PR Director (2024 — Present)
Access Technology’s Digital Newswire
Title II Today
Publisher (2025 — Present)
The Month’s News in Title II Compliance
Connect With Me:
🌍 Website: https://toptechtidbits.com
📧 Email: publisher@toptechtidbits.com
📞 Phone: +1 (855) 578-6660
📧 Subscribe: https://toptechtidbits.com/subscribe
💬 Facebook: https://toptechtidbits.com/facebook
💬 LinkedIn (Individual): https://www.linkedin.com/in/aarondiblasi/
💬 LinkedIn (Publication): https://toptechtidbits.com/linkedin
💬 Mastodon: https://toptechtidbits.com/mastodon
🛜 RSS: https://toptechtidbits.com/feed
💬 X (Formerly Twitter): https://toptechtidbits.com/x
📽️ YouTube: https://toptechtidbits.com/youtube
📍 Address: 1284 SOM Center Road, PMB 194, Mayfield Heights, Ohio 44124-2048, USA


