How I found 80K missing regulatory documents and averted $10+ million fine

There is a category of problem that organizations do not know they have.

Not a problem that is being ignored. Not a problem that is understood but deprioritized. A problem that is entirely invisible — to management, to compliance teams, to the people whose job it is to make sure it does not exist.

This is a story about one of those problems.

The Environment

A global investment bank. A document generation process that had been running in production for years. Its job was straightforward: generate regulatory compliance documents on a defined schedule and store them in a designated library.

The process ran every day. It reported success. Nobody questioned it.

I was not brought in to look at this process. I was not brought in to audit compliance. I had capacity between assigned tasks and I used it the way I always do — by looking at things nobody had asked me to look at.

What I Found

The process was exciting, with a 100% success rate, yet 10% of the documents were missing. In most monitoring environments, a success code means the job ran and completed. Alerts are configured to fire on failure codes. No failure code, no alert, no investigation.

But a success code only means the process completed without crashing. It does not mean the process did what it was supposed to do.

I looked at the output. The documents were not there.

I checked the logs more carefully. The process had been failing silently for five years — completing without error, reporting success, and generating nothing. Every scheduled run. For five years.

The repository was missing approximately 80,000 regulatory compliance documents.

The Fix

The documents were not gone. The underlying data they were generated from still existed. The generation process itself was recoverable.

The problem was recreating five years of documents without triggering a larger incident. The repository was in active use. The bank was in normal operation. A bulk recreation job running during off-hours against a live production system carries its own risks.

I wrote a script to recreate the documents incrementally during business hours, running against the live production environment without disruption to normal operations. It worked. The repository was restored.

No outage. No data loss. No visible disruption.

What It Meant

Regulatory document retention requirements in financial services are not suggestions. They are legal obligations enforced by bodies with significant penalty authority. A five-year gap in a compliance document repository at a global investment bank is not a minor administrative oversight.

The gap was closed. The process was fixed. The monitoring was updated to verify output rather than just exit codes — the actual fix that should have been in place from the beginning.

I received no acknowledgment that anything had happened.

The Pattern

I want to be careful about how I frame this because the point is not the institution or the compliance failure. The point is a category of problem that standard monitoring cannot detect.

Exit code monitoring tells you whether a process ran. It does not tell you whether the process did what it was supposed to do. In most production environments these two things are treated as equivalent. They are not.

A process that exits successfully while producing nothing is invisible to conventional monitoring. It will run indefinitely, reporting health, generating nothing, until someone looks at the output directly.

In this case nobody had looked at the output directly for five years.

The Diagnostic Principle

The most dangerous production failures are the ones that look like successes.

Systems that crash loudly get fixed. Systems that fail silently accumulate consequences. The monitoring gap between “did the process run” and “did the process produce the correct output” is one of the most consistently underestimated risks in enterprise IT.

Every organization has processes running right now that exit with success codes and produce incorrect or incomplete output. In most cases nobody is checking. In most cases nobody will check until something downstream fails visibly enough to trigger an investigation.

By then the gap may be months or years deep.

The question worth asking today, before that happens: how do you know your critical processes are doing what they are supposed to do — not just running, but actually producing correct output?

Russ Profant is a solutions architect and independent consultant with 30 years of experience across financial services, investment banking, healthcare, and government. He runs PC4IT, offering cloud cost diagnostics and architecture advisory to mid-market organizations. pc4it.com

Book a 20-minute call

Send a message