Case Study

Intelligent Document Sort & Store for a Regulated Firm

Professional Services · RegulatedMulti-Tenant BuildOCR + ClassificationBuild · Multi-Phase

A mid-market regulated firm was losing about twenty minutes every time someone needed to find a document. Email, SharePoint, OneDrive, shared mailboxes, no single source of truth, no audit trail on approvals, and a real compliance gap whenever something needed reconstructing. We built a multi-tenant document management platform with AI at the core: OCR and classification with confidence scoring, an admin override that feeds back into taxonomy refinement, and an immutable audit trail behind every action. Retrieval is now under ten seconds.

Dashboard · live document overview

Dashboard · live document overview

Library · 30 documents, multi-status

Library · 30 documents, multi-status

Document review · approve, reject, override

Document review · approve, reject, override

Configuration · companies, functional areas, document types

Configuration · companies, functional areas, document types

Integrations · Microsoft 365 + IMAP

Integrations · Microsoft 365 + IMAP

At a glance

Industry
Mid-market professional services · regulated · multi-entity
Team size
~10 across the pilot group
Mode
Build · multi-phase
Status
Live, in active use · phase-two roadmap scoped
Friction type
Document chaos across email, SharePoint, OneDrive and spreadsheets. Reactive compliance, no audit trail on approvals, retrieval taking ~20 minutes per document, and knowledge trapped in silos.

The problem

On the surface: document chaos. Invoices, contracts, compliance certificates, supplier agreements arriving by every channel with no consistent naming and no consistent filing. Underneath: reactive compliance, no audit trail on email-chain approvals, knowledge that didn't compound across teams, and personal-and-business mixing on OneDrive that meant blanket scanning wasn't an option. They'd tried rigid folder structures, SharePoint and tracking spreadsheets. All of it degraded under real-world use.

The solution

A multi-tenant platform that handles the full document lifecycle: ingestion, classification, review, filing, retrieval, action, audit. AI sits at the centre, but never as a black box.

  • Ingestion · three channels

    Manual upload, email mailbox polling every 30 seconds (shared collector inbox + per-user mailboxes with encrypted credentials), Microsoft 365 via Graph API polled every 5 minutes.

  • Processing · per-document pipeline

    Type and size validation, SHA-256 hash for duplicate detection, tenant-partitioned object storage, OCR + extraction via GPT-4o, classification against the firm's taxonomy with a 1-10 confidence score, canonical filename generation. Low-confidence flagged to admin.

  • Review and approval

    Admin review page: accept the AI suggestion, override, or reject with a reason. Every decision recorded in the audit trail. A finalisation task runs every minute, validates completeness, moves docs into the library when ready.

  • Retrieval

    Full-text search across extracted OCR, filters by company / functional area / document type / status / date, card and table views. Time-limited presigned URLs for downloads. Retrieval dropped from ~20 minutes to under 10 seconds.

  • Role-based access enforced at the API

    Users (upload, search, view), Managers (plus reports, review, configuration view, team management of users), Admins (full). Tenant isolation via auth token, never user input.

  • Compliance layer

    Immutable audit trail for every action. Soft deletes everywhere. Fernet encryption for sensitive credentials, bcrypt for passwords, TLS / SSL throughout. Domain-controlled registration. Unique request ID per API call.

  • Stays actionable, not just filed

    Server-sent events push real-time notifications. Daily summary emails to admins and managers. Automatic escalation at 3, 7 and 14 days for overdue approvals. On-demand CSV reports.

What it deliberately isn't

A black-box AI system. Classification confidence is always visible. Admins can override anything. Every override feeds back into taxonomy refinement. The firm trusts it because they can see how it got there.

The outcome

  • Retrieval time, from ~20 minutes to under 10 seconds.
  • Filing effort, eliminated as a task. Admin intervention is a ~30-second approval check.
  • Naming consistency, automatic, enforced by canonical filename generation.
  • Audit readiness, complete immutable trail producible in minutes, not reconstructed from scratch.
  • Compliance visibility, mandatory documents tracked at functional-area level. Expiring docs trigger escalating alerts.
  • AI accuracy, confidence distribution and override rates visible to admins, feeding taxonomy refinement.

Phase-two roadmap (scoped and costed)

  • In-platform Office editing via WOPI protocol
  • Microsoft Entra ID SSO
  • Action layer (“information only”, “action completed”, notes on documents)
  • OneDrive scanning controls for personal-folder exclusion
  • Targeted compliance certifications (ISO 27001, Cyber Essentials, SOC 2, GDPR data residency)
  • Custom report builder
Sketcha

Got an idea like this in your own business?

Sketcha is a quick, no-pressure way to talk it through. Tell me where things are sticking and you'll walk away with a sketch of how a system could look, including a flow diagram of the AI bits.

Example Sketcha flow diagram of an AI workflow
SKETCHASKETCHA