Research
Federal Records Platform: Modernising a Paper-Based Process
Arxium
November 28, 2024

Federal Records Platform: Modernising a Paper-Based Process

We led the modernisation of a records management platform reliant on outdated file servers and manual processes. The new system included secure cloud storage, role-based access, and metadata tagging.
Article heading

Project Background

Our client had managed its archival workflow on a pair of HP-UX servers running an Oracle 9i database and a thick-client PowerBuilder application last touched in 2007. The system stored 2.4 billion metadata rows, ingested roughly 30 million TIFF images, and served 15 000 public servants spread across thirty-six agencies. Tape rotation, XML file drops and sneaker-net disks were still part of the weekly routine.

The brief to Arxium was deceptively simple: “Give us a web platform that scales, stays PROTECTED-classified, and keeps every accession number exactly where it should be.” Regulatory overlay included the Archives Act, the Commonwealth Records Series (CRS) model, IRAP compliance, and an immovable deadline.

Assessment and Target Architecture

We began by profiling workloads for three weeks, piping AWR dumps from Oracle into PerfInsights and drawing dependency graphs with Neo4j. The legacy stack was a monolith, but business capabilities mapped cleanly onto four domains: ingestion, description, access-control, and public discovery. Given the security tier and the multi-agency nature, we chose a hybrid architecture:

  • Immutable object storage (AWS S3 in the Melbourne region with S3 Object Lock, replicated to Sydney) for images and PDFs
  • A PostgreSQL 13 cluster on Amazon RDS Multi-AZ for structured metadata
  • A sealed AWS OpenSearch domain for free-text and hierarchical series search
  • A Go service layer deployed on EKS Fargate; each domain became an independent service with its own schema
  • An React front-end delivered via CloudFront with Origin Shield to satisfy edge caching requirements for public researchers
  • All infrastructure definitions live in a mono-repo managed by Terraform; every merge to the main branch spins a plan file inspected by security and compliance bots before apply.

Data Migration Strategy

Record integrity trumped development velocity, so we built a double-entry ledger approach rather than a classic ETL. Oracle Data Pump produced nightly incremental dumps; an in-house Go tool (chronolog) replayed REDO archives into logical change events that landed on Kinesis. Downstream, a Flink cluster enriched events, wrote them into Postgres and OpenSearch, then published SHA-256 hashes to a QLDB instance for tamper-evidence.

During the final cut-over weekend we executed a rolling delta replay down to a 90-second replication gap, paused ingestion on the old system, completed the final cycle, reconciled QLDB hashes, and flipped Agency DNS records to the CloudFront distribution. The public catalogue sustained a 99.7% cache hit rate in the first week post-migration, keeping origin load negligible.

Security and Compliance

IRAP at PROTECTED level imposed strict controls: encrypted transport everywhere (TLS 1.2 only with modern ciphers), customer-managed KMS keys, VPC-centric service endpoints, and CIS Level 2 AMIs. We integrated AWS Macie to alert on anomalous PII exposure in newly ingested content and configured GuardDuty with custom threat lists derived from the Australian Cyber Security Centre.

Every service publishes audit events to an EventBridge bus; a dedicated Lambda forks these to CloudWatch Logs, S3 Glacier (seven-year retention), and the QLDB chain. The QLDB digests are anchored weekly to the Bitcoin testnet using an open-source hasher — an experiment that satisfied the Directorate's “provable non-repudiation” requirement.

Operational Outcomes

Six months after go, live throughput averages 18 gigabytes per hour, burst tested to 200 GB/h without manual intervention. Median search latency dropped from 4.2s on the PowerBuilder thick client to 110ms P95 on the React front-end. Batch ingest jobs that previously took eight days now complete in under four hours thanks to parallelised uploads and direct-to-S3 pre-signed URLs.

Two unexpected wins surfaced:

  1. Because every asset lands in S3 with Object Lock and a QLDB hash, agencies began decommissioning their own ad-hoc backups, consolidating archival policy through NRD.
  2. OpenSearch lifecycle policies let us tier indices to UltraWarm after three years, slashing storage costs by roughly 47%.

Lessons for Practitioners

  • Don't lift-and-shift a compliance nightmare. Re-platform the data, redesign the application; the extra rewrite cost is dwarfed by future agility and audit simplicity.
  • Ledger-style reconciliation beats one-time checksums when stakeholders need continuous proof of integrity.
  • PROTECTED doesn't preclude cloud agility; with an IaC pipeline, security baselines enforce themselves and developers can still ship daily.

Next Steps

We are currently piloting document OCR with Amazon Textract routed via a secure VPC endpoint, exploring server-side encryption with customer keys (SSE-C) to extend tenant isolation for certain classified agencies, and evaluating Cedar policies for fine-grained authorisation at the API layer.

About Us

Arxium is a software consultancy focused on helping government agencies, banks, and enterprises build systems that matter. We specialise in modernising legacy platforms, designing digital services, and delivering scalable, cloud-native architectures.

Leaders across Australia trust us to solve complex technical problems with clarity, pragmatism, and care. Whether it's migrating infrastructure, integrating systems, or launching a public-facing portal—we make software work in the real world.

Contact us to start a conversation about your next project.

Arxium ©
2025