Our client had managed its archival workflow on a pair of HP-UX servers running an Oracle 9i database and a thick-client PowerBuilder application last touched in 2007. The system stored 2.4 billion metadata rows, ingested roughly 30 million TIFF images, and served 15 000 public servants spread across thirty-six agencies. Tape rotation, XML file drops and sneaker-net disks were still part of the weekly routine.
The brief to Arxium was deceptively simple: “Give us a web platform that scales, stays PROTECTED-classified, and keeps every accession number exactly where it should be.” Regulatory overlay included the Archives Act, the Commonwealth Records Series (CRS) model, IRAP compliance, and an immovable deadline.
We began by profiling workloads for three weeks, piping AWR dumps from Oracle into PerfInsights and drawing dependency graphs with Neo4j. The legacy stack was a monolith, but business capabilities mapped cleanly onto four domains: ingestion, description, access-control, and public discovery. Given the security tier and the multi-agency nature, we chose a hybrid architecture:
Record integrity trumped development velocity, so we built a double-entry ledger approach rather than a classic ETL. Oracle Data Pump produced nightly incremental dumps; an in-house Go tool (chronolog) replayed REDO archives into logical change events that landed on Kinesis. Downstream, a Flink cluster enriched events, wrote them into Postgres and OpenSearch, then published SHA-256 hashes to a QLDB instance for tamper-evidence.
During the final cut-over weekend we executed a rolling delta replay down to a 90-second replication gap, paused ingestion on the old system, completed the final cycle, reconciled QLDB hashes, and flipped Agency DNS records to the CloudFront distribution. The public catalogue sustained a 99.7% cache hit rate in the first week post-migration, keeping origin load negligible.
IRAP at PROTECTED level imposed strict controls: encrypted transport everywhere (TLS 1.2 only with modern ciphers), customer-managed KMS keys, VPC-centric service endpoints, and CIS Level 2 AMIs. We integrated AWS Macie to alert on anomalous PII exposure in newly ingested content and configured GuardDuty with custom threat lists derived from the Australian Cyber Security Centre.
Every service publishes audit events to an EventBridge bus; a dedicated Lambda forks these to CloudWatch Logs, S3 Glacier (seven-year retention), and the QLDB chain. The QLDB digests are anchored weekly to the Bitcoin testnet using an open-source hasher — an experiment that satisfied the Directorate's “provable non-repudiation” requirement.
Six months after go, live throughput averages 18 gigabytes per hour, burst tested to 200 GB/h without manual intervention. Median search latency dropped from 4.2s on the PowerBuilder thick client to 110ms P95 on the React front-end. Batch ingest jobs that previously took eight days now complete in under four hours thanks to parallelised uploads and direct-to-S3 pre-signed URLs.
Two unexpected wins surfaced:
We are currently piloting document OCR with Amazon Textract routed via a secure VPC endpoint, exploring server-side encryption with customer keys (SSE-C) to extend tenant isolation for certain classified agencies, and evaluating Cedar policies for fine-grained authorisation at the API layer.
Arxium is a software consultancy focused on helping government agencies, banks, and enterprises build systems that matter. We specialise in modernising legacy platforms, designing digital services, and delivering scalable, cloud-native architectures.
Leaders across Australia trust us to solve complex technical problems with clarity, pragmatism, and care. Whether it's migrating infrastructure, integrating systems, or launching a public-facing portal—we make software work in the real world.
Contact us↗ to start a conversation about your next project.