When the client — a national critical-infrastructure provider — first invited us in, their estate spanned three data-centres, 200+ Windows and RHEL VMs, an Oracle RAC cluster, two monolithic .NET 4.6 applications, a handful of Java services, and a sprawling collection of SSIS jobs that moved data around on cron-like schedules. Change windows were measured in weekends. Disaster-recovery tests required four weeks of preparation. Every new feature request began with the same grim question: “How much downtime can we negotiate?”
Cost was not the core problem; agility and reliability were. The CIO's mandate was therefore unambiguous: consolidate the estate, raise availability to “four nines”, and shorten release cycles from quarterly to on-demand — without breaking regulatory alignment (IRAP-aligned controls, Essential Eight maturity 3, PCI DSS for a niche payments module).
We settled on Azure because the organisation already consumed O365 and AD FS, which gave us identity primitives and an existing EA. Our architecture had three guiding principles:
We began with a paired landing-zone model using Terraform and the CAF enterprise module. Each zone deployed:
The Oracle RAC workload was the largest anchor keeping them on-prem. We evaluated three options: Azure VM RAC, Azure Database for Oracle (by Oracle), and logical migration to PostgreSQL. Regulatory deadlines ruled out a wholesale RDBMS shift, so we containerised the existing RAC nodes on AKS with FlashGrid — giving us synchronous Data Guard replicas across the two regions. For telemetry and analytics we ingested CDC streams into a region-paired Cosmos DB and exposed query surfaces through Synapse Serverless.
The .NET monoliths were decomposed along obvious bounded contexts—billing, customer-profile, and reporting — into six services running on AKS with Dapr sidecars for service discovery and pub/sub. We translated WebForms UIs to Razor pages, introduced Serilog structured logging, and standardised health endpoints to the Kubernetes SIG instrumentation spec. Legacy Java WAR files were packaged into OpenShift-compatible images and deployed unchanged; build pipelines simply swapped out the target context.
Every repo gained a GitHub Actions workflow: PR-triggered build, Snyk scan, Trivy scan, unit tests in parallel, Docker build, and a call to Atlantis which planned the corresponding Terraform change set. Once a release candidate was tagged, Argo CD (running in a management cluster) pulled updated Helm charts and promoted through dev → test → prod with automated smoke tests in k6 and Postman. Rollback is instantaneous: we pin the Helm release version and Argo reconciles.
Because RAC replication lag never exceeded 30s and k8s deployments used blue-green semantics, we executed the final cut-over during a Tuesday 02:00 maintenance slot. Traffic was shifted gradually by adjusting DNS TTL down to 60s the week prior, then flipping the Front Door backend pools from DC VIPs to Azure Public IP prefixes. No dropped transactions were observed; synthetic monitoring reported <150 ms P95 end-to-end latency throughout.
We continue to refine the platform: introducing GKE as a secondary cloud, exploring serverless containers with Azure Container Apps, and piloting Chaos Mesh for failure-injection drills. If you’re facing similar constraints — legacy sprawl, high-availability mandates, or audit pressure — I’d be happy to share the Terraform modules, Helm charts, and run-books we developed along the way.
Arxium is a software consultancy focused on helping government agencies, banks, and enterprises build systems that matter. We specialise in modernising legacy platforms, designing digital services, and delivering scalable, cloud-native architectures.
Leaders across Australia trust us to solve complex technical problems with clarity, pragmatism, and care. Whether it's migrating infrastructure, integrating systems, or launching a public-facing portal—we make software work in the real world.
Contact us↗ to start a conversation about your next project.