Prototyping Future Services: Designing and Testing Digital Experiences

As part of out internal research and prototyping practice, we've developed reference designs and component systems tailored to public-facing government platforms

Why We Built an R&D Prototyping Track

Late in 2024 the consulting backlog told a clear story: nearly every engagement—government portals, banking self-service flows, enterprise data products—shared two frustrations. First, business owners struggled to articulate requirements until they could touch something. Second, engineers lost weeks hardening disposable POCs that were never meant for production.

To break the cycle we launched an internal initiative called Future Services: a standing squad tasked with building throw-away prototypes that feel like real products but cost next-to-nothing to discard. Our charter is simple: validate value, de-risk technology choices, and hand teams a reference implementation they can either adopt or ignore without regret.

Operating Model

Future Services runs as a continuous discovery engine. Every quarter the consulting guild nominates two or three recurring problem spaces. We dedicate a six-week sprint to each theme:

Week 1 is immersion: domain interviews, shadowing call-centre workflows, reading legislative constraints.
Weeks 2 - 4 are for making “steel-thread” prototypes—minimum viable verticals that span UI, service layer, and deployment.
Week 5 is test and instrumentation: synthetic load, accessibility audits, threat modelling, cost profiling.
Week 6 produces the artefacts: a public demo video, an ADR collection, and a git tag labelled “throw-away-ready”.

The cadence is fast because tooling is opinionated. The front-end is always React + TypeScript with shadcn/ui; the back-end is either Go or Node + Fastify, containerised with BuildKit, pushed to a shared ECR, and deployed through a single-tenant Argo CD instance that lives in an R&D AWS account. Terraform Cloud seeds every environment; feature flags ride LaunchDarkly so that we can yank experiments without rebuilds.

Case Example - AI-Assisted Document Triage

One of of clients receives thousands of PDF submissions each month — approval requests, compliance reports, incident notifications. Turn-around time had ballooned from five to eighteen days. We hypothesised that Large Language Models could classify urgency and route outliers to human reviewers while letting trivial cases sail through automated checks.

Instead of promising an end-to-end product we set a 42-day goal: “Demonstrate a service that ingests a submission, scores criticality within five seconds, and exposes an audit trail.”

We embedded LangChain orchestration around a locally hosted Mistral-7B model served via Nvidia A10G instances on SageMaker. A lightweight FastAPI gateway handled file uploads, hit an S3 pre-signed URL, triggered a Textract asynchronous job, and piped the parsed text into the LLM chain. Outputs—classification, reasoning chain, JSON evidence were persisted in DynamoDB and surfaced through a React admin console.

The entire stack spun up in under twenty minutes from zero via Terraform workspaces. Synthetic workloads of 10,000 PDFs per hour cost ~AUD 35 thanks to spot GPU fleets. Latency averaged 4.1s P95; accuracy, measured against a gold-label set, hit 92% precision on "urgent" tags, well above our client's 80% threshold for pilot acceptance.

Engineering Lessons

Prototype code is production risk if left unguarded. We keep the git root read-only inside mainline projects; engineers must fork and consciously promote.
Observability starts on day one. Each prototype pushes OpenTelemetry traces to a Grafana Cloud sandbox so that future teams inherit metrics, not mysteries.
Design systems pay dividends even in throw-aways. Because UI elements come from the same shadcn library, screen-reader compliance and colour-contrast ratios are solved up-front, not retro-fitted.

Hand-off and After-Life

When the six weeks expire we hold an open demo—slides banned, live environment mandatory. Stakeholders decide:

Adopt - spin a fresh repo, cherry-pick modules, and move it into the client's pipeline.
Adapt - salvage the reference architecture but rebuild components.
Archive - take the lessons, delete the cluster, and move on.

In the AI triage example our client chose Adapt. Their internal devs replaced our FastAPI edge with Java Spring because of existing skill sets, swapped SageMaker for on-prem GPUs, but retained the Textract preprocessing and the LaunchDarkly flag strategy.

Impact

Future Services has produced nine prototypes; four graduated into paid follow-on projects, three informed RFP responses, and two were intentionally binned. Average discovery-to-decision shrank from four months to six weeks, saving clients budget and sparing delivery teams from death-march pivots.

More subtly, the initiative hardened Arxium's internal platform. The shared Terraform modules, Argo patterns, and observability scaffolding born in prototypes are now the default baseline for every billable engagement, proof that disciplined throw-aways can raise engineering quality across the board.

About Us

Arxium is a software consultancy focused on helping government agencies, banks, and enterprises build systems that matter. We specialise in modernising legacy platforms, designing digital services, and delivering scalable, cloud-native architectures.

Leaders across Australia trust us to solve complex technical problems with clarity, pragmatism, and care. Whether it's migrating infrastructure, integrating systems, or launching a public-facing portal—we make software work in the real world.

Newer Post

« Federal Records Platform: Modernising a Paper-Based Process

Older Post

From Silos to Streaming: Engineering a Real-Time Service-Performance Dashboard »

About us

Services↗Team↗Contact↗

What we do

Our work↗Research↗

Arxium ©

2025