Goli Bhargav

Reflections · 2012 – 2017

Patterns from the prototyping era

Five years of hands-on prototyping with constrained hardware — Raspberry Pi, Arduino, Node MCU — for clients across logistics, retail, and SMB. The architectural intuition that came out of it still shapes how I design production systems today.

Working on small computers with bounded memory, intermittent network, and limited compute is a different discipline than designing on a cloud whiteboard. The constraints force decisions that the cloud lets you defer. You cannot simply add another node when the message queue overflows; you cannot retry forever when battery is finite; you cannot trust the network to be there when the next event fires. The decisions you make under those constraints either compound into clean systems or collapse into spaghetti within a week.

The patterns below are the ones that survived the era. They are not unique to embedded work — every one applies to large distributed systems too. But they were forged on hardware where the cost of getting them wrong was visible by the next morning, which is its own kind of teacher.

Pattern

Event-driven by default, sync only when forced

On a constrained device, every synchronous call is an opportunity for the device to hang. The radio dropped out, the upstream service is slow, the battery sagged — any of them can stall a thread that has nowhere else to go. The pattern that survives: events flow asynchronously through queues; the device produces messages and moves on; the consumer side handles ordering, retries, and idempotency. Synchronous calls exist only where the device cannot proceed without an immediate answer, and even those are wrapped in tight timeouts and fallback paths.

The same discipline applies in production systems at scale. Tight RPC coupling between domains scales until it doesn’t; events flowing through a shared bus give every consumer room to evolve and fail independently. The intuition came from devices; it generalized to platforms.

Pattern

Idempotency on every state-changing operation

Networks fail mid-request. The device sent the message; you do not know whether the upstream received it; the device retries. If the upstream is not idempotent, the action happened twice, and now your inventory shows two units when the user only sold one. On constrained hardware where reconnects are routine, idempotency is not optional — it is the only way to keep the system correct under realistic conditions.

The same is true at platform scale. Every state-changing operation has an idempotency key; consumers can retry safely; replays of the same message do not accumulate side effects. The pattern is invisible when it works, which is exactly the point.

Pattern

Explicit latency budgets, not "as fast as possible"

On a low-power device with a slow link, "as fast as possible" is meaningless. The question is: how much time can this operation take before the user notices, the battery drops below a threshold, or the next event arrives? You declare the budget — say, 800ms end-to-end for a sensor reading round-trip — and then every component in the chain has a slice of that budget. If a component exceeds its slice, it is the component that has to be fixed; the budget is the constraint, not a wish.

Production systems benefit from the same discipline. SLOs are latency budgets stated formally. Without one, every component believes its own performance is acceptable and the user-facing experience drifts unmeasured. With one, the slow component is the one that gets attention, instead of the loudest one.

Pattern

Graceful degradation as a first-class design choice

A device that loses network and goes dark is a broken device. A device that loses network and continues to do its job — buffering events, falling back to cached state, reconnecting when the link returns — is a working device. Designing for the failure modes is not a polish task; it is a primary design choice that informs every other decision. The same code path is exercised every day.

Cloud systems get to pretend that infrastructure is always available, until the day a region degrades or a dependency rolls out a bad release. The systems that handle that day well are the ones that designed for degradation from the start: cached responses when the source is unreachable, queued writes when the database is slow, partial-result UIs when one upstream is down. The discipline is the same as the embedded one; the surface is different.

Closing

The through-line

The prototyping era taught me that architecture is not about the technology stack on the whiteboard; it is about the constraints the system has to survive. The cloud lets you defer many of those constraints by spending money. The hardware era did not let you do that — every decision had to land correctly the first time, or the system stopped working.

Carrying that discipline into platform work has been the single most useful through-line of the career. The tools change every few years; the architectural reflex of "what happens when this fails" does not.