Backend Systems · Distributed Systems · Reliability · Mentorship · IIT Hyderabad

I build backend systems that keep complex products reliable at scale.

I’m Vaibhav Garg. I work on distributed backend systems where correctness, reliability, and operational clarity matter. I care about simple contracts, visible failure modes, pragmatic technical decisions, and teams that can confidently own what they build.

Java JVM / Backend
Go Systems
Python Scripting
Kafka Messaging
SQL Databases
Redis Caching
gRPC RPC
Bash Shell

About Me

Engineering taste shaped by failure modes.

I’m drawn to systems where correctness, failure handling, and human operability matter. Across commerce, customer engagement, healthcare, and resilient networks, the lesson has stayed the same: the hard part is rarely the code path that works.

My default approach is to turn ambiguity into clear contracts, observable behavior, defensible tradeoffs, and systems the next engineer can reason about without heroics.

Focus Areas

Where I add leverage.

I’m most useful when correctness, scale, reliability, product constraints, and cross-team coordination need to move together.

01 / SYSTEMS

Correctness-first architecture

Service boundaries, APIs, data flow, idempotency, and state transitions that keep high-value product flows predictable.

02 / RELIABILITY

Production reliability

Failure-aware design, observability, rollout safety, debugging paths, and reducing ambiguity before incidents become customer pain.

03 / LEADERSHIP

Systems judgment

Pragmatic tradeoffs, clear written reasoning, architecture reviews, and quality bars that help teams move faster with less rework.

04 / MENTORSHIP

Growing engineers

Mentoring engineers through design reviews, debugging habits, ownership, communication, and the next step in their career trajectory.

Engineering Principles

Reliability is a product feature.

The best systems are understandable before incidents, observable during incidents, and calm after they scale.

01 / FAILURE

Make failure visible

Strong systems expose failure modes, recovery paths, and ownership boundaries before the incident channel fills up.

02 / CONTRACTS

Keep contracts small

Narrow interfaces reduce coordination cost and make migrations possible without dragging every team through the same decision.

03 / OPERATIONS

Optimize for operators

Debuggability, observability, backfills, and rollout behavior are part of product quality, not cleanup work.

04 / TRADEOFFS

Write the tradeoff down

Good technical direction makes constraints, risks, and decisions easy for product, engineering, and operations teams to evaluate.

Experience

Experience across reliability-critical product systems.

A backend systems path across reliability-critical products, distributed workflows, data contracts, customer-facing systems, and operational ownership.

2022 — Present Uber
Staff Software Engineer Order Platform · Commerce Processing Platform

Backend ownership in Uber’s order platform and commerce-processing layer, where the core responsibility is storing, managing, and serving order lifecycles across lines of business. The platform keeps order state and commerce context available so downstream consumers can process orders reliably and on time.

Uber-wide orders Order lifecycle Order data access Commerce context Downstream consumers Production mitigation
  • Worked on backend systems that manage order creation, updates, finalization, adjustments, and reads across high-scale order flows.
  • Designed reliability and consistency safeguards around order state, lifecycle updates, and data contracts consumed by downstream systems.
  • Drove alignment across product, platform, data, and partner teams for order-data migrations, access patterns, and lifecycle changes.
  • Handled high-severity production mitigations involving cross-functional order-processing and downstream-consumer flows.
  • Mentored engineers, raised the quality bar for incident postmortems, and contributed to engineering hiring through technical interviews.
  • Presented architecture decisions in leadership reviews, technical deep dives, and broader engineering forums.
2019 — 2022 MoEngage
Technical Lead Backend for auto-trigger and in-app campaigns

Backend leadership for the streaming and orchestration systems behind personalized mobile and web campaigns, where predictable latency and cost efficiency mattered as event volume grew.

Auto-trigger campaigns In-app campaigns Event pipelines Latency reduction Team leadership
  • Led a 10-engineer team building B2B customer engagement backend systems for mobile and web channels.
  • Reduced API latency by ~80% and infrastructure costs by ~50% while scaling streaming pipelines for high-throughput event processing.
  • Built workflow orchestration, A/B testing, in-app HTML campaigns, and monitoring capabilities across the engagement platform.
  • Recognized with Customer Obsession and Rockstar awards across multiple quarters.
2018 — 2019 MFine
Senior Software Engineer Backend for MFine app/web

Backend systems for healthcare workflows where reliability affected clinics, doctors, and patients directly. Focused on storage efficiency, workflow resilience, and offline-capable product behavior.

Healthcare backend Storage efficiency Offline care Workflow reliability Product systems
  • Reduced primary database storage by 50%+ through an archive library, enabling clinic data to stay online without hardware upgrades.
  • Built offline-capable healthcare workflows for clinic case registration - handling connectivity loss gracefully where the alternative is paper.
  • Strengthened reliability across patient-facing flows where downtime directly affects care delivery.
2015 — 2018 Strand Life Sciences
Software Engineer Backend for Operations Management System

Operations platform for genomics and healthcare workflows. Built search, filtering, notification, and workflow systems used by hospital operations teams, and helped migrate the product experience to SPA architecture.

Operations workflows Search dashboards Notifications SPA migration Reusable filters
  • Migrated operations platform to SPA architecture, improving responsiveness for hospital operations teams working across complex multi-step workflows.
  • Built search, notification, and reusable filter systems across users, cases, payments, and hospital workflows.
  • Developed reusable filter persistence framework - saved and restored search state across sessions, reducing repeated manual filtering by ops teams.
2012 — 2015 IIT Hyderabad
Master of Technology Computer Science and Engineering · Research Assistant

Graduate research in post-disaster ICT systems: networks and software designed for environments with limited infrastructure, unreliable connectivity, and real operational constraints.

Systems foundations Research discipline Distributed thinking Tradeoff analysis Resilient infrastructure
  • Focused on resilient systems design: building ad-hoc communication networks that operated without fixed infrastructure in post-disaster environments.
  • Published 3 peer-reviewed papers on wireless mesh networks, ICT data pipelines, and self-contained software for disaster recovery.
  • The research question - how do you build a system that works when everything around it has failed - became the lens for every engineering decision since.

Research

Research foundation in resilient systems.

Graduate research on communication and information systems for post-disaster environments. Building networks that work without infrastructure sharpens how you think about failure.

2015 / WiMob

Performance Evaluation of Wireless Ad-hoc Network for Post-Disaster Recovery using Linux Live USB Nodes

IEEE WiMob · Wireless and mobile computing · Oct 2015 · pp. 125-131.

View paper
2015 / ICNS

A Self-Contained Software Suite for Post-Disaster ICT Environment Using Linux Live USB

ICNS · Networking and services · May 2015 · pp. 17-23.

View paper
2014 / R10 HTC

Acquisition, storage, retrieval and dissemination of disaster related data

IEEE Region 10 Humanitarian Technology Conference · Aug 2014 · pp. 58-63.

View paper

Contact

Get in touch.

LinkedIn is the best starting point. GitHub has code. Codementor is where I mentor engineers on backend systems and architecture.