DoorDash Interview — Project Write-up
1. Problem Statement URL copied
- Customer: Accounts Payable and Finance Ops teams at Jio Infocomm, processing ~20,000 vendor invoices per day
- Scale: ~120 employees dedicated solely to manual invoice validation
- Existing system: OCR → text file → regex rules + if/else conditions extracting 6 fields per invoice
- Invoice Number,
- PO Number,
- Invoice Date,
- PO Date,
- Invoice Amount (numeric),
- Invoice Amount in Words
- Why it was broken: The regex file was a growing monolith — every new vendor required new rules, new conditions. System was rigid and unmaintainable. All-6 exact-match accuracy across all vendors: ~5%
- Why critical: Invoice volume was growing. The only options were: hire more people indefinitely, or buy an external solution, or replace the internal system with a better one. The Finance team preferred the buy option for lower delivery risk, but we proposed an in-house build with a six-month stop-gate to demonstrate improved accuracy and lower operating cost.
- My role: Lead. Proposed the architecture, set staged stop-gates for delivery risk, coordinated across Finance, Infra/DevOps, and PM. Team of 4. Timeline: 6 months to demonstrate improved accuracy
- What made it uniquely hard:
- Early labelled data existed for a text-only NER prototype, but that approach underperformed because it flattened OCR output and discarded spatial layout signal; scaling labels on that wrong formulation would not fix the core issue
- No off-the-shelf open-source model handled all 3 modalities (image + text + bounding boxes). Microsoft LayoutLM was not publicly licensed at the time
- ~1 crore historical invoices available but entirely unlabelled
- Cross-team leadership moments:
- Engineering leadership initially pushed for direct supervised modeling with immediate business proof. I proposed a staged stop-gated plan (pilot on poor-quality vendors first, then scale only if it beat baseline).
- Finance stakeholders pushed to buy an external solution. We aligned on a six-month in-house gate: >=50% pass-through in a 2-hour burst window with lower operating cost.
- Intended outcome: A generalizable extraction system that works across all vendors, including unseen ones, with zero per-vendor configuration
Solution Summary
- Unsupervised multimodal pretraining on the company's historical invoice corpus.
- Followed by supervised fine-tuning on a small labelled set.
- Deployed on Seldon + Triton with Kubernetes autoscaling to meet bursty load.