DoorDash Interview — Project Write-up
Learning 1 — Synthetic data is a last resort, not a starting point
Spent ~2 months on synthetic-supervised training before pivoting. The failure signal was persistent: high training accuracy, poor real validation, no statistical metric that reliably explained the gap. We kept building more complex analysis instead of changing strategy. The correct trigger to pivot was earlier: if N training cycles show no stable improvement on real data, the approach is wrong — not the measurement. Now: in data-scarce document domains, I default to self-supervised pretraining on real unlabelled data first.
Learning 2 — De-risk research bets with explicit business stop-gates
When leadership is skeptical of longer-horizon ML directions, architecture quality alone is not persuasive. What worked was converting the approach into a staged business commitment: limited pilot scope, pre-agreed threshold, and kill criteria. I now use this pattern by default for contentious initiatives: define the gate first, then ask for runway.