WORK STREAM
Dunning
When a recurring payment fails and the customer isn't present, dunning takes over. One engine sees everything. Graduated consequences replace blunt flags. Two phases remain: Shore Up fixes the leaking joints, One Code Path replaces the wrong pipes entirely.
A failed charge enters the retry loop. Each retry either succeeds (resolving the invoice) or fails again. When retries are exhausted, dunning escalates to account-level consequences. Bad-debt-from-upgrades grew from 3/day (late 2024) to 48/day (Nov peak), and sits at 39/day today — ~10,800 accounts per year incorrectly in bad debt from upgrades.
Dunning's escalation produces binary outcomes — bad_debt flag or not, ban or not. A customer whose card expired last week and one who hasn't paid in six months receive the same treatment.
When dunning flags an account, entitlement revocation is a separate process. Dunning decides an account is delinquent — whether that actually affects what the customer can use is someone else's problem.
The A3→A1 HTTP call is synchronous. If Account Management is down or slow, dunning's escalation path is blocked.
An expired card (permanent) gets the same retry cadence as insufficient funds (transient). The recompute engine can see the difference, but the retry logic doesn't use it.
Fix the leaking joints — the problems solvable without rearchitecting. In progress, target March 28, 2026.
Show "Resolved" vs "Paid" correctly. Customers and support can see what actually happened.
Time-based guard plus stuck-in-lifting gauge. Prevents drift remediation from clobbering in-flight payments.
Staged gate rollout: individual → 5% → 10% → 25% → 50% → 100%. Currently disabled after false positives.
Delegate to existing subscription lifecycle workflow. Accounts that exhausted dunning get cancelled, not left in limbo.
Own endpoint (/pay-bad-debt) with guards. Separate from the regular collection path. No cross-contamination.
Entry/exit metrics for bad debt. Who entered, why, what products, how long they stayed. The instrumentation that makes everything else measurable.
Replace the wrong pipes — blind per-event handlers replaced by a single recompute engine. Mid-April → mid-May 2026.
Read all 5 stores → compute what the dunning status should be → apply the minimal diff. Idempotent by construction. Disagreement becomes structurally impossible because one engine sees everything.
One handler, all events. 5 event types: PAYMENT_FAILED, MARKED_UNCOLLECTIBLE, BAD_DEBT_PAID, plus two new triggers — DRIFT_DETECTED and NINJAPANEL_BUTTON.
Visible failures, self-healing retries. No more silent drops into subs_stuck_events. Every failure is a workflow state, not a lost event.
Support self-service — trigger a full account recompute from NinjaPanel. Replaces manual SQL investigation.
Instead of setting a bad_debt flag, dunning produces payment outcome events. Entitlement consequences scale with severity and duration. Early failures may restrict premium features while preserving core access. The blunt ban-or-don't binary is replaced with a spectrum.
Dunning owns the off-session long tail — what happens when payment fails and the user isn't there. The recompute engine gave it complete visibility. The overhaul connects that visibility to entitlement consequences through payments-driven entitlements.