Invoice Created Charge Attempted Retry Loop Retries Exhausted Consequences

A failed charge enters the retry loop. Each retry either succeeds (resolving the invoice) or fails again. When retries are exhausted, dunning escalates to account-level consequences. Bad-debt-from-upgrades grew from 3/day (late 2024) to 48/day (Nov peak), and sits at 39/day today — ~10,800 accounts per year incorrectly in bad debt from upgrades.

Blunt Consequences

Dunning's escalation produces binary outcomes — bad_debt flag or not, ban or not. A customer whose card expired last week and one who hasn't paid in six months receive the same treatment.

Disconnected from Entitlements

When dunning flags an account, entitlement revocation is a separate process. Dunning decides an account is delinquent — whether that actually affects what the customer can use is someone else's problem.

Synchronous Seam

The A3→A1 HTTP call is synchronous. If Account Management is down or slow, dunning's escalation path is blocked.

No Failure Distinction

An expired card (permanent) gets the same retry cadence as insufficient funds (transient). The recompute engine can see the difference, but the retry logic doesn't use it.

Fix the leaking joints — the problems solvable without rearchitecting. In progress, target March 28, 2026.

Fix Billing History

Show "Resolved" vs "Paid" correctly. Customers and support can see what actually happened.

Protect Payment Flow

Time-based guard plus stuck-in-lifting gauge. Prevents drift remediation from clobbering in-flight payments.

Enable Full Remediation

Staged gate rollout: individual → 5% → 10% → 25% → 50% → 100%. Currently disabled after false positives.

Cancel What Should Be Cancelled

Delegate to existing subscription lifecycle workflow. Accounts that exhausted dunning get cancelled, not left in limbo.

Isolate Bad Debt Payment

Own endpoint (/pay-bad-debt) with guards. Separate from the regular collection path. No cross-contamination.

Add Counters

Entry/exit metrics for bad debt. Who entered, why, what products, how long they stayed. The instrumentation that makes everything else measurable.

Replace the wrong pipes — blind per-event handlers replaced by a single recompute engine. Mid-April → mid-May 2026.

Recompute Engine

Read all 5 stores → compute what the dunning status should be → apply the minimal diff. Idempotent by construction. Disagreement becomes structurally impossible because one engine sees everything.

Unified Handler

One handler, all events. 5 event types: PAYMENT_FAILED, MARKED_UNCOLLECTIBLE, BAD_DEBT_PAID, plus two new triggers — DRIFT_DETECTED and NINJAPANEL_BUTTON.

Temporal Orchestration

Visible failures, self-healing retries. No more silent drops into subs_stuck_events. Every failure is a workflow state, not a lost event.

NinjaPanel Recompute

Support self-service — trigger a full account recompute from NinjaPanel. Replaces manual SQL investigation.

Blunt Flags Payment Outcomes Graduated Consequences

Instead of setting a bad_debt flag, dunning produces payment outcome events. Entitlement consequences scale with severity and duration. Early failures may restrict premium features while preserving core access. The blunt ban-or-don't binary is replaced with a spectrum.