Tiered Billing Automation Service (Windows Service + Dashboard)
A Clean Architecture .NET service that moved billing from crib-count logic to tier-based subscriptions, automatically tracking SSO/API participation and generating auditable billing history.
Executive Summary
I built a tiered billing automation service that continuously reconciles account capabilities (SSO, API access, inventory mode) into a billing tier and writes an append-only billing history for auditability. It runs as a 24/7 Windows Service, schedules scans via Hangfire, and serves a lightweight internal dashboard for finance/ops to review changes.
What problem this solved
The legacy billing model depended heavily on “billable cribs” and didn’t reliably reflect new monetized features like:
- SSO participation
- API enablement
- Tier-based pricing (e.g., Lite vs Plus vs Pro)
Accounting needed something that:
- Accurately models the hierarchy (Distributors → End Users → Sites/DBs)
- Continuously detects changes
- Records changes immutably so billing disputes can be resolved with evidence
- Does not sleep or rely on an IIS pool staying warm
My role
- Designed the architecture (layering + dependency boundaries)
- Implemented the core “assemble + enrich” domain model pipeline
- Implemented polling + background execution and change tracking
- Implemented correlation + structured logs to support root cause analysis
- Integrated external “capability signals” (SSO and API participation)
Constraints
- Must run unattended and recover gracefully
- Must be safe to run repeatedly (idempotent scanning)
- Must avoid expensive DB re-fetching when users navigate the dashboard
- Must surface enough diagnostics to debug production issues quickly
Domain Model
The service builds a single enriched in-memory model that represents the business hierarchy (Organizations, Distributors, End Users, Databases) and attaches billing-relevant state to each node. This becomes the “truth” for deciding tier and detecting changes.
Why modeling mattered
Tier logic is only easy when your model is accurate. The system has multiple relationship tables and mixed concepts:
- Distributor vs “Direct customer” behavior differences
- End users that exist under a distributor vs standalone
- Multiple sites/DBs per end user
- A mix of feature flags and external participation signals
How the model is constructed
A dedicated orchestration service loads raw DTOs and turns them into domain entities:
- Pull “flat” DTOs for Organizations, Distributors, EndUsers, Databases
- Pull BillingHistory DTOs and group them by OrgId and DbId
- Build dictionaries/lookups once to avoid repeated traversals
- Hydrate
Databaseentities with current + historical billing events - Build
EndUserentities with their database sets + billing history - Build
Distributorentities and attach end users - Compute derived summaries like “arcturus composition” (counts by type)
- Cache intermediate products to avoid rebuilding on every UI navigation
The entity hierarchy follows a strict inheritance pattern:
// Domain layer entity relationships
Organization (abstract)
├── OrganizationID, Name
├── LastBillDate / NextBillDate
└── SSOOrganizationID / SSOEnabled
Distributor : Organization
├── Direct (bool)
├── EndUsers (collection)
└── Bills (history)
EndUser : Organization
├── ParentOrganizationID
├── Databases (collection)
└── Bills (history)
Database
├── DbId, Company, DbSchema
├── ArcturusType (billing tier)
└── SsoAccess / RestApiAccessThe key performance optimization is pre-building lookup dictionaries to avoid O(n²) traversals:
// Build lookup dictionaries for efficient hydration
var billingHistoryByDbId = allBillingHistoryDtos
.Where(bill => bill.DbId != 0)
.OrderByDescending(dto => dto.RecordID)
.GroupBy(entry => entry.DbId)
.ToDictionary(group => group.Key, group => group.ToList());
// Then hydrate entities efficiently via dictionary lookup
var enrichedDb = new Database(
dbId: db.DbId,
bill: billingHistoryByDbId.TryGetValue(db.DbId, out var history)
? history.First()
: null,
billHistory: history ?? new()
);Cache and rebuild strategy
- The service caches the “assembled” list and key intermediate lists in-memory
- Each scheduled scan can request a
rebuildCache=trueto force a fresh rebuild - UI navigation uses cached results so it does not trigger a full DB pull
Architecture & Stack
This is a Clean Architecture solution with a UI host that runs as a Windows Service, schedules periodic jobs via Hangfire, and uses MediatR to keep orchestration clean and testable.
High-level layers
- Domain: entities such as
Distributor,EndUser,Database, billing event structures - Application: use cases and orchestration (query/command handlers), task queue, polling entry points
- Infrastructure: external integrations (SSO/API signals), caching, persistence repositories, Hangfire wiring, structured logging
- Presentation/UI Host: internal dashboard + background workers
The layer dependencies enforce strict boundaries:
┌─────────────────────────────────────────────────────┐
│ Presentation.UI (Blazor) │
│ MediatR, Fluent UI, StreamRendering │
└───────────────────────┬─────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ Infrastructure │
│ EF Core, Hangfire, WorkOS, Serilog │
└───────────────────────┬─────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ Application │
│ MediatR, Use Cases, Interfaces, Services │
└───────────────────────┬─────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ Domain │
│ Entities, DTOs (zero dependencies) │
└─────────────────────────────────────────────────────┘Key libraries and why they’re here
- Hangfire: reliable recurring job scheduler + dashboard for scan visibility
- MediatR: keeps “scan workflows” decoupled from controllers/workers
- WorkOS SDK: pulls SSO connection participation state
- EF Core: repository implementations / persistence access patterns
- Fluent UI: internal dashboard UX and data grid patterns
Windows Service hosting (why it matters)
The host is built and deployed as a self-contained executable configured for Windows Service hosting. The goal is “always-on” execution without relying on a web server staying warm.
Scanning & Job Execution
A scheduled job enqueues work; a background worker consumes tasks and executes scan commands. This prevents the scheduler thread from doing heavy work directly and keeps execution resilient.
Execution pipeline (conceptual)
- Hangfire triggers a scan on a cron schedule (example: hourly)
- The scan endpoint calls a
PollingServicethat enqueues a task - A
BackgroundServiceworker dequeues tasks and executes the correct workflow:- “API scan” → updates API-driven tier changes
- “SSO scan” → updates SSO participation and org-level flags
- The worker uses correlation context to produce a cohesive trace per scan
- The UI is updated via a change tracking service (tree/list diffs)
Why a queue exists inside the service
Even with Hangfire, I intentionally separated “trigger” from “work”:
- Scheduling should be fast and predictable
- Work can overlap or be throttled
- A queue creates a single place to apply concurrency rules later
The MediatR pipeline wraps every request with correlation and logging:
// Pipeline behavior for cross-cutting concerns
public sealed class CorrelationBehavior<TRequest, TResponse>
: IPipelineBehavior<TRequest, TResponse>
{
public async Task<TResponse> Handle(
TRequest request,
RequestHandlerDelegate<TResponse> next,
CancellationToken ct)
{
var opName = $"request.{typeof(TRequest).Name}";
using var op = _corr.StartOperation(opName, correlationId);
_log.Info($"{opName}.start");
var response = await next();
_log.Info($"{opName}.end");
return response;
}
}Tier Determination Logic
Tier is derived from local configuration/state (inventory mode, billable features) plus external participation signals (SSO connections, API clients). Changes write billing events to an append-only history.
“Local” tier signals
- Database/account types reflect inventory mode and enabled features.
- DB-level tier state is updated by stored logic and verified by scans.
External signals
- SSO: connections/participation reflect whether an organization should be considered “Pro”
- API: presence of active API clients upgrades eligible accounts to “Plus” (or higher)
Event history as an audit log
Instead of overwriting meaning, changes append a new billing history record so you can answer:
- “When did this account become Pro?”
- “What triggered this tier change?”
- “What did the state look like before and after?”
Observability: Correlation + Structured Logging
Every scan/work item is wrapped in a correlation scope and emits structured JSON logs so failures can be traced end-to-end.
Correlation strategy
- Uses an operation scope that starts an
Activity - Supports continuing correlation IDs across queued work items
- Makes it possible to trace:
- Hangfire trigger → enqueue → worker → use-case execution
The Hangfire filter preserves correlation across background job boundaries:
// Preserve trace context across job execution
public class CorrelationJobFilter : IClientFilter, IServerFilter
{
// Capture correlation when enqueuing
public void OnCreating(CreatingContext ctx)
{
var correlationId = Activity.Current?.TraceId.ToString()
?? Guid.NewGuid().ToString();
ctx.SetJobParameter("CorrelationId", correlationId);
}
// Restore correlation when executing
public void OnPerforming(PerformingContext ctx)
{
var correlationId = ctx.GetJobParameter<string>("CorrelationId");
ctx.Items["CorrelationId"] = correlationId;
}
}Log strategy
Logs are emitted as JSON payloads suitable for ingestion and search (including fields like correlationId, elapsed time, route metadata where applicable). This is designed to support “find the scan that broke billing for account X” in minutes, not hours.
Testing & Hardening Notes
The architecture is intentionally “test-shaped” (use cases + services behind interfaces). Remaining work is mainly around expanding unit/integration coverage and surfacing more runtime metrics.
What I would harden next (if productizing further)
- Unit tests around tier calculation and “change detection” rules
- Integration tests for:
- WorkOS connection fetch behavior
- API client lookup behavior
- Repository queries and expected DTO shapes
- Configurable schedules (scan frequency per environment)
- Metrics:
- scan duration percentiles
- counts of tier changes per scan
- cache hit/miss rates
- Authorization in the dashboard if it ever becomes multi-tenant
Technologies Used
- .NET 8
- Clean Architecture (Domain/Application/Infrastructure/Host)
- Hangfire
- MediatR
- EF Core
- WorkOS integration
- Blazor + Fluent UI
- Structured logging + correlation IDs