Software Engineering November 27, 2024

Tiered Billing Automation Service (Windows Service + Dashboard)

A Clean Architecture .NET service that moved billing from crib-count logic to tier-based subscriptions, automatically tracking SSO/API participation and generating auditable billing history.

dotnetclean-architecturehangfireblazorfluentuiworkosobservabilitywindows-service

Executive Summary

I built a tiered billing automation service that continuously reconciles account capabilities (SSO, API access, inventory mode) into a billing tier and writes an append-only billing history for auditability. It runs as a 24/7 Windows Service, schedules scans via Hangfire, and serves a lightweight internal dashboard for finance/ops to review changes.

What problem this solved

The legacy billing model depended heavily on “billable cribs” and didn’t reliably reflect new monetized features like:

SSO participation
API enablement
Tier-based pricing (e.g., Lite vs Plus vs Pro)

Accounting needed something that:

Accurately models the hierarchy (Distributors → End Users → Sites/DBs)
Continuously detects changes
Records changes immutably so billing disputes can be resolved with evidence
Does not sleep or rely on an IIS pool staying warm

My role

Designed the architecture (layering + dependency boundaries)
Implemented the core “assemble + enrich” domain model pipeline
Implemented polling + background execution and change tracking
Implemented correlation + structured logs to support root cause analysis
Integrated external “capability signals” (SSO and API participation)

Constraints

Must run unattended and recover gracefully
Must be safe to run repeatedly (idempotent scanning)
Must avoid expensive DB re-fetching when users navigate the dashboard
Must surface enough diagnostics to debug production issues quickly

Domain Model

The service builds a single enriched in-memory model that represents the business hierarchy (Organizations, Distributors, End Users, Databases) and attaches billing-relevant state to each node. This becomes the “truth” for deciding tier and detecting changes.

Why modeling mattered

Tier logic is only easy when your model is accurate. The system has multiple relationship tables and mixed concepts:

Distributor vs “Direct customer” behavior differences
End users that exist under a distributor vs standalone
Multiple sites/DBs per end user
A mix of feature flags and external participation signals

How the model is constructed

A dedicated orchestration service loads raw DTOs and turns them into domain entities:

Pull “flat” DTOs for Organizations, Distributors, EndUsers, Databases
Pull BillingHistory DTOs and group them by OrgId and DbId
Build dictionaries/lookups once to avoid repeated traversals
Hydrate Database entities with current + historical billing events
Build EndUser entities with their database sets + billing history
Build Distributor entities and attach end users
Compute derived summaries like “arcturus composition” (counts by type)
Cache intermediate products to avoid rebuilding on every UI navigation

The entity hierarchy follows a strict inheritance pattern:

// Domain layer entity relationships
Organization (abstract)
    ├── OrganizationID, Name
    ├── LastBillDate / NextBillDate
    └── SSOOrganizationID / SSOEnabled

Distributor : Organization
    ├── Direct (bool)
    ├── EndUsers (collection)
    └── Bills (history)

EndUser : Organization
    ├── ParentOrganizationID
    ├── Databases (collection)
    └── Bills (history)

Database
    ├── DbId, Company, DbSchema
    ├── ArcturusType (billing tier)
    └── SsoAccess / RestApiAccess

The key performance optimization is pre-building lookup dictionaries to avoid O(n²) traversals:

// Build lookup dictionaries for efficient hydration
var billingHistoryByDbId = allBillingHistoryDtos
    .Where(bill => bill.DbId != 0)
    .OrderByDescending(dto => dto.RecordID)
    .GroupBy(entry => entry.DbId)
    .ToDictionary(group => group.Key, group => group.ToList());

// Then hydrate entities efficiently via dictionary lookup
var enrichedDb = new Database(
    dbId: db.DbId,
    bill: billingHistoryByDbId.TryGetValue(db.DbId, out var history)
        ? history.First()
        : null,
    billHistory: history ?? new()
);

Cache and rebuild strategy

The service caches the “assembled” list and key intermediate lists in-memory
Each scheduled scan can request a rebuildCache=true to force a fresh rebuild
UI navigation uses cached results so it does not trigger a full DB pull

Architecture & Stack

This is a Clean Architecture solution with a UI host that runs as a Windows Service, schedules periodic jobs via Hangfire, and uses MediatR to keep orchestration clean and testable.

High-level layers

Domain: entities such as Distributor, EndUser, Database, billing event structures
Application: use cases and orchestration (query/command handlers), task queue, polling entry points
Infrastructure: external integrations (SSO/API signals), caching, persistence repositories, Hangfire wiring, structured logging
Presentation/UI Host: internal dashboard + background workers

The layer dependencies enforce strict boundaries:

┌─────────────────────────────────────────────────────┐
│            Presentation.UI (Blazor)                  │
│         MediatR, Fluent UI, StreamRendering         │
└───────────────────────┬─────────────────────────────┘
                        ↓
┌─────────────────────────────────────────────────────┐
│                 Infrastructure                       │
│   EF Core, Hangfire, WorkOS, Serilog                │
└───────────────────────┬─────────────────────────────┘
                        ↓
┌─────────────────────────────────────────────────────┐
│                  Application                         │
│       MediatR, Use Cases, Interfaces, Services      │
└───────────────────────┬─────────────────────────────┘
                        ↓
┌─────────────────────────────────────────────────────┐
│                     Domain                           │
│          Entities, DTOs (zero dependencies)         │
└─────────────────────────────────────────────────────┘

Key libraries and why they’re here

Hangfire: reliable recurring job scheduler + dashboard for scan visibility
MediatR: keeps “scan workflows” decoupled from controllers/workers
WorkOS SDK: pulls SSO connection participation state
EF Core: repository implementations / persistence access patterns
Fluent UI: internal dashboard UX and data grid patterns

Windows Service hosting (why it matters)

The host is built and deployed as a self-contained executable configured for Windows Service hosting. The goal is “always-on” execution without relying on a web server staying warm.

Scanning & Job Execution

A scheduled job enqueues work; a background worker consumes tasks and executes scan commands. This prevents the scheduler thread from doing heavy work directly and keeps execution resilient.

Execution pipeline (conceptual)

Hangfire triggers a scan on a cron schedule (example: hourly)
The scan endpoint calls a PollingService that enqueues a task
A BackgroundService worker dequeues tasks and executes the correct workflow:
- “API scan” → updates API-driven tier changes
- “SSO scan” → updates SSO participation and org-level flags
The worker uses correlation context to produce a cohesive trace per scan
The UI is updated via a change tracking service (tree/list diffs)

Why a queue exists inside the service

Even with Hangfire, I intentionally separated “trigger” from “work”:

Scheduling should be fast and predictable
Work can overlap or be throttled
A queue creates a single place to apply concurrency rules later

The MediatR pipeline wraps every request with correlation and logging:

// Pipeline behavior for cross-cutting concerns
public sealed class CorrelationBehavior<TRequest, TResponse>
    : IPipelineBehavior<TRequest, TResponse>
{
    public async Task<TResponse> Handle(
        TRequest request,
        RequestHandlerDelegate<TResponse> next,
        CancellationToken ct)
    {
        var opName = $"request.{typeof(TRequest).Name}";
        using var op = _corr.StartOperation(opName, correlationId);
        _log.Info($"{opName}.start");

        var response = await next();

        _log.Info($"{opName}.end");
        return response;
    }
}

Tier Determination Logic

Tier is derived from local configuration/state (inventory mode, billable features) plus external participation signals (SSO connections, API clients). Changes write billing events to an append-only history.

“Local” tier signals

Database/account types reflect inventory mode and enabled features.
DB-level tier state is updated by stored logic and verified by scans.

External signals

SSO: connections/participation reflect whether an organization should be considered “Pro”
API: presence of active API clients upgrades eligible accounts to “Plus” (or higher)

Event history as an audit log

Instead of overwriting meaning, changes append a new billing history record so you can answer:

“When did this account become Pro?”
“What triggered this tier change?”
“What did the state look like before and after?”

Observability: Correlation + Structured Logging

Every scan/work item is wrapped in a correlation scope and emits structured JSON logs so failures can be traced end-to-end.

Correlation strategy

Uses an operation scope that starts an Activity
Supports continuing correlation IDs across queued work items
Makes it possible to trace:
- Hangfire trigger → enqueue → worker → use-case execution

The Hangfire filter preserves correlation across background job boundaries:

// Preserve trace context across job execution
public class CorrelationJobFilter : IClientFilter, IServerFilter
{
    // Capture correlation when enqueuing
    public void OnCreating(CreatingContext ctx)
    {
        var correlationId = Activity.Current?.TraceId.ToString()
            ?? Guid.NewGuid().ToString();
        ctx.SetJobParameter("CorrelationId", correlationId);
    }

    // Restore correlation when executing
    public void OnPerforming(PerformingContext ctx)
    {
        var correlationId = ctx.GetJobParameter<string>("CorrelationId");
        ctx.Items["CorrelationId"] = correlationId;
    }
}

Log strategy

Logs are emitted as JSON payloads suitable for ingestion and search (including fields like correlationId, elapsed time, route metadata where applicable). This is designed to support “find the scan that broke billing for account X” in minutes, not hours.

Testing & Hardening Notes

The architecture is intentionally “test-shaped” (use cases + services behind interfaces). Remaining work is mainly around expanding unit/integration coverage and surfacing more runtime metrics.

What I would harden next (if productizing further)

Unit tests around tier calculation and “change detection” rules
Integration tests for:
- WorkOS connection fetch behavior
- API client lookup behavior
- Repository queries and expected DTO shapes
Configurable schedules (scan frequency per environment)
Metrics:
- scan duration percentiles
- counts of tier changes per scan
- cache hit/miss rates
Authorization in the dashboard if it ever becomes multi-tenant

Technologies Used

.NET 8
Clean Architecture (Domain/Application/Infrastructure/Host)
Hangfire
MediatR
EF Core
WorkOS integration
Blazor + Fluent UI
Structured logging + correlation IDs