Software Engineering October 10, 2024

LogStudio

A lightweight log ingestion + exploration tool: CLI ingests mixed-format logs into SQLite with FTS5, and a Blazor UI enables fast search + correlation by trace/request/session.

dotnetblazorsqlitefts5observabilitydevtoolsef-core

Overview

I built LogStudio to solve a practical debugging problem: quickly explore large log sets with both structured JSON logs and plaintext logs, and correlate events by trace ID / request ID / session ID.

Highlights:

CLI ingestion from folders/patterns into a local SQLite DB
Automatic parsing of JSON logs and plaintext logs with trailing JSON payloads
SQLite FTS5 full-text search for fast “message contains” queries
Blazor UI (MudBlazor) with quick filtering, virtualization, and detail views
Minimal API layer for query + log detail retrieval

The problem I was solving

When debugging backend systems, raw log files create friction:

logs are large and slow to search
formats vary by source (JSON vs plaintext)
correlating across request boundaries is manual
“grep” gets you a line, not the context

I wanted a tool that feels like a local “mini observability console”:

ingest quickly
search instantly
click into an event and see context + correlation threads

Architecture

LogStudio is split into clear layers:

Core
- Domain model: LogEvent
Infrastructure
- EF Core + SQLite storage
- Ingestion pipeline
- FTS5 initialization and indexing
Host
- Web app (Blazor interactive server rendering)
- Minimal API endpoints for querying
- UI components for exploration
Tools
- CLI that ingests logs from a glob/pattern

This separation keeps parsing/storage logic testable and reusable outside the UI.

Storage & indexing strategy

Why SQLite

Local-first workflows: no infra required
Great fit for “I want to inspect logs now”
Easy sharing of a single .db file as an artifact

Key schema objects

Events table (EF Core entity LogEvent)
- Timestamp, level, logger, message, source file, line number
- Optional correlation IDs: trace/request/session
- JSON properties (PropsJson)
LogEventFts virtual table (FTS5)
- Message, PropsJson
- Linked to Events via rowid = Events.Id

-- External content FTS5 table: indexes without duplicating data
PRAGMA journal_mode=WAL;
CREATE VIRTUAL TABLE IF NOT EXISTS LogEventFts
USING fts5(Message, PropsJson, content='');

The content='' option creates an “external content” FTS table—searches use the index, but actual text lives in the Events table. This saves significant storage on large log datasets.

WAL mode

The DB initializes in WAL mode, improving concurrency and write performance for the ingestion pattern.

Ingestion pipeline

Input support

The ingestor supports two common patterns:

Pure JSON logs

If a line begins with {, it attempts JSON parse and extracts:
- timestamp (timestamp, ts, or @t)
- level (level, loglevel, or @l)
- message (message or @m)
- message template (messageTemplate or @mt)
- logger (logger, sourceContext, or category)
- trace/request/session IDs (including correlationId → requestId)
- property payload (props) or any remaining fields as “extras”

Plaintext logs with optional trailing JSON

Attempts to parse a leading timestamp
Extracts a trailing JSON payload if it exists after a {
Pulls trace/request/session IDs out of that JSON when present
Stores message text and props cleanly

Batch behavior and transactions

Ingestion runs inside a DB transaction per file
Records flush in batches (e.g., 2000 events) to balance memory and throughput
After inserting events via EF Core, the pipeline also inserts into FTS5 with the same IDs

// Memory-efficient streaming with IAsyncEnumerable
private static async IAsyncEnumerable<string> ReadLinesAsync(string path,
    [EnumeratorCancellation] CancellationToken ct = default)
{
    await using var stream = new FileStream(path, FileMode.Open, FileAccess.Read,
        FileShare.ReadWrite | FileShare.Delete);  // Read active log files
    using var sr = new StreamReader(stream);
    while (!sr.EndOfStream)
    {
        ct.ThrowIfCancellationRequested();
        var s = await sr.ReadLineAsync();
        if (s is not null) yield return s;
    }
}

The FileShare.ReadWrite flag allows reading logs that are still being written to—essential for tailing active application logs.

This approach keeps ingestion fast without compromising query performance later.

Query model (API)

Query endpoint

The minimal API accepts a query payload like:

date range (FromUtc, ToUtc)
levels filter (Levels) and/or minimum level (MinLevel)
sorting (SortBy, SortDirection)
paging (Offset, Limit)
text filter (Text)

Query language ergonomics

To make it feel natural, the text field supports:

trace:<guid> or trace=<guid>
request:<guid> / req:<guid> / corr:<guid> etc.
session:<string>
id:<guid> (matches trace or request)
a raw GUID (auto-detected)

This prevents you from needing a “separate UI” just for correlation IDs.

Two search modes

FTS mode

If text starts with message ~ ..., the query uses FTS5:
- Id IN (SELECT rowid FROM LogEventFts WHERE LogEventFts MATCH $q)

LIKE mode

Otherwise, it searches:
- Message LIKE $like OR PropsJson LIKE $like

// Correlation ID prefix parsing for ergonomic queries
if (lowered.StartsWith("trace:") || lowered.StartsWith("trace="))
{
    target = IdTarget.Trace;
}
else if (lowered.StartsWith("request:") || lowered.StartsWith("req:") ||
         lowered.StartsWith("correlation:") || lowered.StartsWith("corr:"))
{
    target = IdTarget.Request;
}
else if (lowered.StartsWith("session:"))
{
    target = IdTarget.Session;
}
else if (Guid.TryParse(text, out _))
{
    target = IdTarget.Any;  // Auto-detect raw GUIDs
}

This query DSL lets you type trace:abc123 or paste a raw GUID and get instant correlation filtering without switching UI modes.

Parameterized SQL for safety and speed

Where it matters, the query path uses parameterized SQL to avoid injection and to keep the query planner stable.

Log detail experience

The log detail endpoint returns:

the selected event
structured properties (parsed from JSON)
“pretty printed” JSON for readability
context:
- 5 events before
- 5 events after
correlation sets:
- related by trace ID
- related by request ID
- related by session ID

In practice, this reduces debugging time because you stop “guessing what happened around it.”

UI design

The UI is Blazor (interactive server rendering) with MudBlazor components.

Key UX choices:

Virtualized grids so large datasets remain responsive
Quick toggles for detail visibility
A log detail dialog that emphasizes:
- the message
- the properties (with kind/type)
- the context and correlation sets

Performance notes

What makes it fast:

SQLite indexing on timestamp + correlation IDs
FTS5 for full-text search
batch ingestion + WAL mode
lightweight API payloads (query rows are DTOs)

What I’d add next

Pluggable parsers (Serilog/ELK formats, NLog layouts)
Saved searches + bookmarks
Export correlated threads (trace/request) to a single shareable view
Retention & DB compaction controls
Import from a streaming source (HTTP tailing)

Results

This content appears in both Overview and Deep Dive modes.

Dramatically reduced “log spelunking” time by making search and correlation interactive.
Converted messy log files into a durable, queryable dataset with strong UX affordances.
Produced a reusable pattern: ingest → index → query → correlate → explain.

Technologies Used

.NET (targeting modern runtime)
Blazor (interactive server rendering)
MudBlazor UI components
EF Core + SQLite
SQLite FTS5 full-text search
Minimal APIs