# Observability

> Traces, metrics and logs through OpenTelemetry into Grafana, and how to instrument your own slices.

## The pipeline

The API emits all three OpenTelemetry signals to an **OpenTelemetry Collector**, which fans them out
to dedicated backends, all explored through Grafana:

| Signal  | Backend     |
| ------- | ----------- |
| Traces  | Tempo       |
| Metrics | Prometheus  |
| Logs    | Loki        |

The collector, the backends and Grafana are all in `docker-compose.yml`, with Grafana dashboards
provisioned from `deploy/`. Open Grafana at [localhost:3010](http://localhost:3010) once the stack is
up.

## Logs and audit

Application logs go through **Serilog** and out over OTLP to Loki. Audit events use the same pipeline
rather than a separate table, so audit retention is a Loki configuration, and the API stays free of
purge jobs.

## Tracing a request

Because the whole request path is instrumented, a single browser action produces one trace spanning
the endpoint, the Wolverine handler, the database call and any published messages. When something is
slow, open the trace, find the expensive span, and jump to its correlated logs.

## Instrumenting your own slice

Most of the time the built-in instrumentation is enough. When a slice does something worth measuring,
add a span or a metric explicitly:

```csharp
private static readonly ActivitySource Activity = new("Slicekit.Search");
private static readonly Counter<long> Rebuilds =
    new Meter("Slicekit.Search").CreateCounter<long>("search.index.rebuilds");

using var span = Activity.StartActivity("rebuild-search-index");
span?.SetTag("project.id", projectId);
Rebuilds.Add(1);
```

Spans you start nest automatically under the request trace, so custom work shows up in context.

## Alerts

Alertmanager is included in the stack. Define alert rules against the Prometheus metrics your product
cares about (error rate, queue depth, latency) and route notifications where your team will see
them.
