Skip to content

Distributed Tracing

Grant uses OpenTelemetry for distributed tracing. The API is auto-instrumented for HTTP, Express, GraphQL, and Redis; you choose a backend (Jaeger or OTLP) via configuration and can add custom spans for business operations. Bootstrap: apps/api/src/lib/tracing/index.ts. Config: TRACING_CONFIG in env.config.ts.

What you get

Distributed tracing records a trace per request: a tree of spans (units of work) with timing and attributes. That gives you:

  • Request timelines — See where time is spent (middleware, resolvers, DB, cache, external calls).
  • Cross-service correlation — Same trace ID across logs and downstream services when propagation is enabled.
  • Debugging — Filter by user.id, error=true, or duration in your trace backend.
How it's wired

Tracing is initialized at server startup before other services. Request IDs and optional user IDs are set on the active span in request-logging middleware. Redis is instrumented via @opentelemetry/instrumentation-ioredis. PostgreSQL is not auto-instrumented (app uses postgres.js). Shutdown runs during graceful server shutdown (shutdownTracing() before DB/cache close).

Request flow example

A trace is a tree of spans over time. Conceptually, one request might look like this:

Architecture

Backends

BackendUse caseKey config
JaegerLocal dev, self-hostedTRACING_BACKEND=jaeger, JAEGER_ENDPOINT
OTLPCloud / vendor backendsTRACING_BACKEND=otlp, OTLP_ENDPOINT

OTLP is the standard export format; most vendors (Datadog, New Relic, Honeycomb, etc.) accept OTLP. Set OTLP_ENDPOINT to your collector or vendor endpoint.

Configuration

VariableDefaultDescription
TRACING_ENABLEDfalseEnable tracing
TRACING_BACKENDjaegerjaeger or otlp
JAEGER_ENDPOINThttp://localhost:14268/api/tracesJaeger collector (when backend is jaeger)
OTLP_ENDPOINThttp://localhost:4318/v1/tracesOTLP endpoint (when backend is otlp)
TRACING_SAMPLING_RATE1.0Sampling rate 0.01.0 (e.g. 0.1 = 10%)
TRACING_SERVICE_NAMEgrant-apiService name in traces

Minimal local setup:

bash
TRACING_ENABLED=true
TRACING_BACKEND=jaeger
JAEGER_ENDPOINT=http://localhost:14268/api/traces

Custom spans

Add spans for business operations so they show up in the trace tree. Use active spans so context propagates.

Creating a span

typescript
import { getTracer } from '@/lib/telemetry/tracing';
import { SpanStatusCode } from '@opentelemetry/api';

const tracer = getTracer();

return tracer.startActiveSpan('OrganizationService.createOrganization', async (span) => {
  try {
    span.setAttribute('organization.name', data.name);
    const organization = await this.repository.create(data);
    span.setAttribute('organization.id', organization.id);
    span.setStatus({ code: SpanStatusCode.OK });
    return organization;
  } catch (error) {
    span.recordException(error);
    span.setStatus({ code: SpanStatusCode.ERROR, message: (error as Error).message });
    throw error;
  } finally {
    span.end();
  }
});

TIP

Use startActiveSpan, not startSpan, so the span is attached to the current trace context. Get the tracer once (e.g. in the service constructor) via getTracer() from @/lib/telemetry/tracing.

Useful attributes

ContextExample attributes
User / tenantuser.id, user.accountId, tenant.id
Domainorganization.id, project.id
Operationoperation.type, operation.entity
Countsrecords.count, batch.size

Nested spans

Create child spans for sub-operations so the trace shows a clear hierarchy:

typescript
return this.tracer.startActiveSpan('processImport', async (parentSpan) => {
  try {
    const records = await this.tracer.startActiveSpan('parseCSV', async (span) => {
      const result = await parseCSV(data.file);
      span.setAttribute('records.count', result.length);
      span.end();
      return result;
    });
    // ... validateRecords, insertRecords as separate startActiveSpan calls
    parentSpan.setStatus({ code: SpanStatusCode.OK });
  } catch (error) {
    parentSpan.recordException(error);
    parentSpan.setStatus({ code: SpanStatusCode.ERROR });
    throw error;
  } finally {
    parentSpan.end();
  }
});

Local development with Jaeger

Run Jaeger (all-in-one) with OTLP enabled:

bash
docker run -d --name jaeger \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 16686:16686 -p 14268:14268 -p 4318:4318 \
  jaegertracing/all-in-one:latest
PortPurpose
16686Jaeger UI
14268Jaeger collector (HTTP)
4318OTLP HTTP

Open http://localhost:16686, select service grant-api, and run "Find Traces".

Cloud / vendor backends (OTLP)

Use TRACING_BACKEND=otlp and set OTLP_ENDPOINT (and any API key env your vendor needs). No extra packages required for standard OTLP HTTP.

VendorEndpoint (typical)Notes
Datadoghttp://localhost:8126 or agent OTLPSet DD_API_KEY if required
New Relichttps://otlp.nr-data.net:4318/v1/tracesSet NEW_RELIC_API_KEY
Honeycombhttps://api.honeycomb.io/v1/tracesSet HONEYCOMB_API_KEY

Refer to each vendor’s docs for exact endpoint and headers.

Best practices

  • Add business contexttenant.id, user.id, organization.id so you can filter traces by customer or feature.
  • Record errors — In catch, call span.recordException(error) and span.setStatus({ code: SpanStatusCode.ERROR, message }).
  • Sample in production — Set TRACING_SAMPLING_RATE=0.1 (or 0.2–0.5) to limit cost and overhead.
  • Keep spans focused — One span per logical operation; use child spans for sub-steps, not one span around a long loop.

Querying traces (Jaeger UI)

GoalSteps
Slow requestsService grant-api → Min duration e.g. 500ms → Find Traces
ErrorsService grant-api → Tag error=true → Find Traces
By userService grant-api → Tag http.user_id=<id> (or user.id if you set it) → Find Traces

Performance

AspectTypical impact
CPU< 5%
Memory< 50 MB
Latency< 1 ms per span
NetworkBatched exports (e.g. 5 s interval)

Use sampling and limit attributes per span in high-traffic environments.

Troubleshooting

IssueWhat to check
No tracesTRACING_ENABLED=true; correct JAEGER_ENDPOINT or OTLP_ENDPOINT; logs for "tracing" / "OpenTelemetry" errors
High overheadLower TRACING_SAMPLING_RATE; ensure noisy instrumentations (e.g. fs, dns) are disabled in bootstrap
Span not in traceUse startActiveSpan (not startSpan) so the span is attached to the current context

Related:

References:

Released under the MIT License.