Logging and Monitoring¶

stdapi.ai provides production observability. It emits structured JSON logs for every request, stream, and background task, and integrates with OpenTelemetry (OTel) for traces and metrics. This guide shows how to enable observability, read the logs, and correlate signals across systems.

At a glance
JSON logs to STDOUT (perfect for AWS CloudWatch Logs). One event per line.
Correlation
All events for a request share the same id and are returned as x-request-id.
ECS friendly
ECS forwards container STDOUT to CloudWatch Logs automatically.
Traces (optional)
Enable OTEL_ENABLED=true to export spans to X‑Ray, Jaeger, Tempo, etc.
Payload logging (optional)
Enable LOG_REQUEST_PARAMS=true only for targeted debugging.

Quick start (2 minutes)¶

Set these environment variables, then restart the service (see the Configuration Guide for details):

# Set minimum log level (optional, defaults to "info")
# Options: info, warning, error, critical, disabled
export LOG_LEVEL=warning

# Enable OpenTelemetry tracing
export OTEL_ENABLED=true
export OTEL_SERVICE_NAME=stdapi
# 0.0–1.0 (10% example)
export OTEL_SAMPLE_RATE=0.1

# Include request/response payloads in logs (for debugging ONLY)
export LOG_REQUEST_PARAMS=true

# Log client IP addresses (requires ENABLE_PROXY_HEADERS for real client IPs)
export LOG_CLIENT_IP=true
export ENABLE_PROXY_HEADERS=true  # When behind ALB/CloudFront

Sensitive data and cost impact

Enabling LOG_REQUEST_PARAMS may expose sensitive content in logs. Use only in development or during targeted troubleshooting. Redact secrets before sharing logs externally.

Additionally, logging full request/response payloads can dramatically increase log volume and costs, especially for large LLM prompts, tool calls, and generated outputs. In AWS CloudWatch Logs, ingestion and storage costs scale with log size. Prefer short retention, targeted sampling, and temporary enablement only when needed.

Client IP Logging

When LOG_CLIENT_IP=true:

The client_ip field is added to request logs
The client IP is added as client.address attribute to OpenTelemetry spans (when OTEL_ENABLED=true)

To log the real client IP address (instead of the proxy IP), also enable ENABLE_PROXY_HEADERS=true when running behind AWS ALB, CloudFront, or other reverse proxies. See the Configuration Guide for details.

CloudWatch best practice

JSON to STDOUT is optimal for CloudWatch Logs Insights. In AWS ECS, the task’s log driver forwards container STDOUT to CloudWatch Logs automatically.

Event types¶

stdapi.ai emits five kinds of JSON events (one per line):

Event	Description
`start`	Emitted once at server startup. Includes startup metadata and warnings.
`stop`	Emitted on graceful shutdown. Includes uptime.
`request`	One per HTTP request. Method, path, status, timings, and optional request/response.
`request_stream`	Streaming segments (SSE/audio). Indicates streaming activity and duration.
`background`	Background tasks correlated to the parent request.

Common fields¶

Each event shares core fields and may add type‑specific ones.

Field	Applies to	Description
`type`	all	One of `start`, `stop`, `request`, `request_stream`, `background`
`level`	all	`info`, `warning`, `error`, `critical` (controlled by `LOG_LEVEL`)
`date`	all	RFC3339, timezone‑aware timestamp
`server_id`	all	Instance identifier
`error_detail`	all	Optional list of formatted exception strings
`id`	request, request_stream, background	Correlation ID (also returned as `x-request-id`)
`execution_time_ms`	request, request_stream, background	Duration of the handled block
`method`	request	HTTP method
`path`	request	Request path
`status_code`	request	Final HTTP status code
`client_ip`	request	Client IP address (if `LOG_CLIENT_IP=true`)
`client_user_agent`	request	When provided by client
`model_id`	request	Targeted model (if applicable)
`voice_id`	request	TTS voice (if applicable)
`request_user_id`, `request_org_id`	request	Propagated identifiers (if applicable)
`request_params`	request	Sanitized request payload (if `LOG_REQUEST_PARAMS=true`)
`request_response`	request	Sanitized response payload (if `LOG_REQUEST_PARAMS=true`)
`event`	background	Background operation name
`server_start_time_ms`, `server_warnings`	start	Startup metrics and warnings
`server_uptime_ms`	stop	Uptime at shutdown

Understanding warnings and errors

For request events, default log levels are derived from the final HTTP status: 4xx → warning, 5xx → error. Unexpected server crashes (like HTTP 500) may appear as critical.
Authentication/authorization: For security, client responses for 401 and 403 include only generic messages. Full diagnostic details are captured in server logs under error_detail and can be correlated via id (see x-request-id).
server_warnings (on the start event) often highlights missing configuration and features that have been disabled as a result (for example, no S3 bucket configured disables certain image/audio features).
error_detail (on any event) contains formatted exception traces and diagnostic hints, which frequently point to missing configuration, unavailable dependencies, or disabled features.

Correlating logs and traces¶

Group events by id to reconstruct a full request lifecycle (request → stream(s) → background).
The x-request-id response header exposes the same value so external systems can propagate correlation.
With OTel enabled, a root span named like POST /v1/... is created and carries attributes: http.method, http.url, http.user_agent, request.id, server.id, http.status_code, and duration_ms.

Do and Don’t for correlation

Do propagate x-request-id across client → service → downstreams when possible.
Do use request_stream durations to account for total user‑perceived latency.
Don’t generate your own request IDs for the same hop; prefer the provided one.

Reading the logs (what to look for)¶

High latency: Inspect execution_time_ms on the request event. If the response was streamed, also sum request_stream durations. Combine with OTel spans to locate downstream delays (model provider, S3, etc.).
Errors: Look for level=critical and error_detail (formatted exceptions). With OTel, the span is marked error with attributes error=true and error.message.

When to open a GitHub issue

If you encounter level=critical events, capture representative JSON log lines (redacting sensitive data) and open an issue at https://github.com/stdapi-ai/stdapi.ai/issues. Include information about the failing request to help reproduce the issue.

Payload issues: Temporarily enable LOG_REQUEST_PARAMS=true to validate requests/responses, then disable.
Client identification: client_user_agent and optional request_user_id / request_org_id help tie requests to users.
Routing confirmation: model_id and voice_id confirm which provider/model/voice handled the request.

Controlling log verbosity¶

The LOG_LEVEL environment variable controls which log events are written to STDOUT. Set it to filter out lower-severity events. For detailed configuration options, see the Logging Level section in the Configuration Guide.

info (default): All events are logged (info, warning, error, critical)
warning: Only warnings and higher severity (warning, error, critical) - recommended for production
error: Only errors and critical events
critical: Only critical events
disabled: No log output (not recommended)

# Production example: reduce log volume while maintaining visibility
export LOG_LEVEL=warning

Reducing CloudWatch Costs

In high-traffic production environments, setting LOG_LEVEL=warning or LOG_LEVEL=error can significantly reduce CloudWatch Logs ingestion and storage costs by filtering out routine info-level events. This is especially effective when combined with appropriate retention policies.

Additionally, infrastructure routes are automatically excluded from logging to reduce noise: /docs, /favicon.ico, /health, /openapi.json, /redoc.

OpenTelemetry integration¶

When OTEL_ENABLED=true:

A span is created per request and for streaming/background blocks.
Spans carry request.id and server.id for correlation.
4xx/5xx status_code marks the span with an error status.
Sampling is controlled via OTEL_SAMPLE_RATE.

For exporters and advanced setup, rely on standard OTel environment variables supported by your exporter/backend.

Example events¶

Example — Request with payload logging enabled

{
  "type": "request",
  "level": "info",
  "date": "2025-01-01T12:00:00Z",
  "server_id": "stdapi-1",
  "id": "a1b2c3d4",
  "method": "POST",
  "path": "/v1/chat/completions",
  "status_code": 200,
  "model_id": "anthropic.claude-sonnet-4-5-20250929-v1:0",
  "execution_time_ms": 842,
  "request_params": {"messages": [{"role": "user", "content": "..."}]},
  "request_response": {"id": "cmpl_...", "choices": [...], "usage": {...}}
}

Example — Streaming segment (SSE/audio)

{
  "type": "request_stream",
  "level": "info",
  "date": "2025-01-01T12:00:01Z",
  "server_id": "stdapi-1",
  "id": "a1b2c3d4",
  "execution_time_ms": 1234
}

Example — Background work correlated to a request

{
  "type": "background",
  "level": "info",
  "date": "2025-01-01T12:00:02Z",
  "server_id": "stdapi-1",
  "id": "a1b2c3d4",
  "event": "image-upload-s3",
  "execution_time_ms": 97
}

Example — Error with captured details

{
  "type": "request",
  "level": "critical",
  "date": "2025-01-01T12:00:05Z",
  "server_id": "stdapi-1",
  "id": "e9f0a1b2",
  "method": "POST",
  "path": "/v1/images/edits",
  "status_code": 500,
  "error_detail": ["Traceback (most recent call last): ..."],
  "execution_time_ms": 12
}

CloudWatch Logs Insights: ready‑to‑use queries¶

These examples assume JSON logs in CloudWatch Logs (default with ECS awslogs/awsfirelens). Adjust the log group and time range.

1) Follow a specific request across request/stream/background¶

fields @timestamp, type, level, path, event, status_code, execution_time_ms
| filter id = "<paste-request-id>"
| sort @timestamp asc

Tip

Copy the request ID from the x-request-id response header or any request log line. Expect one request, optional request_stream entries, and background entries.

2) Find recent errors with context¶

fields @timestamp, level, type, path, status_code, id, error_detail
| filter level in ["error", "critical"]
| sort @timestamp desc
| limit 100

3) High-latency endpoints (P95/P99)¶

fields path, execution_time_ms
| filter type = "request" and ispresent(execution_time_ms)
| stats pct(execution_time_ms, 95) as p95_ms, pct(execution_time_ms, 99) as p99_ms, avg(execution_time_ms) as avg_ms by path
| sort p95_ms desc

AWS service-level logs and metrics¶

Beyond stdapi.ai logs and OTel traces, use AWS-native signals from the underlying AI services to validate provider behavior, monitor throttling/latency, and audit access. Enable only what you need: some options can capture content and increase costs. For full, up-to-date details, refer to the official AWS documentation for more information.

CloudWatch Metrics: Throughput, latency, throttling, and error rates per service/region.
CloudTrail: Control-plane auditing of API calls (who did what, when, from where).
Content/Invocation logging: Optional features that may record inputs/outputs. Use with caution and encryption/retention controls.
Correlation: Service logs won’t include StdAPI x-request-id. Correlate by time window, region, model/voice/job identifiers, and volume. Use StdAPI model_id, voice_id, and execution_time_ms to narrow windows.
AWS Bedrock Invocation logging (optional): Export invocation metadata and, if enabled, content to CloudWatch Logs/S3/Firehose. Treat prompts/completions as sensitive; manage retention and KMS.

Troubleshooting checklist¶

No logs visible: Ensure you are reading container STDOUT. On ECS/Kubernetes, verify the log driver and retention.
Missing request_params: Confirm LOG_REQUEST_PARAMS=true and restart after changing environment variables.
No traces: Verify OTEL_ENABLED=true and that exporters are configured and reachable.
Correlation missed: Ensure clients read and propagate x-request-id for multi‑hop requests.