Files
2026-06-01 10:52:06 -05:00

7.0 KiB
Raw Permalink Blame History

BlueLaminate observability stack (standalone, Proxmox LXC)

A self-contained Grafana LGTM stack — Loki (logs), Grafana (dashboards), Tempo (traces), and Prometheus (Metrics) — fronted by Grafana Alloy as a single OTLP ingress. It runs as native systemd services on its own Proxmox LXC, decoupled from the app's docker-compose.yml. The C2 and Python workers push OpenTelemetry data to Alloy, which fans the three signals out to the backends; Grafana ties them together.

  C2 / workers  ──OTLP(4317 grpc / 4318 http)──►  Alloy  ──┬─► Loki        (logs,    :3100)
  (other host)                                             ├─► Prometheus  (metrics, :9090, remote-write)
                                                           └─► Tempo       (traces,  :4319 OTLP → store)
                                                                                │
                                                                          Grafana (:3000)
                                                                   datasources: Loki + Prometheus + Tempo

Only Alloy's OTLP ports (4317/4318) and Grafana (3000) need to be reachable from the LAN. Loki and Tempo bind localhost; Alloy is the only client that talks to them.

Layout

monitoring/
  install.sh                        # idempotent provisioner — run as root in the LXC
  alloy/config.alloy                # OTLP receiver → batch → Loki / Prometheus / Tempo
  prometheus/prometheus.yml         # self-monitoring scrapes (app metrics arrive via remote-write)
  prometheus/prometheus.service     # systemd unit: remote-write + OTLP receivers, 15d retention
  loki/loki.yml                     # single-binary, filesystem store, 15d retention
  tempo/tempo.yml                   # OTLP on :4319, local store, metrics_generator → Prometheus
  grafana/datasources.yml           # Loki + Prometheus(default) + Tempo, correlated
  grafana/dashboards.yml            # file-based dashboard provider
  grafana/dashboards/overview.json  # starter dashboard (target health, span rates, logs)

1. Create the LXC (run on the Proxmox host)

Reference only — adjust the storage, bridge, and template names to your node. An unprivileged Debian 13 container with ~2 vCPU / 24 GB RAM / 2040 GB disk is plenty.

# Make sure a Debian 13 template is present (once):
#   pveam update && pveam available | grep debian-13
#   pveam download local debian-13-standard_*_amd64.tar.zst

pct create 910 local:vztmpl/debian-13-standard_13.0-1_amd64.tar.zst \
  --hostname grafana-lxc \
  --cores 2 --memory 4096 --swap 1024 \
  --rootfs local-lvm:32 \
  --net0 name=eth0,bridge=vmbr0,ip=dhcp \
  --unprivileged 1 --features nesting=0 \
  --onboot 1 --start 1

# (Optional) give it a static IP instead of dhcp, e.g.
#   --net0 name=eth0,bridge=vmbr0,ip=192.168.1.50/24,gw=192.168.1.1

nesting=0 is fine — there's no Docker here, just native binaries.

2. Deploy the stack (inside the LXC)

pct enter 910            # or: ssh root@<lxc-ip>
apt-get update && apt-get install -y git
git clone <this-repo-url> /opt/bluelaminate
cd /opt/bluelaminate/monitoring
sudo bash install.sh

No git on the LXC? Copy just this folder over instead: scp -r monitoring root@<lxc-ip>:/opt/monitoring && ssh root@<lxc-ip> 'cd /opt/monitoring && bash install.sh'

The script adds the Grafana apt repo, installs grafana/loki/tempo/alloy, drops the Prometheus release binary into /opt/prometheus, lays our configs over the packaged defaults, and enables all five services. It prints the URLs and the OTLP endpoint when done.

3. Verify

systemctl is-active grafana-server loki tempo prometheus alloy   # all → active
curl -s localhost:3100/ready      # Loki  → ready
curl -s localhost:3200/ready      # Tempo → ready
curl -s localhost:9090/-/ready    # Prometheus → Ready

Open Grafana at http://<lxc-ip>:3000 (first login admin / admin — change it). The three datasources and the BlueLaminate → Stack Overview dashboard are provisioned automatically. Alloy's pipeline graph is at http://<lxc-ip>:12345.

End-to-end OTLP smoke test (no app changes needed)

Send synthetic telemetry from any machine that can reach the LXC, using the OpenTelemetry telemetrygen tool (go install github.com/open-telemetry/opentelemetry-collector-contrib/cmd/telemetrygen@latest):

telemetrygen traces  --otlp-endpoint <lxc-ip>:4317 --otlp-insecure --traces 5
telemetrygen metrics --otlp-endpoint <lxc-ip>:4317 --otlp-insecure --duration 10s
telemetrygen logs    --otlp-endpoint <lxc-ip>:4317 --otlp-insecure --logs 5

Then in Grafana Explore: pick Tempo (search recent traces), Prometheus (query gen), and Loki ({service_name=~".+"}) — seeing data in all three confirms the full fan-out before any app is wired up.

4. Wiring the apps later (the OTLP contract)

This deployment is stack-only; the C2 and workers aren't instrumented yet. When you do, point them at this LXC — nothing here changes. The drop-in:

.NET C2 (BlueLaminate.C2) — add packages OpenTelemetry.Extensions.Hosting, OpenTelemetry.Exporter.OpenTelemetryProtocol, and the OpenTelemetry.Instrumentation.AspNetCore / .Http / runtime instrumentations, then builder.Services.AddOpenTelemetry().WithTracing(...).WithMetrics(...) plus builder.Logging.AddOpenTelemetry(...). Configure via env:

OTEL_EXPORTER_OTLP_ENDPOINT=http://<lxc-ip>:4318
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
OTEL_SERVICE_NAME=bluelaminate-c2

Python workers (worker/csmoney_worker.py, skinland_worker.py) — add opentelemetry-distro and opentelemetry-exporter-otlp to worker/requirements.txt, run under opentelemetry-instrument python csmoney_worker.py, same env vars with OTEL_SERVICE_NAME=csmoney-worker / skinland-worker. (Today the workers emit structured JSON logs to stdout — LOG_JSON=1, set by default in the image; an interim option is to ship their Docker stdout to Loki with an Alloy loki.source.docker component on the app host, which can parse those JSON fields directly, instead of instrumenting in-process.)

Add those env vars to the matching docker-compose.yml services when the instrumentation lands.

Hardening

  • Firewall the OTLP ports. 4317/4318 are bound to 0.0.0.0. Restrict them to the app host, e.g. ufw allow from <app-host-ip> to any port 4317,4318 proto tcp.
  • Auth on ingest (optional). Add an otelcol.auth.bearer handler to otelcol.receiver.otlp in alloy/config.alloy and send a matching OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer <token> from the apps.
  • Grafana password. Change admin on first login, or set GF_SECURITY_ADMIN_PASSWORD in /etc/grafana/grafana.ini.

Retention / sizing

Defaults are LXC-friendly: Prometheus 15d, Loki 15d, Tempo 7d. Bump the retention.time flag (prometheus.service), limits_config.retention_period (loki.yml), and compactor.compaction.block_retention (tempo.yml) if you have the disk. Re-run install.sh to apply config edits.