# BlueLaminate observability stack (standalone, Proxmox LXC) A self-contained Grafana **LGTM** stack — **L**oki (logs), **G**rafana (dashboards), **T**empo (traces), and Prometheus (**M**etrics) — fronted by **Grafana Alloy** as a single OTLP ingress. It runs as native systemd services on its own Proxmox LXC, decoupled from the app's `docker-compose.yml`. The C2 and Python workers push OpenTelemetry data to Alloy, which fans the three signals out to the backends; Grafana ties them together. ``` C2 / workers ──OTLP(4317 grpc / 4318 http)──► Alloy ──┬─► Loki (logs, :3100) (other host) ├─► Prometheus (metrics, :9090, remote-write) └─► Tempo (traces, :4319 OTLP → store) │ Grafana (:3000) datasources: Loki + Prometheus + Tempo ``` Only Alloy's OTLP ports (`4317`/`4318`) and Grafana (`3000`) need to be reachable from the LAN. Loki and Tempo bind localhost; Alloy is the only client that talks to them. ## Layout ``` monitoring/ install.sh # idempotent provisioner — run as root in the LXC alloy/config.alloy # OTLP receiver → batch → Loki / Prometheus / Tempo prometheus/prometheus.yml # self-monitoring scrapes (app metrics arrive via remote-write) prometheus/prometheus.service # systemd unit: remote-write + OTLP receivers, 15d retention loki/loki.yml # single-binary, filesystem store, 15d retention tempo/tempo.yml # OTLP on :4319, local store, metrics_generator → Prometheus grafana/datasources.yml # Loki + Prometheus(default) + Tempo, correlated grafana/dashboards.yml # file-based dashboard provider grafana/dashboards/overview.json # starter dashboard (target health, span rates, logs) ``` ## 1. Create the LXC (run on the Proxmox host) Reference only — adjust the storage, bridge, and template names to your node. An unprivileged Debian 13 container with ~2 vCPU / 2–4 GB RAM / 20–40 GB disk is plenty. ```bash # Make sure a Debian 13 template is present (once): # pveam update && pveam available | grep debian-13 # pveam download local debian-13-standard_*_amd64.tar.zst pct create 910 local:vztmpl/debian-13-standard_13.0-1_amd64.tar.zst \ --hostname grafana-lxc \ --cores 2 --memory 4096 --swap 1024 \ --rootfs local-lvm:32 \ --net0 name=eth0,bridge=vmbr0,ip=dhcp \ --unprivileged 1 --features nesting=0 \ --onboot 1 --start 1 # (Optional) give it a static IP instead of dhcp, e.g. # --net0 name=eth0,bridge=vmbr0,ip=192.168.1.50/24,gw=192.168.1.1 ``` `nesting=0` is fine — there's no Docker here, just native binaries. ## 2. Deploy the stack (inside the LXC) ```bash pct enter 910 # or: ssh root@ apt-get update && apt-get install -y git git clone /opt/bluelaminate cd /opt/bluelaminate/monitoring sudo bash install.sh ``` No git on the LXC? Copy just this folder over instead: `scp -r monitoring root@:/opt/monitoring && ssh root@ 'cd /opt/monitoring && bash install.sh'` The script adds the Grafana apt repo, installs grafana/loki/tempo/alloy, drops the Prometheus release binary into `/opt/prometheus`, lays our configs over the packaged defaults, and enables all five services. It prints the URLs and the OTLP endpoint when done. ## 3. Verify ```bash systemctl is-active grafana-server loki tempo prometheus alloy # all → active curl -s localhost:3100/ready # Loki → ready curl -s localhost:3200/ready # Tempo → ready curl -s localhost:9090/-/ready # Prometheus → Ready ``` Open Grafana at `http://:3000` (first login `admin` / `admin` — change it). The three datasources and the **BlueLaminate → Stack Overview** dashboard are provisioned automatically. Alloy's pipeline graph is at `http://:12345`. ### End-to-end OTLP smoke test (no app changes needed) Send synthetic telemetry from any machine that can reach the LXC, using the OpenTelemetry `telemetrygen` tool (`go install github.com/open-telemetry/opentelemetry-collector-contrib/cmd/telemetrygen@latest`): ```bash telemetrygen traces --otlp-endpoint :4317 --otlp-insecure --traces 5 telemetrygen metrics --otlp-endpoint :4317 --otlp-insecure --duration 10s telemetrygen logs --otlp-endpoint :4317 --otlp-insecure --logs 5 ``` Then in Grafana **Explore**: pick **Tempo** (search recent traces), **Prometheus** (query `gen`), and **Loki** (`{service_name=~".+"}`) — seeing data in all three confirms the full fan-out before any app is wired up. ## 4. Wiring the apps later (the OTLP contract) This deployment is **stack-only**; the C2 and workers aren't instrumented yet. When you do, point them at this LXC — nothing here changes. The drop-in: **.NET C2** (`BlueLaminate.C2`) — add packages `OpenTelemetry.Extensions.Hosting`, `OpenTelemetry.Exporter.OpenTelemetryProtocol`, and the `OpenTelemetry.Instrumentation.AspNetCore` / `.Http` / runtime instrumentations, then `builder.Services.AddOpenTelemetry().WithTracing(...).WithMetrics(...)` plus `builder.Logging.AddOpenTelemetry(...)`. Configure via env: ``` OTEL_EXPORTER_OTLP_ENDPOINT=http://:4318 OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf OTEL_SERVICE_NAME=bluelaminate-c2 ``` **Python workers** (`worker/csmoney_worker.py`, `skinland_worker.py`) — add `opentelemetry-distro` and `opentelemetry-exporter-otlp` to `worker/requirements.txt`, run under `opentelemetry-instrument python csmoney_worker.py`, same env vars with `OTEL_SERVICE_NAME=csmoney-worker` / `skinland-worker`. (Today the workers emit structured JSON logs to stdout — `LOG_JSON=1`, set by default in the image; an interim option is to ship their Docker stdout to Loki with an Alloy `loki.source.docker` component on the app host, which can parse those JSON fields directly, instead of instrumenting in-process.) Add those env vars to the matching `docker-compose.yml` services when the instrumentation lands. ## Hardening - **Firewall the OTLP ports.** `4317`/`4318` are bound to `0.0.0.0`. Restrict them to the app host, e.g. `ufw allow from to any port 4317,4318 proto tcp`. - **Auth on ingest (optional).** Add an `otelcol.auth.bearer` handler to `otelcol.receiver.otlp` in `alloy/config.alloy` and send a matching `OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer ` from the apps. - **Grafana password.** Change `admin` on first login, or set `GF_SECURITY_ADMIN_PASSWORD` in `/etc/grafana/grafana.ini`. ## Retention / sizing Defaults are LXC-friendly: Prometheus **15d**, Loki **15d**, Tempo **7d**. Bump the `retention.time` flag (`prometheus.service`), `limits_config.retention_period` (`loki.yml`), and `compactor.compaction.block_retention` (`tempo.yml`) if you have the disk. Re-run `install.sh` to apply config edits. ```