Files
Operation-Blue-Laminate-v2/worker
bob 94177f9a8c Fix worker proxy relay leak and enable noVNC under --scale
_relay waited for both pipe directions (gather), leaking a task holding two sockets on every half-closed tunnel — visible as a flood of pending-task lines under load. Tear the tunnel down when either side closes (FIRST_COMPLETED + close both writers), matching the .NET LocalForwardingProxy's WhenAny. Also move the worker's noVNC to an ephemeral host port so replicas don't collide under 'docker compose up --scale worker=N'.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 15:12:51 -05:00
..

cs.money worker (Python)

The browser/Cloudflare layer for the cs.money scraper. .NET stays the C2 (orchestration, proxy/IP allocation, DB, the sweep loop); this worker is the only component that drives a browser and defeats Cloudflare, because the effective anti-bot tooling (nodriver/undetected-chromedriver, TLS impersonation) only exists in Python/Go, not .NET.

Why nodriver

.NET Selenium got insta-challenged by Cloudflare's managed challenge because msedgedriver controls the browser via the DevTools protocol, leaving navigator. webdriver and chromedriver cdc_ artifacts that Cloudflare keys on. nodriver drives a normal Chromium directly over CDP (no chromedriver) and patches those tells, so it passes where Selenium loops.

Step 1: prove it (current)

poc.py proves nodriver can clear cs.money's Cloudflare and fetch the listings API before we build the full pull-based fleet.

cd worker
py -m venv .venv
.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python poc.py

A Chromium window opens on the market. Solve the Cloudflare check if shown; the script waits, then pages sell-orders deeply (PAGES), reporting how far the warm session survives before any re-challenge and confirming full float precision. Output lands in worker/captures/.

Targeted skin+wear search. cs.money search is free-text on the page (?search=cyber+security+ft). Set SEARCH and the PoC navigates there, captures the actual filtered sell-orders API request the page fires (so we learn the real filter params instead of guessing), prints it, then pages that filtered API:

$env:SEARCH="cyber security ft"; python poc.py   # FT M4A4 Cyber Security only

The >>> DISCOVERED sell-orders API call line shows how the search maps to API params — that's how the C2 will build targeted jobs.

Run on your own IP first (no proxy) — that's the clean A/B vs. the Selenium run. If auto-detect can't find a browser, set BROWSER_PATH to Chrome or Edge (C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe).

Step 2: the pull fleet

worker.py holds one warm nodriver session and loops: poll the .NET C2 for a job (a skin+wear search), scrape that search's sell-orders via in-page fetch, and post the items back. The C2 (BlueLaminate.C2) picks the stalest skin+wear from the catalogue, and on result persists to cs_money_listings + price_history (Source = "csmoney"), stamping SkinCondition.ListingsSweptAt.

Run the C2 (needs Postgres migrated), then the worker:

# terminal 1 — the C2 (from repo root)
dotnet run --project BlueLaminate\BlueLaminate.C2          # serves http://localhost:5080

# terminal 2 — the worker
cd worker; .venv\Scripts\Activate.ps1
$env:WORKER_TOKEN="dev-worker-token"    # must match the C2's WorkerToken
python worker.py

The worker warms the session (you clear Cloudflare once), then runs continuously. Scale out by starting more workers (each with its own PROXY).