Add cs.money worker stack with per-worker IPRoyal residential proxy

Brings up the pull-model scraper: the .NET C2 hands skin+wear jobs to Python nodriver workers that scrape cs.money and post results back, plus the supporting Core/EFCore data model, migrations, and docker-compose orchestration. IPRoyal proxying lets workers scale horizontally with a distinct residential exit IP each: every worker process mints its own sticky session at startup, and an in-process forwarding proxy injects the gateway auth so Chromium talks only to an auth-free localhost endpoint (zero CDP). On a Cloudflare challenge a worker rotates to a fresh session/IP and re-warms. Verified end-to-end against live IPRoyal: distinct US residential exits per worker and IP rotation on demand. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 15:03:31 -05:00
parent eb5fb0dac7
commit dc7c3f99ae
82 changed files with 8354 additions and 571 deletions
--- a/worker/.gitattributes
+++ b/worker/.gitattributes
@@ -0,0 +1,3 @@
+# entrypoint.sh runs in a Linux container — keep LF so the shebang isn't broken by
+# Windows CRLF conversion.
+*.sh text eol=lf
--- a/worker/.gitignore
+++ b/worker/.gitignore
@@ -0,0 +1,3 @@
+.venv/
+__pycache__/
+captures/
--- a/worker/Dockerfile
+++ b/worker/Dockerfile
@@ -0,0 +1,35 @@
+# cs.money worker: headful Chromium (nodriver) under a virtual display, with noVNC
+# so you can open a browser into the container and solve a Cloudflare challenge by hand
+# if one ever appears. Build context is the repo root (see docker-compose.yml).
+FROM python:3.13-slim
+
+# chromium + a virtual X display + VNC bridge + the fonts/libs Chromium needs.
+RUN apt-get update && apt-get install -y --no-install-recommends \
+        chromium \
+        xvfb \
+        x11vnc \
+        novnc \
+        websockify \
+        ca-certificates \
+        fonts-liberation \
+        dumb-init \
+    && rm -rf /var/lib/apt/lists/*
+
+WORKDIR /app
+COPY worker/requirements.txt ./
+RUN pip install --no-cache-dir -r requirements.txt
+COPY worker/worker.py worker/entrypoint.sh ./
+RUN chmod +x entrypoint.sh
+
+ENV BROWSER_PATH=/usr/bin/chromium \
+    CHROME_NO_SANDBOX=1 \
+    DISPLAY=:99 \
+    SOLVE_SECONDS=45 \
+    PYTHONUNBUFFERED=1
+
+
+# noVNC web UI (browse http://localhost:6080/vnc.html to watch / solve a challenge).
+EXPOSE 6080
+
+# dumb-init reaps the Xvfb/x11vnc/websockify children cleanly.
+ENTRYPOINT ["dumb-init", "--", "./entrypoint.sh"]
--- a/worker/README.md
+++ b/worker/README.md
@@ -0,0 +1,72 @@
+# cs.money worker (Python)
+
+The browser/Cloudflare layer for the cs.money scraper. .NET stays the **C2**
+(orchestration, proxy/IP allocation, DB, the sweep loop); this worker is the only
+component that drives a browser and defeats Cloudflare, because the effective
+anti-bot tooling (`nodriver`/`undetected-chromedriver`, TLS impersonation) only
+exists in Python/Go, not .NET.
+
+## Why nodriver
+
+.NET Selenium got insta-challenged by Cloudflare's managed challenge because
+`msedgedriver` controls the browser via the DevTools protocol, leaving `navigator.
+webdriver` and chromedriver `cdc_` artifacts that Cloudflare keys on. `nodriver`
+drives a normal Chromium directly over CDP (no chromedriver) and patches those
+tells, so it passes where Selenium loops.
+
+## Step 1: prove it (current)
+
+`poc.py` proves nodriver can clear cs.money's Cloudflare and fetch the listings API
+before we build the full pull-based fleet.
+
+```powershell
+cd worker
+py -m venv .venv
+.venv\Scripts\Activate.ps1
+pip install -r requirements.txt
+python poc.py
+```
+
+A Chromium window opens on the market. Solve the Cloudflare check if shown; the
+script waits, then pages `sell-orders` deeply (PAGES), reporting how far the warm
+session survives before any re-challenge and confirming full float precision.
+Output lands in `worker/captures/`.
+
+**Targeted skin+wear search.** cs.money search is free-text on the page
+(`?search=cyber+security+ft`). Set `SEARCH` and the PoC navigates there, **captures
+the actual filtered `sell-orders` API request the page fires** (so we learn the real
+filter params instead of guessing), prints it, then pages that filtered API:
+
+```powershell
+$env:SEARCH="cyber security ft"; python poc.py   # FT M4A4 Cyber Security only
+```
+
+The `>>> DISCOVERED sell-orders API call` line shows how the search maps to API
+params — that's how the C2 will build targeted jobs.
+
+Run on your own IP first (no proxy) — that's the clean A/B vs. the Selenium run.
+If auto-detect can't find a browser, set `BROWSER_PATH` to Chrome or Edge
+(`C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe`).
+
+## Step 2: the pull fleet
+
+`worker.py` holds one warm nodriver session and loops: poll the .NET C2 for a job
+(a skin+wear search), scrape that search's sell-orders via in-page fetch, and post
+the items back. The C2 (`BlueLaminate.C2`) picks the stalest skin+wear from the
+catalogue, and on result persists to `cs_money_listings` + `price_history`
+(`Source = "csmoney"`), stamping `SkinCondition.ListingsSweptAt`.
+
+Run the C2 (needs Postgres migrated), then the worker:
+
+```powershell
+# terminal 1 — the C2 (from repo root)
+dotnet run --project BlueLaminate\BlueLaminate.C2          # serves http://localhost:5080
+
+# terminal 2 — the worker
+cd worker; .venv\Scripts\Activate.ps1
+$env:WORKER_TOKEN="dev-worker-token"    # must match the C2's WorkerToken
+python worker.py
+```
+
+The worker warms the session (you clear Cloudflare once), then runs continuously.
+Scale out by starting more workers (each with its own `PROXY`).
--- a/worker/diag_consent.py
+++ b/worker/diag_consent.py
@@ -0,0 +1,71 @@
+"""
+Diagnose the cs.money cookie-consent banner so we can dismiss it programmatically.
+It's likely a Shadow DOM web component (CookieConsentSystem), which is why
+document.querySelectorAll-based clicks miss the real buttons.
+
+Saves:
+  captures/_consent.png  - screenshot (so we can SEE the banner + button positions)
+  captures/_consent.txt  - shadow-host tags + every consent-like button found by
+                           piercing shadow roots, with center coordinates.
+
+    cd worker; .venv\\Scripts\\Activate.ps1
+    python diag_consent.py
+"""
+
+import json
+import os
+import pathlib
+
+import nodriver as uc
+
+URL = os.environ.get("URL", "https://cs.money/market/buy/?search=ak-47+redline")
+SOLVE_SECONDS = int(os.environ.get("SOLVE_SECONDS", "30"))
+BROWSER_PATH = os.environ.get("BROWSER_PATH")
+OUT = pathlib.Path(__file__).parent / "captures"
+
+# Pierce shadow roots to find consent buttons + their viewport-center coords.
+DEEP_FIND = r"""
+JSON.stringify((()=>{
+  const hits=[], hosts=[];
+  function walk(root){
+    root.querySelectorAll('*').forEach(e=>{
+      if(e.shadowRoot){ hosts.push(e.tagName.toLowerCase()); walk(e.shadowRoot); }
+      const t=(e.textContent||'').trim();
+      if(t.length<40 && /accept all|manage cookies|reject all|confirm my choice|^accept$|^manage$/i.test(t)){
+        const r=e.getBoundingClientRect();
+        if(r.width>0&&r.height>0)
+          hits.push({tag:e.tagName, text:t, x:Math.round(r.x+r.width/2), y:Math.round(r.y+r.height/2)});
+      }
+    });
+  }
+  walk(document);
+  return {shadowHosts:[...new Set(hosts)], buttons:hits};
+})())
+"""
+
+
+async def main():
+    OUT.mkdir(exist_ok=True)
+    browser = await uc.start(headless=False, browser_executable_path=BROWSER_PATH)
+    try:
+        page = await browser.get(URL)
+        print(f"Loaded {URL}; waiting {SOLVE_SECONDS}s for Cloudflare...")
+        await page.sleep(SOLVE_SECONDS)
+
+        png = str(OUT / "_consent.png")
+        await page.save_screenshot(png)
+        print(f"screenshot -> {png}")
+
+        raw = await page.evaluate(DEEP_FIND)
+        info = json.loads(raw) if isinstance(raw, str) else {"error": repr(raw)}
+        (OUT / "_consent.txt").write_text(json.dumps(info, indent=2), encoding="utf-8")
+        print("shadow hosts:", info.get("shadowHosts"))
+        print("consent buttons found:")
+        for b in info.get("buttons", []):
+            print(f"  {b}")
+    finally:
+        browser.stop()
+
+
+if __name__ == "__main__":
+    uc.loop().run_until_complete(main())
--- a/worker/discover_pagination.py
+++ b/worker/discover_pagination.py
@@ -0,0 +1,183 @@
+"""
+Discover how cs.money paginates a filtered search past the initial ~60 SSR items.
+
+Tests two hypotheses against a high-result search (default "ak-47 redline", which has
+well over 60 listings):
+
+  A. Does the SSR page honor offset/limit in the URL? Fetch ?search=...&offset=60 and
+     ?search=...&limit=120 and compare item ids to page 1. If disjoint/larger, we can
+     paginate cheaply by re-fetching the page.
+  B. The real client "load more": scroll hard to trigger lazy-load and capture any
+     cs.money /2.0/ XHR via Resource Timing — that request carries the structured
+     filter params + offset, i.e. a lighter direct-API pagination path.
+
+Findings are printed and saved to captures/_pagination.txt.
+
+    cd worker; .venv\\Scripts\\Activate.ps1
+    python discover_pagination.py
+    $env:SEARCH="ak-47 redline"; python discover_pagination.py   # override the search
+"""
+
+import json
+import os
+import pathlib
+import re
+
+import nodriver as uc
+from nodriver import cdp
+
+SEARCH = os.environ.get("SEARCH", "ak-47 redline")
+SOLVE_SECONDS = int(os.environ.get("SOLVE_SECONDS", "30"))
+BROWSER_PATH = os.environ.get("BROWSER_PATH")
+PROXY = os.environ.get("PROXY")
+
+BASE = "https://cs.money/market/buy/"
+PAGE_PARAMS_RE = re.compile(r'<script\b[^>]*id="__page-params"[^>]*>(.*?)</script>', re.S)
+OUT = pathlib.Path(__file__).parent / "captures"
+CONSENT = ["Reject all", "Only necessary", "Reject", "Decline", "Deny"]
+
+# Aggressive scroll: window + every scrollable container (the grid scrolls in a div,
+# which is why a plain window.scrollTo didn't trigger lazy-load before).
+SCROLL_JS = (
+    "window.scrollTo(0, document.body.scrollHeight);"
+    "document.querySelectorAll('*').forEach(e=>{"
+    "  if (e.scrollHeight > e.clientHeight + 80) e.scrollTop = e.scrollHeight;});")
+
+
+async def js(page, expr):
+    raw = await page.evaluate(f"JSON.stringify({expr})")
+    try:
+        return json.loads(raw) if isinstance(raw, str) else None
+    except (json.JSONDecodeError, TypeError):
+        return None
+
+
+async def fetch_text(page, url):
+    expr = (f"fetch({url!r},{{credentials:'include'}}).then(async r=>"
+            f"JSON.stringify({{status:r.status, body:await r.text()}}))")
+    raw = await page.evaluate(expr, await_promise=True)
+    try:
+        o = json.loads(raw)
+        return o.get("status"), o.get("body", "")
+    except (json.JSONDecodeError, TypeError):
+        return None, ""
+
+
+def page_item_ids(html):
+    m = PAGE_PARAMS_RE.search(html or "")
+    if not m:
+        return []
+    try:
+        return [it.get("id") for it in json.loads(m.group(1)).get("inventory", {}).get("items", [])]
+    except json.JSONDecodeError:
+        return []
+
+
+async def click_visible(page, pattern):
+    """Click the first VISIBLE element whose trimmed text matches `pattern` (case-
+    insensitive). nodriver's find() was matching hidden/duplicate nodes; restricting
+    to offsetParent!=null + short text hits the real button."""
+    expr = ("JSON.stringify((()=>{"
+            "const re=new RegExp(" + json.dumps(pattern) + ",'i');"
+            "const els=[...document.querySelectorAll('button,a,[role=\"button\"],span,div')];"
+            "const b=els.find(e=>e.offsetParent!==null && (e.textContent||'').trim().length<40 "
+            "&& re.test((e.textContent||'').trim()));"
+            "if(b){b.click();return true}return false})())")
+    r = await page.evaluate(expr)
+    return isinstance(r, str) and "true" in r
+
+
+async def banner_present(page):
+    r = await page.evaluate(
+        "JSON.stringify(/Manage cookies|Accept all/i.test(document.body.innerText||''))")
+    return isinstance(r, str) and "true" in r
+
+
+async def dismiss(page):
+    """Privacy-preserving first (Manage -> Reject all -> Confirm); if the banner is
+    still up, fall back to Accept all so the page becomes interactive (discovery
+    needs scrolling to work)."""
+    steps = []
+    if await click_visible(page, "manage cookies|^manage$"):
+        steps.append("manage")
+        await page.sleep(1.2)
+        if await click_visible(page, "reject all"):
+            steps.append("reject-all")
+        await page.sleep(0.4)
+        for c in ("confirm my choice", "^confirm$", "^save$"):
+            if await click_visible(page, c):
+                steps.append("confirm")
+                break
+    await page.sleep(1)
+    if await banner_present(page):
+        steps.append("still-up->accept" if await click_visible(page, "accept all|^accept$") else "still-up")
+    await page.sleep(0.5)
+    steps.append("gone" if not await banner_present(page) else "STILL-PRESENT")
+    return ", ".join(steps)
+
+
+async def main():
+    OUT.mkdir(exist_ok=True)
+    args = [f"--proxy-server={PROXY}"] if PROXY else []
+    args.append("--blink-settings=imagesEnabled=false")
+    from urllib.parse import quote_plus
+    q = quote_plus(SEARCH)
+    findings = []
+
+    browser = await uc.start(headless=False, browser_executable_path=BROWSER_PATH, browser_args=args)
+    try:
+        url0 = f"{BASE}?search={q}"
+        page = await browser.get(url0)
+        print(f"Warming on {url0} ({SOLVE_SECONDS}s for Cloudflare)...")
+        await page.sleep(SOLVE_SECONDS)
+        print(f"Consent: {await dismiss(page)}")
+
+        # --- A. URL offset/limit on the SSR page ---
+        _, h0 = await fetch_text(page, f"{BASE}?search={q}")
+        _, h1 = await fetch_text(page, f"{BASE}?search={q}&offset=60")
+        _, h2 = await fetch_text(page, f"{BASE}?search={q}&limit=120")
+        a, b, c = page_item_ids(h0), page_item_ids(h1), page_item_ids(h2)
+        overlap = len(set(a) & set(b))
+        findings.append(f"page1 ids={len(a)}  offset=60 ids={len(b)} (overlap with page1={overlap})  limit=120 ids={len(c)}")
+        findings.append(f"  -> offset works? {'YES (disjoint)' if b and overlap == 0 else 'no/ignored'}")
+        findings.append(f"  -> limit works?  {'YES (>60)' if len(c) > 60 else 'no/ignored'}")
+
+        # --- B. Trigger client load-more, capture cs.money /2.0/ XHRs ---
+        # Infinite scroll only fires on GRADUAL downward scrolling — jumping to the
+        # bottom skips the trigger. So step down in small wheel increments and watch
+        # the item count grow.
+        before = set(await js(page, "performance.getEntriesByType('resource').map(e=>e.name)") or [])
+        async def card_count():
+            n = await page.evaluate(
+                "JSON.stringify(document.querySelectorAll('[href*=\"/item/\"],[class*=\"item\" i]').length)")
+            return n
+        print(f"  cards before scroll: {await card_count()}")
+        for step in range(60):
+            try:
+                await page.send(cdp.input_.dispatch_mouse_event(
+                    type_="mouseWheel", x=720, y=450, delta_x=0, delta_y=500))
+            except Exception:
+                pass
+            await page.sleep(0.7)
+            if step % 15 == 14:
+                now = [u for u in (await js(page, "performance.getEntriesByType('resource').map(e=>e.name)") or [])
+                       if u not in before and "cs.money" in u and "metrics." not in u and "traces." not in u]
+                print(f"  step {step+1}: cards={await card_count()} new cs.money reqs={len(now)}")
+        after = await js(page, "performance.getEntriesByType('resource').map(e=>e.name)") or []
+        new_xhrs = [u for u in after if u not in before and "cs.money" in u
+                    and "metrics." not in u and "traces." not in u]
+        findings.append(f"\nclient requests after scrolling ({len(new_xhrs)} new cs.money):")
+        findings.extend(f"  {u}" for u in dict.fromkeys(new_xhrs))
+        if not new_xhrs:
+            findings.append("  (none — grid may not lazy-load via XHR, or scroll didn't reach the trigger)")
+
+        report = "\n".join(findings)
+        print("\n=== FINDINGS ===\n" + report)
+        (OUT / "_pagination.txt").write_text(f"search: {SEARCH}\n\n{report}\n", encoding="utf-8")
+        print(f"\nsaved to {OUT / '_pagination.txt'}")
+    finally:
+        browser.stop()
+
+
+if __name__ == "__main__":
+    uc.loop().run_until_complete(main())
--- a/worker/discover_price_param.py
+++ b/worker/discover_price_param.py
@@ -0,0 +1,96 @@
+"""
+Find cs.money's price-filter URL param (the basis for price-bucket pagination).
+
+The market has a Price from/to filter in the sidebar. `search=` works via the URL and
+the page SSRs the filtered listings into __page-params, so a price param likely works
+the same way. We baseline the cheapest set, then try candidate param names with a high
+floor and check whether the returned listings actually shift above it.
+
+    cd worker; .venv\\Scripts\\Activate.ps1
+    python discover_price_param.py
+"""
+
+import json
+import os
+import pathlib
+import re
+from urllib.parse import quote_plus
+
+import nodriver as uc
+
+SEARCH = os.environ.get("SEARCH", "ak-47 redline")
+FLOOR = float(os.environ.get("FLOOR", "200"))
+SOLVE_SECONDS = int(os.environ.get("SOLVE_SECONDS", "30"))
+BROWSER_PATH = os.environ.get("BROWSER_PATH")
+BASE = "https://cs.money/market/buy/"
+PP = re.compile(r'<script\b[^>]*id="__page-params"[^>]*>(.*?)</script>', re.S)
+OUT = pathlib.Path(__file__).parent / "captures"
+
+# Param-name variants for a price floor (and a couple of from/to pairs).
+CANDIDATES = [
+    "minPrice", "priceFrom", "price_from", "priceMin", "min_price",
+    "priceGte", "from", "price_min", "minprice", "price.gte", "pricegte",
+]
+
+
+async def fetch_prices(page, url):
+    expr = (f"fetch({url!r},{{credentials:'include'}}).then(async r=>"
+            f"JSON.stringify({{status:r.status, body:await r.text()}}))")
+    raw = await page.evaluate(expr, await_promise=True)
+    try:
+        body = json.loads(raw).get("body", "")
+    except (json.JSONDecodeError, TypeError):
+        return None
+    m = PP.search(body or "")
+    if not m:
+        return None
+    try:
+        items = json.loads(m.group(1)).get("inventory", {}).get("items", [])
+    except json.JSONDecodeError:
+        return None
+    return [it.get("pricing", {}) for it in items if it.get("pricing")]
+
+
+async def main():
+    OUT.mkdir(exist_ok=True)
+    q = quote_plus(SEARCH)
+    lines = []
+    browser = await uc.start(headless=False, browser_executable_path=BROWSER_PATH,
+                             browser_args=["--blink-settings=imagesEnabled=false"])
+    try:
+        page = await browser.get(f"{BASE}?search={q}")
+        print(f"Warming ({SOLVE_SECONDS}s)..."); await page.sleep(SOLVE_SECONDS)
+
+        # Test minPrice/maxPrice semantics directly (old cs.money API used these).
+        tests = [
+            ("baseline", f"{BASE}?search={q}"),
+            ("maxPrice=200", f"{BASE}?search={q}&maxPrice=200"),
+            ("minPrice=300", f"{BASE}?search={q}&minPrice=300"),
+            ("minPrice=300&maxPrice=400", f"{BASE}?search={q}&minPrice=300&maxPrice=400"),
+            ("minPrice=500&maxPrice=1000", f"{BASE}?search={q}&minPrice=500&maxPrice=1000"),
+        ]
+        def rng(pr, field):
+            vals = [p.get(field) for p in pr if isinstance(p.get(field), (int, float))]
+            return (min(vals), max(vals)) if vals else (None, None)
+
+        for name, url in tests:
+            pr = await fetch_prices(page, url)
+            if not pr:
+                lines.append(f"{name:28} -> no items")
+            else:
+                d0, d1 = rng(pr, "default")
+                c0, c1 = rng(pr, "computed")
+                b0, b1 = rng(pr, "basePrice")
+                lines.append(f"{name:28} -> n={len(pr)} default[{d0:.2f},{d1:.2f}] "
+                             f"computed[{c0:.2f},{c1:.2f}] base[{b0:.2f},{b1:.2f}]")
+            print(lines[-1])
+
+        (OUT / "_price_param.txt").write_text(
+            f"search={SEARCH} floor={FLOOR}\n\n" + "\n".join(lines), encoding="utf-8")
+        print(f"\nsaved to {OUT/'_price_param.txt'}")
+    finally:
+        browser.stop()
+
+
+if __name__ == "__main__":
+    uc.loop().run_until_complete(main())
--- a/worker/entrypoint.sh
+++ b/worker/entrypoint.sh
@@ -0,0 +1,19 @@
+#!/usr/bin/env bash
+# Start a virtual display, expose it over noVNC, then run the worker headful against it.
+set -euo pipefail
+
+DISPLAY_NUM="${DISPLAY:-:99}"
+SCREEN="${SCREEN_GEOMETRY:-1440x900x24}"
+
+echo "[entrypoint] starting Xvfb on ${DISPLAY_NUM} (${SCREEN})"
+Xvfb "${DISPLAY_NUM}" -screen 0 "${SCREEN}" -nolisten tcp &
+sleep 1
+
+echo "[entrypoint] starting x11vnc (display ${DISPLAY_NUM} -> :5900)"
+x11vnc -display "${DISPLAY_NUM}" -forever -shared -nopw -quiet -bg
+
+echo "[entrypoint] starting noVNC on :6080 (open http://localhost:6080/vnc.html)"
+websockify --web=/usr/share/novnc 6080 localhost:5900 &
+
+echo "[entrypoint] launching worker"
+exec python worker.py
--- a/worker/poc.py
+++ b/worker/poc.py
@@ -0,0 +1,285 @@
+"""
+Proof-of-concept / pre-fleet validation for the cs.money scraper.
+
+Proves the things we need before building the C2 + worker fleet:
+  1. nodriver clears cs.money's Cloudflare where .NET Selenium couldn't.
+  2. a single WARM session can page the sell-orders API deeply without re-challenge.
+  3. a free-text market search (e.g. "cyber security ft") can be turned into a
+     filtered sell-orders API call — we DISCOVER the real API params by capturing the
+     request the page itself fires, instead of guessing.
+
+It opens the market (optionally a search URL) in a real non-headless Chromium, lets
+you clear Cloudflare, dismisses the cookie banner (privacy-preserving), captures the
+sell-orders request the page makes, then pages that API from inside the cleared page
+(same-origin fetch carries cf_clearance), pacing itself and stopping on re-challenge.
+
+    cd worker
+    .venv\\Scripts\\Activate.ps1
+    pip install -r requirements.txt
+
+    python poc.py                       # whole-market sweep
+    $env:SEARCH="cyber security ft"; python poc.py   # targeted: FT M4A4 Cyber Security
+
+Env knobs (all optional):
+    SEARCH         free-text market search; when set, scrape only those results
+    MARKET_URL     market page base (default the buy market)
+    SOLVE_SECONDS  seconds to wait for you to clear Cloudflare (default 30)
+    PAGES          how many offset pages (60 each) to attempt (default 20)
+    START_OFFSET   first offset (default 0)
+    DELAY / JITTER base + random seconds between fetches (default 2.0 / 1.5)
+    PROXY          host:port for an auth-free proxy (omit to use your own IP)
+    BROWSER_PATH   path to Chrome/Edge if auto-detect fails
+"""
+
+import json
+import os
+import pathlib
+import random
+from urllib.parse import quote_plus, urlsplit, parse_qsl, urlencode, urlunsplit
+
+import nodriver as uc
+from nodriver import cdp
+
+SEARCH = os.environ.get("SEARCH")
+MARKET_URL = os.environ.get("MARKET_URL", "https://cs.money/market/buy/")
+SOLVE_SECONDS = int(os.environ.get("SOLVE_SECONDS", "30"))
+PAGES = int(os.environ.get("PAGES", "20"))
+START_OFFSET = int(os.environ.get("START_OFFSET", "0"))
+DELAY = float(os.environ.get("DELAY", "2.0"))
+JITTER = float(os.environ.get("JITTER", "1.5"))
+PROXY = os.environ.get("PROXY")
+BROWSER_PATH = os.environ.get("BROWSER_PATH")
+
+# Fallback template if we fail to capture the page's own request (offset = {}).
+DEFAULT_TEMPLATE = "https://cs.money/2.0/market/sell-orders?limit=60&offset={}"
+OUT_DIR = pathlib.Path(__file__).parent / "captures"
+CONSENT_LABELS = ["Reject all", "Reject All", "Only necessary", "Necessary only",
+                  "Reject", "Decline", "Deny"]
+
+# Filled by the CDP network handler with sell-orders request URLs the page fires.
+_seen_urls: list[str] = []
+
+
+def looks_like_challenge(body: str) -> bool:
+    s = (body or "").lstrip()
+    return not s or s.startswith("<") or "Just a moment" in body or "challenge-platform" in body
+
+
+def decimals(v: float) -> int:
+    r = repr(float(v))
+    return len(r.split(".")[-1]) if "." in r else 0
+
+
+def template_from(url: str) -> str:
+    """Turn a captured sell-orders URL into a template with offset as '{}',
+    preserving every other param (the search/filter encoding we want to learn)."""
+    parts = urlsplit(url)
+    q = [(k, v) for k, v in parse_qsl(parts.query, keep_blank_values=True) if k != "offset"]
+    if not any(k == "limit" for k, _ in q):
+        q.append(("limit", "60"))
+    base_q = urlencode(q)
+    new_q = (base_q + "&" if base_q else "") + "offset={}"
+    return urlunsplit((parts.scheme, parts.netloc, parts.path, new_q, ""))
+
+
+async def dismiss_consent(page) -> str | None:
+    """Best-effort, privacy-preserving — never clicks 'Accept all'."""
+    for label in CONSENT_LABELS:
+        try:
+            el = await page.find(label, best_match=True, timeout=2)
+        except Exception:
+            el = None
+        if el:
+            try:
+                await el.click()
+                return label
+            except Exception:
+                pass
+    return None
+
+
+async def fetch_json(page, url: str) -> tuple[str, str]:
+    expr = (
+        f"fetch({url!r}, {{credentials:'include', headers:{{'accept':'application/json'}}}})"
+        f".then(async r => JSON.stringify({{status: r.status, body: await r.text()}}))"
+    )
+    raw = await page.evaluate(expr, await_promise=True)
+    if not isinstance(raw, str):
+        return ("-1", "")
+    try:
+        obj = json.loads(raw)
+        return (str(obj.get("status", "-1")), obj.get("body", ""))
+    except json.JSONDecodeError:
+        return ("-1", raw)
+
+
+async def main():
+    OUT_DIR.mkdir(exist_ok=True)
+    args = [f"--proxy-server={PROXY}"] if PROXY else []
+
+    target_url = MARKET_URL
+    tag = "market"
+    if SEARCH:
+        sep = "&" if "?" in MARKET_URL else "?"
+        target_url = f"{MARKET_URL}{sep}search={quote_plus(SEARCH)}"
+        tag = "search_" + "".join(c if c.isalnum() else "_" for c in SEARCH)[:40]
+
+    print(f"Launching nodriver Chromium (proxy={PROXY or 'none / own IP'})...")
+    browser = await uc.start(headless=False, browser_executable_path=BROWSER_PATH, browser_args=args)
+
+    pages_ok = items_total = floats_total = low_prec = 0
+    dp_min, dp_max = 99, 0
+    deepest_offset = None
+    reason = "completed (hit PAGES limit)"
+
+    try:
+        # Open a blank tab first so the network handler is attached BEFORE the page
+        # fires its filtered sell-orders request (otherwise we'd miss it).
+        page = await browser.get("about:blank")
+
+        async def on_request(evt):
+            url = evt.request.url
+            if "/market/sell-orders" in url:
+                _seen_urls.append(url)
+
+        page.add_handler(cdp.network.RequestWillBeSent, on_request)
+        try:
+            await page.send(cdp.network.enable())
+        except Exception as ex:
+            print(f"(network capture unavailable: {ex})")
+
+        print(f"Opening {target_url}")
+        await page.get(target_url)
+        print(f"Solve any Cloudflare challenge. Waiting {SOLVE_SECONDS}s for the grid...")
+        await page.sleep(SOLVE_SECONDS)
+
+        clicked = await dismiss_consent(page)
+        print(f"Consent banner: {'dismissed via ' + clicked if clicked else 'left up (does not block fetch)'}")
+
+        # Reliable discovery via the Resource Timing API: the browser records EVERY
+        # request the page made, so we read the real sell-orders URL straight out of it
+        # (no flaky CDP event timing). Also dump nearby API calls for context.
+        # cs.money is an Astro SSR app — the initial filtered listings are rendered
+        # server-side (no client XHR to capture). Scroll to provoke lazy-load
+        # pagination, which DOES fire a client request carrying the real filter params.
+        print("Scrolling to trigger lazy-load pagination...")
+        for _ in range(6):
+            try:
+                await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
+            except Exception:
+                pass
+            await page.sleep(2)
+
+        # nodriver returns arrays unreliably from evaluate(), so JSON.stringify in JS
+        # and json.loads here (the string path is proven by fetch_json).
+        async def js_list(expr: str) -> list:
+            raw = await page.evaluate(f"JSON.stringify({expr})")
+            try:
+                return json.loads(raw) if isinstance(raw, str) else []
+            except (json.JSONDecodeError, TypeError):
+                return []
+
+        try:
+            all_urls = await js_list("performance.getEntriesByType('resource').map(e=>e.name)")
+            print(f">>> Resource Timing saw {len(all_urls)} requests total")
+            if all_urls:
+                (OUT_DIR / "_all_requests.txt").write_text(
+                    "\n".join(dict.fromkeys(all_urls)), encoding="utf-8")
+            sell = [u for u in all_urls if "/market/sell-orders" in u]
+            _seen_urls.extend(sell)
+            api = [u for u in all_urls if "cs.money/" in u and ("/2.0/" in u or "/1.0/" in u)]
+            if api:
+                (OUT_DIR / "_api_calls.txt").write_text("\n".join(dict.fromkeys(api)), encoding="utf-8")
+                print(f">>> {len(set(api))} cs.money API calls; saved to {OUT_DIR / '_api_calls.txt'}")
+        except Exception as ex:
+            print(f"(resource-timing query failed: {ex})")
+
+        # Dump the SSR'd page so we can see how the filter is encoded and where the
+        # listings data lives (Astro embeds island props / hydration JSON in the HTML).
+        try:
+            html = await page.evaluate("document.documentElement.outerHTML")
+            if isinstance(html, str) and html:
+                (OUT_DIR / "_page.html").write_text(html, encoding="utf-8")
+                print(f">>> saved page HTML ({len(html)} bytes) to {OUT_DIR / '_page.html'}")
+        except Exception as ex:
+            print(f"(page HTML dump failed: {ex})")
+
+        # Discovery: what sell-orders request did the page actually make?
+        if _seen_urls:
+            captured = _seen_urls[-1]
+            template = template_from(captured)
+            print("\n>>> DISCOVERED sell-orders API call the page fired:")
+            print(f"    {captured}")
+            print(f">>> pagination template: {template}\n")
+            # Persist it — the console line is easy to lose, and this is the one bit
+            # of ground truth (the real filter-param scheme) we need.
+            (OUT_DIR / "_discovered.txt").write_text(
+                "ALL captured sell-orders requests:\n"
+                + "\n".join(dict.fromkeys(_seen_urls))
+                + f"\n\npagination template:\n{template}\n",
+                encoding="utf-8")
+            print(f">>> saved to {OUT_DIR / '_discovered.txt'}")
+        else:
+            template = DEFAULT_TEMPLATE
+            if SEARCH:
+                template = template.replace("offset={}", f"search={quote_plus(SEARCH)}&offset={{}}")
+            print(f"\n(no request captured; falling back to template: {template})\n")
+
+        for i in range(PAGES):
+            offset = START_OFFSET + i * 60
+            status, body = await fetch_json(page, template.format(offset))
+
+            if looks_like_challenge(body):
+                print(f"  page {i + 1} [offset {offset}]: RE-CHALLENGED (status {status}). Stopping.")
+                (OUT_DIR / f"{tag}_challenge_offset_{offset}.html").write_text(body, encoding="utf-8")
+                reason = f"re-challenged at offset {offset}"
+                break
+
+            try:
+                items = json.loads(body).get("items", [])
+            except json.JSONDecodeError:
+                print(f"  page {i + 1} [offset {offset}]: non-JSON (status {status}). Stopping.")
+                reason = f"non-JSON at offset {offset}"
+                break
+
+            if not items:
+                print(f"  page {i + 1} [offset {offset}]: 0 items — end of results.")
+                reason = "end of results"
+                break
+
+            (OUT_DIR / f"{tag}_offset_{offset:06d}.json").write_text(body, encoding="utf-8")
+            pages_ok += 1
+            deepest_offset = offset
+            items_total += len(items)
+            names = set()
+            for it in items:
+                fl = it.get("asset", {}).get("float")
+                if fl is not None:
+                    floats_total += 1
+                    d = decimals(fl)
+                    dp_min, dp_max = min(dp_min, d), max(dp_max, d)
+                    if d <= 6:  # short repr — exact binary fraction (e.g. 1/16), not truncation
+                        low_prec += 1
+                names.add(it.get("asset", {}).get("names", {}).get("full"))
+            sample = next(iter(names), None) if SEARCH else None
+            print(f"  page {i + 1} [offset {offset}] OK — {len(items)} items"
+                  + (f" (e.g. {sample}; {len(names)} distinct names)" if SEARCH else ""))
+
+            await page.sleep(DELAY + random.uniform(0, JITTER))
+
+        print("\n=== summary ===")
+        print(f"  query: {SEARCH or '(whole market)'}")
+        print(f"  stopped: {reason}")
+        print(f"  clean pages: {pages_ok}  deepest offset: {deepest_offset}  items: {items_total}")
+        if floats_total:
+            # Truncation would make MANY values short, not one exact binary fraction.
+            verdict = "FULL precision" if low_prec / floats_total < 0.02 else "POSSIBLE TRUNCATION"
+            print(f"  floats: {floats_total} items, {dp_max}-decimal max, "
+                  f"{low_prec} short-repr (exact fractions) — {verdict}")
+        print(f"  files in {OUT_DIR}")
+    finally:
+        browser.stop()
+
+
+if __name__ == "__main__":
+    uc.loop().run_until_complete(main())
--- a/worker/probe_filters.py
+++ b/worker/probe_filters.py
@@ -0,0 +1,77 @@
+"""
+Probe which extra filter params cs.money's SSR market search honors, so we can
+pick a SECOND pagination axis to break apart dense price bands that saturate the
+60-cap (see diag_windows.py). For a saturating search we try candidate params and
+report how the returned set's size + float range + price range change.
+
+    python probe_filters.py "Glock-18 Candy Apple mw"
+"""
+
+import asyncio
+import sys
+
+import nodriver as uc
+
+import worker
+
+BASE = "https://cs.money/market/buy/?search={q}"
+# (label, extra query string) — candidates cs.money markets commonly expose.
+CANDIDATES = [
+    ("baseline", ""),
+    ("sort=price asc", "&order=asc&sort=price"),
+    ("sort=price desc", "&order=desc&sort=price"),
+    ("sort=float", "&sort=float"),
+    ("minFloat/maxFloat lo", "&minFloat=0.07&maxFloat=0.10"),
+    ("minFloat/maxFloat hi", "&minFloat=0.10&maxFloat=0.15"),
+    ("maxWear lo", "&minWear=0.07&maxWear=0.10"),
+    ("isStatTrak=true", "&isStatTrak=true"),
+    ("hasStickers=false", "&hasStickers=false"),
+]
+
+
+def stats(items):
+    floats = [(((it.get("asset") or {}).get("float"))) for it in items]
+    floats = [f for f in floats if isinstance(f, (int, float))]
+    bases = []
+    for it in items:
+        p = it.get("pricing") or {}
+        b = p.get("basePrice", p.get("computed"))
+        if isinstance(b, (int, float)):
+            bases.append(b)
+    fr = f"[{min(floats):.4f},{max(floats):.4f}]" if floats else "[-]"
+    br = f"[{min(bases):.2f},{max(bases):.2f}]" if bases else "[-]"
+    return f"n={len(items):3d}  float{fr}  base{br}"
+
+
+async def main():
+    search = " ".join(sys.argv[1:]) or "Glock-18 Candy Apple mw"
+    q = worker.urllib.parse.quote_plus(search)
+
+    args = ["--blink-settings=imagesEnabled=false"]
+    browser = await uc.start(headless=False, browser_args=args)
+    try:
+        page = await browser.get("about:blank")
+        await worker.warm(page)
+
+        base_ids = None
+        for label, extra in CANDIDATES:
+            url = BASE.format(q=q) + extra
+            status, body = await worker.fetch_json(page, url)
+            if "Just a moment" in body or "challenge-platform" in body:
+                print(f"  {label:24s} CHALLENGED"); break
+            items = worker.extract_items(body)
+            ids = {it.get("id") for it in items}
+            if label == "baseline":
+                base_ids = ids
+                delta = ""
+            else:
+                # If a param is IGNORED, the set is identical to baseline.
+                delta = "IGNORED (== baseline)" if ids == base_ids else f"CHANGED ({len(ids ^ (base_ids or set()))} diff ids)"
+            print(f"  {label:24s} {stats(items)}  {delta}")
+            await page.sleep(worker.DELAY)
+    finally:
+        browser.stop()
+
+
+if __name__ == "__main__":
+    uc.loop().run_until_complete(main())
--- a/worker/requirements.txt
+++ b/worker/requirements.txt
@@ -0,0 +1,5 @@
+# cs.money scraping worker.
+# nodriver = the modern successor to undetected-chromedriver: it drives a normal
+# Chromium over CDP directly (no chromedriver, so none of the cdc_/webdriver tells
+# that got our .NET Selenium setup insta-challenged by Cloudflare).
+nodriver>=0.39
--- a/worker/verify_count.py
+++ b/worker/verify_count.py
@@ -0,0 +1,77 @@
+"""
+One-off count verification: scrape a single skin+wear search from cs.money and
+report how many distinct sell-orders come back, reusing the production worker's
+warm-session + price-window bisection logic (worker.scrape_job).
+
+Use it to sanity-check that our pagination actually recovers the FULL listing
+count cs.money shows on the site (the known ground truth) for one query.
+
+    cd worker
+    .venv\\Scripts\\Activate.ps1
+    python verify_count.py "Desert Eagle Bronze Deco fn"
+
+Env knobs (same meaning as worker.py): SOLVE_SECONDS, DELAY, JITTER, PROXY,
+BROWSER_PATH, LOAD_IMAGES. MAX_FETCHES caps window fetches (default 80).
+"""
+
+import asyncio
+import os
+import sys
+from collections import Counter
+
+import nodriver as uc
+
+import worker
+
+MAX_FETCHES = int(os.environ.get("MAX_FETCHES", "80"))
+
+
+async def main():
+    search = " ".join(sys.argv[1:]) or "Desert Eagle Bronze Deco fn"
+
+    args = [f"--proxy-server={worker.PROXY}"] if worker.PROXY else []
+    if not worker.LOAD_IMAGES:
+        args.append("--blink-settings=imagesEnabled=false")
+    if os.environ.get("CHROME_NO_SANDBOX") == "1":
+        args += ["--no-sandbox", "--disable-dev-shm-usage"]
+
+    print(f"Verifying count for search {search!r} (proxy={worker.PROXY or 'own IP'})")
+    browser = await uc.start(
+        headless=False, browser_executable_path=worker.BROWSER_PATH, browser_args=args)
+    try:
+        page = await browser.get("about:blank")
+        await worker.warm(page)
+
+        job = {"search": search, "maxPages": MAX_FETCHES}
+        items, fetches, reason = await worker.scrape_job(page, job)
+
+        print("\n=== result ===")
+        print(f"  search:   {search}")
+        print(f"  stopped:  {reason}")
+        print(f"  fetches:  {fetches}")
+        print(f"  DISTINCT sell-orders (deduped by id): {len(items)}")
+
+        # Break down what came back so we can see whether the count is inflated by
+        # off-target names/wears (the C2's name+wear filter would drop those later).
+        names = Counter()
+        wears = Counter()
+        st = 0
+        for it in items:
+            asset = it.get("asset") or {}
+            names[(asset.get("names") or {}).get("full")] += 1
+            wears[asset.get("quality")] += 1
+            if asset.get("isStatTrak"):
+                st += 1
+        print(f"  StatTrak in set: {st}")
+        print("  by name:")
+        for name, n in names.most_common():
+            print(f"      {n:4d}  {name}")
+        print("  by wear (quality code):")
+        for w, n in wears.most_common():
+            print(f"      {n:4d}  {w}")
+    finally:
+        browser.stop()
+
+
+if __name__ == "__main__":
+    uc.loop().run_until_complete(main())
--- a/worker/verify_crosscheck.py
+++ b/worker/verify_crosscheck.py
@@ -0,0 +1,79 @@
+"""
+Validate the float-cursor scrape by walking the float axis in BOTH directions and
+comparing the recovered sell-order id sets. If ascending (lowest float first) and
+descending (highest float first) independently land on the same listings, the
+cursor is exhaustive and order-independent — i.e. the count is real, not an artifact
+of walk direction or boundary double-counting.
+
+    python verify_crosscheck.py "Glock-18 Candy Apple mw"
+"""
+
+import asyncio
+import sys
+
+import nodriver as uc
+
+import worker
+
+CAP = worker.PAGE_CAP
+ASC = ("https://cs.money/market/buy/?search={q}"
+       "&order=asc&sort=float&minFloat={cur:.12f}&maxFloat=1")
+DESC = ("https://cs.money/market/buy/?search={q}"
+        "&order=desc&sort=float&minFloat=0&maxFloat={cur:.12f}")
+
+
+async def walk(page, q, template, ascending, max_fetches=60):
+    seen = {}
+    cur = 0.0 if ascending else 1.0
+    fetches = 0
+    while fetches < max_fetches:
+        status, body = await worker.fetch_json(page, template.format(q=q, cur=cur))
+        fetches += 1
+        if "Just a moment" in body or "challenge-platform" in body:
+            return seen, fetches, "challenged"
+        items = worker.extract_items(body)
+        floats = []
+        for it in items:
+            if it.get("id") is not None:
+                seen[it["id"]] = it
+            fl = (it.get("asset") or {}).get("float")
+            if isinstance(fl, (int, float)):
+                floats.append(fl)
+        if len(items) < CAP:
+            return seen, fetches, "completed"
+        nxt = (max(floats) if ascending else min(floats)) if floats else None
+        if nxt is None or (ascending and nxt <= cur) or (not ascending and nxt >= cur):
+            return seen, fetches, "stuck"
+        cur = nxt
+        await page.sleep(worker.DELAY)
+    return seen, fetches, "fetch-cap"
+
+
+async def main():
+    search = " ".join(sys.argv[1:]) or "Glock-18 Candy Apple mw"
+    q = worker.urllib.parse.quote_plus(search)
+    browser = await uc.start(headless=False, browser_args=["--blink-settings=imagesEnabled=false"])
+    try:
+        page = await browser.get("about:blank")
+        await worker.warm(page)
+
+        asc, fa, ra = await walk(page, q, ASC, ascending=True)
+        print(f"ASC : {len(asc):4d} ids   {fa} fetches   {ra}")
+        desc, fd, rd = await walk(page, q, DESC, ascending=False)
+        print(f"DESC: {len(desc):4d} ids   {fd} fetches   {rd}")
+
+        a, d = set(asc), set(desc)
+        union = a | d
+        print("\n=== cross-check ===")
+        print(f"  ASC only:        {len(a - d)}")
+        print(f"  DESC only:       {len(d - a)}")
+        print(f"  in both:         {len(a & d)}")
+        print(f"  UNION (distinct):{len(union)}")
+        agree = "AGREE — count is solid" if a == d else "DISAGREE — one walk missed listings"
+        print(f"  verdict: {agree}")
+    finally:
+        browser.stop()
+
+
+if __name__ == "__main__":
+    uc.loop().run_until_complete(main())
--- a/worker/worker.py
+++ b/worker/worker.py
@@ -0,0 +1,453 @@
+"""
+cs.money scrape worker (pull model).
+
+Holds ONE warm nodriver session (the thing that beats Cloudflare), then loops:
+poll the .NET C2 for a job, scrape that skin+wear's sell-orders via in-page fetch
+from the cleared session, and post the results back. The C2 owns job selection
+(stalest skin+wear first) and persistence; this worker just fetches and forwards.
+
+    cd worker
+    .venv\\Scripts\\Activate.ps1
+    pip install -r requirements.txt
+    python worker.py
+
+Env knobs:
+    C2_URL              C2 base URL (default http://localhost:5080)
+    WORKER_TOKEN        shared secret, must match the C2's WorkerToken (default dev-worker-token)
+    MARKET_URL          market page to warm the session on (default the buy market)
+    SOLVE_SECONDS       seconds to clear Cloudflare on startup (default 30)
+    DELAY / JITTER      base + random seconds between page fetches (default 2.0 / 1.5)
+    IDLE_SECONDS        sleep when the C2 has no work (default 10)
+    BROWSER_PATH        path to Chrome/Edge if auto-detect fails
+
+Proxy (pick one; IPRoyal takes priority when its creds are set):
+    IPROYAL_USERNAME    IPRoyal residential account username
+    IPROYAL_PASSWORD    IPRoyal residential account password
+    IPROYAL_COUNTRY     ISO country for the exit (default us; blank = any)
+    IPROYAL_LIFETIME_MIN sticky-IP hold in minutes (default 60)
+    PROXY               host:port for an auth-free proxy (fallback; omit to use your own IP)
+
+Each worker process mints its own random IPRoyal sticky session at startup, so N
+workers get N distinct residential exit IPs with no coordination — scale with
+`docker compose up --scale worker=N`. On a Cloudflare challenge the worker rotates
+to a fresh session (new IP) and re-warms. Chromium can't carry proxy credentials on
+--proxy-server, so we run a tiny in-process forwarder (LocalForwardingProxy below)
+that injects the IPRoyal auth and chains to the gateway; Chrome talks only to an
+auth-free 127.0.0.1 endpoint, keeping us at zero CDP (a CDP auth handler is a
+Cloudflare tell).
+"""
+
+import asyncio
+import base64
+import json
+import os
+import random
+import re
+import urllib.error
+import urllib.parse
+import urllib.request
+import uuid
+
+import nodriver as uc
+
+C2_URL = os.environ.get("C2_URL", "http://localhost:5080").rstrip("/")
+TOKEN = os.environ.get("WORKER_TOKEN", "dev-worker-token")
+MARKET_URL = os.environ.get("MARKET_URL", "https://cs.money/market/buy/")
+SOLVE_SECONDS = int(os.environ.get("SOLVE_SECONDS", "30"))
+DELAY = float(os.environ.get("DELAY", "2.0"))
+JITTER = float(os.environ.get("JITTER", "1.5"))
+IDLE_SECONDS = int(os.environ.get("IDLE_SECONDS", "10"))
+PROXY = os.environ.get("PROXY")
+BROWSER_PATH = os.environ.get("BROWSER_PATH")
+
+# IPRoyal residential gateway. One fixed host/port; country, sticky-session id and
+# lifetime are encoded as underscore params appended to the password (see
+# _iproyal_password). Mirrors the .NET IpRoyalProxyProvider scheme.
+IPROYAL_HOST = os.environ.get("IPROYAL_HOST", "geo.iproyal.com")
+IPROYAL_PORT = int(os.environ.get("IPROYAL_PORT", "12321"))
+IPROYAL_USERNAME = os.environ.get("IPROYAL_USERNAME")
+IPROYAL_PASSWORD = os.environ.get("IPROYAL_PASSWORD")
+IPROYAL_COUNTRY = os.environ.get("IPROYAL_COUNTRY", "us").strip().lower()
+IPROYAL_LIFETIME_MIN = int(os.environ.get("IPROYAL_LIFETIME_MIN", "60"))
+# Residential proxy is metered per GB. Cloudflare gates on JS, not images, and the
+# sell-orders API is pure JSON — so block images by default to slash page-render
+# bandwidth. Set LOAD_IMAGES=1 to re-enable (e.g. for debugging the visible page).
+LOAD_IMAGES = os.environ.get("LOAD_IMAGES") == "1"
+
+# cs.money is an Astro SSR app: the free-text market search filters server-side and
+# the resulting listings are embedded in the page as a __page-params JSON blob. The
+# /2.0/market/sell-orders API rejects a `search` param (HTTP 400), so we fetch the
+# PAGE for a search and read the embedded items — same item shape as the API.
+#
+# A page returns at most 60 and offset is ignored, so we paginate with a FORWARD
+# CURSOR on float: cs.money honors `order=asc&sort=float` + `minFloat`, and float is
+# full-precision and effectively unique per item. We grab the 60 lowest-float items
+# at/above `lo`, advance `lo` to the highest float returned, and repeat until a page
+# is under the cap. (The old minPrice/maxPrice bisection silently truncated cheap
+# skins: >60 listings can share a sub-$0.02 reference band, which no price window can
+# split — floats almost never tie, so the cursor always makes progress.)
+PAGE = ("https://cs.money/market/buy/?search={search}"
+        "&order=asc&sort=float&minFloat={lo:.12f}&maxFloat=1")
+PAGE_CAP = 60          # items per SSR page
+PAGE_PARAMS_RE = re.compile(
+    r'<script\b[^>]*id="__page-params"[^>]*>(.*?)</script>', re.S)
+
+
+# --- IPRoyal residential proxy ----------------------------------------------------
+
+def _new_session_id() -> str:
+    """Short, opaque, URL-safe token. IPRoyal pins one residential exit IP per
+    distinct session value, so a fresh id == a fresh IP."""
+    return uuid.uuid4().hex[:10]
+
+
+def _iproyal_password(session_id: str) -> str:
+    """Bake the targeting/session knobs onto the account password, IPRoyal-style:
+    "<pass>_country-us_session-<id>_lifetime-60m". Country is optional."""
+    pw = IPROYAL_PASSWORD
+    if IPROYAL_COUNTRY:
+        pw += f"_country-{IPROYAL_COUNTRY}"
+    pw += f"_session-{session_id}_lifetime-{IPROYAL_LIFETIME_MIN}m"
+    return pw
+
+
+class LocalForwardingProxy:
+    """In-process HTTP proxy on 127.0.0.1 that chains every connection to the IPRoyal
+    gateway, injecting the Proxy-Authorization header itself. Chromium ignores creds in
+    --proxy-server and the in-browser ways to answer the gateway's 407 (a CDP auth
+    handler, or a disabled MV2 extension) are Cloudflare tells — so we terminate the
+    browser->proxy hop locally and add auth here, leaving Chrome to talk to an auth-free
+    endpoint at zero CDP. HTTPS (all cs.money serves) flows through the CONNECT tunnel,
+    so this proxy only relays ciphertext and never sees plaintext. Ported from the .NET
+    LocalForwardingProxy. The active session token can be swapped live (set_password) to
+    move to a fresh exit IP without restarting the browser. (New tunnels pick up the new
+    IP; any still-open keep-alive tunnel stays on the old one until it closes.)"""
+
+    def __init__(self, host: str, port: int, username: str, password: str):
+        self._host = host
+        self._port = port
+        self._username = username
+        self._password = password
+        self._server: asyncio.AbstractServer | None = None
+        self.endpoint = ""
+
+    def set_password(self, password: str) -> None:
+        self._password = password
+
+    def _auth_header(self) -> str:
+        token = base64.b64encode(f"{self._username}:{self._password}".encode()).decode()
+        return f"Proxy-Authorization: Basic {token}\r\n"
+
+    async def start(self) -> "LocalForwardingProxy":
+        self._server = await asyncio.start_server(self._handle, "127.0.0.1", 0)
+        port = self._server.sockets[0].getsockname()[1]
+        self.endpoint = f"127.0.0.1:{port}"
+        return self
+
+    async def stop(self) -> None:
+        if self._server is not None:
+            self._server.close()
+            try:
+                await self._server.wait_closed()
+            except Exception:
+                pass
+
+    @staticmethod
+    async def _read_header(reader: asyncio.StreamReader) -> str | None:
+        """Read up to the end of the HTTP header block (CRLFCRLF). None on EOF/overflow."""
+        try:
+            data = await reader.readuntil(b"\r\n\r\n")
+        except (asyncio.IncompleteReadError, asyncio.LimitOverrunError):
+            return None
+        return data.decode("latin-1")
+
+    async def _handle(self, client_reader: asyncio.StreamReader, client_writer: asyncio.StreamWriter) -> None:
+        up_writer: asyncio.StreamWriter | None = None
+        try:
+            header = await self._read_header(client_reader)
+            if not header:
+                return
+            parts = header.split("\r\n", 1)[0].split(" ")
+            if len(parts) < 2:
+                return
+            method, target = parts[0], parts[1]
+
+            up_reader, up_writer = await asyncio.open_connection(self._host, self._port)
+            if method.upper() == "CONNECT":
+                # HTTPS: open an authenticated tunnel upstream, then relay raw bytes.
+                up_writer.write(
+                    f"CONNECT {target} HTTP/1.1\r\nHost: {target}\r\n{self._auth_header()}\r\n".encode())
+                await up_writer.drain()
+                up_header = await self._read_header(up_reader)
+                status = up_header.split(" ", 2) if up_header else []
+                if len(status) < 2 or status[1] != "200":
+                    line = (up_header or "no response").split("\r\n", 1)[0]
+                    print(f"  proxy: upstream refused CONNECT {target}: {line}")
+                    client_writer.write(b"HTTP/1.1 502 Bad Gateway\r\nConnection: close\r\n\r\n")
+                    await client_writer.drain()
+                    return
+                client_writer.write(b"HTTP/1.1 200 Connection established\r\n\r\n")
+                await client_writer.drain()
+            else:
+                # Plain HTTP: re-inject the request upstream with auth, then relay.
+                idx = header.index("\r\n") + 2
+                up_writer.write((header[:idx] + self._auth_header() + header[idx:]).encode())
+                await up_writer.drain()
+
+            await self._relay(client_reader, client_writer, up_reader, up_writer)
+        except Exception:
+            pass  # one bad tunnel must never take down the listener
+        finally:
+            for w in (client_writer, up_writer):
+                if w is not None:
+                    try:
+                        w.close()
+                    except Exception:
+                        pass
+
+    @staticmethod
+    async def _relay(
+        client_reader: asyncio.StreamReader, client_writer: asyncio.StreamWriter,
+        up_reader: asyncio.StreamReader, up_writer: asyncio.StreamWriter) -> None:
+        async def pipe(reader: asyncio.StreamReader, writer: asyncio.StreamWriter) -> None:
+            try:
+                while data := await reader.read(65536):
+                    writer.write(data)
+                    await writer.drain()
+            except Exception:
+                pass
+        await asyncio.gather(
+            pipe(client_reader, up_writer),
+            pipe(up_reader, client_writer),
+        )
+
+
+def looks_like_challenge(body: str) -> bool:
+    s = (body or "").lstrip()
+    return not s or s.startswith("<") or "Just a moment" in body or "challenge-platform" in body
+
+
+# --- C2 HTTP (stdlib, run off the event loop) -------------------------------------
+
+def _get_job_sync():
+    req = urllib.request.Request(f"{C2_URL}/jobs/next", headers={"X-Worker-Token": TOKEN})
+    try:
+        with urllib.request.urlopen(req, timeout=15) as r:
+            if r.status == 204:
+                return None
+            return json.loads(r.read() or b"null")
+    except urllib.error.HTTPError as e:
+        print(f"  C2 /jobs/next -> HTTP {e.code}")
+        return None
+    except urllib.error.URLError as e:
+        print(f"  C2 unreachable: {e}")
+        return None
+
+
+def _post_result_sync(job_id: str, payload: dict):
+    data = json.dumps(payload).encode()
+    req = urllib.request.Request(
+        f"{C2_URL}/jobs/{job_id}/result", data=data, method="POST",
+        headers={"X-Worker-Token": TOKEN, "Content-Type": "application/json"})
+    try:
+        with urllib.request.urlopen(req, timeout=60) as r:
+            return json.loads(r.read() or b"null")
+    except urllib.error.HTTPError as e:
+        print(f"  C2 result -> HTTP {e.code}: {e.read()[:200]!r}")
+        return None
+    except urllib.error.URLError as e:
+        print(f"  C2 unreachable posting result: {e}")
+        return None
+
+
+async def get_job():
+    return await asyncio.to_thread(_get_job_sync)
+
+
+async def post_result(job_id, payload):
+    return await asyncio.to_thread(_post_result_sync, job_id, payload)
+
+
+# --- scraping ---------------------------------------------------------------------
+
+async def fetch_json(page, url: str) -> tuple[str, str]:
+    expr = (
+        f"fetch({url!r}, {{credentials:'include', headers:{{'accept':'application/json'}}}})"
+        f".then(async r => JSON.stringify({{status: r.status, body: await r.text()}}))"
+    )
+    raw = await page.evaluate(expr, await_promise=True)
+    if not isinstance(raw, str):
+        return ("-1", "")
+    try:
+        obj = json.loads(raw)
+        return (str(obj.get("status", "-1")), obj.get("body", ""))
+    except json.JSONDecodeError:
+        return ("-1", raw)
+
+
+async def _click(page, text, timeout=3):
+    try:
+        el = await page.find(text, best_match=True, timeout=timeout)
+        if el:
+            await el.click()
+            return True
+    except Exception:
+        pass
+    return False
+
+
+async def dismiss_consent(page):
+    """Privacy-preserving. The banner only offers 'Accept all' / 'Manage cookies';
+    the Reject-all control lives inside the Manage window. So: Manage -> Reject all ->
+    Confirm. (The data path reads SSR __page-params regardless, but this keeps the
+    session honest and unblocks any future interaction.)"""
+    steps = []
+    if await _click(page, "Manage cookies") or await _click(page, "Manage"):
+        await page.sleep(1)
+        if await _click(page, "Reject all"):
+            steps.append("reject-all")
+        for c in ("Confirm my choice", "Confirm", "Save"):
+            if await _click(page, c):
+                steps.append(f"confirm:{c}")
+                break
+    return ", ".join(steps) if steps else None
+
+
+async def warm(page):
+    """Open the market and clear Cloudflare so the session holds cf_clearance."""
+    print(f"Warming session at {MARKET_URL} (clear Cloudflare; {SOLVE_SECONDS}s)...")
+    await page.get(MARKET_URL)
+    await page.sleep(SOLVE_SECONDS)
+    clicked = await dismiss_consent(page)
+    print(f"Consent: {'dismissed via ' + clicked if clicked else 'left up'}")
+
+
+def extract_items(html: str) -> list:
+    """Pull inventory.items out of the page's __page-params JSON blob."""
+    m = PAGE_PARAMS_RE.search(html)
+    if not m:
+        return []
+    try:
+        return json.loads(m.group(1)).get("inventory", {}).get("items", []) or []
+    except json.JSONDecodeError:
+        return []
+
+
+async def scrape_job(page, job) -> tuple[list, int, str]:
+    """Scrape ALL listings for one skin+wear via a forward float cursor.
+
+    A search page returns at most 60 items and ignores offset, but cs.money sorts by
+    float (order=asc&sort=float) and filters by minFloat. So we walk the float axis:
+    grab the 60 lowest-float items at/above `lo`, advance `lo` to the highest float on
+    the page, and repeat until a page is under the cap. The boundary item is re-fetched
+    (minFloat is inclusive) and dropped by the id dedup. Returns (items, fetches, reason).
+    """
+    search = urllib.parse.quote_plus(job["search"])
+    max_fetches = job.get("maxPages", 40)  # safety cap on page fetches per job
+    seen: dict = {}
+    fetches = 0
+    lo = 0.0
+    reason = "completed"
+
+    while fetches < max_fetches:
+        status, body = await fetch_json(page, PAGE.format(search=search, lo=lo))
+        fetches += 1
+
+        if "Just a moment" in body or "challenge-platform" in body:
+            return list(seen.values()), fetches, "challenged"
+
+        items = extract_items(body)
+        floats = []
+        for it in items:
+            if it.get("id") is not None:
+                seen[it["id"]] = it
+            fl = (it.get("asset") or {}).get("float")
+            if isinstance(fl, (int, float)):
+                floats.append(fl)
+
+        if len(items) < PAGE_CAP:
+            break  # last page — fewer than the cap means we've seen everything
+
+        # Advance the cursor past the highest float on this page. Items at exactly that
+        # float are re-fetched next round (minFloat is inclusive) and deduped by id.
+        nxt = max(floats) if floats else None
+        if nxt is None or nxt <= lo:
+            # Cursor can't advance: >60 listings share a single float value, or the
+            # items carry no float. Bail loudly rather than spin — a flagged gap beats
+            # a silent one (this is the failure the price-window version hid).
+            reason = "stuck-float-tie"
+            break
+        lo = nxt
+
+        await page.sleep(DELAY + random.uniform(0, JITTER))
+    else:
+        reason = "fetch-cap"
+
+    return list(seen.values()), fetches, reason
+
+
+async def main():
+    # IPRoyal (auth'd, per-worker sticky IP) takes priority; else a plain auth-free
+    # PROXY; else this host's own IP. The forwarder injects IPRoyal auth so Chrome
+    # only ever sees an auth-free 127.0.0.1 endpoint.
+    forwarder = None
+    session_id = None
+    if IPROYAL_USERNAME and IPROYAL_PASSWORD:
+        session_id = _new_session_id()
+        forwarder = await LocalForwardingProxy(
+            IPROYAL_HOST, IPROYAL_PORT, IPROYAL_USERNAME, _iproyal_password(session_id)).start()
+        proxy = forwarder.endpoint
+        proxy_label = f"iproyal[{IPROYAL_COUNTRY or 'any'}] session {session_id} via {forwarder.endpoint}"
+    else:
+        proxy = PROXY
+        proxy_label = PROXY or "own IP"
+
+    args = [f"--proxy-server={proxy}"] if proxy else []
+    if not LOAD_IMAGES:
+        # Disable image loading at the engine level — the dominant bandwidth cost on
+        # an image-heavy market, and unneeded for CF clearance or the JSON API.
+        args.append("--blink-settings=imagesEnabled=false")
+    if os.environ.get("CHROME_NO_SANDBOX") == "1":
+        # Required when running Chromium as root in a container.
+        args += ["--no-sandbox", "--disable-dev-shm-usage"]
+    print(f"Starting worker (C2={C2_URL}, proxy={proxy_label}, images={'on' if LOAD_IMAGES else 'off'})...")
+    browser = await uc.start(headless=False, browser_executable_path=BROWSER_PATH, browser_args=args)
+    try:
+        page = await browser.get("about:blank")
+        await warm(page)
+
+        while True:
+            job = await get_job()
+            if not job:
+                await asyncio.sleep(IDLE_SECONDS)
+                continue
+
+            print(f"Job {job['jobId'][:8]} — search {job['search']!r}")
+            items, pages, reason = await scrape_job(page, job)
+
+            if reason == "challenged":
+                # The exit IP is likely flagged. On IPRoyal, rotate to a fresh sticky
+                # session (new IP) before re-warming; otherwise just re-solve in place.
+                if forwarder is not None:
+                    session_id = _new_session_id()
+                    forwarder.set_password(_iproyal_password(session_id))
+                    print(f"  challenged; rotating exit IP -> session {session_id}, re-warming...")
+                else:
+                    print("  re-challenged; re-warming session...")
+                await warm(page)
+
+            result = await post_result(job["jobId"], {
+                "items": items, "pages": pages, "stoppedReason": reason})
+            summary = (f"matched {result.get('matched')}, new {result.get('inserted')}, "
+                       f"upd {result.get('updated')}, removed {result.get('removed')}") if result else "post failed"
+            print(f"  scraped {len(items)} items ({pages}p, {reason}) -> {summary}")
+
+            await page.sleep(DELAY + random.uniform(0, JITTER))
+    finally:
+        browser.stop()
+        if forwarder is not None:
+            await forwarder.stop()
+
+
+if __name__ == "__main__":
+    uc.loop().run_until_complete(main())