Add cs.money worker stack with per-worker IPRoyal residential proxy
Brings up the pull-model scraper: the .NET C2 hands skin+wear jobs to Python nodriver workers that scrape cs.money and post results back, plus the supporting Core/EFCore data model, migrations, and docker-compose orchestration. IPRoyal proxying lets workers scale horizontally with a distinct residential exit IP each: every worker process mints its own sticky session at startup, and an in-process forwarding proxy injects the gateway auth so Chromium talks only to an auth-free localhost endpoint (zero CDP). On a Cloudflare challenge a worker rotates to a fresh session/IP and re-warms. Verified end-to-end against live IPRoyal: distinct US residential exits per worker and IP rotation on demand. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
72
worker/README.md
Normal file
72
worker/README.md
Normal file
@@ -0,0 +1,72 @@
|
||||
# cs.money worker (Python)
|
||||
|
||||
The browser/Cloudflare layer for the cs.money scraper. .NET stays the **C2**
|
||||
(orchestration, proxy/IP allocation, DB, the sweep loop); this worker is the only
|
||||
component that drives a browser and defeats Cloudflare, because the effective
|
||||
anti-bot tooling (`nodriver`/`undetected-chromedriver`, TLS impersonation) only
|
||||
exists in Python/Go, not .NET.
|
||||
|
||||
## Why nodriver
|
||||
|
||||
.NET Selenium got insta-challenged by Cloudflare's managed challenge because
|
||||
`msedgedriver` controls the browser via the DevTools protocol, leaving `navigator.
|
||||
webdriver` and chromedriver `cdc_` artifacts that Cloudflare keys on. `nodriver`
|
||||
drives a normal Chromium directly over CDP (no chromedriver) and patches those
|
||||
tells, so it passes where Selenium loops.
|
||||
|
||||
## Step 1: prove it (current)
|
||||
|
||||
`poc.py` proves nodriver can clear cs.money's Cloudflare and fetch the listings API
|
||||
before we build the full pull-based fleet.
|
||||
|
||||
```powershell
|
||||
cd worker
|
||||
py -m venv .venv
|
||||
.venv\Scripts\Activate.ps1
|
||||
pip install -r requirements.txt
|
||||
python poc.py
|
||||
```
|
||||
|
||||
A Chromium window opens on the market. Solve the Cloudflare check if shown; the
|
||||
script waits, then pages `sell-orders` deeply (PAGES), reporting how far the warm
|
||||
session survives before any re-challenge and confirming full float precision.
|
||||
Output lands in `worker/captures/`.
|
||||
|
||||
**Targeted skin+wear search.** cs.money search is free-text on the page
|
||||
(`?search=cyber+security+ft`). Set `SEARCH` and the PoC navigates there, **captures
|
||||
the actual filtered `sell-orders` API request the page fires** (so we learn the real
|
||||
filter params instead of guessing), prints it, then pages that filtered API:
|
||||
|
||||
```powershell
|
||||
$env:SEARCH="cyber security ft"; python poc.py # FT M4A4 Cyber Security only
|
||||
```
|
||||
|
||||
The `>>> DISCOVERED sell-orders API call` line shows how the search maps to API
|
||||
params — that's how the C2 will build targeted jobs.
|
||||
|
||||
Run on your own IP first (no proxy) — that's the clean A/B vs. the Selenium run.
|
||||
If auto-detect can't find a browser, set `BROWSER_PATH` to Chrome or Edge
|
||||
(`C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe`).
|
||||
|
||||
## Step 2: the pull fleet
|
||||
|
||||
`worker.py` holds one warm nodriver session and loops: poll the .NET C2 for a job
|
||||
(a skin+wear search), scrape that search's sell-orders via in-page fetch, and post
|
||||
the items back. The C2 (`BlueLaminate.C2`) picks the stalest skin+wear from the
|
||||
catalogue, and on result persists to `cs_money_listings` + `price_history`
|
||||
(`Source = "csmoney"`), stamping `SkinCondition.ListingsSweptAt`.
|
||||
|
||||
Run the C2 (needs Postgres migrated), then the worker:
|
||||
|
||||
```powershell
|
||||
# terminal 1 — the C2 (from repo root)
|
||||
dotnet run --project BlueLaminate\BlueLaminate.C2 # serves http://localhost:5080
|
||||
|
||||
# terminal 2 — the worker
|
||||
cd worker; .venv\Scripts\Activate.ps1
|
||||
$env:WORKER_TOKEN="dev-worker-token" # must match the C2's WorkerToken
|
||||
python worker.py
|
||||
```
|
||||
|
||||
The worker warms the session (you clear Cloudflare once), then runs continuously.
|
||||
Scale out by starting more workers (each with its own `PROXY`).
|
||||
Reference in New Issue
Block a user