almost ready
This commit is contained in:
@@ -14,47 +14,27 @@ webdriver` and chromedriver `cdc_` artifacts that Cloudflare keys on. `nodriver`
|
||||
drives a normal Chromium directly over CDP (no chromedriver) and patches those
|
||||
tells, so it passes where Selenium loops.
|
||||
|
||||
## Step 1: prove it (current)
|
||||
|
||||
`poc.py` proves nodriver can clear cs.money's Cloudflare and fetch the listings API
|
||||
before we build the full pull-based fleet.
|
||||
## Local setup
|
||||
|
||||
```powershell
|
||||
cd worker
|
||||
py -m venv .venv
|
||||
.venv\Scripts\Activate.ps1
|
||||
pip install -r requirements.txt
|
||||
python poc.py
|
||||
```
|
||||
|
||||
A Chromium window opens on the market. Solve the Cloudflare check if shown; the
|
||||
script waits, then pages `sell-orders` deeply (PAGES), reporting how far the warm
|
||||
session survives before any re-challenge and confirming full float precision.
|
||||
Output lands in `worker/captures/`.
|
||||
|
||||
**Targeted skin+wear search.** cs.money search is free-text on the page
|
||||
(`?search=cyber+security+ft`). Set `SEARCH` and the PoC navigates there, **captures
|
||||
the actual filtered `sell-orders` API request the page fires** (so we learn the real
|
||||
filter params instead of guessing), prints it, then pages that filtered API:
|
||||
|
||||
```powershell
|
||||
$env:SEARCH="cyber security ft"; python poc.py # FT M4A4 Cyber Security only
|
||||
```
|
||||
|
||||
The `>>> DISCOVERED sell-orders API call` line shows how the search maps to API
|
||||
params — that's how the C2 will build targeted jobs.
|
||||
|
||||
Run on your own IP first (no proxy) — that's the clean A/B vs. the Selenium run.
|
||||
If auto-detect can't find a browser, set `BROWSER_PATH` to Chrome or Edge
|
||||
(`C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe`).
|
||||
|
||||
## Step 2: the pull fleet
|
||||
## The pull fleet
|
||||
|
||||
`worker.py` holds one warm nodriver session and loops: poll the .NET C2 for a job
|
||||
(a skin+wear search), scrape that search's sell-orders via in-page fetch, and post
|
||||
`csmoney_worker.py` holds one warm nodriver session and loops: poll the .NET C2 for a
|
||||
job (a skin+wear search), scrape that search's sell-orders via in-page fetch, and post
|
||||
the items back. The C2 (`BlueLaminate.C2`) picks the stalest skin+wear from the
|
||||
catalogue, and on result persists to `cs_money_listings` + `price_history`
|
||||
(`Source = "csmoney"`), stamping `SkinCondition.ListingsSweptAt`.
|
||||
(`Source = "csmoney"`), stamping that band's per-site checkpoint (the `csmoney`
|
||||
row in `skin_condition_sweeps`). The checkpoint is per-site, so a band CSFloat
|
||||
already swept is still due for a cs.money sweep.
|
||||
|
||||
Run the C2 (needs Postgres migrated), then the worker:
|
||||
|
||||
@@ -65,8 +45,64 @@ dotnet run --project BlueLaminate\BlueLaminate.C2 # serves http://local
|
||||
# terminal 2 — the worker
|
||||
cd worker; .venv\Scripts\Activate.ps1
|
||||
$env:WORKER_TOKEN="dev-worker-token" # must match the C2's WorkerToken
|
||||
python worker.py
|
||||
python csmoney_worker.py
|
||||
```
|
||||
|
||||
The worker warms the session (you clear Cloudflare once), then runs continuously.
|
||||
Scale out by starting more workers (each with its own `PROXY`).
|
||||
|
||||
## Layout
|
||||
|
||||
Both market scripts are thin: each subclasses `blworker.Worker` and fills in only its
|
||||
own scrape + cookie-consent steps. Everything shared lives in the `blworker/` package:
|
||||
|
||||
| file | responsibility |
|
||||
| --- | --- |
|
||||
| `blworker/config.py` | `Settings` — every env knob, parsed once |
|
||||
| `blworker/log.py` | stdout logging, human or `LOG_JSON=1` (for Loki) |
|
||||
| `blworker/proxy.py` | IPRoyal forwarder + session/password helpers |
|
||||
| `blworker/c2.py` | `C2Client` — claim a job, post a result |
|
||||
| `blworker/runtime.py` | `Worker` base: proxy/browser bring-up, the poll→scrape→post loop, Cloudflare IP rotation, graceful shutdown |
|
||||
| `csmoney_worker.py` / `skinland_worker.py` | the per-market scrape strategies |
|
||||
|
||||
To add a market: subclass `Worker`, set `name`/`jobs_path`/`default_market_url`, implement
|
||||
`scrape_job` + `describe_job` (+ `dismiss_consent` if it has a banner), and call
|
||||
`run(YourWorker)`.
|
||||
|
||||
## skin.land worker
|
||||
|
||||
`skinland_worker.py` is the same pull model for **skin.land** (also Cloudflare-walled). It
|
||||
shares all the proxy/Cloudflare/C2 plumbing with the cs.money worker via `blworker`; only
|
||||
the scrape differs. The C2 hands out jobs from its **`/skinland/jobs`** group (the
|
||||
`skinland` rows in `skin_condition_sweeps`, so a band cs.money/CSFloat already swept is
|
||||
still due here) and on result persists to `skin_land_listings` + `price_history`
|
||||
(`Source = "skinland"`).
|
||||
|
||||
How it scrapes (learned during discovery):
|
||||
|
||||
- A job's target is the market **page URL**, e.g.
|
||||
`https://skin.land/market/csgo/ak-47-redline-field-tested/`. The slug is just
|
||||
`{weapon}-{skin}-{wear}` kebab-cased — the C2 builds it from the catalogue, no lookup.
|
||||
- skin.land is a Nuxt SSR app. The page embeds an internal numeric `skin_id`; the worker
|
||||
resolves it once from the `__NUXT__` payload (the skin object whose `url` == the slug),
|
||||
caches it per slug, then pages the clean JSON API
|
||||
`GET https://app.skin.land/api/v2/obtained-skins?skin_id={id}&page={n}` (a Laravel
|
||||
paginator `{data:[…offers], meta:{current_page,last_page,…}}`), walking to `last_page`.
|
||||
- Each offer carries a full-precision `item_float`, `final_withdrawal_price`, and the steam
|
||||
`item_link`. skin.land exposes **no paint seed**, so listings aren't fingerprinted to a
|
||||
`SkinInstance` (no cross-market roll-up / dupe detection here). StatTrak and Souvenir are
|
||||
separate pages (`stattrak-`/`souvenir-` slugs); v1 sweeps the base page per skin+wear.
|
||||
|
||||
Run it alongside (or instead of) the cs.money worker — it points at the same C2:
|
||||
|
||||
```powershell
|
||||
cd worker; .venv\Scripts\Activate.ps1
|
||||
$env:WORKER_TOKEN="dev-worker-token"
|
||||
python skinland_worker.py
|
||||
```
|
||||
|
||||
Under Docker it's the `skinland-worker` service (same image, `WORKER_SCRIPT=skinland_worker.py`):
|
||||
|
||||
```powershell
|
||||
docker compose up --build --scale skinland-worker=5
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user