Cut metered-proxy bandwidth: re-sweep floor + wire-size logging

JobQueue now skips bands swept within MinResweepHours (config, default 6h) instead of re-scraping the whole catalogue continuously — the dominant cost on the metered residential proxy. Roughly linear savings with no data loss (full pagination retained); 0 disables it. Worker logs the real compressed transferSize per job (what the proxy bills) rather than the ~6.5x-larger decompressed length, so spend is visible.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
bob
2026-05-31 15:27:37 -05:00
parent 94177f9a8c
commit 8b0eb0db78
5 changed files with 65 additions and 16 deletions

View File

@@ -19,6 +19,9 @@ services:
ConnectionStrings__SkinTracker: ${SKINTRACKER_CONN:-Host=host.docker.internal;Port=5432;Database=skintracker;Username=postgres}
WorkerToken: ${WORKER_TOKEN:-dev-worker-token}
MaxPagesPerJob: ${MAX_PAGES_PER_JOB:-60}
# Re-sweep floor (hours): skip bands swept more recently than this. The big lever
# for metered-proxy bandwidth — fewer redundant re-pulls. 0 = continuous re-sweep.
MinResweepHours: ${MIN_RESWEEP_HOURS:-6}
ports:
- "5080:5080"
extra_hosts: