PBN Research Lab

Hypothesis

I can compare paint-by-number style pipelines (A/B/C) with repeatable metrics, optional human rubric scores, and logged runs on disk, so improvement is data-shaped instead of eyeballing the latest PNG folder.

Why This Matters to Me

This is lab tooling: judgment-heavy image work benefits from a tight loop (run → see variants → score → log). The goal is to learn what “good PBN” means before any product wrapper exists.

Who It's For

Me (or anyone cloning the repo) running local experiments on my images and my run logs. Not for a public SaaS, automated hosting, or customer support—those stay out of scope in the PRD.

What It Does

• Local web UI plus CLI to run pipelines, view previews, and read auto metrics
• Human sliders (e.g. subject clarity, paintability) blended with auto scores per configured weights
• One-click logging to `assets/output/` (runs, manifests) for traceability
• Sweeps for budgeted search when the UI is not enough

Existing Options

Product / approach	Price	User Base	Strength	Limitation
Ad hoc scripts / notebooks	$0	universal	Flexible	Drift; hard to compare runs
ML experiment trackers	Varies	teams	Strong for model metrics	Overkill; may not map to PBN human judgments
Consumer PBN apps	Free–$	large; unknown	Productized output	Opaque; not a research surface

Gap: a local-first loop tuned to PBN—variants, side-by-sides, logged judgment—in one place.