Encode style. Skip the fine-tune.
A framework for encoding creative style as explicit, runtime-applied rules, instead of training a model to learn it implicitly.
Cheaper than fine-tuning. Transparent enough to read and edit. Works with any image, text, or audio model. The same pattern translates across creative domains.
Style is explicit. Encode it that way.
Creators can articulate why they chose a frame, a color, a moment. That language is structure. Write it down and the model doesn't need to learn it from scratch.
- Hours of training, $100s of compute.
- Opaque: no insight into what the model learned.
- Brittle: often loses style on novel prompts.
- Locked to the model you trained.
- No training. Rules live in plain markdown.
- Transparent: read, edit, ship in minutes.
- Resilient: same rules work on novel prompts.
- Model-agnostic: swap providers freely.
Four steps. No model training.
Intake.
Seven questions anchor the work in the creator's own words. What you're trying to say, who it's for, what makes it yours.
Portfolio analysis.
Roughly 100 representative items get tagged across the 8 dimensions. Real patterns surface. Not what you say you do, what you actually do.
Synthesis.
Patterns become 6–10 explicit rules with VALIDATION + GENERATION blocks. Each rule earns its place against your real work.
Runtime.
At generation time, rules synthesize into the prompt. Outputs score against the same rules. Failures soft-retry with adjusted prompts.
A framework for creative style.
Eight dimensions that translate across photography, art, music, writing, and graphic design. The names change per domain. The structure doesn't.
Authenticity
Composition
Subject Focus
Texture
Palette
Connection
Context
Decisiveness
Three commands. No retraining.
Clone the repo, set your API key, run the bundled example to verify the pipeline end to end. The Python worker (~540 lines, one runtime dependency) reads a classifier file, generates an image, and scores it against the rules.
Edit the markdown, rerun, and behavior changes immediately.
Classifier
photography.mdRules, weights, and defaults in one markdown file. The source of truth. Change a rule, behavior changes on the next run.
Generator
gpt-image-2 (default)Creates the output. Swap for any prompt-driven model that fits your domain: image, text, audio, or video. Set in the classifier's defaults; override per-run with env vars.
Evaluator
gpt-4o (default)Scores the output against the rules. Swap for any model that matches your output modality: vision for images, text for prose, audio for music. Same override pattern.
The worker reads the classifier, synthesizes prompts using each rule's GENERATION block, calls the generator, scores the output with the evaluator, and retries with adjusted prompts when the combined score falls below the threshold.
The worker is just Python. Wrap it in whatever surface fits the way you work.
Direct invocation. Run the worker with a prompt and a batch name; output lands in output/generation/ with the generated files plus a scores.json breakdown. Lowest friction. The right surface for one-off runs, prompt experiments, and iterating on rules between batches.
Open the repo in Claude Code and let it drive. It can read photography.md, run the worker via Bash, inspect scores.json, and propose rule edits when scores miss. The worker becomes a tool Claude calls; you stay in conversation while it iterates.
Wrap the worker as a skill. A SKILL.md definition points Hermes at the classifier and the script; calls flow through Hermes' tool layer like any other skill, and generated outputs come back into the agent's working set.
VS Code, Cursor, JetBrains, anything that can run a Python file. The repo is plain files with one runtime dependency. No plugins, no service hooks, no build step.
Bring your own domain.
The repo ships intake scaffolds for five common domains. Pick one, answer the seven questions, analyze your portfolio, and synthesize your [domain].md. For non-image domains, write a thin worker against a domain-appropriate generator. The bundled image worker is a 540-line reference you can adapt.
Not fine-tuning.
No model training, no GPU rental. The runtime is the API call.
Not a hosted service.
Self-hosted, local-first.
Not a vendor SDK.
The example uses OpenAI; the framework works with any generator that takes a prompt.
Not a chat agent.
A generation + scoring pipeline. Wrap it in an agent if you want.
Read the thesis. Fork the repo.
Open source, MIT licensed. PRs welcome, especially new domain examples.