Style Is Explicit, Not Statistical | Baseline Studio

Style is articulable, not statistical. A photographer can tell you why they chose a frame. A musician knows the rhythm they reach for. A writer can name what makes a phrase theirs. The thing creators want AI to capture, they already know how to describe.

So why does every attempt at AI in your style start with fine-tuning a model on a hundred examples? You don't need a model to discover what you can already name. You need a place to write it down.

Fine-tuning, in practice

The case for fine-tuning is that style is too tacit to articulate. You feel it; you can't name it. The model has to absorb it from enough examples.

Run that argument against the actual experience of fine-tuning, though, and it falls apart fast.

You're spending hours and hundreds of dollars to encode something into model weights you'll never inspect. If the result is wrong, you can't ask the model what it learned. You can only retrain. The model is locked to one platform; if a better one comes out next month, none of your work transfers. Your hundred images of taste-defining work just paid for a black box you don't own and can't edit.

That's a strange trade for a creator to accept. It only makes sense if style is genuinely opaque, even to the person making the work. And mostly, it isn't.

What happens when you sit down and write it out

Earlier this year I ran the experiment on myself. I worked through a seven-question intake on my own photography. Why I make images. How I shoot. What I look for. What technical choices matter. What I want viewers to feel. What my non-negotiables are. What patterns I notice in my own work.

The answers weren't surprises. They were the things I'd been saying to myself for years. Layers. People. Shadows. Warm tones. Eye contact. NYC specificity. Decisive moments.

Then I analyzed about a hundred of my own images against those answers. The patterns the analysis surfaced lined up with what I'd already named. Documentary honesty. Layered composition. Human-centered framing. A specific lens character. Defined shadow structure. Genuine connection. NYC context. Peak-moment timing.

Eight rules. Each one had a name, a definition, a way to evaluate it (does this image express the rule, scored zero to one), and a way to instruct it (what to put in the prompt to make a model produce work that follows it). All of it lived in a single markdown file.

That file did the job people are using fine-tuning to do.

What an explicit rule looks like

The shape of a rule is small.

A header with the rule's name and a weight, how much it counts in the combined score.

A short definition, what this rule actually means.

A VALIDATION block, the questions you'd ask of a generated image to score it against the rule, plus a 0 to 1 rubric.

A GENERATION block, the keywords, emphases, and structural notes that go into the prompt at generation time.

That's it. Eight of those, in one file, were enough to encode a personal photography style well enough that any image model could be guided to produce work consistent with it.

Nothing about that pattern is photography-specific. The same eight-dimension structure applies to writing, music, art, graphic design, anything where style is real but undocumented. The rule names change. The block format doesn't.

Runtime, not training

Once the rules exist, the rest is plumbing.

At generation time, the rules synthesize into the prompt. The base prompt comes from the user. The rule blocks add the structural and stylistic guidance. The combined prompt goes to whichever generator is appropriate (image, text, audio).

The output is then scored against the same rules by a model with the right modality (vision for images, text for prose). Each rule produces a score. The weighted combination produces a single number. If it clears the threshold, ship. If not, the worker retries with the prompt adjusted toward the rules that scored low. A soft retry, not a regeneration from scratch.

There's no training step. There's no model state to maintain. The runtime is the API call.

Every part of this is editable. If a rule is too strict, change a number. If a definition is wrong, rewrite it. If you want to swap the image model, change one line of config. If you want to use a different vision model to score, change another line. None of those edits cost a retraining cycle. They cost the time it takes to save a file.

Legibility, not cost

The cost story is real. A fine-tuning run is hundreds of dollars; an analysis run is a few. The time story is real. Hours versus minutes for a meaningful change. The portability story is real. Rules transfer across providers; weights don't.

But the part that actually matters for creative work is that the system is legible.

If a generated image doesn't feel right, the score breaks down by rule. You can see which dimension failed. You can read the rule that failed. You can read the prompt that was synthesized. You can read the output that was scored. Every step is a document.

Compare that to the experience of a fine-tuned model producing something off. There's nothing to read. You retrain.

Legibility compounds. Once your style is written down, you can edit it deliberately. You can show it to a collaborator and discuss it. You can fork it for a side project that has different priorities. You can hand it to a teammate. None of that is possible when your style lives inside model weights.

From one project to a framework

The photography work was the proof. After it shipped, the structure was clearly the actual artifact, not the rules themselves.

I extracted it: an open-source framework for building this kind of classifier in any creative domain. Intake scaffolds for photography, art, graphic design, writing, and music. A reference Python worker, around 540 lines, that reads a domain markdown file, generates, scores, retries. The full pipeline that the personal experiment used, generalized so anyone can plug their own domain in.

That extraction is now Runtime Classifier. Free, MIT, on GitHub.

The point of the framework isn't that I want everyone making the same kind of work I do. The point is that the eight-dimension structure (composition, subject, texture, palette, connection, context, decisiveness, authenticity) translates across creative domains. The rule names change per domain. The structure stays.

A musician can answer the same seven intake questions, analyze their own catalog, and end up with eight rules that capture how they actually make music. A writer can do the same with their archive. The pipeline is identical; only the domain content changes.

Where this leaves fine-tuning

Fine-tuning will keep being the right tool for some things. Building world knowledge into a base model. Teaching a model a new modality. Training for tasks where the desired behavior really is statistical and hard to articulate. That isn't what most creators are reaching for it for. Most creators are reaching for it to encode a style they already know how to describe. They just haven't been given a place to write it down.

The place to write it down is a markdown file. The place to apply it is at runtime. The place to change it is wherever you keep your text editor open.

That's the whole argument.

If you want to see what a rule actually looks like, the worked example ships with Runtime Classifier. One file, eight rules, around 540 lines of Python that reads it. Fork it for your domain.