Most AI Content Workflows Are Built to Move Fast. Mine is Built to Last.
How I developed a Systems Thinking Framework for content engineering, and why the sentence test is the most useful thing in it.
Here’s the problem with most AI content workflows: they’re optimized for the first use.
Someone finds a prompt that works. They save it in a doc. They share it with the team. Three weeks later, half the team is using a slightly different version, nobody remembers why the original worked, and the outputs have drifted so far from what you started with that you’re essentially starting over. The workflow didn’t fail because AI is unreliable. It failed because it was never really a system — it was a shortcut that got promoted to infrastructure before it was ready.
I’ve spent the past few months building AI tools for content production, and I’ve made this mistake enough times to recognize it early now. The temptation to jump straight to the tool and ask, “What can I do in Claude? What can I build in AirOps?,” is real—and it’s almost always wrong. Not because the tools are bad, but because you end up building solutions to the wrong problems, in the wrong tools, with no clear definition of what “good” actually looks like.
That frustration is what pushed me toward a more structured approach. Through a lot of iteration and conversations where Claude was my thinking partner, I developed what I now call the Systems Thinking Framework for Content Engineering. It’s the framework I now use before starting any new build. Here’s how it works, and why I think the most useful thing in it is a single sentence.
Stage 1: Start with the problem. Always.
The framework has two stages. Stage one is problem identification, and it’s the stage most people skip.
Before touching a tool, I ask four questions:
1. Is this a one-time task or a recurring pattern?
If someone on the team does it more than once, it’s a candidate for systemization. If it only ever happens once, it’s probably not worth building for.
2. Where does the manual work live?
Copy-paste. Reformatting. Switching between tools. Re-explaining context. These are the signals. Anywhere a human is acting as a bridge between two things that could be connected directly is a system waiting to be built.
3. Who else has this problem?
If it’s only you, it’s a personal workflow. If it’s the whole team, it’s an infrastructure problem worth prioritizing.
4. What does “good enough to hand off” look like?
A system isn’t useful if only you can operate it. The test is whether someone else could run it without asking you questions. If they’d need you, it’s still a process. Not a system.
That last question is the one that changes how you build. It forces you to think about documentation, repeatability, and edge cases before you’ve written a single prompt, which is exactly when you should be thinking about them.
Anywhere a human is acting as a bridge between two things that could be directly connected is a system waiting to be built.
The sentence test
At the end of stage one, before I move into building anything, I do a single check. I try to finish this sentence:
“This system takes [standardized input] and reliably produces [standardized output] without requiring [specific manual step] every time.”
If I can finish it clearly, I have a system worth building. If I can’t — if the blanks are vague, or if the manual step I’m trying to eliminate is actually doing something important — I’m still in problem identification mode. I don’t move on until I can fill it in.
It sounds simple, but don’t be fooled. It’s surprisingly hard. Most half-formed build ideas collapse at this sentence because they either don’t have a standardized input yet, or the “output” is still fuzzy, or the manual step I’m removing is actually judgment in disguise. The sentence test doesn’t let you paper over those gaps.
Stage 2: Build in layers
Once the problem is clearly defined, stage two is building the solution — in four layers, in order.
Limitation spawns creativity.
The layer order matters. Most people start at layer four: they pick a tool and build around it. The framework inverts that. You don’t touch a tool until you know exactly what input it’s receiving, what transformation it needs to perform, and what output it’s expected to produce. By the time you get to tool selection, the build is mostly already designed.
What it looks like in practice
The most recent build I’ve mapped with this framework is a Client Agent in AirOps. It’s a modular AI content production tool designed to generate on-brand, accurate content for a specific client without requiring constant human re-explanation of context.
Stage one was quick. The manual work was obvious: every time a writer started a new piece, they had to re-load the client’s brand voice, re-explain the product details, and reconstruct the quality bar from scratch. It was recurring, it was widespread across the team, and the handoff test was clear: the system needed to run without the writer holding it in their head.
Stage two is where it got interesting. The solution ended up with three context layers, and the distinction between them is the part I’m most glad I thought through before I started building:
Core context holds the things that rarely change: the ICP definition, the style guide, voice and tone guidance, a banned words list, and approved content examples. This layer is maintained manually and only updated when the client makes a significant brand shift.
Product accuracy context is stored separately from the core context on purpose. Product details need to be treated as ground truth the system references, not style guidance that influences tone. Mixing them creates a system that sounds right but says the wrong things. Keeping them separate keeps the outputs both on-brand and accurate.
Learned context starts empty. It gets populated over time as real production runs surface corrections and patterns that weren’t in the original build. But—and this is the part that prevents drift—nothing moves into learned context without going through a validation step first. A correction happens, gets flagged for review, and only gets promoted to permanent context after a human decides it’s a genuine pattern worth keeping.
The sentence test for building the learned context: “This system takes a content brief and reliably produces a first draft that matches the client’s brand voice and product accuracy standards without requiring the writer to re-explain client context every time.” Clear input. Clear output. Clear manual step removed. Build approved.
The point isn’t to automate everything
The framework didn’t come from a desire to remove humans from content production. It came from a specific frustration: watching good AI tools get used badly because nobody had thought clearly about what problem they were actually solving.
The sentence test isn’t a filter for whether to use AI. It’s a filter for whether you have a real system or just a fast shortcut dressed up as one. There’s a difference. The shortcut works until it doesn’t. The system keeps working after you’ve stopped paying attention to it.
That’s the only kind worth building.
I’m continuing to develop and document this framework as I build. If you’re working through similar problems in your content workflows or have questions about any of the layers, I’d love to hear what you’re up to.
