Most people treat prompts like search queries — type something in, hope for the best, and re-roll until the output looks decent. That's not engineering. That's gambling.
At MavenX, we've shipped dozens of AI-powered products. Every one of them depends on prompts that perform reliably, at scale, across thousands of calls. We can't afford inconsistency. And after building this muscle across dozens of products, we've developed a framework that consistently delivers 10x better results.
Here's the playbook.
The Problem With How Most People Prompt
The average prompt is vague, unstructured, and gives the model way too much freedom. You type something like Write me a blog post about AI and wonder why the output reads like a generic LinkedIn thought piece from 2022.
The issue isn't the model — it's the instruction. LLMs are incredibly capable, but they need precision. They need context. They need constraints. And most importantly, they need a clear picture of what "good" looks like.
The MavenX Prompt Architecture
Every prompt we write at MavenX follows a five-layer architecture. We call it RCCOE — and it works across every model we've tested (GPT-4, Claude, Gemini, open-source).
1. Role
Tell the model who it is. Not just "you are a helpful assistant" — give it a specific persona with relevant expertise. The more specific the role, the more focused the output.
You are a senior content strategist with 10 years of experience in B2B SaaS marketing. You specialize in long-form SEO content that drives organic traffic.
2. Context
Provide the background information the model needs to do its job. This includes your audience, your goals, relevant data, and any constraints on the subject matter. Don't assume the model knows your situation — spell it out.
3. Constraints
This is where most prompts fall apart. Constraints are the guardrails that prevent generic output. They include:
- Tone and voice — "Write in a direct, no-fluff tone. No buzzwords."
- Length — "Keep the response under 300 words."
- What to avoid — "Do not include generic AI hype or vague platitudes."
- Specificity requirements — "Include at least 3 concrete examples."
4. Output Format
Tell the model exactly how you want the output structured. Bullet points? Numbered list? JSON? Markdown with headers? The clearer your format specification, the less post-processing you'll need.
5. Examples
This is the secret weapon. Providing 1–3 examples of what "good" looks like gives the model a concrete target. Few-shot prompting consistently outperforms zero-shot, especially for tasks where quality and style matter.
Pro Tip
We keep a shared library of example outputs organized by use case. When building a new prompt, we pull from this library rather than writing examples from scratch. This saves time and ensures consistency across our products.
The Iterative Refinement Loop
No prompt is perfect on the first try. The real skill isn't writing the initial prompt — it's knowing how to refine it. We use a three-step refinement loop:
- Generate — Run the prompt and evaluate the output against your quality criteria.
- Diagnose — Identify what's wrong. Is it too generic? Missing nuance? Wrong tone? Each failure mode has a specific fix.
- Adjust — Modify the prompt layer that's causing the issue. If the output is too generic, tighten your constraints. If it's off-topic, improve your context. If the format is wrong, be more explicit about structure.
We typically reach a production-quality prompt within 3–5 iterations. The key is being systematic about it rather than randomly tweaking words.
Model-Specific Adjustments
Not all models respond the same way to the same prompt. Here's what we've learned:
- GPT-4 — Responds well to detailed system prompts and follows complex multi-step instructions reliably. Tends to be verbose; use length constraints aggressively.
- Claude — Excels at nuanced, thoughtful outputs. Better at following "spirit of the instruction" rather than just letter. Great for tasks requiring judgment and careful reasoning.
- Gemini — Strong at multimodal tasks and data analysis. Needs more explicit formatting instructions than GPT-4 or Claude.
- Open-source models — Generally need simpler prompt structures. Break complex tasks into smaller steps. Few-shot examples become even more important.
The Quality Scoring Rubric
We don't just eyeball our outputs — we score them. Every prompt we put into production gets evaluated against four criteria:
- Accuracy — Is the information correct and relevant?
- Consistency — Does it produce similar quality across multiple runs?
- Tone — Does it match the intended voice and audience?
- Usefulness — Can the output be used as-is, or does it need heavy editing?
Each criterion gets scored 1–5. A prompt needs to average 4+ across all criteria before it goes into production. Anything below that goes back through the refinement loop.
Start Using This Today
You don't need to overhaul your entire workflow. Start with one change: next time you write a prompt, structure it using the RCCOE framework. Role, Context, Constraints, Output format, Examples. That single change will dramatically improve your results.
And if you want the complete system — including 40+ templates, the full scoring rubric, and model-specific optimization guides — check out our LLM Prompting Masterclass SOP.