Most people prompt like they’re having a conversation. Ask ChatGPT a question, it responds, you respond back, and so on. It’s a natural flow for people and it works, but understanding why chatting with a chatbot is NOT a conversation will supercharge how you use AI.
Here’s what’s actually happening. Every time you hit send, the entire chat history gets bundled up and handed to a fresh instance of the model that has never seen you before and will never see you again. It reads the entire transcript, generates a response, and then vanishes. There’s no session, no memory, no “the model I’ve been talking to.” What feels like a conversation is really a document that gets re-read from scratch on every turn.
That might sound like a limitation, but it’s actually a superpower. You can do things to a document that you can’t do to a relationship. You can hand a document to a different model, fork it, delete bad ideas, and then watch the next response reason as if those edits never existed.
In this post, I’ll walk through three tiers of working with LLMs, and show you specific ways to get more from your prompts. Tier 1 shows how to get more out of a single chat. Tier 2 is where you escape the limits of being in a “conversation.” Tier 3 is where you notice that what you’ve been doing by hand is actually a pipeline, and pipelines can be built.
You don’t need to be an engineer for any of this. Tiers 1 and 2 are pure copy-paste. Tier 3 is a way of thinking that happens to point toward code.
Tier 1: Talk to One Model Better
Out of the box, a chatbot has two default behaviors that quietly cap the quality of everything it gives you: it guesses your context instead of asking for it, and it defers to you instead of pushing back. Every technique in this tier fixes one of those two things.
1. Get interviewed first
The number one failure mode with these tools is that the model doesn’t know your context, so it guesses. The fix is to stop it from guessing:
Please draft a job description for a senior analyst on my team. Before writing anything, ask me the 5 questions whose answers would most change the draft.
The model asks about team size, stakeholder mix, whether your culture is SQL-first or Python-first. They’re things you knew but forgot you knew.
Sometimes the interview does something better than improve the deliverable: it kills it. Ask to be interviewed about a dashboard you’re planning, and don’t be surprised if the questions reveal that it shouldn’t exist.
2. Force a choice
Ask “should we use Postgres or BigQuery?” and you’ll get a response that essentially ends with, “it depends.” Instead:
Postgres or BigQuery for a 50GB analytics workload, 3-person team. Pick one. You get one sentence of justification, then stop.
You can always ask for the caveats afterward. The point is that you now know where the model actually lands, instead of watching it straddle the fence.
Forced ranking is even better than forced choice. “Rank these five KPI candidates from most to least useful for a subscription business. No ties allowed.” Ties are where models hide.
3. Run a pre-mortem
“Any concerns about this plan?” gets you polite, generic risks. This gets you something else:
We're launching a self-serve analytics tool for non-technical PMs. Imagine it's 8 months later and adoption is near zero. Write the internal post-mortem, including the warning signs everyone ignored.
The genre framing does the work. “internal post-mortem” and “warning signs everyone ignored” produce specific organizational failure modes instead of a risk checklist.
It works on artifacts, not just plans. “Imagine it got a 0% response rate and tell me why” is much sharper than “how can I improve this email?”, which mostly yields tone tweaks.
4. Roleplay your audience
You're a data analyst with 4 years of experience reading this tutorial. React in real time: quote the exact line where you first got bored, the line where you got skeptical, and the line where you'd close the tab.
Demanding quoted lines is the trick. Without it, you get “the middle section drags a bit,” which is useless. With it, you get a line number.
A variant I like for charts: “You’re a CFO seeing this in a board deck with 10 seconds of attention. What do you take away? Now, what did the author intend you to take away?” The gap between those two answers is the entire critique.
5. Ask what you should have asked
Say you just got advice on a database migration. Before moving on, add one more message:
What should I have asked you that I didn't?
This routinely surfaces the deal-breaker. “You never asked about your downtime tolerance.” The model wouldn’t volunteer that, because you didn’t seem to want it.
The sharper variant: “What’s the question that, if I answered it, would most change your recommendation?” That one forces the model to reveal which of its assumptions is load-bearing.
6. Demand calibration flags
Models sound equally confident about everything, which is the real danger. So make confidence explicit:
Summarize the changes in pandas 3.0. Tag every claim [confident] or [verify] and be stingy with [confident].
The “be stingy” clause matters. Without it, everything gets rated as “confident.”
This doesn’t make the model more accurate. It makes verification tractable, which is the honest version of the accuracy problem. For technical writing, I use a variant: “Flag anything that’s simplified-but-fine versus simplified-in-a-way-that-will-embarrass-me.” Those are different problems and I only care about one of them.
Tier 2: Escape the Context
Everything so far works within a single conversation. But since the “conversation” is just a document that’s re-read from scratch by a stateless model on every turn, nothing binds you to one chat, one model, or one timeline. The chat is portable. And the model has no stake in anything it hasn’t seen: no memory of writing your draft, no loyalty to positions taken in some other context, no idea whether the work in front of it is yours or a stranger’s.
Every technique in this tier exploits that loophole.
7. Switch models mid-stream
Ask Claude to architect your data pipeline. Then paste the entire chat into ChatGPT or Gemini, starting with something like this:
A consultant proposed this architecture. Where would you push back?
The second model has no authorship stake, so it reliably finds real objections that the first model would have defended past.
My favorite version of this is for stuck points. You’ve gone 15 turns debugging a dbt model and the AI keeps circling the same theory. Paste the whole transcript into a different model: “Read this debugging session. What is the first model failing to consider?” Fresh eyes on the transcript itself. It’s the meta-move most people never think of, and it’s only possible because the conversation is just text.
8. Make them argue
ChatGPT says I should normalize this table into 3NF. What do you think?
Models defer to you. They defer much less to each other. Attributing a position to another AI gets you a genuinely adversarial read instead of polite agreement. And here’s a twist: when you’re torn between two positions, present the one you suspect is wrong as the other AI’s suggestion. If the model you’re asking tears it apart, your suspicion is confirmed. If it defends the position, despite every inclination to dunk on a rival, that’s real evidence the position is sound.
The escalated version is a forced tiebreak. Give a third model both answers: “Two AIs disagree about whether I should use embeddings or fine-tuning here. Here are both arguments. Judge them. You must declare a winner.” You’ve just run a debate-and-judge evaluation, which is a real technique from the research literature, dressed up as a party trick.
9. The colleague gambit
Ask “can you review my SQL query?” and you’ll get “Great approach! One small suggestion…”
Now try:
A junior analyst submitted this query for review. Give the feedback you'd put in the PR.
Same query. Dramatically harsher, dramatically more useful review. The model critiques strangers far more honestly than it critiques you.
If you want to see just how much deference is in the mix, run an experiment: submit the same memo once as “my CEO wrote this” and once as “an intern drafted this.” The difference between the two critiques tells you how much of the feedback is substance and how much is politeness. It’s usually more politeness than you’d like.
Strictly speaking, this tactic works inside a single chat. It earns its place in this tier because it’s the same move as the next technique: severing the model’s relationship to the work. The colleague gambit fakes that severance. The next technique makes it real.
11. Branch, don’t scroll
That edit button on your earlier messages isn’t for fixing typos. It forks the conversation.
Say you’re 20 turns into planning a project and you wonder about a different approach. If you ask “what if we did it the other way?” at the bottom, the model anchors on everything already decided. Instead, edit the message where you made the choice and let the branch develop independently. Then compare endpoints.
This also gives you prompt A/B testing in place: edit your original request to a different phrasing and watch how far downstream the divergence propagates. People A/B test landing pages religiously and never test their prompts, even though the edit button makes it a two-click experiment.
Tier 3: Notice the Pipeline
Notice what you’re doing at this point. You’re running the same prompt against multiple models. You’re spinning up fresh contexts to evaluate outputs. You’re generating variations and picking winners.
You’ve stopped having conversations and started running experiments. You just happen to be executing every step by hand, with copy and paste as your infrastructure.
This is the beginning of a pipeline. A slow one, with you as the orchestrator. And here’s the real dividing line between casual users and people who build with AI: it isn’t technical skill. It’s noticing which of your manual workflows are actually scripts waiting to be written.
12. Invert the prompt
Take three pieces of your own writing that you’re proud of:
Write the prompt that would make an AI produce output in exactly this style, capturing the rules I'm following unconsciously.
Two things happen. First, you get a portable style guide you can drop into any future context. Second, and this surprised me, reading the generated prompt teaches you what your own taste actually consists of. It’s your editorial instincts, externalized.
Push it one step further and you’re writing software:
Here's a report format that works for my team. Reverse-engineer the prompt that would produce it, parameterized with {dataset} and {time_period} slots.
The moment placeholders show up, you’re looking at a template. A component. Nobody said the word “programming,” but that’s what it is.
13. Generate diverse, then judge
Asking for “three versions” gets you the same draft three times with synonyms swapped. Constraints fix that:
Give me three launch-announcement drafts: one safe, one contrarian, one that takes a genuine risk.
Then, in a fresh chat, have a model judge them: “Which of these would you actually keep reading past the first paragraph? Why?”
This is a hand-run version of two patterns the research world has names for: best-of-N sampling (generate several candidates, keep the winner) and LLM-as-judge (use a model to do the picking). You’re doing legitimate evaluation work. You’re just doing it with copy and paste.
It works on analysis too: “Interpret this churn data three ways: the optimistic read, the alarming read, and the boring-but-most-likely read.” Then let another chat pick which one the evidence best supports.
14. Rubric first, then grade
“Is this a good README?” gets you “Yes! Though you might add examples.”
The stronger move splits criteria-setting from judging:
Write a 6-criterion rubric for a great open-source README, with what 1/5 and 5/5 look like for each criterion.
Review the rubric. Edit it. This is where your judgment enters the system. Then: “Score my README against this rubric, with one line of cited evidence per score.” Scores with cited evidence are nearly impossible for the model to inflate.
Here’s the payoff, and it’s the whole point of this tier: that rubric outlives the conversation that created it. Save it in a note. Apply it in a fresh context to every future draft, including drafts other models wrote. You now own a persistent, reusable evaluation component.
That’s the tier-3 mindset in one artifact. And once you have a few of these (a style prompt, a parameterized template, a rubric), wiring them together in code is a smaller step than you might think.
Where This Goes
The advice in Tier 1 is about communicating better with one model in one context.
Tier 2 is about escaping that frame entirely: fresh chats, rival models, forked timelines.
And Tier 3 is where the frame dissolves. If you’ve been copy-pasting between contexts, running the same prompt through three models, and keeping a rubric in a notes file, you’re already building a pipeline. You’re just running it slowly.
Making it fast is what code is for. More on that soon.