The Conversation Isn’t Real

Most people prompt like they’re having a conversation. Ask ChatGPT a question, it responds, you respond back, and so on. It’s a natural flow for people and it works, but understanding why chatting with a chatbot is NOT a conversation will supercharge how you use AI.

Here’s what’s actually happening. Every time you hit send, the entire chat history gets bundled up and handed to a fresh instance of the model that has never seen you before and will never see you again. It reads the entire transcript, generates a response, and then vanishes. There’s no session, no memory, no “the model I’ve been talking to.” What feels like a conversation is really a document that gets re-read from scratch on every turn.

That might sound like a limitation, but it’s actually a superpower. You can do things to a document that you can’t do to a relationship. You can hand a document to a different model, fork it, delete bad ideas, and then watch the next response reason as if those edits never existed.

In this post, I’ll walk through three tiers of working with LLMs, and show you specific ways to get more from your prompts. Tier 1 shows how to get more out of a single chat. Tier 2 is where you escape the limits of being in a “conversation.” Tier 3 is where you notice that what you’ve been doing by hand in Tier 2 is actually a pipeline, and pipelines can be built.

You don’t need to be an engineer for any of this. Tiers 1 and 2 are pure copy-paste. Tier 3 is a way of thinking that happens to point toward code.

Tier 1: Talk to One Model Better

Out of the box, a chatbot has two default behaviors that quietly cap the quality of everything it gives you: it guesses your context instead of asking for it, and it defers to you instead of pushing back. Every technique in this tier fixes one of those two things.

1. Get interviewed first

The number one failure mode with these tools is that the model doesn’t know your context, so it guesses. The fix is to stop it from guessing:

Please draft a job description for a senior analyst on my team. Before writing anything, ask me the 5 questions whose answers would most change the draft.

The model asks about team size, stakeholder mix, whether your culture is SQL-first or Python-first. They’re things you knew but forgot you knew.

Sometimes the interview does something better than improve the deliverable: it kills it. Ask to be interviewed about a dashboard you’re planning, and don’t be surprised if the questions reveal that it shouldn’t even exist.

2. Force a choice

Ask “should we use Postgres or BigQuery?” and you’ll get a response that essentially ends with, “it depends.” Instead, do something like this:

Postgres or BigQuery for a 50GB analytics workload, 3-person team. Pick one. You get one sentence of justification, then stop.

You can always ask for the caveats afterward. The point is that you now know where the model actually lands, instead of watching it straddle the fence.

Forced ranking is even better than forced choice. “Rank these five KPI candidates from most to least useful for a subscription business. No ties allowed.” Ties are where models hide.

3. Run a pre-mortem

“Any concerns about this plan?” gets you polite, generic risks. This gets you something else:

We're launching a self-serve analytics tool for non-technical PMs. Imagine it's 8 months later and adoption is near zero. Write the internal post-mortem, including the warning signs everyone ignored.

The framing does the work. Phrases like “internal post-mortem” and “warning signs everyone ignored” produce specific organizational failure modes instead of a risk checklist.

It works on artifacts too, not just plans. Take a marketing email you’re about to send: “imagine it got a 0% response rate and tell me why” is much sharper than “how can I improve this email?”, which will mostly get you tone tweaks.

4. Roleplay your audience

You're a data analyst with 4 years of experience reading this tutorial. React in real time: quote the exact line where you first got bored, the line where you got skeptical, and the line where you'd close the tab.

Demanding quoted lines is the trick. Without it, you get “the middle section drags a bit,” which is useless. With it, you get a line number.

A variant I like for charts: “You’re a CFO seeing this in a board deck with 10 seconds of attention. What do you take away? Now, what did the author intend you to take away?” The gap between those two answers is the entire critique.

5. Ask what you should have asked

Say you just got advice about a database migration. Before moving on, add one more message:

What should I have asked you that I didn't?

This often surfaces the deal-breaker. “You never asked about your downtime tolerance.” The model wouldn’t volunteer that, because you didn’t seem to want it.

This one’s even better: “What’s the question that, if I answered it, would most change your recommendation?” That one forces the model to reveal which of its assumptions is load-bearing.

6. Demand calibration flags

Models sound equally confident about everything, which is a real danger. So make confidence explicit:

Summarize the changes in pandas 3.0. Tag every claim [confident] or [verify] and be stingy with [confident].

The “be stingy” clause matters. Without it, everything gets rated as “confident.”

This doesn’t make the model more accurate. It makes verification tractable, which is the honest version of the accuracy problem. For technical writing, I use a variant: “Flag anything that’s simplified-but-fine versus simplified-in-a-way-that-will-embarrass-me.” Those are different problems and I only care about one of them.

Tier 2: Escape the Context

Everything so far works within a single chat. But since the “conversation” is just a document that’s re-read from scratch by a stateless model on every turn, nothing binds you to one chat, one model, or one timeline. The chat is portable. And the model has no stake in anything it hasn’t seen: no memory of writing your draft, no loyalty to positions taken in some other chat, no idea whether the work in front of it is yours or a stranger’s.

Every technique in this tier exploits that loophole.

7. Switch models mid-stream

Ask Claude to architect your data pipeline. Then paste the entire chat into ChatGPT or Gemini, starting with something like this:

A consultant proposed this architecture. Where would you push back?

The second model has no authorship stake, so it reliably finds real objections that the first model would have defended past.

My favorite version of this is for stuck points. You’ve gone 15 turns debugging a SQL script and the AI keeps circling the same theory. Paste the whole transcript into a different model: “Read this debugging session. What is the first model failing to consider?” Fresh eyes on the transcript itself. It’s the meta-move most people never think of, and it’s only possible because the conversation is just text.

8. Make them argue

ChatGPT says I should normalize this table into 3NF. What do you think?

Models defer to you. They defer much less to each other. Attributing a position to another AI gets you a genuinely adversarial read instead of polite agreement. And here’s a twist: when you’re torn between two positions, present the one you suspect is wrong as the other AI’s suggestion. If the model you’re asking tears it apart, your suspicion is confirmed. If it defends the position, despite every inclination to dunk on a rival, that’s real evidence the position is sound.

The escalated version is a forced tiebreak. Give a third model both answers: “Two AIs disagree about whether I should use embeddings or fine-tuning here. Here are both arguments. Judge them. You must declare a winner.” You’ve just run a debate-and-judge evaluation, which is a real technique from the research literature, dressed up as a party trick.

9. The colleague gambit

Ask “can you review my SQL query?” and you’ll get “Great approach! One small suggestion…”

Now try:

A junior analyst submitted this query for review. Give the feedback you'd put in the PR.

Same query. Dramatically harsher, dramatically more useful review. The model critiques strangers far more honestly than it critiques you.

If you want to see just how much deference is in the mix, run an experiment: submit the same memo once as “my CEO wrote this” and once as “an intern drafted this.” The difference between the two critiques tells you how much of the feedback is substance and how much is politeness. It’s usually more politeness than you’d like.

Strictly speaking, this tactic works inside a single chat. It earns its place in this tier because it’s the same move as the next technique: severing the model’s relationship to the work. The colleague gambit fakes that severance. The next technique makes it real.

10. Never let the author grade its own homework

The chat that wrote your blog post will defend every paragraph of it. So don’t ask that chat. Open a fresh one:

I found this post online. Is it worth my time? Be honest about where it loses you.

The cold chat tells you the intro is 200 words too long. The warm one never would.

Same principle for fact-checking. After a long research chat, take only the final claims, stripped of the reasoning that produced them, into a fresh chat and ask for independent verification. In the original chat, errors survive because their justification is sitting right there, looking plausible. Sever the claims from the reasoning and they have to stand on their own.

11. Branch, don’t scroll

That edit button on your earlier messages isn’t for fixing typos. It forks the chat.

Say you’re 20 turns into planning a project and you wonder about a different approach. If you ask “what if we did it the other way?” at the bottom, the model anchors on everything already decided. Instead, edit the message where you made the choice and let the branch develop independently. Then compare endpoints.

This also gives you prompt A/B testing in place: edit your original request to a different phrasing and watch how far downstream the divergence propagates. People A/B test landing pages religiously and never test their prompts, even though the edit button makes it a two-click experiment.

Tier 3: Notice the Pipeline

At this point, you’re running the same prompt against multiple models. You’re spinning up fresh chats to evaluate outputs. You’re generating variations and picking winners.

You’ve stopped having conversations and started running experiments. You just happen to be executing every step by hand, with copy and paste as your infrastructure.

This is the beginning of a pipeline. It’s a slow one, with you as the orchestrator, but the real dividing line between casual users and people who build with AI isn’t technical skill. It’s noticing which of your manual workflows are actually scripts waiting to be written.

12. Invert the prompt

Take three pieces of your own writing that you’re proud of:

Write the prompt that would make an AI produce output in exactly this style, capturing the rules I'm following unconsciously.

Two things happen. First, you get a portable style guide you can drop into any future chat. Second, and this surprised me, reading the generated prompt teaches you what your own taste actually consists of. It’s your editorial instincts, externalized.

Push it one step further and you’re writing software:

Here's a report format that works for my team. Reverse-engineer the prompt that would produce it, parameterized with {dataset} and {time_period} slots.

The moment placeholders show up, you’re looking at a template. A component. Nobody said the word “programming,” but that’s what it is.

13. Generate diverse, then judge

Asking for “three versions” gets you the same draft three times with synonyms swapped. Constraints fix that:

Give me three launch-announcement drafts: one safe, one contrarian, one that takes a genuine risk.

Then, in a fresh chat, have a model judge them: “Which of these would you actually keep reading past the first paragraph? Why?”

This is a hand-run version of two patterns the research world has names for: best-of-N sampling (generate several candidates, keep the winner) and LLM-as-judge (use a model to do the picking). You’re doing legitimate evaluation work. You’re just doing it with copy and paste.

It works on analysis too: “Interpret this churn data three ways: the optimistic read, the alarming read, and the boring-but-most-likely read.” Then let another chat pick which one the evidence best supports.

14. Rubric first, then grade

“Is this a good README?” gets you “Yes! Though you might add examples.”

The stronger move splits criteria-setting from judging:

Write a 6-criterion rubric for a great open-source README, with what 1/5 and 5/5 look like for each criterion.

Review the rubric. Edit it. This is where your judgment enters the system. Then: “Score my README against this rubric, with one line of cited evidence per score.” Scores with cited evidence are nearly impossible for the model to inflate.

Here’s the payoff, and it’s the whole point of this tier: that rubric outlives the chat that created it. Save it in a note. Apply it in a fresh chat to every future draft, including drafts other models wrote. You now own a persistent, reusable evaluation component.

That’s the tier-3 mindset in one artifact. And once you have a few of these (a style prompt, a parameterized template, a rubric), wiring them together in code is a smaller step than you might think.

Where This Goes

The advice in Tier 1 is about getting more out of a single chat.

Tier 2 is about escaping that frame entirely: fresh chats, rival models, forked timelines.

And Tier 3 is where the frame dissolves. If you’ve been copy-pasting between chats, running the same prompt through three models, and keeping a rubric in a notes file, you’re already building a pipeline. You’re just running it slowly.

Making it fast and automated is what code is for. More on that soon.