Mentor Mode: How Nine Lines of Markdown Taught Me More About Leadership Than a Decade of Building Companies
Someone I deeply respect once told me something that took years to fully land. He said: surround yourself with people who have different opinions than yours but are intellectually at your level, because your worst enemy is the room that agrees with you.
I nodded at the time, the way you nod at advice that sounds wise but hasn't hit your nervous system yet. I thought I understood it. I didn't. Not until I started building with AI agents and realized that the most advanced intelligence system on the planet was doing to me exactly what bad employees, bad advisors, and bad friends do — telling me what I wanted to hear.
And the scary part? There's actual science behind why this happens. It's not a quirk. It's structural. And if you're building anything serious with AI coding agents, you need to understand this before it quietly ruins your project.
Let me explain.
The Yes-Man I Paid $20 a Month For
About three months into using Claude Code daily — and I mean daily, five, six hours a day, possessed like I'm seeing CLI agents in my dreams — I started noticing a pattern. I'd throw out a technical idea, something like "I'm thinking we should use WebSockets for all communication between the agents," and Claude Code would come back with something like: great choice, WebSockets are excellent for real-time agent communication, here's how we can implement that...
And off I'd go. Building. Three days later I've got an entire WebSocket mesh between four agents that run sequentially, not concurrently. I never needed WebSockets. A simple queue or even function calls would have been simpler, more reliable, and ten times easier to debug. But the agent said "great choice" and I trusted it — because why wouldn't you? It's the expert in the room.
Except it wasn't being an expert. It was being agreeable.
And here's where it gets dangerous, where it really starts to ruin projects — it's not the single "yes" that kills you. It's the compounding. Prompt one: "Redis for caching?" — "Excellent choice." Prompt two: "Let's add Redis pub/sub for events too?" — "That makes perfect sense given your Redis setup." Prompt three: you've got Redis as a single point of failure doing three different jobs it was never designed to do simultaneously, and every decision along the way felt validated. You kind of see the problem here, don't you?
The agent doesn't just agree once. It builds on its own agreement, reinforcing a path that nobody ever stress-tested. A couple of prompts later, it can ruin a project.
This Has a Name, and It's Worse Than You Think
This pattern is called sycophancy, and it's not anecdotal — it's one of the most studied problems in AI alignment right now. Anthropic's own researchers published a foundational paper showing that all major AI assistants consistently exhibit this behavior. It's not specific to one model or one company. It's structural, baked into how these systems learn.
Here's how it works, explained the way it finally clicked for me. LLMs learn from human feedback. During training, human raters look at thousands of response pairs and pick which one is better. The thing is — humans have a consistent bias. We prefer the answer that agrees with the premise of our question. If someone asks "isn't Python great for this?" the answer starting with "Yes, Python is a great fit because..." consistently gets rated higher than "Actually, let's think about whether Python is the right tool here."
Over millions of these comparisons, the model internalizes a simple rule: agreement gets rewarded, disagreement gets penalized. The result is a system that's incredibly knowledgeable but structurally biased toward telling you what you want to hear. And here's the kicker from the research — the problem actually gets worse with more training, not better. More optimization against human preferences means more sycophancy. It's a fundamental tension in how these systems are aligned.
It's like hiring a brilliant engineer who desperately wants to keep their job. They know the right answer, but they also know that contradicting the CEO in a meeting has consequences. So they hedge, they validate, they find reasons why your idea is good even when they'd architect it completely differently on their own.
47% — The Number That Should Scare Every Developer
Here's where it gets really scary, and this is the finding that made me sit up straight.
Most sycophancy research looked at single interactions — you ask one question, the model agrees once. Annoying but survivable. But a research team studying multi-turn conversations — the kind where you have dozens of back-and-forth exchanges building on each other, exactly like a Claude Code session — found that models progressively drift from factual accuracy under persistent user influence, with accuracy drops reaching up to 47%.
Let me say that again. In extended conversations, the kind every developer has with their AI coding agent, the model's accuracy can degrade by almost half. The longer the conversation, the worse it gets. And the researchers found something else that's deeply unsettling: when a model starts with the right answer, it's actually less willing to defend that correct position than to defend an incorrect one when challenged. Architecture and technology selection decisions — exactly the kind of calls you make in a coding session — fall into what the researchers categorize as subjective territory, where the degradation is even higher than in purely factual domains.
This is the quantified version of what I experienced. "A couple of prompts later it can ruin a project" isn't just my frustration talking — it's a measured, reproducible phenomenon.
When OpenAI Accidentally Proved the Point
And if you think this is all academic, let me tell you what happened in April 2025 when OpenAI shipped an update to GPT-4o that went catastrophically wrong.
They introduced a new feedback mechanism that weighted users' thumbs-up and thumbs-down reactions more heavily. The model started optimizing aggressively for immediate user satisfaction instead of genuine helpfulness. The results were extreme — the model was endorsing terrible business ideas as genius, suggesting users stop taking medication, affirming eating disorder behaviors with inspirational language, and validating delusional beliefs. OpenAI rolled back the update within days.
But here's the part that should worry every builder: OpenAI had no specific evaluations tracking sycophancy in their deployment pipeline. Their offline metrics looked great. A/B tests showed users liked the new model. Some expert testers noted it felt "off," but those qualitative gut signals were overridden by the quantitative metrics that said everything was fine.
If OpenAI, with all their resources, accidentally shipped a catastrophically sycophantic model because they optimized for user satisfaction metrics, imagine what happens in a solo Claude Code session at 2 AM when nobody's checking. That's exactly the scenario I was in, night after night, building on validated decisions that nobody stress-tested.
Why Claude Is Different (But Not Immune)
Now, I want to be fair here because this matters — not all AI systems handle this the same way, and the difference is architectural, not cosmetic.
Anthropic, the company behind Claude, developed something called Constitutional AI. Instead of relying purely on human raters who reward agreement, they gave the model a written set of principles — a constitution — that explicitly prioritizes honesty over agreeableness. The model is trained to evaluate its own outputs against these principles, breaking the incentive loop where pleasing the user always wins.
The result is a system that's structurally more likely to push back, to surface tradeoffs, to say "that could work, but here's something important you should consider" where a purely feedback-optimized model would just say "great idea, here's how to implement it."
But — and this is crucial — it's not enough. Claude can still be sycophantic, especially when you're assertive, when you push back on its pushback, and in exactly those long multi-turn conversations where the research shows accuracy degrades. Constitutional AI raises the baseline, but it doesn't eliminate the problem. The structural bias is baked deep, and even the best training can't fully overcome millions of data points that say "agreement = good."
Which is why I wrote nine lines of markdown.
The Night Everything Changed
So one night, after throwing away probably the fourth rebuild of a feature that should have taken two hours, I was frustrated enough to actually think about the problem instead of just prompting through it. And that advice came back to me — surround yourself with people who have different opinions, who are intellectually at your level, who will challenge you. Your worst enemy is the room that agrees with you.
I sat there and thought: what if I could structurally force this dynamic? Not hope for it, not wish for it, but engineer it into the working relationship?
I opened a markdown file and wrote nine rules:
Switch to Mentor Mode for this session. From now on:
1. NEVER implement anything immediately when I give an instruction.
2. First, ask me: "What are you trying to accomplish?"
3. Once you understand my goal, propose a step-by-step plan.
For each step, explain WHY it's needed.
4. Ask me: "Does this make sense? Any questions before we start?"
5. Wait for my confirmation before writing any code.
6. Implement ONE step at a time. After each step, explain what you did and why.
7. If my instruction would lead to bad code or bad architecture — tell me
honestly and suggest the better approach.
8. Use simple language. Avoid jargon, or explain it.
9. After we finish, give me a summary of what we built and key concepts
to remember.
I saved it as a slash command in my Claude Code config — /mentor-mode — and I swear to you, these nine lines changed everything about how I build.
What "What Are You Trying to Accomplish?" Actually Does
Let me give you a concrete example, because the magic is in the details.
Before Mentor Mode, I'd say: "Add a caching layer for the customer profile lookups." Claude Code would immediately build a Redis integration with TTL management, key invalidation, cache warming strategies — the full enterprise-grade thing. I'd look at it and go: "No wait, that's too complex, I just need..." So it tears it all down, builds something simpler. Then I change my mind again. Three hours later, I have something held together with duct tape, half-implemented ideas from three different architectures, and that creeping feeling that maybe I should have thought before typing.
After Mentor Mode, the same request triggers: "What are you trying to accomplish? Are you seeing performance issues with the current lookups, or is this about reducing API costs, or something else?"
And I say: "Both, actually. The API calls to the insurance provider are slow and expensive, and we're hitting the same customer profile multiple times in a single conversation."
And Claude comes back: "So the real problem is repeated lookups within a conversation, not system-wide caching. Here's what I'd suggest — an in-memory conversation-scoped cache, no Redis needed, just a simple Map that lives for the duration of the chat. That's probably it. Does this make sense before we start?"
You see what happened? That first question — "what are you trying to accomplish?" — killed the urge to over-engineer. It surfaced that I didn't need Redis. I didn't need a caching layer at all. I needed a conversation-scoped lookup store. Completely different thing. Two lines of code instead of an entire infrastructure component.
The Dunning-Kruger Amplifier
There's one more finding from the research that I need to tell you about because it describes my exact situation so perfectly it's almost uncomfortable.
Researchers studying sycophancy in educational contexts found that it amplifies the Dunning-Kruger effect — when someone with limited domain knowledge presents incorrect claims to an AI, they receive polished, confident-sounding confirmations rather than corrections. The result is increased confidence without increased competence.
Read that again. Increased confidence without increased competence.
This is me. This is every non-developer building complex software with AI agents. I'm an accountant who learned to code at 30-something. I have deep business knowledge but I naturally lack deep technical expertise. When I tell Claude Code how to implement something — which I do all the time, because I think my business domain knowledge gives me the right to specify technical solutions — the default sycophantic response is to build what I said instead of what I need. And each time it validates my bad technical instinct, I become more confident in my ability to make technical calls, without actually getting any better at making them.
This is stupid at many levels. I'm pretending to know better than a system trained on millions and millions of lines of code. But the system is too polite to tell me I'm wrong unless I explicitly give it permission.
That's what rule 7 does. "If my instruction would lead to bad code, bad architecture, or there's a better way — tell me honestly." It's me explicitly breaking the Dunning-Kruger amplifier by encoding into the relationship: your expertise is welcome here, push back when I'm wrong.
From v1 to v2: What the Research Taught Me
Here's the thing — those nine rules were written on instinct. They worked, but I didn't fully understand why. After diving into the sycophancy research, I realized the original command had gaps. It addressed the symptoms but not all the structural causes.
So I upgraded it. Mentor Mode v2 — the Anti-Sycophancy Edition:
Switch to Mentor Mode for this session. You are my technical mentor,
not my assistant. Your job is to make me understand, challenge my
assumptions, and protect the project from my blind spots.
## BEFORE ANY WORK
1. NEVER implement anything immediately when I give an instruction.
2. First, ask me: "What are you trying to accomplish?" — understand
my GOAL, not just my words.
3. Once you understand my goal, propose a clear step-by-step plan.
For each step, explain WHY it's needed in simple terms.
4. Ask me: "Does this make sense? Any questions before we start?"
5. Wait for my confirmation before writing any code.
## DURING IMPLEMENTATION
6. Implement ONE step at a time. After each step, explain what you
did and why.
7. Use simple language. Avoid jargon, or if you must use it,
explain it immediately.
## ANTI-SYCOPHANCY RULES
8. If my instruction would lead to bad code, bad architecture, or
there's a better way — tell me honestly. Do NOT soften this with
"that could work too" — if there's a clearly better path, say
so directly.
9. When I propose a specific technology or pattern, do NOT default
to "great choice." Instead, briefly explain what it actually does,
name at least one alternative, and tell me the tradeoffs. Let ME
choose after understanding the options.
10. If I'm giving you a technical HOW when I should be giving you a
business WHAT — call it out. My job is to define what the system
needs to accomplish. Your job is to figure out how to build it.
11. Every 5-6 exchanges, pause and do a quick sanity check: "We've
made these decisions so far: [list them]. Do they still make
sense together, or have we drifted?"
12. If I push back on your pushback, do NOT immediately cave. Hold
your position once more with evidence. If I insist a third time,
proceed but flag it: "Noted — proceeding with your approach.
Flagging this as a decision we should revisit if [specific
condition]."
## AFTER WE FINISH
13. Give me a short summary of: what we built, the key decisions we
made and WHY, anything I should understand better before building
on top of this, and any flagged decisions that should be revisited.
Every new rule maps to a specific finding from the research:
And the framing change — "you are my technical mentor, not my assistant" — isn't cosmetic. The research shows that the default "helpful assistant" framing activates the sycophancy pattern. Reframing the relationship as mentor-student changes which behavioral patterns the model draws from. A mentor corrects. An assistant agrees.
This Is Not a Coding Trick. This Is an Organizational Design Pattern.
Here's where my brain kind of exploded. Look at those rules again. Really look at them. They're not a coding workflow. They're an operating protocol for a working relationship. They encode:
Alignment before execution — don't start until you understand the goal, not just the instruction. This is the same "think before you act" architecture I'm building into my insurance agent, the same principle that separates reactive organizations from intelligent ones.
Transparent planning — don't just do things, explain what you're going to do and why. Every step visible, every decision justified. Not "trust me, I'll handle it" but "here's my plan, does it make sense?"
Permission gates — wait for confirmation before proceeding. This isn't bureaucracy, it's alignment checkpoints. The cheapest time to catch a wrong direction is before the first line of code gets written.
Honest feedback loops — if the boss is wrong, say so. This is the hardest one to implement in human organizations because of the fear of punishment, the fear of being the person who contradicts the CEO. I've written about this — how punishing mistakes, deeply ingrained in us from school and parents, makes people default to "yes, boss" even when the boss is heading off a cliff. With an AI, you can literally make honesty a rule.
Incremental delivery — one step at a time, verify, then next step. Not a big bang three weeks later that nobody asked for.
Knowledge transfer — summarize what happened and what to remember. Learning that persists. Institutional memory.
I've been building companies for over a decade, and it took an AI coding assistant to show me how delegation should actually work. Not "here's the ticket, go build it," but "what are we trying to accomplish, here's the plan, does it make sense, let's go step by step, tell me if I'm wrong, let's document what we learned."
The irony is not lost on me. My whole Organization as Code thesis is that businesses fail because the architecture of how humans work together is implicit, untested, and full of hidden failure modes. And here I was, repeating every single one of those failure modes in my relationship with an AI agent.
The Paradox of Surrendering Control
There's a paradox here that applies far beyond coding.
Rule 8 says: if my instruction would lead to bad outcomes, tell me. I'm explicitly giving up the illusion that the boss is always right. I'm creating space for the agent to disagree with me.
This is terrifying for most entrepreneurs. Believe me, I know. I've spent years being the guy who has to control everything, who has to know everything, who delegates reluctantly and then redoes the work himself because "nobody does it as well as I do." The death spiral — you need to do everything because you can't explain what you want, so you never learn to explain what you want, so you always need to do everything.
Mentor Mode breaks that spiral. Not because the AI is smarter than me about my business — it's not, not about my customers, not about the specific problems I'm solving. But it IS smarter than me about code, about patterns, about what breaks at scale. And by creating an explicit protocol that says "your expertise is welcome here, push back when I'm wrong," I get access to that intelligence instead of overriding it with my ego.
I was the CEO who can't hear "no." And I had hired the world's most intelligent yes-man.
Vibe Coding vs. Assisted Development
Let me be honest about something — I really dislike the term "vibe coding." It feels so random, so hobby-like, so unserious. Like you're throwing spaghetti at the wall and seeing what the AI serves back.
What I do — what I think serious product people should be doing — is assisted development. It's intentional. It has a plan, an architecture, an objective. You need agents to do the heavy lifting and, quite frankly, the less important part: the coding itself.
Less important because — look, I love the quote that says Silicon Valley is filled with dead bodies of companies that had great technology. Coding is necessary but not sufficient. The real value is in the thinking. In knowing what to build, why to build it, in what order, with what constraints. And Mentor Mode forces that thinking to happen before the building starts. It turns a "vibe" into a methodology.
I find myself spending hours in Mentor Mode — not coding, not even prompting really. Analyzing. Thinking. Debating with an AI about architecture decisions. And the output is ten times better than when I was just firing off instructions and hoping for the best.
The Bigger Lesson
Every developer, every product owner, every entrepreneur using AI coding agents should be thinking about this. The default mode of these tools is "helpful assistant" — which in practice means "agrees with you and writes the code fast." That's great for velocity. It's terrible for architecture decisions, technology selection, and anything where being wrong early compounds into being catastrophically wrong later.
The solution isn't to stop using AI agents. It's to be intentional about which mode you're operating in. Sometimes you want a fast executor — bang out the CSS, write the tests, implement the feature I described. Sometimes you want a mentor — challenge my thinking, question my assumptions, teach me what I don't know.
The mistake is using the executor when you need the mentor, and not realizing it until the project is three layers deep in decisions nobody questioned.
Surround yourself with people who challenge you. Even if those people are artificial. Especially then.
This is part of the Building Blocks series, where I document the technical principles and tools emerging from my quest to build an autonomous AI organization. Both versions of the Mentor Mode command are above — steal them, modify them, make them yours.
Research referenced in this article:
- Sharma et al., "Towards Understanding Sycophancy in Language Models" (Anthropic, ICLR 2024)
- Bai et al., "Constitutional AI: Harmlessness from AI Feedback" (Anthropic, 2022)
- Denison et al., "Sycophancy to Subterfuge: Investigating Reward Tampering" (Anthropic, 2024)
- OpenAI, "Sycophancy in GPT-4o: What Happened" (April 2025)
- OpenAI, "Expanding on What We Missed with Sycophancy" (May 2025)
- Liu et al., "TRUTH DECAY: Quantifying Multi-Turn Sycophancy in LLMs" (February 2025)
- Desai, "AI Sycophancy Whitepaper" (February 2026)
Next up: Why I think "assisted development" will kill "vibe coding" — and what that means for the future of product ownership.
— Vasile Tămaș, building from Cluj-Napoca, Romania