Prompts That Go Wrong: What I Learned Shipping AI Features

Prompt engineering is just technical writing for an extremely literal intern. Most developers ship a prompt that works twice and assume they’re done. They aren’t. In production, users will find every possible way to break your “perfect” instructions. This guide covers the real-world failures I’ve seen while shipping AI features—from bots that get passive-aggressive to JSON that breaks your parser. You’ll learn how to move past vague requests and start writing precise specifications. Stop guessing why your model is acting weird. Start building prompts that actually handle the chaos of real-world inputs.

Why is my summary bot ignoring my constraints?

You ask for three bullet points. You get five. Sometimes the model adds an intro paragraph or a “hope this helps” conclusion. It treats your constraints as suggestions rather than hard rules. In testing with clean data, it looks fine, but messy user inputs make it wander.

The Scenario: You’re building a “TL;DR” feature for a news app. A user pastes a 3,000-word conspiracy theory. The bot gets so “interested” in the topic that it writes a ten-point analysis instead of the three-point summary you hard-coded into the UI.

The fix is explicit formatting rules. Tell the model to “Return exactly 3 bullet points starting with -.” Tell it to “Do not include any other text.” Adding a concrete example gives the model a template to match instead of a vibe to follow.

Why is every support ticket being labeled as “Other”?

Classification prompts fail when categories aren’t defined. If you just list “billing” and “technical,” the model will get conservative. It dumps anything slightly ambiguous into the “other” category because it’s scared of being wrong. Your “smart” sorting system becomes a junk pile.

The Scenario: A customer writes: “My credit card was charged twice because the app crashed.” Is that billing or technical? Without clear definitions, the model panics and labels it “Other.” Now a high-priority refund request is sitting in a queue that nobody checks.

Define your categories with bullet points. Tell the model that “billing” includes duplicate charges and “technical” includes app crashes. Once you provide a rubric, “Other” usage will drop from a majority to a tiny fraction of your data.

Why does my bot sound like a corporate robot?

Vague values like “be professional” are useless. Models interpret professionalism as being extremely formal and detached. This works fine for a bank but feels passive-aggressive when a user is frustrated. It sounds like a form letter from a company that doesn’t care.

The Scenario: An angry user complains that their account was locked for three days. Your bot responds: “I understand your frustration with the situation you have described.” The user feels mocked by a machine and takes a screenshot of the “cold” response to post on social media.

Tell the model exactly how to handle anger. Tell it to “Reference the specific details the user mentioned, like the three-day wait.” Tell it to “Never be defensive.” Specific instructions for difficult conversations beat abstract “empathy” every single time.

Why is raw Markdown breaking my JSON parser?

You ask for JSON and you get it. But you also get triple backticks and the word json at the top. Your backend code tries to parse the string and crashes because it isn’t a raw object. It’s a markdown document containing an object.

The Scenario: You’re processing 100 receipts in a batch. On the 99th one, the model decides to be “helpful” and adds a note: “Here is your data: json....” Your script fails, the process stops, and you have to manually clean the database at midnight.

Tell the model: “Return only the raw JSON object starting with { and ending with }.” Don’t allow any markdown. You should also write a simple regex in your code to strip backticks defensively. Never trust the model to follow formatting rules perfectly.

Why is my bot suddenly claiming it “doesn’t know” anything?

If you tell a model to “say you don’t know if you aren’t sure,” it will play it safe. It might refuse to answer common-sense questions because it doesn’t have “real-time” access to verify them. Your helpful assistant becomes a useless wall of refusals.

The Scenario: A user asks: “What is the capital of France?” Your bot responds: “I don’t know.” You’ve tuned the “caution” so high that the model is now afraid to use its own training data for basic facts.

Define the threshold for “don’t know” explicitly. Tell it to answer general knowledge questions confidently. Only tell it to hedge if the user asks for account-specific data or live news. Precision in your “refusal” rules is just as important as the task itself.

What’s the meta-lesson here?

Write prompts like you’re writing a technical specification. Don’t use vague adjectives. Use hard constraints and provide examples of both good and bad outputs. The model isn’t a person; it’s a text-prediction engine that needs a clear target.

The prompts that work in production share three traits. They have explicit constraints. They use concrete examples. They handle failure modes with specific “If/Then” logic. The more precisely you describe the goal, the more likely you are to hit it.

System_Continuity

Next_Recommended_Node

AI Mistakes When Building Apps (And How to Fix Them)

The mistakes developers make when integrating AI into their apps — not the obvious ones, but the ones that only show up after you've shipped and users are hitting them.

Vishnu

5m read

AI 5m

How to Add Claude to Your App Using the Anthropic API

Log_Access

AI 5m

Prompt Engineering Is Dead. Long Live System Prompts.

Log_Access

Browse the full manifest