M
MeshWorld.
AI Context Window LLM Claude Productivity Developer Tools Beginners 7 min read

Context Window Full? 9 Tricks to Get More Out of Every AI Session

Vishnu
By Vishnu

:::note[TL;DR]

  • Context degrades around 60–70% utilization — before the hard limit — as older tokens get deprioritized
  • The single most effective fix: start a new session for each distinct task
  • Paste only what’s relevant — full file pastes are the biggest context waste
  • Use a “session summary” prompt to restart without losing state
  • CLAUDE.md (Claude Code) or a CONTEXT.md file keeps project context available across sessions without re-pasting :::

The context window is the amount of text an AI model can hold in its “working memory” at once — your messages, its responses, any files you paste in. When you hit the limit, the model starts forgetting earlier parts of the conversation. Older instructions get dropped. Code from earlier in the session disappears.

Every AI model has one, even the large ones. Claude 3.5 Sonnet has 200k tokens. GPT-4o has 128k. Gemini 1.5 Pro goes up to 1 million. But in practice, long sessions degrade well before the hard limit — the model starts losing track of context around 60–70% utilization.

Here’s how to work with that reality instead of fighting it.

Why does quality drop before the hard limit?

At high context utilization, models start to compress and deprioritize earlier content. You’ll see this as:

  • Forgetting variables or constraints you mentioned early on
  • Re-introducing code you already removed
  • Ignoring instructions that were near the start of the conversation
  • Answers that contradict earlier decisions

It’s not a bug. It’s the model running out of attention budget for older tokens.

Trick 1: Start a new session for each distinct task

The single most effective thing. Each new conversation starts fresh with full attention capacity. If you’re doing three things — debugging a function, writing a new feature, updating docs — do them in three sessions, not one.

The scenario: You’re debugging a tricky auth issue and you’ve been going back and forth for an hour. The conversation is massive. You ask the model to add a simple log statement and it writes something completely wrong. The session is too full. Start fresh, paste in just the function and the error, and get it fixed in two messages.

Trick 2: Paste only what’s relevant

Don’t paste entire files when you need help with one function. Cut out the noise. A 200-line file where the issue is in a 20-line function? Paste the function plus the relevant types/imports. Same result, 10x less context used.

Trick 3: Use a “session summary” to restart

When a session gets long and you can’t abandon it, ask the model to summarize what it knows before you start a new chat:

“Summarize the current state of what we’ve built, the key decisions we’ve made, and what we still need to do. I’ll paste this into a new session.”

Paste that summary at the start of the new conversation. You recover most of the working context in a fraction of the tokens.

:::tip Before starting a new chat, ask the model: “Summarize the current state of what we’ve built, key decisions, and what we still need to do.” Paste this at the top of the new conversation to recover context in a fraction of the original tokens. :::

Trick 4: Keep a running CONTEXT.md file

For ongoing projects, maintain a plain text file in your repo. Before each session, update it with the current state: what’s done, what’s in progress, what decisions were made. Paste it at the start of every new session.

This is what CLAUDE.md is in Claude Code — a persistent context file the model reads at the start of every session without you having to paste it manually.

Trick 5: Be explicit about what to forget

You can instruct the model to deprioritize earlier content:

“Ignore everything before my last message about authentication. Focus only on the password reset flow.”

It’s not perfect, but it helps the model allocate attention toward what matters.

Trick 6: Break large files into chunks

Need to refactor a 1,000-line file? Do it in sections:

  1. Paste lines 1–200, ask for refactored output
  2. Paste lines 201–400, continue
  3. Combine the results

Or ask the model to create a plan first (which takes fewer tokens), then execute section by section.

Trick 7: Prefer diffs over full file replacements

When asking for code edits, request a diff or specific line changes instead of a full file rewrite. “Replace lines 45–67 with…” uses far fewer tokens than the model rewriting 500 unchanged lines.

Claude Code does this automatically with its Edit tool. If you’re using a chat interface, ask for it explicitly.

Trick 8: Use structured output to compress responses

For technical information (lists of functions, schemas, options), ask for a compressed format:

“List the options in a compact table, not prose.”

Prose takes 3–5x more tokens than the same information in a table or bullet list. Over a long session, this adds up.

Trick 9: Know when to switch models

For tasks that genuinely require large context (analyzing an entire codebase, processing a long document), use a model built for it:

  • Gemini 1.5 Pro / 2.0 — 1M token context, good for large document analysis
  • Claude 3.5 Sonnet — 200k tokens, strong reasoning within that window
  • Claude 3 Opus — better at maintaining coherence near the top of a long context

:::note A large context window doesn’t mean a problem-free session. Gemini’s 1M token window is useful for write-once read-heavy tasks like codebase analysis — but for interactive coding, attention diffusion still degrades quality at high utilization regardless of the hard limit. :::

Don’t force a 128k model to hold a 500-file codebase. Use the right tool.

What actually fills up context fast

  • Full file pastes (especially minified or generated code)
  • Long back-and-forth debugging sessions with repeated error messages

:::warning Repeated error message pastes are the worst context waste — each time you paste the same stack trace, you burn tokens on content the model has already seen. Save the first error, reference it by message or line number in follow-ups instead of pasting it again. :::

  • Pasting the same file multiple times across a session
  • Asking for full file rewrites instead of targeted edits
  • Long system prompts or tool definitions

Knowing what eats tokens helps you be deliberate about what goes in.

Related: What Is a Context Window?


Summary

  • Context window degradation happens before the hard limit — quality drops around 60–70% utilization
  • The single most effective fix is starting a new session for each distinct task
  • Paste only what’s relevant — full file pastes are the biggest context waste
  • Use a session summary prompt to restart a long session without losing state
  • CLAUDE.md (Claude Code) or a CONTEXT.md file keeps project context available across sessions without re-pasting

Frequently Asked Questions

Why does the AI start contradicting itself late in a session?

At high context utilization, older tokens get deprioritized. Earlier instructions, constraints, and code decisions receive less attention than recent messages. The model isn’t broken — it’s running out of attention budget for content from the start of the conversation.

Does starting a new chat always lose context?

Only if you don’t carry it forward. Ask the model to summarize the session state before you start fresh, then paste that summary at the top of the new chat. You recover the key decisions and current state in a fraction of the original token cost.

Is a 1M token context window (like Gemini) actually usable?

For document analysis and large codebase reads, yes. For interactive coding sessions, not really — response latency increases significantly at high utilization, and attention diffusion means the model can still miss things from much earlier in a 1M token context.