Tokens, Context Limits, and Why Your AI Chat Sometimes Forgets

by Tom McAtee | 5 Aug 2025 | AI in Practice

Understand how language models process input – and what to do when things get laggy or lost.

Most people assume AI remembers everything they’ve said in a chat. Until suddenly… it doesn’t.

You might be halfway through a strategy discussion, only to find ChatGPT is repeating itself, ignoring earlier instructions, or forgetting what you told it five scrolls ago.

What’s going on?

It’s not broken. It’s just reaching the limit of its context window.

To use AI more effectively — especially for long, reflective work — you need to understand how tokens and context limits shape every response.


What Are Tokens?

Large Language Models (LLMs) don’t read full sentences or paragraphs the way humans do. They process language in tokens — small chunks like:

    • A whole word

    • Part of a word

    • A piece of punctuation

    • Or even a space

For example:

“Collaborative leadership is essential.”
…might be broken into 5–6 tokens depending on the model.

Different models count tokens slightly differently, but the concept is the same:

Every time you type, you’re spending tokens — and every time the model replies, it’s spending more.


What Is a Context Limit?

Each model has a maximum number of tokens it can consider in a single prompt + response cycle. This is called the context window.

If you exceed that limit:

    • The model forgets earlier parts of the conversation

    • It may hallucinate or repeat itself

    • The quality of responses gradually deteriorates

Think of it like short-term memory:

The model can only “hold” a certain number of words and ideas at once. When it gets full, earlier content starts to fall out of memory — just like a chalkboard being erased from the top.


Token Limits by Model (Approximate)

Model Context Limit Notes
ChatGPT-4o ~128,000 tokens Equivalent to ~300 pages of text
Claude Opus ~200,000 tokens Very large context window
Gemini 1.5 Pro ~1 million tokens (varies) Huge capacity in ideal conditions
ChatGPT-3.5 ~16,000 tokens (paid) 4,000 for free tier
GROK Undisclosed Generally similar to GPT-3.5 ranges

Important: Just because a model can process a long context doesn’t mean it always uses it effectively.


Signs You’re Hitting the Limit

    • ChatGPT starts forgetting details you shared earlier

    • It ignores formatting instructions or repeats itself

    • It asks clarifying questions about things you’ve already explained

    • It starts getting “vague” or less accurate

If this happens — it’s probably not you. It’s the token ceiling.


Projects Help You Manage Context Limits

When your work stretches over a long period — like writing a book, preparing a workshop, or exploring a new research idea — trying to do it all in one chat is risky.

Instead of one long thread that eventually forgets where it started:

Use Projects to chunk your thinking

    • Create separate chats for different angles

    • Give each chat a clear purpose and name

    • Start each with a summary of what came before

This avoids overloading the context window — and gives you more precise control over each interaction.

Pro Tip: Use Projects to create your own “token budget” — short, sharp, scoped chats that stay focused.


Summary: How to Stay Within the Context Window

  • Keep it tight. If your chat is getting long, start a new thread.

  • Summarise often. Ask the model to tell you what you’ve already done or said.

  • Use Projects. Break your thinking into manageable parts.

  • Avoid repetition. Don’t ask the same thing 10 different ways.

  • Know the limits. Bigger models ≠ infinite memory.


Final Thought: You Don’t Need Infinite Context — Just Clear Thinking

AI doesn’t need to remember everything. It just needs to remember what matters — and that’s your job to define.

Think of yourself as the architect of the conversation.
Tokens are your building blocks — and how you stack them shapes the outcome.


Suggested Next Step

Notice when your chats start to feel “off.”
That’s often a signal you’ve reached the model’s working limit.
Start a new chat, summarise what matters, and keep building forward.

Written by Tom McAtee

Curious by nature, grounded by experience – I explore the intersection of AI, culture, and leadership, drawing on four decades in heavy industry and high-stakes organisations. These days, I’m diving deep into research, building tools for thinking, and sharing personal reflections along the way. I also happen to love golf, music, cycling, travel, food – and building elegant things with Divi.

Related Posts

Understanding Memory in ChatGPT

Understanding Memory in ChatGPT

Memory in ChatGPT isn’t what most people think. Learn what it does, what it doesn’t, and how to use it with clarity — especially when working across sessions or models.

Custom GPTs for Research & Consulting

Custom GPTs for Research & Consulting

Build your own AI assistant that knows your frameworks, tone, and goals. Custom GPTs let you work faster and think deeper — without losing your professional edge.

Submit a Comment

Your email address will not be published. Required fields are marked *