Understand how language models process input – and what to do when things get laggy or lost.
Most people assume AI remembers everything they’ve said in a chat. Until suddenly… it doesn’t.
You might be halfway through a strategy discussion, only to find ChatGPT is repeating itself, ignoring earlier instructions, or forgetting what you told it five scrolls ago.
What’s going on?
It’s not broken. It’s just reaching the limit of its context window.
To use AI more effectively — especially for long, reflective work — you need to understand how tokens and context limits shape every response.
What Are Tokens?
Large Language Models (LLMs) don’t read full sentences or paragraphs the way humans do. They process language in tokens — small chunks like:
-
-
A whole word
-
Part of a word
-
A piece of punctuation
-
Or even a space
-
For example:
“Collaborative leadership is essential.”
…might be broken into 5–6 tokens depending on the model.
Different models count tokens slightly differently, but the concept is the same:
Every time you type, you’re spending tokens — and every time the model replies, it’s spending more.
What Is a Context Limit?
Each model has a maximum number of tokens it can consider in a single prompt + response cycle. This is called the context window.
If you exceed that limit:
-
-
The model forgets earlier parts of the conversation
-
It may hallucinate or repeat itself
-
The quality of responses gradually deteriorates
-
Think of it like short-term memory:
The model can only “hold” a certain number of words and ideas at once. When it gets full, earlier content starts to fall out of memory — just like a chalkboard being erased from the top.
Token Limits by Model (Approximate)
Model | Context Limit | Notes |
---|---|---|
ChatGPT-4o | ~128,000 tokens | Equivalent to ~300 pages of text |
Claude Opus | ~200,000 tokens | Very large context window |
Gemini 1.5 Pro | ~1 million tokens (varies) | Huge capacity in ideal conditions |
ChatGPT-3.5 | ~16,000 tokens (paid) | 4,000 for free tier |
GROK | Undisclosed | Generally similar to GPT-3.5 ranges |
Important: Just because a model can process a long context doesn’t mean it always uses it effectively.
Signs You’re Hitting the Limit
-
-
ChatGPT starts forgetting details you shared earlier
-
It ignores formatting instructions or repeats itself
-
It asks clarifying questions about things you’ve already explained
-
It starts getting “vague” or less accurate
-
If this happens — it’s probably not you. It’s the token ceiling.
Projects Help You Manage Context Limits
When your work stretches over a long period — like writing a book, preparing a workshop, or exploring a new research idea — trying to do it all in one chat is risky.
Instead of one long thread that eventually forgets where it started:
✅ Use Projects to chunk your thinking
-
-
Create separate chats for different angles
-
Give each chat a clear purpose and name
-
Start each with a summary of what came before
-
This avoids overloading the context window — and gives you more precise control over each interaction.
Pro Tip: Use Projects to create your own “token budget” — short, sharp, scoped chats that stay focused.
Summary: How to Stay Within the Context Window
-
Keep it tight. If your chat is getting long, start a new thread.
-
Summarise often. Ask the model to tell you what you’ve already done or said.
-
Use Projects. Break your thinking into manageable parts.
-
Avoid repetition. Don’t ask the same thing 10 different ways.
-
Know the limits. Bigger models ≠ infinite memory.
Final Thought: You Don’t Need Infinite Context — Just Clear Thinking
AI doesn’t need to remember everything. It just needs to remember what matters — and that’s your job to define.
Think of yourself as the architect of the conversation.
Tokens are your building blocks — and how you stack them shapes the outcome.
Suggested Next Step
Notice when your chats start to feel “off.”
That’s often a signal you’ve reached the model’s working limit.
Start a new chat, summarise what matters, and keep building forward.