Dawn of the Infinite Context Window

Credit: DALL-E

Developments in AI continue apace, which is to say at some advanced warp factor that even Star Trek ships can’t reach. One recent news nugget that didn’t get as much play in the mainstream as flashier debuts like Sora was Google’s announcement that, with Gemini Pro 1.5, its large language model now has a 1-million-token context window.

As we wrote about last week, this has pretty massive implications. As a milestone, though, it’s destined to be surpassed, and probably quite soon: Microsoft just published a paper that describes a method to push the context window to 2 million tokens (disclosure: members of The Media Copilot consult with several companies, including Microsoft). Not to be outdone, Google is said to have a 10-million-token context window on the roadmap, according to The Verge.

If you’re unfamiliar, the context window for an AI is the limit of how much data you can throw at it in a prompt. It’s usually expressed in tokens, with each token representing slightly less than a word. ChatGPT’s context window, for instance, is 8,000 tokens, which is generally thought to be 6,000-7,000 words. The window expands to 128,000 tokens if you use the GPT-4 API — a number that seemed enormous way back in the olden days of November 2023. Now you get that same 128,000-token window on the public-facing version of Gemini Pro.

Google’s 1-million-token limit has certainly set a new bar, but no one doubts that OpenAI and others will soon respond in kind. It’s easy to see where this hockey stick is going: Pretty soon the cap on an LLM’s context window will effectively be infinite for the vast majority of use cases. Whether you’re analyzing earnings reports, mining court documents, or simply pulling out details from a series of books, all you need to do is “paste” the relevant data in your prompt.

I use paste in quotes because when we’re talking about millions of tokens or words, we’re well beyond the realm of Control-V. One consequence of this trend of enlarging the context window is it will force the industry to create better ways of using it — that is, the gathering and aggregating of the information we want to “paste” in. That’s a straightforward engineering task, but it feels to me there’s an opportunity for someone to create a consumer-level tool that bookmarks items for inclusion in a dataset for dropping into a prompt later.

Subscribe now

Obsoleting RAG?

One of the things companies we consult with often request are ways to use generative AI with their own data. This typically involves converting a set of key documents and archives into vector embeddings an LLM can query via retrieval-augmented generation, or RAG — a standard way to tell the AI to “check with this data set” when creating responses.

But with a massive context window, you no longer necessarily need RAG for certain use cases. Instead of building a custom way to query specific data, you can feed the entirety of that data to LLM at the prompt stage. For data sets that change often, this might even be preferable since it means you don’t have the extra step of needing to update your data set, or corpus, whenever there’s new data.

It’s doubtful RAG is going to go away entirely, since the cost of compute is still a huge factor in AI, and many queries will use less-capable LLMs for cost reasons. Plus many custom tools — ones that abstract prompting away and are purpose-built around data that doesn’t change much — will still use RAG.

Regardless, massive context windows are an important step forward. While a big number doesn’t necessarily mean great output, it does mean most users can focus on what they want to get out of a data set, rather than cutting it down to size.

Ready to start using AI like a pro?


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.