An idea to drive autonomous research using LLMs

13 Mar, 2025

I think I have a nice idea to enable LLM's drive autonomous research. I'm aware that these things are already thought of, people are working on this and I might not bring something new to the table. Nevertheless, I thought it would be useful to start a discussion anyway. I do think I have at least one unique thing to bring so please read on.

Background: I think LLM's have the unique ability to capture patterns amongst huge data. Think about it: ChatGPT 4.0 knows about random stuff like physics, mathematics and biology. What if there are unnoticed patterns amongst this knowledge base that can help drive research forward? That will be the truest test of LLM's value.

I tried using DeepResearch from ChatGPT recently and I noticed that it forgets the initial context and just blurts a huge report based only on one or two sentences that has the instruction in the original prompt. I think its a fundamental limitation of what context means. I'm aware that "Attention" knows what data to hold and what to forget. But why not do it at a higher level? Why limit attention to "tokens"?

Theory: I think knowledge can be abstracted, and I know this is not a new concept.

I don't need to know how linux works internally, nor how individual packets travel in light waves from computer to computer. But I can still create a nice application based on higher level concepts.

Let's think for a moment on how we can enable LLM's to drive autonomous research. Do we just have a while loop and let the model contemplate indefinitely? Ok that can work, why not give it some tools like web search, Python? Ok now we are going somewhere.

But wait, there's a fundamental resource constraint here. I believe that is the context window. To drive autonomous research the LLM must hold as much contextual information possible to realise deep patterns that emerge through its thinking but it can only do it till the context window fills up. Sure, we can delegate some information to a database like a vector db, it can help but how far? A vector db still falls short because it can not help the LLM gain deep understanding between the embeddings, if that makes sense.

Lets name the main LLM that is driving the research as the orchestrator LLM because it orchestrates tools.

Lets go a bit higher level, as the orchestrator LLM is learning (by scourging through data online, using tools or just contemplating) it can store important abstracted concepts as plain text. It stores chunks of this text so that it can be used as context or a prompt to child models.

To give an example, the LLM has spat out say one book worth of insights by searching through the web, it then judicially transforms this learning into some text. Crucially, this "knowledge" is an abstract concept, the orchestrator LLM must only know of this concept's existence, its internal mechanism can exist in a database and when required the orchestrator LLM can gain some understanding of this concept by feeding this as a prompt to another LLM so as to not waste its own context.

We have at least gone one step ahead, instead of storing concepts in a vector db, we instead store it as context for child LLM's. Its crucial to understand the advantage here: a deep understanding only comes from feeding it as context to an LLM, this kind of understanding is clearly more nuanced and captures insights better. The main orchestrator LLM now contains a list of concepts [concept1, concept2...] in its context and dives deeper into each concept only when required.

The chunking of knowledge in terms of concepts is interesting to me: you are again limited by chunk size because it can be a maximum of an LLM model's context window. Also: what do we lose be creating a boundary in this sense? I think we lose deep understanding across these chunks which might be necessary. In an ideal case you would not have to chunk at all and have everything in context.

I think this much is commonly known though not implemented at least as far as I know.

Here is my novel concept, I mentioned earlier that we are limited by chunk size to be the context window of an LLM. When deep understanding of a concept is required and chunking may lose crucial insights shall we instead fine tune a new small model? This will enable storing bigger concepts that can not be chunked. You might think this is excessive but it can be done within an hour or two on small models. Sure, the model you fine tune loses some generalised data to store your new information but I think this tradeoff can be worth it. We can now break down our chunks in a better way, retaining bigger sizes but with the tradeoff of using more time and energy. Having bigger chunks of knowledge allows for deeper understanding within each chunk.

I also want to touch on a very crucial parameter that is exactly what to store in the main orchestrator's context window. We must be very judicious in using this precious space as during each contemplation iteration it should know about the progress and high level concepts. The degree to which it can make true progress is limited by its context.

To summarise: my main theory is that we can get value out of abstracting concepts, just keep the existence of a concept and delegate deep understanding of it to another entity which can either be a context to another LLM or a fine tuned LLM itself.

With this approach we can send many LLM's on their own path and leave them unattended for a few months and maybe something clicks?

Some more things to refine this concept: can a child fine tuned LLM create a more children recursively when it makes sense to do it?

I'm aware this sounds like an eccentric blob of text and I would love some feedback anyway. Is this something obvious that people are already working on? Or is there an obvious limitation that makes such an idea infeasible?