FunSearch: Making New Discoveries in Mathematical Sciences Using Large Language Models

Introduction

Large Language Models (LLMs) have demonstrated remarkable capabilities in tasks ranging from coding and creative writing to answering questions. However, their tendency to “hallucinate” – producing plausible but incorrect information – has limited their use in scientific discovery. Researchers at Google DeepMind have now introduced FunSearch, a novel method that harnesses the creative power of LLMs while rigorously guarding against hallucinations. FunSearch is designed to search for new solutions in mathematics and computer science by treating these problems as a search for functions (hence the name “Function Search”)【1†source】. This approach has, for the first time, enabled an LLM-based system to make genuine discoveries on open, challenging problems in mathematical sciences【2†source】.

FunSearch’s core innovation is an evolutionary procedure that pairs a pre-trained LLM with an automated “evaluator”【2†source】. The LLM’s role is to propose creative solutions in the form of computer programs (functions), while the evaluator acts as a critical judge, validating the correctness and quality of each proposal. By iteratively feeding the best-performing programs back to the LLM and refining them, FunSearch evolves an initial set of ideas into progressively better solutions【1†source】. This cycle allows the system to explore a vast space of potential functions – much like searching through a high-dimensional space – and discover new knowledge that is both correct and insightful【3†source】.

The FunSearch Method: Evolutionary Search in Function Space

FunSearch represents a significant advance in AI-driven scientific discovery. It builds on the idea that many problems in mathematics and computer science are “easy to evaluate” (one can efficiently check the correctness of a solution) but “hard to solve” (finding the solution is difficult)【3†source】. By formulating such problems as a search for functions – for example, writing a function that outputs a candidate solution to a puzzle – FunSearch leverages the LLM’s ability to generate code and the evaluator’s ability to test it. This combination is powerful: the LLM can combine concepts in creative ways, while the evaluator ensures that only verifiably correct ideas survive and propagate【1†source】.

Key to FunSearch’s success are several design choices that distinguish it from prior approaches【3†source】:

Best-Shot Prompting: Instead of randomly sampling ideas, FunSearch feeds the LLM the best solutions found so far as context. This “best-shot” prompting encourages the model to improve upon the most promising ideas, rather than starting from scratch each time【3†source】. In practice, the system maintains a database of high-scoring programs and uses them as examples in the prompt for the LLM, guiding the model to generate even better variants【7†source】.
Skeleton Programs: FunSearch often starts with a simple “skeleton” program that provides a basic structure or known solution framework. The LLM then evolves only the critical parts of the program that govern the problem-solving logic【3†source】. For example, in a combinatorial search problem, the skeleton might set up a greedy algorithm’s outer loop, and the LLM evolves the heuristic function that decides which choices to make at each step. This focus on evolving a specific function keeps the search tractable and interpretable.
Island-Based Evolution: To maintain diversity and avoid getting stuck in local optima, FunSearch uses an island-based evolutionary method【3†source】. This means the search is split into multiple “islands” (sub-populations) that evolve in parallel, periodically sharing their best solutions. This approach encourages exploration of different regions of the function space and helps the system discover novel solutions that might be overlooked in a single monolithic search.
Automated Evaluation: Central to FunSearch is the evaluator, which automatically executes each generated function on a set of test cases and scores its performance【2†source】. The problem specification provided by the user includes this evaluation function – for instance, in a mathematical optimization problem, the evaluator would compute the quality of the solution (e.g. the size of a constructed set)【9†source】. By automating the evaluation, FunSearch can rapidly iterate through thousands of candidates without human intervention, scaling to problem sizes that are infeasible for manual or brute-force search【1†source】.

The interplay of these components creates a self-reinforcing discovery loop. The LLM proposes a program, the evaluator tests it, the highest-scoring programs are added to the pool, and the process repeats, with the LLM continuously improving upon the best ideas. Over many cycles, FunSearch can arrive at solutions that are both correct and novel, pushing the boundaries of what is known【2†source】. Importantly, the programs FunSearch discovers are interpretable – they are pieces of code that a human can read and understand【2†source】. This is a major advantage over black-box AI solutions, as it allows mathematicians and scientists to study the discovered functions, gain insights from them, and potentially use or refine them further【10†source】.

Discovering New Mathematical Insights

FunSearch’s power was demonstrated by applying it to two long-standing open problems: one in pure mathematics and one in computer science. In both cases, the system not only matched the best-known solutions but also surpassed them, achieving results that have stood for decades【2†source】.

Cap Set Problem (Mathematics): The cap set problem is a classic question in extremal combinatorics. It asks for the largest possible subset of points in a high-dimensional grid (specifically, in $\mathbb{F}_3^n$) such that no three points lie on a line【1†source】. This problem has fascinated mathematicians for years – even Fields Medalist Terence Tao once called it his favorite open question【1†source】. Finding large cap sets is challenging because brute-force enumeration is impossible (the number of possibilities grows astronomically with dimension)【1†source】. When researchers applied FunSearch to the cap set problem, it discovered new constructions that yield the largest cap sets ever found【1†source】. In fact, FunSearch’s results represented the largest improvement in cap set sizes in over 20 years, breaking a long-standing record【1†source】. These new constructions outperformed not only previous human-made solutions but also state-of-the-art computer search methods, especially as the problem scaled to dimensions where traditional solvers struggle【1†source】. This was the first time an AI system had made a genuine discovery on a well-known open problem in pure mathematics, proving that LLMs can indeed contribute new knowledge in science【2†source】.

Figure 1: Illustration of the relative improvement in cap set sizes achieved by FunSearch compared to the previous best-known result. The chart represents the growth in size, with the FunSearch result showing a significant advancement over the record held for two decades.

Online Bin Packing (Computer Science): To showcase FunSearch’s versatility beyond pure math, the researchers also tackled a well-known problem in computer science: the online bin packing problem. This problem asks how to pack items of different sizes into the fewest bins possible, given that items arrive one by one and must be placed immediately without knowledge of future items【1†source】. It’s a classic optimization problem with practical applications (e.g. allocating resources in data centers), and it’s usually addressed with hand-crafted heuristic rules【1†source】. FunSearch, however, autonomously developed a new heuristic for bin packing that outperformed established algorithms【1†source】. When tested on standard benchmarks, the FunSearch-discovered heuristic packed the same items into significantly fewer bins than the best previously known heuristics【1†source】. This was a striking result because FunSearch was not specifically trained on bin packing – it inferred a better strategy simply by evolving solutions guided by the problem’s evaluation function. The fact that it could improve upon decades of human-designed heuristics in a single run highlights FunSearch’s potential to discover efficient algorithms in areas where even experts have hit a ceiling.

Figure 2: A comparison of the number of bins required by different heuristics to pack the same set of items. FunSearch's discovered heuristic uses fewer bins than the well-known 'Best Fit' heuristic, demonstrating its superior efficiency.

These examples illustrate FunSearch’s broad applicability. Whether the problem is abstract (finding largest sets under combinatorial constraints) or practical (minimizing resources in an algorithmic task), FunSearch can search the space of functions and find programs that solve the problem better than the best-known approaches【2†source】. The system’s ability to discover solutions in both domains – one a theoretical puzzle and the other a real-world optimization – suggests that LLM-driven evolutionary search could be a general tool for discovery across many scientific fields.

Broader Impact and Future Directions

FunSearch’s achievements mark a milestone in the use of AI for scientific discovery. It addresses a fundamental challenge: how to get creative ideas from an AI while ensuring their correctness. By pairing an LLM with a rigorous evaluator, FunSearch provides a blueprint for AI systems that can propose and validate new knowledge autonomously. This has implications far beyond the two problems it was tested on. In principle, any problem that can be formulated as a search for a function with a measurable outcome could be tackled with FunSearch or its descendants【3†source】. For example, it could be used to discover new algorithms in areas like cryptography, optimization, or even physical simulations, where the “function” might be a control strategy or a model parameterized in code.

One of the most exciting aspects of FunSearch is its interpretability. Unlike black-box neural networks, FunSearch outputs a human-readable program that explains how it solved the problem【2†source】. This allows domain experts to not only trust the result (since they can verify the program’s correctness) but also to learn from it. In the cap set case, mathematicians could study the discovered functions and gain new insights into the structure of large cap sets【10†source】. In the bin packing case, computer scientists could analyze the heuristic and understand why it works better, potentially refining it further or integrating it into production systems. This feedback loop between AI and human experts – where the AI proposes a solution and the human interprets and builds on it – is a promising model for collaborative discovery in the future【10†source】.

The success of FunSearch has also paved the way for more advanced systems. DeepMind’s AlphaEvolve, introduced in 2025, builds directly on FunSearch’s evolutionary approach but scales it up significantly【3†source】. AlphaEvolve uses Google’s Gemini 2.0 LLMs and can evolve entire codebases (hundreds of lines of code) to optimize complex algorithms【3†source】. It has been applied to improve critical infrastructure at Google – for instance, finding more efficient ways to allocate jobs in data centers (saving a substantial amount of computing resources) and even discovering faster matrix multiplication algorithms than previously known【3†source】. AlphaEvolve’s ability to handle larger and more varied problems than FunSearch shows the potential of LLM-driven evolutionary search to become a general-purpose tool for algorithm discovery and optimization.

Looking ahead, the synergy between large language models and evolutionary algorithms is likely to grow. Researchers are exploring ways to make these systems even more robust and efficient, such as incorporating more sophisticated search strategies or improving the “imagination” of the LLMs with better training【8†source】. There are also important challenges to address, like ensuring the diversity of evolved solutions and preventing the system from getting stuck in suboptimal regions of the search space. Nonetheless, FunSearch and its progeny represent a new paradigm in AI-assisted discovery. By automating the process of proposing, testing, and refining solutions, these systems are unlocking the creative potential of LLMs for science. They are already leading to new mathematical theorems, faster algorithms, and optimized industrial processes – and they hold the promise of accelerating human progress in many fields by helping us discover knowledge that would otherwise remain hidden【11†source】.

Conclusion

FunSearch stands as a landmark achievement, demonstrating that Large Language Models can be harnessed to make genuine discoveries in mathematical sciences. It addresses a longstanding challenge of AI-driven research – the need for verifiable correctness – by combining the creative generative power of LLMs with rigorous evaluation in an evolutionary framework. The result is a system that can explore vast solution spaces, verify results, and iteratively improve until it finds solutions that are both correct and novel. In doing so, FunSearch has broken a decades-old record in combinatorics and found better algorithms for real-world problems, proving that LLMs can indeed contribute to advancing human knowledge.

Beyond the specific problems it solved, FunSearch’s impact lies in showing how AI can be used for discovery. It provides a blueprint for AI systems that act as intelligent assistants to researchers: proposing ideas, checking them, and learning from the best ones. The success of this approach has already inspired further research, including more advanced evolutionary coding agents like AlphaEvolve. As these technologies mature, we can expect AI to become an increasingly valuable partner in scientific exploration – not by replacing human ingenuity, but by amplifying it. FunSearch’s story is a testament to the potential of human-AI collaboration in the quest for new knowledge, opening the door to a future where AI-driven discovery is a normal part of the scientific process【12†source】. With each new discovery, we move closer to that vision, and FunSearch has undeniably moved us a significant step forward.

FunSearch: Making New Discoveries in Mathematical Sciences Using Large Language Models

FunSearch: Making New Discoveries in Mathematical Sciences Using Large Language Models

Introduction

The FunSearch Method: Evolutionary Search in Function Space

Discovering New Mathematical Insights

Broader Impact and Future Directions

Conclusion

🌟 智谱 GLM-5 已上线