Close Menu
bkngpnarnaul
  • Home
  • Education
    • Biology
    • Chemistry
    • Math
    • Physics
    • Science
    • Teacher
  • E-Learning
    • Educational Technology
  • Health Education
    • Special Education
  • Higher Education
  • IELTS
  • Language Learning
  • Study Abroad

Subscribe to Updates

Please enable JavaScript in your browser to complete this form.
Loading
What's Hot

16 Must-Read Novels for Your Summer TBR Pile

July 18, 2025

This Number System Beats Binary, But Most Computers Can’t Use It

July 18, 2025

A Decade of Mathletics Success at College Heights Christian School (A Principal’s Perspective)

July 18, 2025
Facebook X (Twitter) Instagram
Friday, July 18
Facebook X (Twitter) Instagram Pinterest Vimeo
bkngpnarnaul
  • Home
  • Education
    • Biology
    • Chemistry
    • Math
    • Physics
    • Science
    • Teacher
  • E-Learning
    • Educational Technology
  • Health Education
    • Special Education
  • Higher Education
  • IELTS
  • Language Learning
  • Study Abroad
bkngpnarnaul
Home»Physics»Mathematicians interact with AI, July 2025 update
Physics

Mathematicians interact with AI, July 2025 update

adminBy adminJuly 16, 2025No Comments9 Mins Read0 Views
Share Facebook Twitter Pinterest LinkedIn Tumblr Email WhatsApp Copy Link
Follow Us
Google News Flipboard Threads
Mathematicians interact with AI, July 2025 update
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link


This is a guest post from Aravind Asok. If you have comments about this, you can contact him at [email protected]. We’ll see if there’s some way to later post moderated comments here.

Recently, several symposia have been organized in which groups of mathematicians interacted with developers of various AI systems (specifically, reasoning models) in a structured way. We have in mind the Frontier Math Symposium hosted by Epoch AI and the Deepmind/IAS workshop. The first of these events received more coverage in the press than the second. It spawned several articles including pieces in Scientific American and the Financial Times, though both articles are currently behind a paywall. Curiously absent from these discussions is any kind of considered opinion of mathematicians regarding these interactions, though hyperbolic quotes from these pieces have made the rounds on social media. Neither of these events was open to the public: participation in both events was limited and by invitation. In both cases the goal was to foster transparent and unguarded interactions.

For context, note that many mathematicians have spent time interacting with reasoning models (Open AI’s ChatGPT, Google’s Gemini, and Anthropic’s Claude among others). While mathematicians were certainly not exempt from the wave of early prompt-based experimentation with initial public models of ChatGPT, they have also explored the behavior of reasoning models on professional aspects of mathematics, testing the models on research mathematics, homework problems, example problems for various classes as well as mathematics competition problems. Anecdotally, reactions run the gamut from dismissal to surprise. However, a structured group interaction with reasoning models provides a qualitatively different experience than these personal explorations. Since invitation to these events was controlled, their audience was necessarily limited; the Epoch event self-selected for those who expressed specific interest in AI, though the IAS/Deepmind event tried to generate a more random cross section of mathematicians.
Much press coverage has a breathless feel, e.g., including coverage of comments by Sam Altman in, say, Fortune. It seems fair to say that mathematicians are impressed with the current performance of models, and, furthermore, see interesting avenues for augmenting mathematical research using AI tools. However, many mathematicians view the rhetoric that “math can be solved”, extrapolating from progress on competition-style mathematics viewed as a game, as problematic at best, and at worst presenting a fundamental misunderstanding of the goals of research mathematics as a whole.

Our discussion here will focus on the Epoch AI-sponsored meeting for concreteness, which was not “secret” in any dramatic or clandestine sense, contrary to some reports. The backstory: Epoch AI has been trying to create benchmarks for the performance of various released LLMs (a.k.a., chatbots like Open AI’s ChatGPT, Anthropic’s Claude, Google Deepmind’s Gemini, etc.). Frontier Math is a benchmark designed to evaluate the mathematical capabilities of reasoning models. This benchmark consists of tiered lists of problems. Tier 1 problems amount to “mathematical olympiad” level problems, while Tiers 2 and 3 are “more challenging” requiring “specialized knowledge at the graduate level.” Frontier Math sought to build a Tier 4 benchmark of “research
level” problems.

Building the Tier 4 benchmark necessitated involving research mathematicians. Earlier this year, Epoch reached out to mathematicians through varying channels. Initial requests promised some amount of money for delivering a problem of a particular type, but many mathematicians unfamiliar with the source of the communication either dismissed it as not credible or had no interest in the monetary compensation. To speed up the collection of Tier 4 problems, Epoch came up with the idea of hosting a symposium. The symposium was advertised on several social media outlets (e.g., Twitter) and various mathematicians were contacted directly by e-mail. Interested participants were sometimes asked to interview with Frontier Math lead mathematician Elliot Glazer and also to produce a prospective problem. Mathematics is a fairly small community so many of the people who attended already knew others who were attending; also the vast majority of attendees came from California. Participants did sign a non-disclosure agreement, but it was limited to information related to the problems that were to be delivered. Symposium participants also had their travel and lodging covered, and were paid a \$1500 stipend for their participation.

Participants were given a list of criteria for problem construction; problems must:

  1. Have a definite, verifiable answer (e.g., a large integer, a symbolic real, or a tuple of such objects) that can be checked computationally.
  2. Resist guesswork: Answers should be “guessproof,” meaning random attempts or trivial brute-force approaches have a negligible chance of success. You should be confident that a person or AI who has found the answer has legitimately reasoned through the underlying mathematics.
  3. Be computationally tractable: The solution of a computationally intensive problem must include scripts demonstrating how to find the answer, starting only from standard knowledge of the field. These scripts must cumulatively run less than a hour on standard hardware.

The participants were divided into groups based on field specificity (number theory, analysis, algebraic geometry, topology/geometry and combinatorics) and told to produce suitable problems.

How did participants contextualize this challenge? In mathematics research one frequently does not know in advance the solution to a given problem, nor whether the problem is computationally tractable. In fact, many mathematicians will agree that knowing a problem is soluble can be game-changing. Moreover, deciding which problems should be deemed worthy of study can be difficult. As a consequence, by and large, participants did not frame the challenge as one of producing research problems, but rather one of simply producing appropriate problems.

Unsurprisingly, ability to construct such problems varied from subject to subject. For example, one geometer said that it was quite difficult to construct “interesting” problems subject to the constraints. There are also real questions about the extent to which “ability to resist guesswork” truly measures “mathematical understanding”. Many participants were rather open about this: even if AI managed to solve the problems they created, they did not feel that would constitute “understanding” in any real sense.

While most participants had written and submitted problems before the symposium started, few people had an idea at that point of what would be “easy” or “hard” for a model. Most of the first day was spent seeing how models interacted with these preliminary problems, and the subsequent discussions refined participants’ understanding of the stipulation that problems were resistant to guesswork. Along the way, models did manage to “solve” some of the problems, but that statement deserves qualification and a more detailed understanding of what constitutes a “solution”.

One key feature of reasoning models was explicit display of “reasoning traces”, showing the models “thinking”. These traces displayed models searching the web and identifying related papers, but their ability to do so was sensitive to the formulation of the problem in fascinating ways. For example, in algebraic geometry, formulating a problem in terms of commutative ring theory instead of varieties could elicit different responses from a model. However, it is a cornerstone of human algebraic geometry to be able to pass back and forth between the two points of view with relative ease. In geometry/topology, participants noted that models demonstrated no aptitude for geometric reasoning. For example, models could not create simple pictorial models (knot diagrams were specifically mentioned) for problems and manipulate them. In algebraic and enumerative combinatorics, models applied standard methods well (e.g., solving linear recurrences, appealing to binomial identities), but if problems required several steps as well as ingenuity models were stymied, even if they were prompted with relevant literature or correct initial steps.

When a model did output a correct answer, examining the reasoning traces sometimes indicated that happened because the problem was constructed in such a way that the answer could be obtained by solving a much simpler but related problem. In terms of the exam solution paradigm, we would probably say such a response was “getting the right
answer for the wrong reason” and assign a failing grade to such a solution!

Participants were routinely told to aim to craft problems that even putative future reasoning models would find difficult. From that standpoint, it was easy to extrapolate that a future model might behave in a more human way, demonstrate “understanding” in a human sense, and isolate the missing key ingredient. This created a pervasive fear that if reasoning traces indicated models seemed “close now”, then one should extrapolate that the problems would be solvable by future models. Participants did observe that if literature in a particular domain was suitably saturated, the models could identify lemmas that would be appropriate and generate relevant mathematics. This was certainly impressive, but one wonders to what extent the natural language output affects perception of the coherence of responses: it is easy for things to “look about right” if one does not read too closely! Eventually, participants did converge on problems that were thought to meet the required bar.

The language models that we worked with were definitely good at keyword search, routinely generating useful lists of references. The models also excelled at natural language text generation and could generate non-trivial code, which made them useful in producing examples. However, press-reporting sometimes exaggerated this, suggesting that reasoning models are “faster” or “better” than professional mathematicians. Of course, such statements are very open to interpretation. On the one hand, this could be trivially true, e.g., calculators are routinely faster than professional mathematicians at adding numbers. Less trivially, it could mean automating complicated algebraic computations, but even this would be viewed by most mathematicians as far from the core essence of mathematical discovery.

The participants at the meeting form a rather thin cross-section of mathematicians who have some interest in the interface between AI (broadly construed) and mathematics. The symposium Signal chat became very active after the Scientific American article was posted. Undoubtedly participants felt there were exciting possible uses of AI for the development of mathematics. There are also real questions about whether or when future “reasoning models” will approach “human-level”competence, as well as serious and fascinating philosophical questions about what that even means; this is a direct challenge for the mathematics community. What does it mean to competently do research mathematics? What is valuable or important mathematics?

Finally, there are important practical questions about the impact, e.g., environmental or geopolitical, of computing at this level. All these questions deserve attention: barring some additional as-yet-unseen theoretical roadblock, reasoning models seem likely to continue improving, underscoring the importance of these questions. As things stand, however—particularly when it comes to mathematical reasoning—caution seems warranted in extrapolating future research proficiency of models.



Source link

interact July Mathematicians Update
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email WhatsApp Copy Link
yhhifa9
admin
  • Website

Related Posts

Physics

Lava Meets Leidenfrost – FYFD

July 18, 2025
Physics

Uncovering the hidden emerging pathogen behind Aspergillosis cases in Japan

July 17, 2025
Physics

Pioneering theoretical physicist Mary K. Gaillard has died at 86

July 15, 2025
IELTS

New IELTS test questions from Japan – July 2025 (Academic Module)

July 15, 2025
Physics

Inside the International Atomic Energy Agency Archives Unit

July 14, 2025
Physics

2023 International Physics & Astronomy Educator Program – Welsh educator funding – Physics and Astronomy outreach

July 13, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

What Is The Easiest Language To Learn? Your Guide And Quiz

June 30, 20255 Views

10 Student Engagement Strategies That Empower Learners –

May 28, 20253 Views

Do You Hear What I Hear? Audio Illusions and Misinformation

May 28, 20253 Views

Improve your speech with immersive lessons!

May 28, 20252 Views
Don't Miss

Kiki’s Faculty-Led Program in Paris

By adminJuly 18, 20250

40 Eager to follow in the footsteps of a college student who studied abroad in…

Am I Able to Study Abroad as an Underclassman? 

July 14, 2025

Wednesday’s Spring Semester in Florence

July 10, 2025

Building a Life Abroad | Study in Ireland

July 9, 2025
Stay In Touch
  • Facebook
  • Twitter
  • Pinterest
  • Instagram
  • YouTube
  • Vimeo

Subscribe to Updates

Please enable JavaScript in your browser to complete this form.
Loading
About Us
About Us

Welcome to Bkngpnarnaul. At Bkngpnarnaul, we are committed to shaping the future of technical education in Haryana. As a premier government institution, our mission is to empower students with the knowledge, skills, and practical experience needed to thrive in today’s competitive and ever-evolving technological landscape.

Our Picks

16 Must-Read Novels for Your Summer TBR Pile

July 18, 2025

This Number System Beats Binary, But Most Computers Can’t Use It

July 18, 2025

Subscribe to Updates

Please enable JavaScript in your browser to complete this form.
Loading
Copyright© 2025 Bkngpnarnaul All Rights Reserved.
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms and Conditions

Type above and press Enter to search. Press Esc to cancel.