Ahmad
315 posts

Ahmad
@chahmad____
CS Student @ FAST | Python • ML Learning AI |projects
Katılım Temmuz 2022
156 Takip Edilen79 Takipçiler

Massive news for AI's Math power.
GPT-5.2 Pro set a new record on FrontierMath Tier 4.
Solved 15 of 48 problems for 31% accuracy, up from the prior best 19%.
FrontierMath Tier 4 has been the toughest math benchmark for AI so far,
On this run, GPT-5.2 Pro also did better on the held-out subset than the non-held-out subset, i.e. no overfitting.
Tier 4 has been incredibly tough for models, since only 13 of its 48 problems had ever been solved before this run.
To sense the difficulty of this test, these are not short calculation puzzles, and several newly-solved items came from topology, geometry, number theory, and analytic combinatorics.
Epoch reports it ran the test manually on ChatGPT because its API scaffold hit timeouts.
GPT-5.2 Pro solved 10 of 20 held-out problems for 50% versus 5 of 28 non-held-out problems for 18%, so there was no evidence of score inflation from memorized solutions.
One reviewer said it recognized the geometry of a polynomial-defined surface and still solved a harder point-cloud version.
A remaining failure pattern is making a plausible assumption without proving it, which can be the whole crux in research math.
imo, we are not far away to saturate this, the toughtest math benchmark as well.

Epoch AI@EpochAIResearch
New record on FrontierMath Tier 4! GPT-5.2 Pro scored 31%, a substantial jump over the previous high score of 19%. Read on for details, including comments from mathematicians.
English

@rohanpaul_ai Just in both are best in their own way
In my opinion as a student gpt explains best while claude provides better solutions
English

Jensen Huang loves Claude and ChatGPT.
"Claude is incredible. Anthropic has made huge progress, a massive leap, in developing Claude. We use it all over our company. The coding capability, the reasoning capability, its overall ability is genuinely impressive.
Anybody who has a software company really ought to get involved and use it. On the other hand, ChatGPT is probably the most successful consumer AI in history. Its ease of use and approachability mean everybody should get involved.
Whether it’s someone in a developing country or a student, it’s very clear that learning how to use AI is essential. You need to know how to direct an AI, prompt an AI, manage an AI, guardrail an AI, and evaluate an AI."
---
From 'World Economic Forum' YT channel
English

@rohanpaul_ai Such amounts of calls now need more complex systems to handle or there would be a new competitor we will see soon
English

Google’s Gemini API traffic reportedly doubled in 5 months
Internal Google data shows Gemini API calls count surged from ~35B in March 2025 to ~85B in August 2025;
Google says Gemini Enterprise has hit 8M subscribers
That jump is also reported to pull extra spend into adjacent Google Cloud products like storage and databases, not just the model endpoint itself.
Google is also betting on packaged enterprise software, where Gemini Enterprise sits closer to company data through connectors and permissions-aware internal search.
---
theinformation .com/articles/googles-gemini-sees-skyrocketing-business-sales

English

@rohanpaul_ai Ig we would have human replica robots by Nvidia by 2030 or sooner
English
Ahmad retweetledi

Nvidia is focusing a lot on physical AI in 2026.
NVIDIA Omniverse@nvidiaomniverse
What is the next generation of AI? Physical AI. 🦾 AI is no longer just about text generation. It is multimodal and multi-domain, able to see, hear, and reason across language, vision, video, biology, and chemistry, reshaping the world around us. Watch the full #CES2026 montage to see physical AI in action: nvda.ws/49jTVbJ
English
Ahmad retweetledi
Ahmad retweetledi

@aditiitwt C++ and fun fact I survived coding a game in assembly language in my last semester thanks to claude
English


















