Mayur Naik

425 posts

Mayur Naik banner
Mayur Naik

Mayur Naik

@AI4Code

Professor @CIS_Penn. Neurosymbolic AI researcher and educator.

Philadelphia, PA شامل ہوئے Ocak 2019
336 فالونگ2.4K فالوورز
Mayur Naik ری ٹویٹ کیا
Damek
Damek@damekdavis·
Solve harder problems.
English
12
10
111
6.1K
Mayur Naik ری ٹویٹ کیا
a16z
a16z@a16z·
"Not having a coding experience is becoming an advantage." Replit CEO Amjad Masad: "You don't need any development experience. You need grit. You need to be a fast learner." "If you're a good gamer, if you can jump in a game and figure it out really quickly, you're really good at this." "Coders get lost in the details." "Product people, people who are focused on solving a problem, on making money, they're going to be focused on marketing, they're going to be focused on user interface, they're going to be focused on all the right things." "I think this year it's gonna flip, and I think not having a coding background is gonna be more advantageous for the entrepreneur." @amasad with @jackhneel
English
599
521
4.7K
2.5M
Mayur Naik
Mayur Naik@AI4Code·
4. The best uses of AI will come from those who did not create it.
English
0
0
0
107
Mayur Naik
Mayur Naik@AI4Code·
2. Too much CS knowledge can get in the way of trying. Even if I was able to imagine such animations, I would never have believed today's AI can create them for me. 3. Share and learn. Even something as small as helping someone setup up Claude Code could unlock a new capability.
English
1
0
2
178
Mayur Naik
Mayur Naik@AI4Code·
AI this year feels very different. A computer science student in my lab helped a math collaborator setup Claude Code to build a website I was procrastinating on. She built a stunning website with sophisticated math animations within a day. Many lessons here:
English
1
0
7
461
Mayur Naik
Mayur Naik@AI4Code·
Penn-goin! Elated that my amazing daughter Isha was admitted to Penn in the Mechanical Engineering program's Class of 2030! Lots of gratitude to her teachers, our family and friends, my colleagues, and even my students who helped in any way possible. We as a family got to experience the stressful process of US college admissions firsthand. Sending good wishes to others going through it right now. A reminder that admission decisions are never a measure of your worth as a person. thedp.com/article/2025/1… Isha's website with artwork and writings: ishanaikfineart.com #PennEngineeringProud @MEAM_Penn #classof2030
Mayur Naik tweet media
English
12
6
234
20.5K
Mayur Naik ری ٹویٹ کیا
Google Open Source
Google Open Source@GoogleOSS·
Like a deep-sea anglerfish illuminating its surroundings, ESCA "illuminates" an agent's environment with structured scene graphs ( source). This new framework from UPenn, accelerated by JAX, is a game-changer for embodied AI. goo.gle/esca-screen-gr… #AI #JAX #Robotics #Innovation
Google Open Source tweet media
English
0
4
16
2.2K
Mayur Naik
Mayur Naik@AI4Code·
@yanndubs @adxtyahq It seems common to underestimate the role of test-time inference in modern LLMs; I made a post about it: x.com/AI4Code/status…
Mayur Naik@AI4Code

It is easy to overlook but hard to overstate how big a role "test-time inference" has played in modern LLM gains in reasoning. I myself wasn't sure, so I put GPT-5 and Gemini-2.5 to a classic programming puzzle called variable shadowing (en.wikipedia.org/wiki/Variable_…): Both models in "Thinking" mode solved it correctly but took around 2 minutes. The thinking was elaborate (and correct): "I'm currently focused on rewriting the user's C-like code, aiming to replace all instances of 'y' with 'x'. However, I've hit a snag. The crucial part is to avoid this substitution in specific scenarios. " More surprisingly, both models in "Instant" mode got it consistently wrong, giving a glimpse into their pre-training levels of reasoning which seem to have hardly improved over the years if at all. Is there a middle ground which reliably answers this simple question correctly without thinking for too long? The best solution today remains prompt engineering. Appending "Think step by step before answering" to the prompt causes both models to engage in a brief monologue which turns out to be crucial to subsequently generating the correct program. The recent "Auto" mode of LLMs is intended to automatically strike this balance and forgo the need for prompt engineering. But it currently errs on the side of over-thinking, or worse, under-thinking. LLMs will undoubtedly get better at calibrating their "Auto"mode but I am equally excited about emerging architectures like diffusion models, which are not constrained by left-to-right generation and can iteratively refine their answer the way humans typically do.

English
0
0
0
180
Yann Dubois
Yann Dubois@yanndubs·
@adxtyahq The model evaluated in the blog is the thinking model, try that (even at low thinking)
English
6
0
74
4.3K
aditya
aditya@adxtyahq·
btw, GPT-5.2 claims to score 100% on AIME LMAOOOO
aditya tweet media
English
52
3
250
23K
Mayur Naik
Mayur Naik@AI4Code·
✨✨Proud of my research group’s NeurIPS 2025 Spotlight paper on improving zero-shot performance of embodied AI agents using neurosymbolic representations! Come see us at the conference in San Diego, play with our live demo / model / dataset on HuggingFace, and check out our code on GitHub. Led by my amazing student Jiani Huang @jiani_huang_ai who is on the academic job market! Collaborators: - PhD students Neelay Velingker @NeelayV, Mayank Keoliya @KeoliyaMayank - undergrads Amish Sethi @sethi_amish who is applying to PhD programs, Matthew Kuo - former PhD student Ziyang Li @_ziyang_, now faculty at JHU CS @JohnsHopkins #neurosymbolic #embodiedAI #robotics #NeurIPS2025
Penn Engineering AI@PennEngAI

@PennEngineers doctoral student @jiani_huang_ai (@cis_penn) presents ESCA at @NeurIPSConf 2025, a system that helps embodied AI agents better understand their surroundings by creating context-aware descriptions of a scene. Research advised by Professor Mayur Naik (@AI4Code).

English
2
5
12
1.5K
Mayur Naik ری ٹویٹ کیا
Jiani Huang
Jiani Huang@jiani_huang_ai·
Announcing our ✨ NeurIPS’25 Spotlight ✨ paper: ESCA: Contextualizing Embodied Agents via Scene-Graph Generation TLDR: We introduce a framework that grounds multimodal embodied agents in scene graphs, leading to more reliable perception, stronger reasoning, and better actions.
Jiani Huang tweet media
English
1
7
9
547
Mayur Naik ری ٹویٹ کیا
prof-g
prof-g@prof_g·
i've had mixed results in the past, but the new @grok model (grok-4-fast [beta]) is crushing really hard math problems that other models can't handle. it's amazing to me how fast it's improving.
English
4
6
49
4K
Mayur Naik ری ٹویٹ کیا
prof-g
prof-g@prof_g·
my first technical job was a summer job for the computer division of a bank in cleveland ohio. i was 19 and into assembly, pascal, etc. couldn't wait to see what i would be assigned... === 1988 === my job is in the tape library. stand in front of a green-screen-matrix-terminal. wait for a 6-digit number to appear. memorize it. run to tape library & select tape. load it into the mainframe, replacing old tape. read off number of old tape and file correctly. go back to monitor & wait for a new number. at exciting moments, multiple numbers appear. === i was replaced by technology. my hard work rendered useless. none of my skills from then matter to the new tech. i am completely obsolete. & i do not want to do that job ever again :-) === i can't wait for AI to make much of my job obsolete: > writing & grading tests > creating curricula > managing canvas sites > scheduling meetings > serving on committees > formatting papers > looking up references > diagram chasing > writing slide decks > reading resumes & portfolios >>>
English
2
5
57
5.6K
Mayur Naik ری ٹویٹ کیا
Adam Stein
Adam Stein@adamlsteinl·
Announcing our NeurIPS paper: Once Upon an Input: Reasoning via Per-Instance Program Synthesis (PIPS) 📝: arxiv.org/abs/2510.22849 Why do LLMs (and LLM agents) still struggle on hard reasoning problems which should be solvable by writing and executing code? We find that the biggest problem with LLM generated “programs” for reasoning is that they don’t compute anything, they just hardcode the answer! PIPS fixes this by 1️⃣ abstracting the input into symbols, 2️⃣ generating code that maps symbols to the answer, and 3️⃣ refining the code with structural feedback. 🧵👇
Adam Stein tweet media
English
2
8
17
2.2K
Mayur Naik ری ٹویٹ کیا
prof-g
prof-g@prof_g·
@emollick i've spent serious efforts working on developing math problems with an unambiguous (e.g., numerical) answer that gpt-5-pro cannot solve. it is *nontrivial* to do so. it was totally different even 4-6 months ago.
English
12
30
284
263.6K
Mayur Naik
Mayur Naik@AI4Code·
It is easy to overlook but hard to overstate how big a role "test-time inference" has played in modern LLM gains in reasoning. I myself wasn't sure, so I put GPT-5 and Gemini-2.5 to a classic programming puzzle called variable shadowing (en.wikipedia.org/wiki/Variable_…): Both models in "Thinking" mode solved it correctly but took around 2 minutes. The thinking was elaborate (and correct): "I'm currently focused on rewriting the user's C-like code, aiming to replace all instances of 'y' with 'x'. However, I've hit a snag. The crucial part is to avoid this substitution in specific scenarios. " More surprisingly, both models in "Instant" mode got it consistently wrong, giving a glimpse into their pre-training levels of reasoning which seem to have hardly improved over the years if at all. Is there a middle ground which reliably answers this simple question correctly without thinking for too long? The best solution today remains prompt engineering. Appending "Think step by step before answering" to the prompt causes both models to engage in a brief monologue which turns out to be crucial to subsequently generating the correct program. The recent "Auto" mode of LLMs is intended to automatically strike this balance and forgo the need for prompt engineering. But it currently errs on the side of over-thinking, or worse, under-thinking. LLMs will undoubtedly get better at calibrating their "Auto"mode but I am equally excited about emerging architectures like diffusion models, which are not constrained by left-to-right generation and can iteratively refine their answer the way humans typically do.
Mayur Naik tweet mediaMayur Naik tweet mediaMayur Naik tweet media
English
1
0
11
807