
asfaan murthyulas
279 posts

asfaan murthyulas
@AMurthyulas
Ask stupid question-Think impossible-Do step by step.
Katılım Ocak 2023
392 Takip Edilen2 Takipçiler

@tunguz Use all thinking patterns that created aha moments in great people who solved impossible problems.
English

@stats_feed His blood circulated without friction and organs worked without wear and tear.
English

@andrewwhite01 Generalization of solar system. As above so below thinking.
English

@AllenInstitute May be their function is same. Like gestalt unification.
English

Neurons don't connect randomly.
In this video by our #ElectronMicroscopy team, a blue neuron's axon forms a connection to a far neuron. Along the way, it links to some neighbors while skipping thousands. Connectomics seeks to understand what makes those connections special.
English

@_YifanGao What about punishment does it travel back to predictable cue.?
English

New paper!🧠We showed that the Reward Positivity backpropagates from feedback to predictive cues during reinforcement learning — first time this has been demonstrated with noninvasive EEG in humans! Huge thanks to my mentor and my co-authors! Open access: doi.org/10.1111/psyp.7…

English

@PrinceDavies55 @jon_d_doe Claude Mythos is doing all patch works in cyber security.
English

@jon_d_doe "Meta laid off 8,000 and shifted 7,000 into AI.
You know what that means for cybersecurity?
More AI making decisions. Fewer humans catching the mistakes. More attack surface. More risk.
Upskill or get left behind."
English

@bitcoinmalayali Fine-tuning open source is all you need.
English


@dilipjain077 Evolved for of free food and free procreation.
English

@JaibyGeorge5979 വളം യുദ്ധം കഴിഞ്ഞാൽ എത്തും, പക്ഷെ പറഞ്ഞത് തിരിച്ചെടുക്കാൻ ആവില്ല.
മലയാളം

@AutismCapital He is a great guy. How can you add dirty smile. 🤌
English

@manoramanews America is a luxuries country they are proud about. Certainly They will find a way to stay.
English

'60 ദിവസത്തിനുള്ളില് പുതിയ ജോലി കണ്ടുപിടിച്ചോ, അല്ലെങ്കില് യുഎസ് വിടണം'! ചങ്കിടിച്ച് ഇന്ത്യന് ടെക...
Read more at: manoramanews.com/gulf-and-globa… #us #india

മലയാളം

@_avichawla Every feature in path to reward become good indicators to maximize.
English

Karpathy's prediction about RL is coming true now!
He called reward functions unreliable and argued that a single reward number is too low-dimensional to teach an agent what "good" means for complex tasks. To solve this, Agents need a knowledge-guided review as a higher-dimensional feedback channel.
Every major AI lab trains models with RL today (OpenAI, Anthropic, DeepSeek).
And their key bottleneck has always been the reward functions.
GRPO by DeepSeek worked well for math and code because the environment gave a binary signal.
But for real agent tasks, someone still has to hand-code the scoring function. That takes days and breaks every time the pipeline changes.
RULER (implemented in OpenPipe ART, 10k stars) addresses the exact problem Karpathy identified.
The reward criteria are defined in plain English, and an LLM evaluates each trajectory against that description to provide feedback for training.
I trained a Qwen3 1.4B agent that plays 2048 using GRPO with this exact workflow.
In this case, the agent saw the board, picked a direction, and RULER evaluated the outcome, all from this natural language definition.
You can see the full implementation on GitHub and try it yourself.
Here's the ART Repo: github.com/OpenPipe/ART
(don't forget to star it ⭐ )
Just like RLHF replaced manual rankings and GRPO replaced the critic model, natural language rewards are replacing hand-coded scoring functions.
RL reward engineering is now prompt engineering.
I wrote a full walkthrough covering RL for LLM agents, from RLHF to GRPO to RULER, in the article below.
Avi Chawla@_avichawla
English

@SkyNews Its Same as kicking out aged zoo animals to forest.
English

UK-based bank to replace 'lower-value human capital' with AI
trib.al/COxXrSi
English

@dwarkesh_sp @ericjang11 I think scaling can solve this problem. Leela zero uses transformer architecture. Cnn and rnn have proximity bias but scaled transformer can reach cnn and rnn level performance. Biases can save lots of compute and data( comes with cost of reasoning beyond bias)
English

.@ericjang11 tried using transformers for his Go bot, but they couldn't beat ResNets.
The reason gets at something general about architectures.
ResNets are biased towards the local. Nearby things matter more, and a useful pattern in one place is a useful pattern anywhere.
Transformers are biased the other way, towards global context, with every position able to attend to every other.
Most Go fighting is local, and a useful local pattern learned in one position can be applied anywhere in the board.
A ResNet's inductive bias means it gets these insights about Go for free. But a transformer has to pay for them.
English

@thePartyPartyUS @voooooogel They may contain quality reasoning tokens.
English

@voooooogel not publishing the unabridged is near criminal
English

unfortunately openai didn't publish the unsummarized chain of thought, but the summary is 125 pages!
the model reaches the crucial idea (which it describes as 'frightening,' i would love to read the unabridged chain of thought here...) on page 39

OpenAI@OpenAI
Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946. For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids. An OpenAI model has now disproved that belief, discovering an entirely new family of constructions that performs better. This marks the first time AI has autonomously solved a prominent open problem central to a field of mathematics.
English















