Murali Manohar

863 posts

Murali Manohar

Murali Manohar

@gitlostmurali

AI something @AlephAlpha. Post training models. Built agentic chat & evals

Katılım Ağustos 2018
1.7K Takip Edilen174 Takipçiler
Murali Manohar
Murali Manohar@gitlostmurali·
Ending this year with a blog on RL environments: gitlostmurali.com/rl-environments Talks about reward hacking, sandboxing, curriculum learning, tool calling - all the stuff that can break when you actually try to train agents
English
0
0
0
107
Murali Manohar retweetledi
Piotr Mazurek (at MLSys 🇺🇸)
RL is cool, but what do you actually need to know about hardware and infra to predict its future? Check out our new piece on tensoreconomics:
Piotr Mazurek (at MLSys 🇺🇸) tweet media
English
4
18
78
19.9K
Rishu Kumar
Rishu Kumar@rishdotuk·
I need to find a recommendation for the next Fictional read. Read too many non-fic and now I want some change.
English
1
0
0
80
Murali Manohar
Murali Manohar@gitlostmurali·
Thanks! huggingface.co/datasets/Jiayi… It's a countdown task - "Use every given number exactly once to build an equation that hits the target." As a next step, I wanted the small model to decide if it has to forward the equation or solve on its own. I introduced latency as a negative reward but it didn't work quite well. And then I went to other projects.
English
0
0
0
17
Brendan Hogan
Brendan Hogan@brendanh0gan·
@gitlostmurali oh wow yeah this looks very similar! nice work! what task is this? And yeah I noticed this too when I was doing math problems I defeated how much of the real question to give to the small model
English
1
0
1
21
Brendan Hogan
Brendan Hogan@brendanh0gan·
doing this now for my debate framework: gpt4.1 vs gpt4.1 advised by qwen 3B gpt4.1 w qwens advice debates itself in elo/tournament style to get advantage advantage is used to grpo qwen to give better advice you can fine tune api models with rl'd context
Brendan Hogan tweet media
Brendan Hogan@brendanh0gan

big models are great agents but often too big, closed, or delicate to fine-tune idea: train a small model to craft context for a frozen big model, score the big model's outputs, use that as reward for the small one grpo for context tuning. more below

English
6
1
81
6.5K
Murali Manohar
Murali Manohar@gitlostmurali·
An interesting thing happened when the remote model is swapped with a bad quality 7B LLM. After failing to get the remote model to provide right equations, the local model started providing hints in its prompt for the remote model. In a few instances, the local model started solving the problem and just passed the equation in remote model's prompt for the remote model to just **forward** the equation.
English
1
0
1
27
Murali Manohar
Murali Manohar@gitlostmurali·
@rishdotuk Ouch! I like the personality. Imagine having this as the PR reviewer before one submits it to the team
English
1
0
1
36
Rishu Kumar
Rishu Kumar@rishdotuk·
I asked Claude about the vibes of my code, and I got cooked. "I'm following a tutorial but adapting it badly." I should cry myself to sleep. :D
Rishu Kumar tweet media
English
1
0
1
95
Murali Manohar
Murali Manohar@gitlostmurali·
@goyal__pramod Always. Visualizations are rewarded since humans are visual creatures. To speed this process, you can try these: 1. Claude renders diagrams with Artifacts feature. 2. Try asking AI for mermaid versions and later manually convert to excalidraw or ilk
English
1
0
12
847
Pramod Goyal
Pramod Goyal@goyal__pramod·
Time spent writing -> 30 min Time spent creating visualization -> 3 hours
Pramod Goyal tweet media
English
24
22
533
41.3K
Murali Manohar
Murali Manohar@gitlostmurali·
@sh_reya @HamelHusain Thank you! I'll follow up over email to discuss the budget and see what can work. Appreciate the flexibility 🙏 Thanks again!
English
0
0
1
109
Shreya Shankar
Shreya Shankar@sh_reya·
@gitlostmurali @HamelHusain What is your corporate learning + development budget? Email Hamel at hh@parlance-labs.com and we will work something out. We do not want cost to be a barrier; we have a steep listed price so we can attract only those who are actually serious about building AI products
English
3
0
3
962
Murali Manohar
Murali Manohar@gitlostmurali·
@tanay_mehta @doSwayamExist Ouch. How pathetic!! This incident and intern-shaming Apple's latest paper is deeply concerning. This is classism in a different form. This can only change if indian teachers/ professors are "okay" with removing Sir/Madam. And how cool is it to say, Prof. Sanyal/Prof. Chawla?!
English
0
0
0
106
Tanay Mehta
Tanay Mehta@tanay_mehta·
@doSwayamExist Bullying juniors with manner-policing like this is a way for some people to get the high they couldn’t get from actually doing something worthwhile
English
1
0
43
707
Swayam Gupta
Swayam Gupta@doSwayamExist·
Bro is in 12th and calling a person who is almost 5 years older than him "dude"?
Swayam Gupta tweet media
English
536
43
1.8K
319.6K
Sam Altman
Sam Altman@sama·
codex gets access to the internet today! it is off by default and there are complex tradeoffs; people should read about the risks carefully and use when it makes sense. also, we are making in available in the chatgpt plus tier.
English
495
610
9.5K
1.3M
Murali Manohar
Murali Manohar@gitlostmurali·
The linter is still complaining. Let me try a different approach by adding type ignores Claude models (3.7 & 4) are ruthless
English
0
0
1
123
Murali Manohar
Murali Manohar@gitlostmurali·
@t_blom I came across folks who said: "If you find the clients and take care of the tech stuff, you'll be the co-founder". How dependable can a person be!!?
English
0
0
1
95
Tom Blomfield
Tom Blomfield@t_blom·
“I’ll invest if you find a lead” is the single lamest thing an investor can say.
English
143
135
2.6K
272.9K
Murali Manohar
Murali Manohar@gitlostmurali·
Cursor Code is exactly what I want my cursor to be. Precise edits and incremental.
English
0
0
0
97