Sourya Kakarla

1.2K posts

Sourya Kakarla banner
Sourya Kakarla

Sourya Kakarla

@curious_queue

building something open. inspired by: unix philosophy, etymology. led ML @tryskylink (YC W22; acq. by @AmadeusITGroup). ex-@microsoft | @columbia @iitkgp

Palo Alto/SF انضم Temmuz 2022
589 يتبع948 المتابعون
Sourya Kakarla
Sourya Kakarla@curious_queue·
@DanniFriedland yep!! it's worth investing in meta-systems/habits that make it easy to handle that while agents might help, sometimes non-tech paths like - using pen and paper - meditation can be better to detox and think clearly small errors & misalignment can compound fast
English
1
0
0
4
Danni Friedland
Danni Friedland@DanniFriedland·
Dont give up your mental model of the systems you own. This is what you're getting paid for eventually. You're not a "manager" of agents. You're not a "CTO" of agents. You're a IC using tools. Agents dont hold the mental model for you.
English
3
2
4
49
Sourya Kakarla
Sourya Kakarla@curious_queue·
@julien_c good reasoning + chained tool calling in my testing so far gave me my first taste of fully self-hosted agent that can handle non-trivial multistep task execution x.com/curious_queue/…
Sourya Kakarla@curious_queue

just had a magical open experience - deployed Qwen3.6-35B-A3B on a VM (with an A100 GPU) - set up a hermes agent profile to use this self-hosted model - gave it a task of sending me a whatsapp message (as i wanted to see if it was able to figure out how from my existing stack without receiving any explicit instructions of what was present where) ✅ it took 2-3 interventions from me to guide it in ambiguous situations but got the job done!! - this is the first time i got an openweights model whose inference I was managing end to end to execute a complex task using non-trivial amount of chained tool calls - seeing the reasoning and the trajectory of tool usage definitely felt impressive!! especially considering the baseline i was used to was gpt-5.4 xhigh in codex - tried gemma 4 recently for a similar experiment but its tool usage wasn't polished enough for this kind of open ended procedural discovery + execution (could be just model-tool-harness plumbing issues tbh) - though i've been tracking the general openweights capability progress over last few months passively, actually witnessing the agentic actions run with end-to-end control of the stack was surreal this was my "hello world" of implementing ~sovereign complex agentic actions. long way to go ofc. amped to to dig deep into self-hosted/openweights inference and tuning models/harnesses to be reliable enough to daily drive more. gg @TheAhmadOsman @NousResearch. amazing work on pushing forward the community discourse on open weights/harnesses.

English
0
0
4
134
Sourya Kakarla
Sourya Kakarla@curious_queue·
@odysseus0z yep!! wrote a bit about how the modelling of that eval/verification artifact in the task design is the majority of work these days in response to @karpathy's post: x.com/curious_queue/…
Sourya Kakarla@curious_queue

> 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge) true! also, the actual act of fleshing out the verification artifact that the model can run in a loop (like your autoresearch and @GeoffreyHuntley's ralph loop) takes a very high bar when you consider the overall distribution of LLM/agent use if i want any work to be done by an agent reliably (not just wing/vibe it), my job is now to have a mental model of how to elicit that verification artifact verification by human senses is a bottle neck for the agents to leverage their actual superpower (relative to humans) of running stuff in a loop fast and checking (can be ofc scaled with parallel like in the recent claude mythos-glasswing project) while the actual implementation act of eliciting the verification artifact is made easy by the agents by taking care of the grunt work of writing code, knowing how to elicit that is still a lot of skillful mental work that acts as a filter for people feeling the AGI autonomous task execution by agents needs to be powered by verification engineering by humans like all the previous transitions (punch cards -> assembly -> C -> Java/Python -> Agents), feels like we are moving higher up in the abstractions and there is always *something* to be done by humans feel free to roast me if i got anything wrong :p would love to learn from the sensei :)

English
1
0
5
256
Sourya Kakarla
Sourya Kakarla@curious_queue·
here's my codex skills folder: github.com/ma08/botfiles/… listing out some of my favourites: - all task related ones (`start-new-tast`, `save-task-status`, `continue-task`): make it easy to have task-specific context files like status document, intermediate artifacts organized in date based directories for easy human review and make it easy for another agent to pick up from - cross session ones (`cross-session-context`, `cross-session-message`): make it easy for parallel sessions to know about each other's state and message each other. useful for 1 orchestrator session and N executor sessions pattern. - `message-developer`: whatsapp me when something needs my attention during a long running task (not a deterministic hook; agent decision decision driven) - `sync-codex-claude-skills`: makes it easy to sync skills across codex and claude - `oracle`: consult gpt-5.4-pro using github.com/steipete/oracle for complex situations - `deep-research`: use a combination of deep research apis from openai, gemini, exa to do a generate a comprehensive deep research report from these multiple sources - `grill-me`: a customized version of the famous grill me skill from @mattpocockuk. mine uses the interactive `request_user_input` tool that provides different choices to pick from interactively with a recommended option including rationale of why recommended etc. - `ralph`: make it easy to create artifacts like spec files and ralph script to run a long running ralph loop (h/t @GeoffreyHuntley) for async implementation with self-verification - `update-coding-agent-preferences`: make it easy to update both CLAUDE.md and AGENTS.md together some of these could be a bit raw and upolished as i haven't cleaned them up for public launch/use yet
English
0
0
15
1.1K
neural nets.
neural nets.@cneuralnetwork·
send some good codex skills. md
English
13
3
290
18.1K
Sourya Kakarla
Sourya Kakarla@curious_queue·
just had a magical open experience - deployed Qwen3.6-35B-A3B on a VM (with an A100 GPU) - set up a hermes agent profile to use this self-hosted model - gave it a task of sending me a whatsapp message (as i wanted to see if it was able to figure out how from my existing stack without receiving any explicit instructions of what was present where) ✅ it took 2-3 interventions from me to guide it in ambiguous situations but got the job done!! - this is the first time i got an openweights model whose inference I was managing end to end to execute a complex task using non-trivial amount of chained tool calls - seeing the reasoning and the trajectory of tool usage definitely felt impressive!! especially considering the baseline i was used to was gpt-5.4 xhigh in codex - tried gemma 4 recently for a similar experiment but its tool usage wasn't polished enough for this kind of open ended procedural discovery + execution (could be just model-tool-harness plumbing issues tbh) - though i've been tracking the general openweights capability progress over last few months passively, actually witnessing the agentic actions run with end-to-end control of the stack was surreal this was my "hello world" of implementing ~sovereign complex agentic actions. long way to go ofc. amped to to dig deep into self-hosted/openweights inference and tuning models/harnesses to be reliable enough to daily drive more. gg @TheAhmadOsman @NousResearch. amazing work on pushing forward the community discourse on open weights/harnesses.
Sourya Kakarla tweet mediaSourya Kakarla tweet media
0xSero@0xSero

New best local model for y'all 16GB-64GB rejoice, the chosen one has arrived. huggingface.co/Qwen/Qwen3.6-2…

English
1
0
6
386
Sourya Kakarla أُعيد تغريده
0xSero
0xSero@0xSero·
New best local model for y'all 16GB-64GB rejoice, the chosen one has arrived. huggingface.co/Qwen/Qwen3.6-2…
English
70
133
2.6K
183K
Sourya Kakarla
Sourya Kakarla@curious_queue·
@BoyuanChen0 @OpenAIDevs @tszzl when i prompted it to draw etymology tree for రూపం (rūpaṁ), it actually did great in using ప (pa) over వ (va) might've been a resolution thing in the above indic languages image i have a feeling i will abuse this model for etymology stuff sauce: chatgpt.com/share/69e8d623…
Sourya Kakarla tweet media
English
0
0
1
28
Sourya Kakarla
Sourya Kakarla@curious_queue·
wow! my botfiles repo is - not polished for public use - highly custom for my flows but got - 38 unique clones in 14d - 2 stars without ever "launching" it. was just linking its files as citations in agentic engineering threads. shows how unreal demand is for good agent use.
Sourya Kakarla tweet media
English
0
0
3
169
Sourya Kakarla
Sourya Kakarla@curious_queue·
@rsjagarlamudi lol momentarily yea it's aight, value provided will return and compound eventually like krishna bro says: just focus on doing the deed and leave the result to bro
English
0
0
2
25
Sourya Kakarla
Sourya Kakarla@curious_queue·
mfers that bookmark a post but don't like it are a special breed
English
1
0
5
151
Sourya Kakarla أُعيد تغريده
George Pickett
George Pickett@georgepickett·
The 6th CodexSF meetup is happening April 28th at @WorkOS! If you're in SF, come by! I've massively evolved my Codex workflows in the last few months and I'm excited to share the latest. There are 4 demo slots available - if you want to demo your Codex workflows, RSVP below!
English
3
2
7
387
Sourya Kakarla
Sourya Kakarla@curious_queue·
pro-tip for anyone that wants to use gpt-5.4-pro, the most powerful model out there (that's open to use), for expert consultation in your typical codex/claude code/hermes/openclaw flows: use github.com/steipete/oracle with a dedicated skill
Sourya Kakarla@curious_queue

@0xSero @steipete's oracle CLI has been working great for me in consulting gpt-5.4-pro: github.com/steipete/oracle here's the skill I use to make codex play well with it: github.com/ma08/botfiles/…

English
0
0
2
197
kasey
kasey@kaseyklimes·
looking to learn if/how people are using hermes/openclaw agents at work, if you’re experimenting here let’s chat
English
4
2
10
1.1K
Sourya Kakarla
Sourya Kakarla@curious_queue·
@kaseyklimes hermes discord form factor helps me voice chat with the most powerful models like getting on a call with a teammate gg @NousResearch x.com/curious_queue/…
Sourya Kakarla@curious_queue

definitely, right now i've resorted to using the traditional cascade pipeline to be able to use the gpt-5.4 model i was okay with the increased latency to be able to chat with gpt-5.4 (cached inputs and its dynamic reasoning depth make it reasonably fast tbh) the discord voice channel experience in hermes is pretty good for this: #voice-messages" target="_blank" rel="nofollow noopener">hermes-agent.nousresearch.com/docs/user-guid… i had to fork it to add support for the STT provider i wanted though (kudos to @NousResearch's hermes agent that it was able modify its own code to make this happen) we can do much better than the cascade of STT->LLM->TTS by using some novel architectures like the oracle/supervisor that can bridge realtime audio experience with deep reasoning horsepower

English
0
0
0
36