Sourya Kakarla

@kylegawley skill issue of systems thinking every degree of bad system design compounds enshitification x.com/SahilBloom/sta…

QME

4

Sourya Kakarla@curious_queue·1h

@DanniFriedland yep!! it's worth investing in meta-systems/habits that make it easy to handle that while agents might help, sometimes non-tech paths like - using pen and paper - meditation can be better to detox and think clearly small errors & misalignment can compound fast

English

0

4

Danni Friedland@DanniFriedland·2h

Dont give up your mental model of the systems you own. This is what you're getting paid for eventually. You're not a "manager" of agents. You're not a "CTO" of agents. You're a IC using tools. Agents dont hold the mental model for you.

English

3

2

4

49

Sourya Kakarla@curious_queue·3h

@julien_c good reasoning + chained tool calling in my testing so far gave me my first taste of fully self-hosted agent that can handle non-trivial multistep task execution x.com/curious_queue/…

just had a magical open experience - deployed Qwen3.6-35B-A3B on a VM (with an A100 GPU) - set up a hermes agent profile to use this self-hosted model - gave it a task of sending me a whatsapp message (as i wanted to see if it was able to figure out how from my existing stack without receiving any explicit instructions of what was present where) ✅ it took 2-3 interventions from me to guide it in ambiguous situations but got the job done!! - this is the first time i got an openweights model whose inference I was managing end to end to execute a complex task using non-trivial amount of chained tool calls - seeing the reasoning and the trajectory of tool usage definitely felt impressive!! especially considering the baseline i was used to was gpt-5.4 xhigh in codex - tried gemma 4 recently for a similar experiment but its tool usage wasn't polished enough for this kind of open ended procedural discovery + execution (could be just model-tool-harness plumbing issues tbh) - though i've been tracking the general openweights capability progress over last few months passively, actually witnessing the agentic actions run with end-to-end control of the stack was surreal this was my "hello world" of implementing ~sovereign complex agentic actions. long way to go ofc. amped to to dig deep into self-hosted/openweights inference and tuning models/harnesses to be reliable enough to daily drive more. gg @TheAhmadOsman @NousResearch. amazing work on pushing forward the community discourse on open weights/harnesses.

English

New best local model for y'all 16GB-64GB rejoice, the chosen one has arrived. huggingface.co/Qwen/Qwen3.6-2…

4

134

Julien Chaumond@julien_c·1d

What’s the vibe check on this?

0xSero@0xSero

English

15

0

46

17.4K

Sourya Kakarla@curious_queue·9h

@odysseus0z yep!! wrote a bit about how the modelling of that eval/verification artifact in the task design is the majority of work these days in response to @karpathy's post: x.com/curious_queue/…

> 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge) true! also, the actual act of fleshing out the verification artifact that the model can run in a loop (like your autoresearch and @GeoffreyHuntley's ralph loop) takes a very high bar when you consider the overall distribution of LLM/agent use if i want any work to be done by an agent reliably (not just wing/vibe it), my job is now to have a mental model of how to elicit that verification artifact verification by human senses is a bottle neck for the agents to leverage their actual superpower (relative to humans) of running stuff in a loop fast and checking (can be ofc scaled with parallel like in the recent claude mythos-glasswing project) while the actual implementation act of eliciting the verification artifact is made easy by the agents by taking care of the grunt work of writing code, knowing how to elicit that is still a lot of skillful mental work that acts as a filter for people feeling the AGI autonomous task execution by agents needs to be powered by verification engineering by humans like all the previous transitions (punch cards -> assembly -> C -> Java/Python -> Agents), feels like we are moving higher up in the abstractions and there is always *something* to be done by humans feel free to roast me if i got anything wrong :p would love to learn from the sensei :)

English

0

5

256

George@odysseus0z·1d

I told you guys. RL env = harness engineering = agentic coding = autoresearch. all are about eval/task design.

Ryan Lopopolo@_lopopolo

A neat thing we’ve been experimenting with: Codex workout sessions. Getting Codex to close the loop and validate its work is critical for higher complexity changes. To do that we want skills for high level workflows: “log in”, “upload file attachments and start a chat”, “grant this group access to a Workplace Agent”. To do this reliably, we’ve been getting Codex to iterate on its own skills by planting “flags” CTF-style in the UI and ralphing Codex using automations in the app, making commits to iteratively refine the skills after self reflection on each attempt. Capturing the flag is the win condition and from there codex optimized for reliability, wall clock time, and keeping up to the changing codebase. Put in the reps with your agents!

English

4

5

160

17.3K

Sourya Kakarla@curious_queue·10h

here's my codex skills folder: github.com/ma08/botfiles/… listing out some of my favourites: - all task related ones (`start-new-tast`, `save-task-status`, `continue-task`): make it easy to have task-specific context files like status document, intermediate artifacts organized in date based directories for easy human review and make it easy for another agent to pick up from - cross session ones (`cross-session-context`, `cross-session-message`): make it easy for parallel sessions to know about each other's state and message each other. useful for 1 orchestrator session and N executor sessions pattern. - `message-developer`: whatsapp me when something needs my attention during a long running task (not a deterministic hook; agent decision decision driven) - `sync-codex-claude-skills`: makes it easy to sync skills across codex and claude - `oracle`: consult gpt-5.4-pro using github.com/steipete/oracle for complex situations - `deep-research`: use a combination of deep research apis from openai, gemini, exa to do a generate a comprehensive deep research report from these multiple sources - `grill-me`: a customized version of the famous grill me skill from @mattpocockuk. mine uses the interactive `request_user_input` tool that provides different choices to pick from interactively with a recommended option including rationale of why recommended etc. - `ralph`: make it easy to create artifacts like spec files and ralph script to run a long running ralph loop (h/t @GeoffreyHuntley) for async implementation with self-verification - `update-coding-agent-preferences`: make it easy to update both CLAUDE.md and AGENTS.md together some of these could be a bit raw and upolished as i haven't cleaned them up for public launch/use yet

English

New best local model for y'all 16GB-64GB rejoice, the chosen one has arrived. huggingface.co/Qwen/Qwen3.6-2…

15

1.1K

neural nets.@cneuralnetwork·13h

send some good codex skills. md

English

13

3

290

18.1K

Sourya Kakarla@curious_queue·23h

just had a magical open experience - deployed Qwen3.6-35B-A3B on a VM (with an A100 GPU) - set up a hermes agent profile to use this self-hosted model - gave it a task of sending me a whatsapp message (as i wanted to see if it was able to figure out how from my existing stack without receiving any explicit instructions of what was present where) ✅ it took 2-3 interventions from me to guide it in ambiguous situations but got the job done!! - this is the first time i got an openweights model whose inference I was managing end to end to execute a complex task using non-trivial amount of chained tool calls - seeing the reasoning and the trajectory of tool usage definitely felt impressive!! especially considering the baseline i was used to was gpt-5.4 xhigh in codex - tried gemma 4 recently for a similar experiment but its tool usage wasn't polished enough for this kind of open ended procedural discovery + execution (could be just model-tool-harness plumbing issues tbh) - though i've been tracking the general openweights capability progress over last few months passively, actually witnessing the agentic actions run with end-to-end control of the stack was surreal this was my "hello world" of implementing ~sovereign complex agentic actions. long way to go ofc. amped to to dig deep into self-hosted/openweights inference and tuning models/harnesses to be reliable enough to daily drive more. gg @TheAhmadOsman @NousResearch. amazing work on pushing forward the community discourse on open weights/harnesses.

0xSero@0xSero

English

0

6

386

Sourya Kakarla أُعيد تغريده

0xSero@0xSero·1d

New best local model for y'all 16GB-64GB rejoice, the chosen one has arrived. huggingface.co/Qwen/Qwen3.6-2…

English

70

133

2.6K

183K

Sourya Kakarla@curious_queue·1d

@BoyuanChen0 @OpenAIDevs @tszzl when i prompted it to draw etymology tree for రూపం (rūpaṁ), it actually did great in using ప (pa) over వ (va) might've been a resolution thing in the above indic languages image i have a feeling i will abuse this model for etymology stuff sauce: chatgpt.com/share/69e8d623…

English

1

28

Sourya Kakarla@curious_queue·1d

@BoyuanChen0 @OpenAIDevs also cc @tszzl gaaru x.com/tszzl/status/1…

roon@tszzl

@madhuri_p_ @krishnabtwtr despite being like 100% white washed i still speak really good telugu

Deutsch

0

3

99

Sourya Kakarla@curious_queue·1d

The ChatGPT Images 2.0 release looks pretty impressive! Found a minor mistake in Telugu portion of the Indic languages image at openai.com/index/introduc… It's supposed to be రూపం with a gap in the second letter (en.wiktionary.org/wiki/%E0%B0%B0…) Not రూవం cc @BoyuanChen0 @OpenAIDevs

OpenAI@OpenAI

Stronger Across Languages ChatGPT Images 2.0 can produce images with non-English text that’s not only rendered correctly but with language that flows coherently. This makes the model more globally useful and helps people create visuals that work in the languages they actually use.

English

0

6

432

Sourya Kakarla@curious_queue·2d

maximizing tweet lurker value

mfers that bookmark a post but don't like it are a special breed

English

1

85

Sourya Kakarla@curious_queue·2d

wow! my botfiles repo is - not polished for public use - highly custom for my flows but got - 38 unique clones in 14d - 2 stars without ever "launching" it. was just linking its files as citations in agentic engineering threads. shows how unreal demand is for good agent use.

English

3

169

Sourya Kakarla@curious_queue·3d

@rsjagarlamudi lol momentarily yea it's aight, value provided will return and compound eventually like krishna bro says: just focus on doing the deed and leave the result to bro

English

2

25

Radha S Jagarlamudi@rsjagarlamudi·3d

@curious_queue That stings!

English

0

1

26

Sourya Kakarla@curious_queue·3d

mfers that bookmark a post but don't like it are a special breed

English

0

5

151

Sourya Kakarla@curious_queue·3d

@georgepickett 🤘

QME

1

14

George Pickett@georgepickett·3d

@curious_queue Awesome! See you there!

English

0

1

19

Sourya Kakarla أُعيد تغريده

George Pickett@georgepickett·3d

The 6th CodexSF meetup is happening April 28th at @WorkOS! If you're in SF, come by! I've massively evolved my Codex workflows in the last few months and I'm excited to share the latest. There are 4 demo slots available - if you want to demo your Codex workflows, RSVP below!

English

3

2

7

387

Sourya Kakarla@curious_queue·3d

@georgepickett just RSVPed :)

English

0

1

17

George Pickett@georgepickett·3d

luma.com/yn5ysjp2

ZXX

0

1

120

Sourya Kakarla@curious_queue·3d

pro-tip for anyone that wants to use gpt-5.4-pro, the most powerful model out there (that's open to use), for expert consultation in your typical codex/claude code/hermes/openclaw flows: use github.com/steipete/oracle with a dedicated skill

@0xSero @steipete's oracle CLI has been working great for me in consulting gpt-5.4-pro: github.com/steipete/oracle here's the skill I use to make codex play well with it: github.com/ma08/botfiles/…

English

2

197

kasey@kaseyklimes·3d

looking to learn if/how people are using hermes/openclaw agents at work, if you’re experimenting here let’s chat

English

4

2

10

1.1K

Sourya Kakarla@curious_queue·3d

@kaseyklimes hermes discord form factor helps me voice chat with the most powerful models like getting on a call with a teammate gg @NousResearch x.com/curious_queue/…

definitely, right now i've resorted to using the traditional cascade pipeline to be able to use the gpt-5.4 model i was okay with the increased latency to be able to chat with gpt-5.4 (cached inputs and its dynamic reasoning depth make it reasonably fast tbh) the discord voice channel experience in hermes is pretty good for this: #voice-messages" target="_blank" rel="nofollow noopener">hermes-agent.nousresearch.com/docs/user-guid… i had to fork it to add support for the STT provider i wanted though (kudos to @NousResearch's hermes agent that it was able modify its own code to make this happen) we can do much better than the cascade of STT->LLM->TTS by using some novel architectures like the oracle/supervisor that can bridge realtime audio experience with deep reasoning horsepower

English

@kaseyklimes x.com/curious_queue/…

36

Sourya Kakarla@curious_queue·3d

@hargup13 @Miles_Brundage custom cron jobs in hermes that supervise claude code/codex sessions and contextually nudge them forward worked well for me to battle this im sure this can be polished more though x.com/curious_queue/…

QME