( ✦﹏✦)

4.5K posts

( ✦﹏✦) banner
( ✦﹏✦)

( ✦﹏✦)

@SynthSquid

idea collector

เข้าร่วม Eylül 2024
421 กำลังติดตาม88 ผู้ติดตาม
( ✦﹏✦) รีทวีตแล้ว
Viv
Viv@Vtrivedy10·
GEPA <1 years old 😮 incredible the impact that the ideas here have spawned on hill climbing + improving agents does anyone know of cool work on looping/GEPA/Optimize_Anything + RL? main ideas: - eventually harness opt hits the wall of model intelligence - we can break through that wall by RLing on good evals that increase model ability in the eval domains - new weights shape intelligence where an updated harness can better use these new weights - loop Model-Harness codesign is really interesting, we’re pushing here much more with using traces to create datasets for self-improvement and there’s some interesting work to do in marrying Harness Eng and RL recipes here 👀
Lakshya A Agrawal@LakshyAAAgrawal

How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵

English
4
8
32
5.5K
( ✦﹏✦) รีทวีตแล้ว
ρ:ɡeσn
ρ:ɡeσn@pigeon__s·
if i see a single person say "erm they didnt compare against opus 4.7 or gpt-5.5" i will actually kill you these are insane numbers
ρ:ɡeσn tweet media
English
0
1
20
2.1K
( ✦﹏✦) รีทวีตแล้ว
◥◣ N I C O L E T T E ◥◣
you aren't ready for what CODEX 5.5 + GPT 2 just unlocked you can vibe code a complete pokemon game, with all CC0 assets (mine is @moggmon) > Codex 5.5 coded all UI + battle logic > GPT 2 generated all sprites, animation, SFX ask me anything about the process #vibejam
English
37
31
378
26.8K
( ✦﹏✦) รีทวีตแล้ว
Vals AI
Vals AI@ValsAI·
DeepSeek v4 is now the #1 open-weight model on our Vibe Code Benchmark, and it’s not close. It leaves the #2 (Kimi K2.6) in the dust, and even beats out frontier closed source models like Gemini 3.1 Pro.
Vals AI tweet media
English
25
112
1K
82.8K
( ✦﹏✦) รีทวีตแล้ว
Lotto
Lotto@LottoLabs·
Full benchmarks for DS4 vs DS3.2 Flash seems very solid
Lotto tweet media
English
3
2
28
1.4K
( ✦﹏✦) รีทวีตแล้ว
DeepSeek
DeepSeek@deepseek_ai·
🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n
DeepSeek tweet media
English
878
3.8K
21K
1.9M
( ✦﹏✦) รีทวีตแล้ว
Paweł Huryn
Paweł Huryn@PawelHuryn·
Anthropic has quietly shipped third-party inference for Cowork and Code in Claude Desktop. This should work with local models or OpenRouter via LiteLLM proxy. Is it just me?
Paweł Huryn tweet mediaPaweł Huryn tweet media
English
51
79
1.1K
673.4K
( ✦﹏✦) รีทวีตแล้ว
( ✦﹏✦) รีทวีตแล้ว
Parallel Web Systems
The best web search for agents is now free. Upgrade to Parallel's web search tools in any MCP-supported tool or agent, for free, in under 60 seconds. No account. No API keys. Zero cost. docs.parallel.ai/integrations/m…
GIF
English
11
26
217
101.3K
proper
proper@ProperPrompter·
i don't like gpt 5.5
proper tweet media
English
18
7
719
17.4K
Ankith 🐋/acc
Ankith 🐋/acc@dhtikna·
The 2x price increase basically made me lose all interest in gpt 5.5
English
4
0
23
1.8K
Robin Ebers | AI Coach for Founders
GPT 5.5 is now more expensive than Opus 4.7 remember when I posted this and everyone disagreed? GPT-5: $1.25 / $10 GPT-5.1: $1.60 / $12 GPT-5.2: $2.00 / $14 GPT-5.3: $2.30 / $16 GPT-5.4: $2.80 / $18 GPT-5.5: $5.00 / $30 GPT-5 got more expensive with EVERY MODEL RELEASE and is now 3x (!!!) the price that it was when it first launched we’re fucked
Robin Ebers | AI Coach for Founders@robinebers

everyone says AI models are getting cheaper they're not - gpt-5.2 is 40% more expensive than 5.1 - gemini 3 pro is 60% more expensive than 2.5 pro - sonnet with 1M context: 2x the price the world is slowly waking up to the fact that compute is expensive ... and you get what you pay for

English
36
3
97
17.1K
( ✦﹏✦)
( ✦﹏✦)@SynthSquid·
@songjunkr evey comment about dsv4 pushes the release date back by 1 minute
English
0
0
9
1K
송준 Jun Song
송준 Jun Song@songjunkr·
What happened to DeepSeek V4? Now is the perfect time for them to release it. Has anyone heard any news?
English
27
1
75
8.7K
( ✦﹏✦)
( ✦﹏✦)@SynthSquid·
I'm ready for spud-mini, probably releasing in 12 days, (per gpt5.4 release vs mini)
English
0
0
0
28
Theo - t3.gg
Theo - t3.gg@theo·
GPT-5.5 (medium) is tied for SOTA on Artificial Analysis. GPT-5.5 (high) and GPT-5.5 (xhigh) are meaningfully ahead. xhigh is the first model to break the 50's
Theo - t3.gg tweet media
English
47
45
1.4K
68K
Alex Volkov
Alex Volkov@altryne·
@sharifshameem We actually tried this exact flow live on stream and the results were very meh. Image generated a crazy overthe top design and 5.5 didnt' even try to reimplement it :/
English
5
0
14
2.7K
Sharif Shameem
Sharif Shameem@sharifshameem·
GPT 5.5 is the greatest frontend model in the world when combined with imagegen 2. it can build working products that feel like they're from sci-fi movies. it can take my blog's URL and one shot a 32 page magazine. a world-class designer and coder for $20 a month is surreal
Sharif Shameem tweet media
English
20
17
389
20.8K