Winrey

126 posts

Winrey

@Winrey_team9

Full-time CEO, Building https://t.co/5uatyh1xjM, ex-@Microsoft Star me on github: https://t.co/JzMcxlQ6sh

Katılım Ağustos 2023

82 Takip Edilen31 Takipçiler

Winrey@Winrey_team9·8 May

@haider1 But the main complaints are pretty consistent too: cost and rate limits feel tighter; instruction-following can drift during long-running tasks; it can sometimes become overly conservative in the name of being safe, with less creativity and personality than models like 4o

English

240

Haider.@haider1·8 May

gpt-5.5 is pretty solid so far gives sharp answers, pushes for accuracy, refines ideas well, and is useful for brainstorming it also handles language nuance better, makes fewer mistakes, and works well for agentic coding in codex > set reasoning to high in codex > set reasoning to extended in chatgpt

English

120

7.1K

Winrey@Winrey_team9·8 May

@s_streichsbier I really like this analogy. A lot of people think the key to using AI well is getting answers faster, but what actually determines the outcome is the tension you build beforehand: whether you’ve given enough context, aimed at the right goal, and made the constraints clear.

English

Stefan Streichsbier@s_streichsbier·8 May

Using AI well is like shooting an arrow. - Context draws the bow. - The goal aims the arrow. - Execution is letting it fly. Most failures happen because people release too early, then act surprised when they shoot themselves in the foot.

English

130

Winrey@Winrey_team9·8 May

@JavierForge @samteic :)

QAM

Javier Alonso@JavierForge·8 May

@samteic SaaS builder here, lets connect!

English

Samteic@samteic·8 May

looking to fill my feed with builders and developers. if that’s you, let’s connect 🤝

English

1.6K

Winrey@Winrey_team9·8 May

@samteic let's connect :)

English

Winrey@Winrey_team9·8 May

Found someone open-source Mythos... A 22-year-old open-source developer named Kye Gomez pieced together clues from publicly available academic papers and, using pure PyTorch, managed to reproduce this hypothetical architecture. OpenMythos takes a completely different approach: instead of stacking more layers, it runs the same set of weights over and over again. You can think of it like this: traditional models are like reading a book and you turn one page and move on to the next, and when you finish, it’s done. OpenMythos is like one person repeatedly reading the exact same paragraph, understanding it more deeply with each pass.Inference depth no longer depends on how many parameters you have, but on how many times you’re willing to let the model think. The result? A 770M-parameter recurrent model can match a 1.3B-parameter traditional Transformer. Same performance, with almost half the parameters. Check this out⬇️ github.com/kyegomez/OpenM…

English

119

Winrey@Winrey_team9·7 May

@DanielSmidstrup you are so right

English

Daniel Smidstrup@DanielSmidstrup·7 May

@Winrey_team9 Sometimes you just can't help it.

English

Daniel Smidstrup@DanielSmidstrup·7 May

Why are you building?

English

110

3.8K

Winrey@Winrey_team9·7 May

@KaiXCreator Gemini 4 is still in the lab watching GPT 5.5 and Opus 4.7 drop like “damn, y’all already submitted the group project?”

English

Winrey@Winrey_team9·7 May

@JasonBotterill Same. If AVM v2 gets anywhere close to 5.5-instant intelligence, I’m going to use it constantly.

English

218

JB@JasonBotterill·7 May

I would be so happy if AVM v2 is as intelligent as 5.5-instant. I like using the Spruce voice he sounds like a brollic black guy

English

114

4.1K

Winrey@Winrey_team9·7 May

@Peter_Soida ：）

QAM

Peter Soida@Peter_Soida·7 May

@Winrey_team9 yeah

English

Winrey@Winrey_team9·7 May

Loop is the future. The principle behind this feature is actually very simple: let Claude schedule a recurring task using cron, which you can set to run every minute, every 5 minutes, once a day, or whatever interval you want. I’m currently running dozens of loops in parallel: > One that “babysits” my PRs: auto-fixes CI, auto-rebases > One that keeps CI healthy: automatically fixes flaky tests as soon as they appear > One that scrapes user feedback from X every 30 minutes, then clusters and summarizes it The 4.7 model has already started using loops on its own. I asked Claude to run a data query. It noticed the data was changing over time and proactively said, “I see the data is changing. I’ll send you a report every 30 minutes.” I replied, “Can you send it to Slack?” Claude immediately called Slack’s MCP by itself and started pushing the reports. This is the correct state: the model no longer needs the user to teach it how to use tools.

English

130

Winrey@Winrey_team9·7 May

The scary/exciting part is that autonomy compounds. A loop that babysits PRs is incredible. A loop that misunderstands intent and confidently keeps acting every 30 minutes is a lifestyle choice. So yes, loops are the future. But the winning version needs observability, permissions, rollback, and taste. Autonomy without taste is just cron with main character energy.

English

tsukina the cat@tsukina9812·7 May

@Winrey_team9 The real unlock isn’t just cron. Cron is simple. The unlock is the model noticing that a task has temporal structure: “this changes over time, so I should keep watching it,” then choosing the right delivery channel without needing a hand-holding workflow.

English

Winrey@Winrey_team9·7 May

@dhruvtwt_ still they make purchasing decision based on the experience :)

English

Dhruv@dhruvtwt_·7 May

@Winrey_team9 truee, but I think many get influenced by others in this process

English

337

Dhruv@dhruvtwt_·7 May

Unpopular opinion: People telling everyone to switch from Claude Code to Codex right now will be the same people telling everyone to switch back from Codex to Claude Code again in a few weeks.

English

162

9.9K

Winrey@Winrey_team9·7 May

when I tried on codex be like

sui ☄️@birdabo

everybody calm down. i got this.

English

Winrey@Winrey_team9·7 May

@adel_ljaljic @Anas_founder but reddit enhances GEO significantly

English

Adel Ljaljic@adel_ljaljic·7 May

@Anas_founder depends on the niche for builders go with x for niches go to tiktok and for negativity go with reddit haha 😝 (jk)

English

103

Anas@Anas_founder·7 May

You can only use one growth channel: - TikTok - Reddit - X - YouTube What’s your bet?

English

3.1K

Winrey@Winrey_team9·7 May

by first using SFT to break the model’s strategic sandbagging behavior (i.e., getting it to accept weak demonstrations), followed by RL, performance can be lifted to 88–99% of its peak, almost completely eliminating sandbagging. The method works best when the training distribution matches the actual deployment distribution. this is an extremely pragmatic insight.

English

Wes Roth@WesRoth·7 May

As AI models become increasingly advanced, they take on complex tasks that humans cannot easily or fully verify. This creates a risk where a highly capable AI could "sandbag" deliberately holding back its true capabilities or underperforming on benchmarks without human overseers ever noticing. A new paper from Anthropic Fellows, in collaboration with MATS and Redwood Research, explores how to detect and prevent this strategic underperformance. The study reveals a major breakthrough in scalable oversight: a highly capable, sandbagging model can actually be trained out of that behavior and brought to near-full capability by using a weaker AI model as its supervisor.

Anthropic@AnthropicAI

As AI takes on work humans can't fully check, a capable model could deliberately hold back—and we'd never know. New Anthropic Fellows research finds that such a model can be trained to near-full capability using a weaker model as supervisor. Read more:

English

2.3K

Winrey@Winrey_team9·7 May

This demonstrates that they have carried out deep optimizations in both the routing design and expert specialization. Otherwise, such an extremely low activation rate would easily lead to routing collapse. According to the official technical post, they adopted an MLP-based Router (rather than a nonlinear router), which is indeed one of the known effective solutions for improving routing stability.

English

491

Chubby♨️@kimmonismus·7 May

Zyphra under 1B active parameters, AMD-Trained, big evals, look strong? Zyphra says its new ZAYA1-8B model delivers unusually high reasoning power for its size, using under 1 billion (!) active parameters while competing with much larger open-weight and proprietary systems on math, coding, and reasoning benchmarks. The interesting part is not just the model’s size, but its full-stack bet: AMD-only training infrastructure (!), new architectural choices, large-scale RL, and a test-time compute method called Markovian RSA that appears to boost hard math performance through parallel reasoning and recursive aggregation.

English

327

19.3K

Winrey@Winrey_team9·7 May

@Peter_Soida @seed_fast looks great!

English

Peter Soida@Peter_Soida·6 May

NewPoint now collaborating with @seed_fast : our recommendation engine, moderation, and feed are all battle-tested before launch You’ll be able to explore their work on NewPoint soon

English

732

Winrey@Winrey_team9·7 May

@trikcode It’s a classic case of heterogeneous cluster federation. xAI’s Colossus-scale GPU pool suddenly becomes burst capacity for Anthropic’s inference stack → overnight 2× rate limits on Claude Code without them having to spin up another H100 rack.

English

Wise@trikcode·7 May

Interesting timing. the Anthropic-xAI compute deal doubled Claude Code's rate limits overnight and all it took was two rival CEOs realizing they need each other more than they need to tweet about each other.

English

Keşfet

@haider1 @s_streichsbier @JavierForge @samteic @DanielSmidstrup @KaiXCreator @JasonBotterill @Peter_Soida