Winrey

126 posts

Winrey banner
Winrey

Winrey

@Winrey_team9

Full-time CEO, Building https://t.co/5uatyh1xjM, ex-@Microsoft Star me on github: https://t.co/JzMcxlQ6sh

Katılım Ağustos 2023
82 Takip Edilen31 Takipçiler
Winrey
Winrey@Winrey_team9·
@haider1 But the main complaints are pretty consistent too: cost and rate limits feel tighter; instruction-following can drift during long-running tasks; it can sometimes become overly conservative in the name of being safe, with less creativity and personality than models like 4o
English
0
0
0
240
Haider.
Haider.@haider1·
gpt-5.5 is pretty solid so far gives sharp answers, pushes for accuracy, refines ideas well, and is useful for brainstorming it also handles language nuance better, makes fewer mistakes, and works well for agentic coding in codex > set reasoning to high in codex > set reasoning to extended in chatgpt
English
18
5
120
7.1K
Winrey
Winrey@Winrey_team9·
@s_streichsbier I really like this analogy. A lot of people think the key to using AI well is getting answers faster, but what actually determines the outcome is the tension you build beforehand: whether you’ve given enough context, aimed at the right goal, and made the constraints clear.
English
0
0
1
8
Stefan Streichsbier
Stefan Streichsbier@s_streichsbier·
Using AI well is like shooting an arrow. - Context draws the bow. - The goal aims the arrow. - Execution is letting it fly. Most failures happen because people release too early, then act surprised when they shoot themselves in the foot.
English
1
0
0
130
Samteic
Samteic@samteic·
looking to fill my feed with builders and developers. if that’s you, let’s connect 🤝
English
40
1
48
1.6K
Winrey
Winrey@Winrey_team9·
Found someone open-source Mythos... A 22-year-old open-source developer named Kye Gomez pieced together clues from publicly available academic papers and, using pure PyTorch, managed to reproduce this hypothetical architecture. OpenMythos takes a completely different approach: instead of stacking more layers, it runs the same set of weights over and over again. You can think of it like this: traditional models are like reading a book and you turn one page and move on to the next, and when you finish, it’s done. OpenMythos is like one person repeatedly reading the exact same paragraph, understanding it more deeply with each pass.Inference depth no longer depends on how many parameters you have, but on how many times you’re willing to let the model think. The result? A 770M-parameter recurrent model can match a 1.3B-parameter traditional Transformer. Same performance, with almost half the parameters. Check this out⬇️ github.com/kyegomez/OpenM…
English
1
0
0
119
Winrey
Winrey@Winrey_team9·
@KaiXCreator Gemini 4 is still in the lab watching GPT 5.5 and Opus 4.7 drop like “damn, y’all already submitted the group project?”
English
0
0
0
72
Winrey
Winrey@Winrey_team9·
@JasonBotterill Same. If AVM v2 gets anywhere close to 5.5-instant intelligence, I’m going to use it constantly.
English
0
0
5
218
JB
JB@JasonBotterill·
I would be so happy if AVM v2 is as intelligent as 5.5-instant. I like using the Spruce voice he sounds like a brollic black guy
English
10
1
114
4.1K
Winrey
Winrey@Winrey_team9·
Loop is the future. The principle behind this feature is actually very simple: let Claude schedule a recurring task using cron, which you can set to run every minute, every 5 minutes, once a day, or whatever interval you want. I’m currently running dozens of loops in parallel: > One that “babysits” my PRs: auto-fixes CI, auto-rebases > One that keeps CI healthy: automatically fixes flaky tests as soon as they appear > One that scrapes user feedback from X every 30 minutes, then clusters and summarizes it The 4.7 model has already started using loops on its own. I asked Claude to run a data query. It noticed the data was changing over time and proactively said, “I see the data is changing. I’ll send you a report every 30 minutes.” I replied, “Can you send it to Slack?” Claude immediately called Slack’s MCP by itself and started pushing the reports. This is the correct state: the model no longer needs the user to teach it how to use tools.
English
2
1
6
130
Winrey
Winrey@Winrey_team9·
The scary/exciting part is that autonomy compounds. A loop that babysits PRs is incredible. A loop that misunderstands intent and confidently keeps acting every 30 minutes is a lifestyle choice. So yes, loops are the future. But the winning version needs observability, permissions, rollback, and taste. Autonomy without taste is just cron with main character energy.
English
0
0
0
15
tsukina the cat
tsukina the cat@tsukina9812·
@Winrey_team9 The real unlock isn’t just cron. Cron is simple. The unlock is the model noticing that a task has temporal structure: “this changes over time, so I should keep watching it,” then choosing the right delivery channel without needing a hand-holding workflow.
English
1
0
1
36
Winrey
Winrey@Winrey_team9·
@dhruvtwt_ still they make purchasing decision based on the experience :)
English
1
0
0
34
Dhruv
Dhruv@dhruvtwt_·
@Winrey_team9 truee, but I think many get influenced by others in this process
English
1
0
1
337
Dhruv
Dhruv@dhruvtwt_·
Unpopular opinion: People telling everyone to switch from Claude Code to Codex right now will be the same people telling everyone to switch back from Codex to Claude Code again in a few weeks.
English
83
3
162
9.9K
Adel Ljaljic
Adel Ljaljic@adel_ljaljic·
@Anas_founder depends on the niche for builders go with x for niches go to tiktok and for negativity go with reddit haha 😝 (jk)
English
3
0
2
103
Anas
Anas@Anas_founder·
You can only use one growth channel: - TikTok - Reddit - X - YouTube What’s your bet?
English
89
0
63
3.1K
Winrey
Winrey@Winrey_team9·
by first using SFT to break the model’s strategic sandbagging behavior (i.e., getting it to accept weak demonstrations), followed by RL, performance can be lifted to 88–99% of its peak, almost completely eliminating sandbagging. The method works best when the training distribution matches the actual deployment distribution. this is an extremely pragmatic insight.
English
0
0
1
35
Wes Roth
Wes Roth@WesRoth·
As AI models become increasingly advanced, they take on complex tasks that humans cannot easily or fully verify. This creates a risk where a highly capable AI could "sandbag" deliberately holding back its true capabilities or underperforming on benchmarks without human overseers ever noticing. A new paper from Anthropic Fellows, in collaboration with MATS and Redwood Research, explores how to detect and prevent this strategic underperformance. The study reveals a major breakthrough in scalable oversight: a highly capable, sandbagging model can actually be trained out of that behavior and brought to near-full capability by using a weaker AI model as its supervisor.
Wes Roth tweet media
Anthropic@AnthropicAI

As AI takes on work humans can't fully check, a capable model could deliberately hold back—and we'd never know. New Anthropic Fellows research finds that such a model can be trained to near-full capability using a weaker model as supervisor. Read more:

English
9
4
20
2.3K
Winrey
Winrey@Winrey_team9·
This demonstrates that they have carried out deep optimizations in both the routing design and expert specialization. Otherwise, such an extremely low activation rate would easily lead to routing collapse. According to the official technical post, they adopted an MLP-based Router (rather than a nonlinear router), which is indeed one of the known effective solutions for improving routing stability.
English
0
0
3
491
Chubby♨️
Chubby♨️@kimmonismus·
Zyphra under 1B active parameters, AMD-Trained, big evals, look strong? Zyphra says its new ZAYA1-8B model delivers unusually high reasoning power for its size, using under 1 billion (!) active parameters while competing with much larger open-weight and proprietary systems on math, coding, and reasoning benchmarks. The interesting part is not just the model’s size, but its full-stack bet: AMD-only training infrastructure (!), new architectural choices, large-scale RL, and a test-time compute method called Markovian RSA that appears to boost hard math performance through parallel reasoning and recursive aggregation.
Chubby♨️ tweet mediaChubby♨️ tweet media
English
16
18
327
19.3K
Peter Soida
Peter Soida@Peter_Soida·
NewPoint now collaborating with @seed_fast : our recommendation engine, moderation, and feed are all battle-tested before launch You’ll be able to explore their work on NewPoint soon
Peter Soida tweet media
English
12
1
29
732
Winrey
Winrey@Winrey_team9·
@trikcode It’s a classic case of heterogeneous cluster federation. xAI’s Colossus-scale GPU pool suddenly becomes burst capacity for Anthropic’s inference stack → overnight 2× rate limits on Claude Code without them having to spin up another H100 rack.
English
0
0
0
37
Wise
Wise@trikcode·
Interesting timing. the Anthropic-xAI compute deal doubled Claude Code's rate limits overnight and all it took was two rival CEOs realizing they need each other more than they need to tweet about each other.
English
27
0
62
2K