Joyce

235 posts

Joyce banner
Joyce

Joyce

@joyceerhl

product @cerebras prev eng @code ✨ opinions mine

San Francisco Katılım Haziran 2019
274 Takip Edilen522 Takipçiler
Joyce retweetledi
Michael Magán
Michael Magán@mrmagan_·
generative user interfaces at the speed of thought. you can now build "tab autocomplete" for every app. ultra-fast inference @cerebras & your components render by the @tambo_ai agent.
English
11
16
219
24K
Joyce retweetledi
Tibo
Tibo@thsottiaux·
We’ve made GPT-5.3-Codex-Spark about 30% faster. It is now serving at over 1200 tokens per second. More to come on speed across the board.
English
212
117
2.6K
348.7K
Joyce retweetledi
DHH
DHH@dhh·
@kevinwestmx @Zai_org GLM4.7 on @cerebras is insane. I was impressed with the model performance on my sample test, but I was BLOWN AWAY by the speed. Real window into the future of AI.
English
13
16
223
63.5K
Joyce retweetledi
OpenAI
OpenAI@OpenAI·
GPT-5.3-Codex-Spark is now in research preview. You can just build things—faster.
English
597
642
5.8K
1.5M
Joyce
Joyce@joyceerhl·
@burkeholland my kingdom for an alt modifier to engage shell mode
English
0
0
0
19
Burke Holland
Burke Holland@burkeholland·
Now that we're all coding in terminals this is my life
English
2
0
14
2.3K
Joyce retweetledi
Andrew Feldman
Andrew Feldman@andrewdfeldman·
@OpenAI and @Cerebras have signed a multi-year agreement to deploy 750 megawatts of Cerebras wafer-scale systems to serve OpenAI customers. This has been a decade in the making. Deployment begins in early 2026, and when fully rolled out, it will be the largest high-speed AI inference deployment in the world. OpenAI and Cerebras were both founded in 2015 with radically ambitious goals. OpenAI set out to build the software that would push AI toward general intelligence. Cerebras set out to rethink computing hardware from first principles. Our teams met as far back as 2017. We shared ideas, early work, and a common belief: there would come a point when model scale and hardware architecture would have to converge. That point has arrived. ChatGPT set the direction for the entire industry. It showed the world what AI could be. Now we’re in the next phase - not proving capability, but delivering it at global scale. The history of technology is clear on one thing: speed drives adoption. The PC industry didn’t operate at kilohertz. The internet didn’t change the world on dial-up. AI is no different. As models grow more capable, speed becomes the bottleneck. Slow systems limit what users can do, how often they engage, and whether AI becomes infrastructure or remains a novelty. Cerebras was built for this moment. By keeping computation and memory on a single wafer-scale processor, we eliminate the data-movement penalties that dominate GPU systems. The result is up to 15× faster inference, without sacrificing model size or accuracy. That speed changes product design, user behavior, and ultimately productivity. For consumers, it means AI that feels instantaneous. For the economy, it means agents that can finally drive serious productivity growth. For Cerebras, 2026 will be a defining year. With this collaboration with OpenAI, Cerebras’ wafer-scale technology will reach hundreds of millions - and eventually billions - of users. We’re proud to work alongside OpenAI to bring fast, frontier AI to people around the world. This is what a decade of long-term thinking looks like.
Andrew Feldman tweet media
English
56
70
498
157.3K
Joyce retweetledi
Nathan Lambert
Nathan Lambert@natolambert·
The combo of improvements in reasoning efficiency (fewer tokens per answer, still very new research area) and faster chips is going to make coding agents so so much faster in 6-12 months. The products in 2+ years will feel approx instantaneous relative to today.
English
9
5
165
17.5K
Joyce retweetledi
Vercel Developers
Vercel Developers@vercel_dev·
You can now use GLM-4.7 through Cerebras on AI Gateway.
Cerebras@cerebras

GLM-4.7 from @Zai_org is live on Cerebras! - Frontier intelligence for coding, tool-driven agents, and multi-turn reasoning - Record coding speed: ~1,000 tokens per second (up to 1,700 TPS for other uses) - Strong price-performance: ~10x higher than Sonnet 4.5

English
3
7
65
9.4K
SPS
SPS@spsbuilds·
@cerebras @Zai_org Does the API support tool calling with structured output?
English
1
0
0
891
Joyce retweetledi
Cerebras
Cerebras@cerebras·
GLM-4.7 from @Zai_org is live on Cerebras! - Frontier intelligence for coding, tool-driven agents, and multi-turn reasoning - Record coding speed: ~1,000 tokens per second (up to 1,700 TPS for other uses) - Strong price-performance: ~10x higher than Sonnet 4.5
English
94
123
1.4K
134.4K
Joyce
Joyce@joyceerhl·
@AI_GPT42 @cerebras @Zai_org 👋 By default, GLM 4.7 on Cerebras will use reasoning. You can opt to disable reasoning by setting `disable_reasoning: true` in your request.
English
0
0
0
39
Joyce
Joyce@joyceerhl·
@AonSayyed @cerebras @Zai_org 👋 we support 131K context window for GLM 4.7, and these are the full model weights (non-REAPed).
English
0
0
1
54
Joyce
Joyce@joyceerhl·
@gustojs @cerebras @Zai_org 👋 Caching and interleaved thinking are both supported. For optimal caching perf, we recommend setting `clear_thinking=false` in your requests. Learn more: #param-clear-thinking" target="_blank" rel="nofollow noopener">inference-docs.cerebras.ai/api-reference/…
English
0
0
1
43
Wali Mohammad Kadri
Wali Mohammad Kadri@_wmk0_·
Hello @pierceboggan 👋🏻 When are we getting some better "0x" models in copilot chat? There are better, cheaper & open-source alternatives, why not provide them?
English
2
0
2
895