Nick

232 posts

Nick banner
Nick

Nick

@nickwal

just emergent behavior

Katılım Haziran 2014
902 Takip Edilen178 Takipçiler
Leonard Tang
Leonard Tang@leonardtang_·
Hello MJ1: The World's TASTIEST Judge Model Agent verification is the bottleneck to AI's progress. The field's ability to verify visual output lags far behind that of text, especially in matters of ~taste~. So we built the world's tastiest multimodal judge model, MJ1.
Leonard Tang tweet media
English
10
7
61
9.4K
Nick retweetledi
Bartosz Naskręcki
Bartosz Naskręcki@nasqret·
It finally happened-my personal move 37 or more. I am deeply impressed. The solution is very nice, clean, and feels almost human. While testing new models in the last few weeks, I felt this coming, but it's an eerie feeling to see an algorithm solve a task one has curated for about 20 years. But at least I have gained a tool that understands my idea on par with the top experts in the field. And I am now working on a completely new level. My singularity has just happened… and there is life on the other side, off to infinity!
Epoch AI@EpochAIResearch

We ran GPT-5.4 (xhigh) an additional ten times on Tier 4 to get a pass@10 score. This was 38%. In one of these runs, it solved another problem no model had solved before. This problem was by @nasqret.

English
105
452
3.6K
1.1M
Xeophon
Xeophon@xeophon·
Some personal news: - Finished another trip around the sun today 🫡 - Decided to join @PrimeIntellect to work on evals!! There’s a lot to be build and do couldn’t imagine a better team to do just that 🙌 - I will be in SF the next two weeks :) Just to look around, of course 👀
English
195
21
905
102.3K
Standard Intelligence
Standard Intelligence@si_pbc·
Computer use models shouldn't learn from screenshots. We built a new foundation model that learns from video like humans do. FDM-1 can construct a gear in Blender, find software bugs, and even drive a real car through San Francisco using arrow keys.
GIF
English
186
404
3.9K
1.1M
jason liu
jason liu@jxnlco·
I’ve recently joined @openai to work with @romainhuet on @OpenAIDevs Now is the year of dogged pursuits But Back in 2021 i thought my technical career was over. I had chronic hand pain in both my hands and could barely tie my shoes let alone use my phone or write code. I spent a few years not thinking about what it mean for the value of my labor to go zero but to not being able to produce any labor at all… I gave up bjj. Pottery. Tech. Etc. Then, that one company that solved dota and hide and seek released chatgpt and whisper and all of a sudden with dictation and some determination I could write essays, build things, and make a living from twitter meeting great people like @eugeneyalt @dmdohan @humford @GEVS94 for my reintegration into the tech world after so many years away. From Canada advised companies for free until I had to ask them to pay me. I charged companies until I figured out pricing and asked for enough that I became an investor as well. I started a consulting business and a course business. Learning alongside @HamelHusain and @vig_xyz But through that time I learned a lot about running a business and felt like I’d stopping learning about everything else. I realized that last summer that I wanted to wrap things up and go somewhere and just get involved and be at the center of it all.
jason liu tweet media
English
124
18
595
75.4K
Proximal
Proximal@ProximalHQ·
Today, we are announcing Proximal. Proximal is a research lab for data. Our core belief is that data which is complex enough to teach today’s frontier models is not bottlenecked by domain experts, but by great ideas and excellent software. We are excited about a world in which coding agents can autonomously run for multiple weeks, solve the hardest technical problems and discover novel ideas that advance progress in various domains of science and engineering. We believe that we are not far from this future, but that the biggest bottleneck preventing us from achieving it is training data. Many companies work on data, but most of them are approaching it the wrong way. Historical capability breakthroughs are the result of creative engineers discovering scalable data collection methods, not thousands of contractors manually writing task demonstrations. Inevitably, the potential impact of human data will become smaller and smaller as model capabilities increase: agents are already outperforming most humans in many domains - the number of experts that are capable of judging model outputs shrinks with every new model release. Proximal is a new data company. We are not a recruiting firm or a talent marketplace, but a research and engineering organization that treats data as a problem which deserves the same level of rigor as work on training algorithms and model architectures. We think that this is the most impactful work towards agents that can autonomously solve complex technical problems, and intend to share our research and progress in the open.
Proximal tweet media
English
50
21
317
105.9K
Nick retweetledi
David
David@DavidSHolz·
5 million humanoid robots working 24/7 can build Manhattan in ~6 months. now just imagine what the world looks like when we have 10 billion of them by 2045. now imagine the year 2100.
English
589
338
4.6K
535.5K
Goodfire
Goodfire@GoodfireAI·
We used interpretability to scale RL against open-ended tasks, cutting Gemma 12B’s hallucination rate in half by teaching it to self-correct in tandem with our probing harness.
English
13
39
341
69.5K
Nick retweetledi
Prime Intellect
Prime Intellect@PrimeIntellect·
Introducing Lab: A full-stack platform for training your own agentic models Build, evaluate and train on your own environments at scale without managing the underlying infrastructure. Giving everyone their own frontier AI lab.
English
133
291
2.5K
746.9K
Nick
Nick@nickwal·
@thdxr providing*
English
0
0
0
49
Nick
Nick@nickwal·
@thdxr I believe this is regarding intra-turn prefill where you precondition the response by proving the first few tokens for that turn of the assistant response I believe this is unrelated to prior turns in the chat format
English
2
0
2
1.7K
dax
dax@thdxr·
are we misunderstanding this? the implication is you can't insert any content that anthropic didn't know to have generated this breaks things like switching models mid session and a dozen other things harnesses rely on i switch between claude and gpt all the time :(
dax tweet media
English
55
12
650
89.8K
Nick
Nick@nickwal·
@latkins yooooo amazing work!
English
0
0
0
34
Lucas Atkins
Lucas Atkins@latkins·
Today, we are releasing our first weights from Trinity-Large, our first frontier-scale model in the Trinity MoE family. American Made. - Trinity-Large-Preview (instruct) - Trinity-Large-Base (pretrain checkpoint) - Trinity-Large-TrueBase (10T pre Instruct data/anneal)
English
52
111
858
296.7K
Nick
Nick@nickwal·
@msfeldstein fetch tool / web search is very brittle (constantly hangs for me) and could use some improvements regarding what it’s actually searching
English
0
0
0
21
Michael Feldstein
Michael Feldstein@msfeldstein·
What are your biggest paper cuts and quality issues you'd like to see us invest in fixing in the Cursor IDE?
English
72
1
58
11.8K
Jon Kaplan
Jon Kaplan@aye_aye_kaplan·
@MikeCarbone @leerob Agent Review does have the option, it looks like you're on a really old client. Just update your client and you'll see a chevron dropdown that lets you change the base branch.
English
4
0
6
383
Mike Carbone 🇺🇸
Mike Carbone 🇺🇸@MikeCarbone·
Yo @leerob would love the ability to find issues vs. custom branch (the parent), not main. Working in a graphite stack 🙏 Right now relying on bugbot in PRs but would like to do a pass locally
Mike Carbone 🇺🇸 tweet mediaMike Carbone 🇺🇸 tweet media
English
2
0
3
509
Nick
Nick@nickwal·
@willccbb nemotron would be sick if it didn't mean a trip to mamba hell
English
0
0
1
34
will brown
will brown@willccbb·
what models do you guys wanna train? maybe some bigger ones?
will brown tweet media
English
25
1
120
8.5K
Nick
Nick@nickwal·
@aye_aye_kaplan this has got me a few times, but when @-ing cursor on Github, the prompt has to go after the @ or else the agent wont get triggered I often find myself writing a comment in review and then also mentioning cursor as a second opinion along with team members
English
1
0
0
18
Jon Kaplan
Jon Kaplan@aye_aye_kaplan·
Anyone have any bugs or quality of life improvements on their Cursor wishlist? Tell me and I'll fix it this week! * Please make sure it repros on the latest version of Cursor (2.3) so I don't waste time chasing a bug that's already fixed * Caveat that some bugs may not be fixable in such a short time
English
58
3
76
9.8K
Nick retweetledi
Crémieux
Crémieux@cremieuxrecueil·
I've never understood this claim. The fascist leadership was non-STEM and absolutely *obsessed* with art. They considered it more important than economic affairs! And of course they were! The allure of STEM is mastery over matter—the opposite of the drive to political power.
Crémieux tweet media
Variety@Variety

Guillermo del Toro tells young directors they should not listen when “people tell you art is not important,” because that is “always a prelude to fascism.” “Be kind, be involved, believe in your art. At a time when people tell you art is not important, that is always a prelude to fascism. They think they can debase everything that makes us a little better, a little more human. And that, in my book, and in my life, includes monsters," @RealGDT said. Read more here: variety.com/2026/film/news…

English
59
155
2.2K
80.1K
Nick
Nick@nickwal·
@aye_aye_kaplan oh amazing guess I didn’t click around enough thanks!
English
0
0
1
17
Jon Kaplan
Jon Kaplan@aye_aye_kaplan·
@nickwal You can already do this! Click the little chevron to open the dropdown!!
English
1
0
1
42