Pao Ramen

5.6K posts

Pao Ramen banner
Pao Ramen

Pao Ramen

@masylum

🍼 Stay at home dad. 🦄 Former founder and CTO at Factorial. I craft products: ☕️ https://t.co/7INU1LaQH5 🎩 https://t.co/XsN59VxLja 🀄️ https://t.co/Sjpxa8796e

Barcelona, Spain Katılım Mart 2007
819 Takip Edilen2.2K Takipçiler
Antoine Chaffin
Antoine Chaffin@antoine_chaffin·
BrowseComp-Plus, perhaps the hardest popular deep research task, is now solved at nearly 90%... ... and all it took was a 150M model ✨ Thrilled to announce that Reason-ModernColBERT did it again and outperform all models (including models 54× bigger) on all metrics
Antoine Chaffin tweet media
English
22
45
310
78.5K
Pao Ramen
Pao Ramen@masylum·
Can’t believe the whole economy depends on these hands
Pao Ramen tweet media
English
1
0
0
126
Pao Ramen
Pao Ramen@masylum·
When humans do tasks and they take more effort than expected, we suffer. We improve workflows, tools, and systems to avoid it the next time. As agents take over the workplace, they will need some embedded sense of "pain." Otherwise, they will happily, and inefficiently, chug along while improving nothing. Developers are already discovering this: agents are generating piles of code, and feeling no pain whatsoever, refactoring none of it. Perhaps the simplest way to implement this would be to train models to predict how many tokens a task should require, and when that budget is exceeded, trigger a refactoring workflow.
English
1
0
4
310
Pao Ramen
Pao Ramen@masylum·
Back in the 70s, Gil Scott-Heron prophetically declared, “The Revolution Will Not Be Televised.” That line stayed with me, and I found myself wondering what its 2026 equivalent might be. This piece is my answer: an article about the mundane rhythms of daily life set against the cosmological stakes of AI. paoramen.fika.bar/the-singularit…
Pao Ramen tweet media
English
0
0
3
200
Pao Ramen
Pao Ramen@masylum·
Hey hey! We are launching for the second time Fika! Since the last launch, I’ve built a company and brought together a small team of excellent humans to help make Fika real. I grew up on an internet where people shared ideas through long-form writing, and that’s how you learned. That internet is fading. Algorithmic feeds bury thoughtful content under sensational noise, and more and more slop is being written by machines. So we’re relaunching Fika as a publishing platform built for human writing. Our focus is on two things: helping writers grow their audience and making it possible to earn a living from their work. We’ve added internationalization features so writers can reach global audiences in their own language, and built simple lead-magnet tools to help publications grow their subscribers. We've crafted the product with lots of love and care, and I think it shows. You can show your support here: producthunt.com/products/fika-…
Pao Ramen tweet media
English
0
0
8
459
Pao Ramen
Pao Ramen@masylum·
When I was a kid, only three kids at school had a computer at home. We found each other and every day, instead of playing soccer, we traded floppy disks full of shareware and viruses. As you can imagine, we were the weird ones. The nerds. Now it’s the opposite. Most kids play Fortnite or Brawl Stars and my kids, who don’t play videogames yet, are the ones who feel left out. Funny how in a single generation the tables have turned!
English
1
0
1
449
Nate Berkopec
Nate Berkopec@nateberkopec·
While this is cool, optimizing a single variable for a nondeterministic process is the simplest possible thing you could optimize. Optimizing e.g. a web-app is 100x the dimensions here, along with a far more stringent correctness requirement. We have a long way to go.
Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English
10
0
23
9.2K
Pao Ramen
Pao Ramen@masylum·
@natemcgrady They are quick to answer when challenged and can articulate tradeoffs
English
0
0
0
96
Nate McGrady
Nate McGrady@natemcgrady·
What is a subtle sign that someone actually knows how to code in a coding interview?
English
64
0
52
18.3K
Pao Ramen
Pao Ramen@masylum·
@thsottiaux Search should also find things on the diff. Sometimes I'm reviewing code and I want to know where something is declared
English
0
0
0
67
Tibo
Tibo@thsottiaux·
With GPT-5.4 out. What should Codex ship or improve next?
English
1.1K
17
1.2K
112K
Pao Ramen
Pao Ramen@masylum·
@sdepablos so basically, everybody else are just NPC doing performative but important foreplay
English
1
0
2
100
Sergi de Pablos 🇪🇺
Sergi de Pablos 🇪🇺@sdepablos·
@masylum Not for you or me, but of all yearly multibillion telco contracts, roughly 10-15% are literally finished here and 50% are originated (all the Cs involved in the same city). Of course the action happens mostly in top hotels or in the MWC meeting rooms.
English
2
0
5
170
Pao Ramen
Pao Ramen@masylum·
Yesterday, for the first time, I went to the Mobile World Congress. The whole thing was bizarre. Everyone looked bored and slightly disengaged. There were three kinds of booths: testosteronic ones with Formula 1 cars, futuristic ones with dancing robots, and everything else. After walking around for hours, I still couldn’t tell what 99% were actually selling. Something something future. Agentic. AI. The average conversation went like this: “How’s it going? Is it worth it?” “Well… it’s kind of a waste of time and money, but YOU HAVE TO BE HERE.” A collective delusion of suits, tech, and surprisingly few mobile phones. 🎪
English
2
2
18
1.9K
Jordi Villar
Jordi Villar@jrdi·
@masylum You already knew that nobody goes there for the daylight events
English
1
0
1
223
0xMarioNawfal
0xMarioNawfal@RoundtableSpace·
OpenClaw can now scrape any website without getting blocked - zero bot detection, bypasses Cloudflare natively, 774x faster than BeautifulSoup. No selector maintenance. No workarounds. Just data. THIS IS AN UNFAIR ADVANTAGE AND IT'S FULLY OPEN SOURCE.
0xMarioNawfal tweet media
English
188
737
8.1K
933.5K
Pao Ramen
Pao Ramen@masylum·
Last month was our best ever at Fika. 1,087 new publications & 564 new articles. We've also shipped: - Paid subscriptions - Reactions - Resizable images - Advanced stats (views, open rates, channels…) - Voice dictation - Embeds - …and polished the product to death. I'm pretty confident Fika is now the best-looking publishing platform out there.
English
0
0
14
604
Pao Ramen
Pao Ramen@masylum·
Paid subscriptions are coming to Fika. Writers, journalists, and content creators can now monetize their audience directly. If you don’t have a publication on Fika yet, I encourage you to create one. Why? Because you don’t own your audience on social media. A follower means little when an algorithm dictates who gets to read what you create. And most platforms make you jump through hoops just to earn from the people who already chose to follow you. blog.fika.bar/how-to-earn-by… (for instance, I have +2.1k followers here, but this post will only reach a few hundreds because I committed the sin of adding a link)
English
0
0
1
278
Pao Ramen
Pao Ramen@masylum·
@davesnx I'm imagining only a subset of SQL is required, and obviously to secure it with RLS so data is scoped for the agent's permissions.
English
0
0
0
18
David Sancho
David Sancho@davesnx·
@masylum I dont follow, but I'm still curious if exposing sql doesn't mean allowing all sql syntax, then I don't know what exposing sql means
English
1
0
0
21
Pao Ramen
Pao Ramen@masylum·
Everybody seems to be looking for the holy grail of agent tooling. First it was MCP, then CLIs, and now some people claim it’s code generation + REST. I have a suspicion the final stop will be SQL. Not just locally, but remotely as well. Right now, agents need several calls to accomplish their tasks. That means more tokens in the context, more time, and more chances to mess things up. SQL lets you ask for the shape of the data directly, in one go. And LLMs are uncannily good at it. I’m willing to make a bold prediction: products that expose their data through SQL endpoints will be agent ready and thrive. The rest will lag behind.
English
3
2
14
1.1K
Pao Ramen
Pao Ramen@masylum·
@davesnx Why is exposing SQL more risky than exposing endpoints? Exposing SQL != Exposing the whole db
English
1
0
0
47
David Sancho
David Sancho@davesnx·
@masylum exposing SQL entirely sounds risky af, sql + a shared permissions model sounds ok but it will become oracle dbms 🫩
English
1
0
0
100