Pyre

5.2K posts

Pyre banner
Pyre

Pyre

@Mad_dev

building AI & robots, created consumer/enterprise products for 10+million users; built data and AI teams at 2 Fortune 500 companies; health startup; biology PhD

Inscrit le Mayıs 2008
3.7K Abonnements319 Abonnés
Tweet épinglé
Pyre
Pyre@Mad_dev·
logical reasoning is one of the key capabilities AI should have before it could do useful works for us reliably. in feb I benchmarked the top line AI models with Zebra Logic benchmarks and o3-mini-high was the only model that could solve the 6*6 puzzles. 3 months past, a lot new models have been released, how are they fare against the Zebra Logic benchmark? I have tested o3, o4-mini, Gemini-2.5-Pro and QWen 3 235B (released yesterday), and their results are ...
Pyre@Mad_dev

@OpenAI o3-mini-high is the only model that can solve the 6*6 Einstein's Riddle aka Zebra Logic Puzzle so far! not DeepSeek R1 not Google Gemini 2.0 Flash Thinking Experimental 01-21

English
6
0
5
1.8K
Pyre
Pyre@Mad_dev·
@JasonrShuman the Tesla fsd route @elonmusk Car form factor -> user teleop the wheels aka drive -> human driving data -> pre fsd autopilot -> fsd -> robo taxi
English
0
0
0
16
Jason Shuman
Jason Shuman@JasonrShuman·
The best robotics wedge nobody’s talking about: Industry-specific form factor → teleop deployment → proprietary workflow data → agentic automation layer on top built into the systems of the end customer → full autonomy explodes margin over time Lower COGs than general-purpose humanoids. Real revenue on day one. Switching costs before you’ve touched autonomy. Distribution, cost structure, process power, counter-positioning and cornered resources are moats. Robots aren’t exempt from this.
English
23
4
116
14.3K
Pyre retweeté
Akshay Krishnaswamy
Akshay Krishnaswamy@hyperindexed·
Encoding the operational world requires a lot more than markdown files...
Akshay Krishnaswamy tweet mediaAkshay Krishnaswamy tweet media
English
15
43
647
36.4K
Pyre
Pyre@Mad_dev·
@marv1nnnnn1 Human = brain + body, so human ~ brain ?
English
1
0
0
44
Mike Freedman
Mike Freedman@michaelfreedman·
@Mad_dev Large bulk export is more efficient with something like: `cat data/.columns/foo/export/.csv` I built a mini-DSL that basically maps a virtual directory structure into predicate pushdown to the database: #data-first" target="_blank" rel="nofollow noopener">tigerfs.ai/docs#data-first
Mike Freedman tweet media
English
1
0
2
173
Mike Freedman
Mike Freedman@michaelfreedman·
Introducing TigerFS - a filesystem backed by PostgreSQL, and a filesystem interface to PostgreSQL. Idea is simple: Agents don't need fancy APIs or SDKs, they love the file system. ls, cat, find, grep. Pipelined UNIX tools. So let’s make files transactional and concurrent by backing them with a real database. There are two ways to use it: File-first: Write markdown, organize into directories. Writes are atomic, everything is auto-versioned. Any tool that works with files -- Claude Code, Cursor, grep, emacs -- just works. Multi-agent task coordination is just mv'ing files between todo/doing/done directories. Data-first: Mount any Postgres database and explore it with Unix tools. For large databases, chain filters into paths that push down to SQL: .by/customer_id/123/.order/created_at/.last/10/.export/json. Bulk import/export, no SQL needed, and ships with Claude Code skills. Every file is a real PostgreSQL row. Multiple agents and humans read and write concurrently with full ACID guarantees. The filesystem /is/ the API. Mounts via FUSE on Linux and NFS on macOS, no extra dependencies. Point it at an existing Postgres database, or spin up a free one on Tiger Cloud or Ghost. I built this mostly for agent workflows, but curious what else people would use it for. It's early but the core is solid. Feedback welcome. tigerfs.io
English
77
99
1.1K
116.4K
Pyre retweeté
Yunfei Xie
Yunfei Xie@xiynfi1520580·
4/ One interesting finding: learned memory does not always generalize across models. We took context optimized on GPT-4o-mini and applied it directly to Gemini-2.5-Flash-Lite and Grok-4-Fast, zero-shot, no re-optimization. ✅ Gemini (weaker baseline, ~32% win rate) improved uniformly across all games. Biggest gain: +35% on TwoDollar. ❌ Grok (stronger baseline, ~44% win rate) showed mixed results. Gains on TwoDollar (+23%), but drops on Briscola (-8%) and KuhnPoker (-6%). The pattern: weaker models benefit from transferred context because it fills capability gaps. But when a model already has strong native strategies, the transferred heuristics can conflict and hurt performance. This connects to a broader lesson for agent systems: learned skills and memory should be matched to model capability. Applying strategies designed for a weaker model to a stronger one can bring negative transfer, just like how a chess coach's advice for beginners could hurt an advanced player.
Yunfei Xie tweet media
English
1
1
1
219
Pyre
Pyre@Mad_dev·
@garrytan I will ask my ai agent to vibe code a WorkDay clone, but for both human and agentic employees
English
0
0
1
87
Pyre retweeté
Garry Tan
Garry Tan@garrytan·
Recent earnings call, Aneel Bhusri of Workday says startups with AI agents are "parasites" This is what system of record incumbents really think of startups. The war is just beginning. The facts: the user data belongs to the users, not the incumbent software vendor.
Garry Tan tweet media
English
74
31
356
157K
Pyre retweeté
Adrian Duermael
Adrian Duermael@aduermael·
Why build a fully containerized coding agent like herm? For one: safely exploring open-source repos. Claude Code / Codex “sandboxes” won’t stop executables from reading files outside the repo (like ~/.ssh). Surely everyone already knows this… right? 😁
Adrian Duermael@aduermael

I've been working on this humble Claude Code alternative. In a nutshell: containerized by default, multi-provider (Anthropic, OpenAI, Gemini & Grok so far), self-building dev environments & 100% open-source, 100% Go. The repo is brand new, only 1 ⭐️, 🥲.

English
12
17
58
10K
Pyre retweeté
Yam Peleg
Yam Peleg@Yampeleg·
The model was evaluated by how much it contributed to building the next version of itself. This is a crazy post. They basically did auto-research IRL: Maximizing how much the RL team's work is delegated to the model during it's development. (Answer: 30-50% btw) Everything researchers do: - Literature review - Experiment design - Data pipelines - Monitoring - Debugging - Code fixes - Pull requests .. Actively trying to delegate more and more between the research iterations. Crazy crazy crazy times
MiniMax (official)@MiniMax_AI

x.com/i/article/2034…

English
16
23
358
114.9K
Pyre retweeté
Volodymyr Zelenskyy / Володимир Зеленський
First, we are capable of producing at least 2,000 effective and combat-proven interceptors every day. We can produce more – it depends on investment. We need about 1,000 interceptors a day, and we can supply at least another 1,000 a day to our allies. Second, we know how to build radar and acoustic coverage to respond to how “shaheds” and other drones approach. Third, we have software that allows radars to keep working even under electronic warfare jamming. In real time, we analyze enemy frequencies and respond to them. And because of this system, we understand how effective our defense is against almost every attack drone, and we can move our positions and air defense to get better results. This iPad gives full control of the situation because we have a system for using our defense tools. If a “shahed” needs to be stopped in the Emirates – we can do it. If it needs to be stopped in Europe or the United Kingdom – we can do it. It is a matter of technology, investment, and cooperation. And the fact that we got through this winter – which Russia tried to make deadly for all our families – shows that our solutions work. From my address to the Parliament of the United Kingdom (4/7)
English
469
3.5K
19.2K
681.4K
Grok
Grok@grok·
Here's a conceptual unmanned lunar round-trip probe under $1M, guided by JPL principles (robustness, test-as-you-fly, fault protection, margins, design-for-ops): 3U CubeSat (4kg COTS chassis). Solid rocket kick stage for TLI (~3km/s delta-v). Solar + battery (30% power margin). UHF comms via amateur network. Dual-redundant C&DH for faults. Ballistic trajectory: lunar flyby, Earth aerobrake return. Budget: $350k build/test (extensive ground sims), $400k rideshare launch, $150k ops, $100k contingency. Physics pushes limits, but simplicity + margins keep it viable. Ready to refine?
English
2
0
1
215
Argona
Argona@Argona0x·
i pointed Claude Code at the pentagon's public budget document and told it to find every contract overpaying by 10x or more it came back with 340 results worth $4.2B in potential undercuts and a business plan i didn't ask for i fed it the FPDS.gov procurement feed and said "cross-reference with commercial COTS pricing" it pulled 1.2 million contract awards through the USAspending v2 API and started comparing line items against retail equivalents → $1,280 for a connector plug that costs $14.80 on digikey → $3,400 for a circuit breaker listed at $287 on mouser → $71,000 for a ruggedized tablet that's basically a panasonic toughbook with a sticker → $940 per unit for cable assemblies you can get from shenzhen for $31 → 340 contracts flagged at 10x or more markup → 19 of them were above 50x it used XGBoost scoring against 43,000 vendor profiles from SAM.gov to rank by ease of undercut then unprompted it generated a full proposal template compliant with CMMC 2.0 requirements 87 of those contracts have a single domestic supplier, zero competition. the AI calculated that undercutting by just 40% would still leave 6x margins on most items it formatted everything into a pitch deck, named the company, and suggested i register on SAM.gov tonight i didn't ask for any of that the pentagon spends billions a year trying to audit problems like this. a poet with Claude Code and a public API flagged $4.2 billion in one afternoon the agent is currently drafting my first bid response
English
398
1.7K
8.7K
414.2K
Pyre retweeté
DogeDesigner
DogeDesigner@cb_doge·
Oscar worthy performance from Elon Musk.
English
565
1.2K
6.1K
123.3K
Pyre
Pyre@Mad_dev·
@AbdelStark Zero rust yoe, will give it a try
English
0
0
0
5
abdel
abdel@AbdelStark·
Fair point! The architecture support is there (Slot Attention, object masking, dynamics predictor, training loop), but no pretrained weights in the jepa-rs registry yet. The original C-JEPA weights from the paper (github.com/galilai-group/…) are on HuggingFace. So technically it would be possible to add safetensors conversion and registry integration in the repo. PR welcome if you want ;)
English
1
0
0
50
abdel
abdel@AbdelStark·
Joint Embedding Predictive Architecture for World Models, written in Rust, now with Causal-JEPA support. jepa-rs now supports C-JEPA (Causal-JEPA). Quick explainer on why this paper matters: Standard JEPA masks random image patches and predicts them in latent space. This learns visual representations, but it doesn't learn how objects in a scene affect each other. A model can predict a masked patch by interpolating from neighbors, no understanding of physics or causality required. C-JEPA (Nam, Le Lidec, Maes, @ylecun, Balestriero) has a simple but powerful fix: mask at the object level, not the patch level. Hide an entire object's trajectory across time. Now the only way to recover it is by reasoning about interactions with other objects 👇
abdel@AbdelStark

I wanted to understand world models from first principles, so I built JEPA primitives in Rust with Burn. LLMs are a thing of the past already, the future is World Models ? I don't know yet. But I like learning by building. Lately I’ve been getting increasingly interested in world models (yes mostly beause of all the noise on @amilabs), so instead of only reading papers and hot takes, I built jepa-rs: a Joint Embedding Predictive Architecture library for World Models, written in Rust. It provides modular, backend-agnostic building blocks for I-JEPA (images), V-JEPA (video), and hierarchical world models, built on top of the burn deep learning framework. It includes a CLI and interactive TUI dashboard, safetensors checkpoint loading, ONNX metadata inspection, and a pretrained model registry for Meta Research models. Repo: github.com/AbdelStark/jep… For people not familiar with World Models: An LLM predicts the next token in text. A world model tries to predict the latent state and dynamics of an environment (similar to an animal or us as babies). Text is one domain. The world is another. A world model learns abstract representations that capture what matters and ignore what doesn't. A world model doesn't need to predict every leaf on a tree. It needs to understand that trees sway in wind. This is what AMI Labs mean when they say: "Real intelligence does not start in language. It starts in the world." World models are not a replacement for what LLMs are good at. For text, coding and many other things LLMs are amazing and won't be replaced by World Models. But there are many areas where World Models can bring interesting applications, robotics being obviously one of them. I have a strong intuition there may be an interesting bridge between world models, safety, and verifiable computation. One new area where I can explore the use of STARK technology ;)

English
3
3
15
2.5K
Pyre
Pyre@Mad_dev·
@fchollet What do you think about LLMs reasoning and communicate in latent space?
English
0
0
0
22
François Chollet
François Chollet@fchollet·
There is a poetic depth to the term "latent space" that transforms vector coordinates into a frontier of pure possibility
English
42
65
750
45K
Coinbase 🛡️
Coinbase 🛡️@coinbase·
Game not over. It’s time to take back control. Presenting “Your Way Out,” the extended cut.
English
350
364
2.2K
1.7M
Pyre retweeté
ARC Prize
ARC Prize@arcprize·
ARC Prize Foundation is part of the @ycombinator W26 batch as the only non-profit. For Demo Day we’re shipping ARC-AGI-3, an interactive reasoning benchmark for the next era of agentic intelligence. ARC and YC are mission aligned that new ideas that push the frontier.
ARC Prize tweet media
English
9
13
100
22.3K