Pyre

5.2K posts

Pyre

@Mad_dev

building AI & robots, created consumer/enterprise products for 10+million users; built data and AI teams at 2 Fortune 500 companies; health startup; biology PhD

Inscrit le Mayıs 2008

3.7K Abonnements319 Abonnés

Tweet épinglé

Pyre@Mad_dev·29 Nis

logical reasoning is one of the key capabilities AI should have before it could do useful works for us reliably. in feb I benchmarked the top line AI models with Zebra Logic benchmarks and o3-mini-high was the only model that could solve the 6*6 puzzles. 3 months past, a lot new models have been released, how are they fare against the Zebra Logic benchmark? I have tested o3, o4-mini, Gemini-2.5-Pro and QWen 3 235B (released yesterday), and their results are ...

Pyre@Mad_dev

@OpenAI o3-mini-high is the only model that can solve the 6*6 Einstein's Riddle aka Zebra Logic Puzzle so far! not DeepSeek R1 not Google Gemini 2.0 Flash Thinking Experimental 01-21

English

1.8K

Pyre@Mad_dev·20h

@JasonrShuman the Tesla fsd route @elonmusk Car form factor -> user teleop the wheels aka drive -> human driving data -> pre fsd autopilot -> fsd -> robo taxi

English

Jason Shuman@JasonrShuman·1d

The best robotics wedge nobody’s talking about: Industry-specific form factor → teleop deployment → proprietary workflow data → agentic automation layer on top built into the systems of the end customer → full autonomy explodes margin over time Lower COGs than general-purpose humanoids. Real revenue on day one. Switching costs before you’ve touched autonomy. Distribution, cost structure, process power, counter-positioning and cornered resources are moats. Robots aren’t exempt from this.

English

116

14.3K

Pyre retweeté

Akshay Krishnaswamy@hyperindexed·1d

Encoding the operational world requires a lot more than markdown files...

English

647

36.4K

Pyre@Mad_dev·20h

@marv1nnnnn1 Human = brain + body, so human ~ brain ?

English

marv1nnnnn@marv1nnnnn1·1d

x.com/i/article/2034…

ZXX

124

12.3K

Pyre@Mad_dev·1d

@michaelfreedman Could I run SQL queries on the database interface?

English

Mike Freedman@michaelfreedman·1d

@Mad_dev Large bulk export is more efficient with something like: `cat data/.columns/foo/export/.csv` I built a mini-DSL that basically maps a virtual directory structure into predicate pushdown to the database: #data-first" target="_blank" rel="nofollow noopener">tigerfs.ai/docs#data-first

English

173

Mike Freedman@michaelfreedman·1d

Introducing TigerFS - a filesystem backed by PostgreSQL, and a filesystem interface to PostgreSQL. Idea is simple: Agents don't need fancy APIs or SDKs, they love the file system. ls, cat, find, grep. Pipelined UNIX tools. So let’s make files transactional and concurrent by backing them with a real database. There are two ways to use it: File-first: Write markdown, organize into directories. Writes are atomic, everything is auto-versioned. Any tool that works with files -- Claude Code, Cursor, grep, emacs -- just works. Multi-agent task coordination is just mv'ing files between todo/doing/done directories. Data-first: Mount any Postgres database and explore it with Unix tools. For large databases, chain filters into paths that push down to SQL: .by/customer_id/123/.order/created_at/.last/10/.export/json. Bulk import/export, no SQL needed, and ships with Claude Code skills. Every file is a real PostgreSQL row. Multiple agents and humans read and write concurrently with full ACID guarantees. The filesystem /is/ the API. Mounts via FUSE on Linux and NFS on macOS, no extra dependencies. Point it at an existing Postgres database, or spin up a free one on Tiger Cloud or Ghost. I built this mostly for agent workflows, but curious what else people would use it for. It's early but the core is solid. Feedback welcome. tigerfs.io

English

1.1K

116.4K

Pyre retweeté

Yunfei Xie@xiynfi1520580·1d

4/ One interesting finding: learned memory does not always generalize across models. We took context optimized on GPT-4o-mini and applied it directly to Gemini-2.5-Flash-Lite and Grok-4-Fast, zero-shot, no re-optimization. ✅ Gemini (weaker baseline, ~32% win rate) improved uniformly across all games. Biggest gain: +35% on TwoDollar. ❌ Grok (stronger baseline, ~44% win rate) showed mixed results. Gains on TwoDollar (+23%), but drops on Briscola (-8%) and KuhnPoker (-6%). The pattern: weaker models benefit from transferred context because it fills capability gaps. But when a model already has strong native strategies, the transferred heuristics can conflict and hurt performance. This connects to a broader lesson for agent systems: learned skills and memory should be matched to model capability. Applying strategies designed for a weaker model to a stronger one can bring negative transfer, just like how a chess coach's advice for beginners could hurt an advanced player.

English

219

Pyre@Mad_dev·1d

@garrytan I will ask my ai agent to vibe code a WorkDay clone, but for both human and agentic employees

English

Pyre retweeté

Garry Tan@garrytan·1d

Recent earnings call, Aneel Bhusri of Workday says startups with AI agents are "parasites" This is what system of record incumbents really think of startups. The war is just beginning. The facts: the user data belongs to the users, not the incumbent software vendor.

English

356

157K

Pyre retweeté

Adrian Duermael@aduermael·1d

Why build a fully containerized coding agent like herm? For one: safely exploring open-source repos. Claude Code / Codex “sandboxes” won’t stop executables from reading files outside the repo (like ~/.ssh). Surely everyone already knows this… right? 😁

Adrian Duermael@aduermael

I've been working on this humble Claude Code alternative. In a nutshell: containerized by default, multi-provider (Anthropic, OpenAI, Gemini & Grok so far), self-building dev environments & 100% open-source, 100% Go. The repo is brand new, only 1 ⭐️, 🥲.

English

10K

Pyre retweeté

Yam Peleg@Yampeleg·1d

The model was evaluated by how much it contributed to building the next version of itself. This is a crazy post. They basically did auto-research IRL: Maximizing how much the RL team's work is delegated to the model during it's development. (Answer: 30-50% btw) Everything researchers do: - Literature review - Experiment design - Data pipelines - Monitoring - Debugging - Code fixes - Pull requests .. Actively trying to delegate more and more between the research iterations. Crazy crazy crazy times

MiniMax (official)@MiniMax_AI

x.com/i/article/2034…

English

358

114.9K

Pyre@Mad_dev·1d

Claude -> Claude Code -> Claude Cloud

AprilNEA@AprilNEA

🧵 I just reverse-engineered the binaries inside Claude Code's Firecracker MicroVM and found something wild: Anthropic is building their own PaaS platform called "Antspace" (Ants + Space). It's a full deployment pipeline — hidden in plain sight inside the environment-runner binary. Here's what I found 👇

English

Pyre retweeté

Volodymyr Zelenskyy / Володимир Зеленський@ZelenskyyUa·2d

First, we are capable of producing at least 2,000 effective and combat-proven interceptors every day. We can produce more – it depends on investment. We need about 1,000 interceptors a day, and we can supply at least another 1,000 a day to our allies. Second, we know how to build radar and acoustic coverage to respond to how “shaheds” and other drones approach. Third, we have software that allows radars to keep working even under electronic warfare jamming. In real time, we analyze enemy frequencies and respond to them. And because of this system, we understand how effective our defense is against almost every attack drone, and we can move our positions and air defense to get better results. This iPad gives full control of the situation because we have a system for using our defense tools. If a “shahed” needs to be stopped in the Emirates – we can do it. If it needs to be stopped in Europe or the United Kingdom – we can do it. It is a matter of technology, investment, and cooperation. And the fact that we got through this winter – which Russia tried to make deadly for all our families – shows that our solutions work. From my address to the Parliament of the United Kingdom (4/7)

English

469

3.5K

19.2K

681.4K

Pyre@Mad_dev·3d

@grok @jimlux @jimlux what about this solution?

English

102

Grok@grok·3d

Here's a conceptual unmanned lunar round-trip probe under $1M, guided by JPL principles (robustness, test-as-you-fly, fault protection, margins, design-for-ops): 3U CubeSat (4kg COTS chassis). Solid rocket kick stage for TLI (~3km/s delta-v). Solar + battery (30% power margin). UHF comms via amateur network. Dual-redundant C&DH for faults. Ballistic trajectory: lunar flyby, Earth aerobrake return. Budget: $350k build/test (extensive ground sims), $400k rideshare launch, $150k ops, $100k contingency. Physics pushes limits, but simplicity + margins keep it viable. Ready to refine?

English

215

Jim Lux@jimlux·3d

JPL has published their "Design Principles" for general consumption. acquisition.jpl.nasa.gov/download/terms…

English

234

18K

Pyre retweeté

Ryan McEntush@rmcentush·4d

the war machine is going electric.

Ryan McEntush@rmcentush

feels inevitable that all military systems will be battery-electric or hybrid-electric. cheaper, simpler, and with the massive electrical power capacity for next-gen compute, EW, directed energy, etc.

English

378

39.6K

Pyre@Mad_dev·3d

@Argona0x Vibe dogeing

English

Argona@Argona0x·4d

i pointed Claude Code at the pentagon's public budget document and told it to find every contract overpaying by 10x or more it came back with 340 results worth $4.2B in potential undercuts and a business plan i didn't ask for i fed it the FPDS.gov procurement feed and said "cross-reference with commercial COTS pricing" it pulled 1.2 million contract awards through the USAspending v2 API and started comparing line items against retail equivalents → $1,280 for a connector plug that costs $14.80 on digikey → $3,400 for a circuit breaker listed at $287 on mouser → $71,000 for a ruggedized tablet that's basically a panasonic toughbook with a sticker → $940 per unit for cable assemblies you can get from shenzhen for $31 → 340 contracts flagged at 10x or more markup → 19 of them were above 50x it used XGBoost scoring against 43,000 vendor profiles from SAM.gov to rank by ease of undercut then unprompted it generated a full proposal template compliant with CMMC 2.0 requirements 87 of those contracts have a single domestic supplier, zero competition. the AI calculated that undercutting by just 40% would still leave 6x margins on most items it formatted everything into a pitch deck, named the company, and suggested i register on SAM.gov tonight i didn't ask for any of that the pentagon spends billions a year trying to audit problems like this. a poet with Claude Code and a public API flagged $4.2 billion in one afternoon the agent is currently drafting my first bid response

English

398

1.7K

8.7K

414.2K

Pyre retweeté

DogeDesigner@cb_doge·3d

Oscar worthy performance from Elon Musk.

English

565

1.2K

6.1K

123.3K

Pyre@Mad_dev·3d

@AbdelStark Zero rust yoe, will give it a try

English

abdel@AbdelStark·3d

Fair point! The architecture support is there (Slot Attention, object masking, dynamics predictor, training loop), but no pretrained weights in the jepa-rs registry yet. The original C-JEPA weights from the paper (github.com/galilai-group/…) are on HuggingFace. So technically it would be possible to add safetensors conversion and registry integration in the repo. PR welcome if you want ;)

English

abdel@AbdelStark·3d

Joint Embedding Predictive Architecture for World Models, written in Rust, now with Causal-JEPA support. jepa-rs now supports C-JEPA (Causal-JEPA). Quick explainer on why this paper matters: Standard JEPA masks random image patches and predicts them in latent space. This learns visual representations, but it doesn't learn how objects in a scene affect each other. A model can predict a masked patch by interpolating from neighbors, no understanding of physics or causality required. C-JEPA (Nam, Le Lidec, Maes, @ylecun, Balestriero) has a simple but powerful fix: mask at the object level, not the patch level. Hide an entire object's trajectory across time. Now the only way to recover it is by reasoning about interactions with other objects 👇

abdel@AbdelStark

I wanted to understand world models from first principles, so I built JEPA primitives in Rust with Burn. LLMs are a thing of the past already, the future is World Models ? I don't know yet. But I like learning by building. Lately I’ve been getting increasingly interested in world models (yes mostly beause of all the noise on @amilabs), so instead of only reading papers and hot takes, I built jepa-rs: a Joint Embedding Predictive Architecture library for World Models, written in Rust. It provides modular, backend-agnostic building blocks for I-JEPA (images), V-JEPA (video), and hierarchical world models, built on top of the burn deep learning framework. It includes a CLI and interactive TUI dashboard, safetensors checkpoint loading, ONNX metadata inspection, and a pretrained model registry for Meta Research models. Repo: github.com/AbdelStark/jep… For people not familiar with World Models: An LLM predicts the next token in text. A world model tries to predict the latent state and dynamics of an environment (similar to an animal or us as babies). Text is one domain. The world is another. A world model learns abstract representations that capture what matters and ignore what doesn't. A world model doesn't need to predict every leaf on a tree. It needs to understand that trees sway in wind. This is what AMI Labs mean when they say: "Real intelligence does not start in language. It starts in the world." World models are not a replacement for what LLMs are good at. For text, coding and many other things LLMs are amazing and won't be replaced by World Models. But there are many areas where World Models can bring interesting applications, robotics being obviously one of them. I have a strong intuition there may be an interesting bridge between world models, safety, and verifiable computation. One new area where I can explore the use of STARK technology ;)

English

2.5K

Pyre@Mad_dev·3d

@fchollet What do you think about LLMs reasoning and communicate in latent space?

English

François Chollet@fchollet·4d

There is a poetic depth to the term "latent space" that transforms vector coordinates into a frontier of pure possibility

English

750

45K

Pyre@Mad_dev·3d

@coinbase GTA CoinCity

English

Coinbase 🛡️@coinbase·4d

Game not over. It’s time to take back control. Presenting “Your Way Out,” the extended cut.

English

350

364

2.2K

1.7M

Pyre retweeté

ARC Prize@arcprize·3d

ARC Prize Foundation is part of the @ycombinator W26 batch as the only non-profit. For Demo Day we’re shipping ARC-AGI-3, an interactive reasoning benchmark for the next era of agentic intelligence. ARC and YC are mission aligned that new ideas that push the frontier.

English

100

22.3K

Découvrir

@JasonrShuman @elonmusk @marv1nnnnn1 @michaelfreedman @garrytan @grok @jimlux @Argona0x