Bilal

84 posts

Bilal

@that_vision_guy

cs @ eth zurich | real2sim & robotics evals | world models

San Francisco, CA انضم Ekim 2025

937 يتبع212 المتابعون

Bilal@that_vision_guy·3d

Genuinely one of the best decisions of my life.

Ernesti Sario@ErnestiSario

We took over a former technical university. This is Hogwarts in real life. For people who want to work on something too early, too weird, too ambitious.

English

131

Bilal@that_vision_guy·3d

@shipfr8 🔥🔥🔥🔥

QME

125

FR8@shipfr8·3d

We took over a former technical university. This is Hogwarts in real life. For people who want to work on something too early, too weird, too ambitious.

English

117

18.4K

Bilal أُعيد تغريده

Mushtaq Bilal, PhD@MushtaqBilalPhD·5d

Sci-Hub is an evil website that pirated 85M+ research papers and made them freely available And now they've added AI to their database to make Sci-Bot. It answers your questions using latest, full-text articles. But DO NOT use it. We should all try to make billion-dollar academic publishers richer. I'm putting the link below so you know how to avoid it.

English

830

8.9K

46.8K

4.7M

Bilal@that_vision_guy·5d

@nilscmr 🔥🔥🔥

QME

386

Bilal أُعيد تغريده

Nils Cremer@nilscmr·5d

CPUs suck. We're building a new general-purpose chip that scales to thousands of cores while being more energy-efficient. We're hiring hardware design engineers, consider joining us tendrils.co/jobs What we do differently ...

English

459

100.1K

Bilal@that_vision_guy·17 Nis

@aretaidos LFG!

Tiger@aretaidos·17 Nis

Prompting is a design flaw. So we built the thing they won't. The entire AI industry convinced you that typing instructions into a box is the future. It's not. It's a crutch. You're doing the thinking, the context-setting, the remembering. The AI just autocompletes. Felicity v3 is built to allow YOU to forget. Forget Birthdays, Anniversaries, Coffee chats - Felicity will remember across everything: The moment you wake up. Felicity doesn't wait for prompts. She prepares what you need, drafts your responses, and handles your to-dos before you think about them. You don't prompt Felicity. You just confirm. And she's on your phone, and life can be done without a laptop. 20x less overhead to get everything done right. Always one tap, swipe, or transcription away. Entire interface rebuilt around voice (and text if you want). Yesterday we shipped it. This isn't an upgrade. It's a different category. Watch the demo. heyfelicity.ai Tomorrow testflight goes for all.

English

3.5K

Bilal@that_vision_guy·16 Nis

@zeno_fox This is super cool!

English

Zeno Fox@zeno_fox·16 Nis

x.com/i/article/2044…

ZXX

14.1K

Bilal@that_vision_guy·2 Nis

@trynolimit Crazy

English

Nolimit.pro@trynolimit·2 Nis

Introducing NoLimit It's a pair of underwear that prevents injuries and makes you 3x times stronger We track muscle activation, fatigue, and movement all in real time. And use AI to predict an injury minutes before it occurs.

English

109

16.3K

Bilal@that_vision_guy·6 Mar

@meichenster Goat

English

Meisa 🍪@meichenster·5 Mar

How does one generate new 3D worlds every 24 hours without INSANE COMPUTE??!?!!? Made a 5-min shallow-dive (lol) on how #PEAKGame does this ft. procedural generation, RTX & occlusion maps!!!

English

1.4K

Bilal أُعيد تغريده

Abhinav Gupta@abhinavguptaIAQ·6 Mar

The toughest problems to solve while developing Breethr was delivering 18°C air when outdoors is ~40°C while maintaining fresh air AND very low AQI. I'm proud to announce we have now launched Breethr V2 that's able to cool while also maintaining clean air. Video below at cultfit

English

189

21.9K

Bilal أُعيد تغريده

otso veistera@OtsoVeistera·3 Mar

You're wasting half your context window. We’re launching @thetokenco (YC W26) today. We compress LLM inputs before they reach the model. Fewer tokens, lower cost, faster inference. Models also perform better. In customer case studies we’ve seen a +5% lift in user purchases due to higher preference for outputs from compressed prompts. The API is live. Link in the comments

English

509

93.4K

Bilal أُعيد تغريده

Mandeep@themandeepc·3 Mar

I think folks are being misled by "high performance" on browser use "benchmarks". It's not appreciated enough just how different they are to LLM benchmarks, and why they're difficult to do right and currently extremely flawed. LLM benchmarks are "closed world": the model generates text, and you verify it against some fixed ground truth that doesn't change. Even 'hard' benchmarks like Humanity's Last Exam fit this pattern. The benchmark dataset fully defines the expected inputs, outputs, and validation function. Browser use benchmarks, however, are fundamentally different because they're not closed world. "Actions" - things that change state on a website - are especially difficult. You can't go around willy nilly and mutate state on Twitter, Salesforce, etc, every time you run the evals. That especially applies to the websites we care about: internal enterprise software being the most obvious category. Even data retrieval can be difficult: websites and data change. Restaurant availability changes every hour, flight availability/prices change even faster. It's _slightly_ easier than actions since you can cache the HTML and make it closed world, as some benchmarks do, but this doesn't work for actions, and ages badly. Other benchmarks get around this by trying to fix the date of a check ("find me flights on 1 March 2024"). Ofc that trick doesn't work for most tasks (like that flights example - you can't view historical flight availability). Then there's CAPTCHAs, which exist on basically every high-value web task (even if hidden). Current benchmarks exclude all these 'inconvenient' tasks, which massively skews them to be totally unrepresentative of how humans use websites. Pure computer use have it easier because they're often closed world: the start and desired end state can be well-defined and evaluated inside a network-less container. Updating an Excel sheet has no harm (which tbf represents a lot of economic work). But once you're doing things in a browser, on websites over the internet, this nice property doesn't apply anymore. WebArena's answer to this conundrum was to create 'fake' websites that were supposed to be representative of real ones. The problem is, they're not. OSWorld makes it kinda closed world by providing cached versions of HTML, but this only really works for data retrieval. They're also very unrepresentative. WebVoyager is especially egregious: just 15 (!!) websites are represented, and the tasks are ridiculously easy. Take a look yourself: github.com/MinorJerry/Web… So, how does this translate to the claims made by browser startups? Well, WebVoyager (the extremely easy one) is the benchmark the avg browser startup reports 85%+ accuracy on. Claude's performance is reported for computer use, and against OSWorld which is dominated by closed-world tasks. So really, high reported accuracies should be taken with a huge grain of salt, and there's still a long way to go before computer use is solved. That said, there's at least one other team thinking about these problems (@yutori_ai, with their release of Navi-Bench). From first principles, this is a really tricky problem to solve. The infra and data to properly benchmark web agent performance is extremely nascent and underdeveloped. It's a problem we think a lot about at Indices -- please reach out (DM) if you do too!

English

8.5K

Bilal@that_vision_guy·3 Mar

@UseAtlaz Super cool

English

141

Bilal@that_vision_guy·17 Şub

@njokuScript Great read!

English

njokuscript.um@njokuScript·13 Şub

x.com/i/article/2022…

ZXX

3.7K

Bilal@that_vision_guy·10 Şub

@sohumgautam @Seda_AI_ LFG!

221

Sohum Gautam@sohumgautam·10 Şub

Introducing @Seda_AI_ The social media platform for research and discovery Let's discover the world together

English

205

516.8K

Bilal@that_vision_guy·4 Şub

@DominiqueCAPaul Beyond any of this, the most serious problem is that the radiation causes the bit error rate to explode. This will make any training there extremely unstable.

English

Dominique Paul@DominiqueCAPaul·3 Şub

I kept hearing about data centres in space. After yesterday’s news about SpaceX’s xAI acquisition to raise cash for building data centres in space, I finally looked into the claims about why this supposedly makes sense and fact-checked them. Here’s a summary: Solar power: “Unlimited cheap energy” Not unlimited, but solar panels in orbit deliver about 5× more usable energy than on Earth. Terrestrial solar runs at ~10–25% utilisation. In sun-synchronous orbit it’s >90%, plus ~40% stronger sunlight. Cooling: “Space is cold, so cooling is free” Space isn’t a fridge. You can’t blow air or pump water. Heat has to be radiated away as infrared light. The trick is spreading heat over large panels and letting it shine into deep space. About ~700 W per m² is realistic – it works, but only if you build a lot of surface area. Scalability: “You can scale infinitely” You avoid land, grid, and water limits. You trade them for manufacturing speed, launch cadence, and orbital constraints. Still, multi-GW clusters are easier to imagine in orbit than on Earth. Environment: “Much greener than Earth data centres” No land use. No water use. No local grid stress. The footprint shifts to manufacturing hardware and launching it. AI fit: “Perfect for AI training runs” Good for frontier training runs: long, batch-style jobs that mainly burn energy and move data inside the cluster. Bad for inference, fast iteration, or anything interactive. Curious to hear other takes on this. Let me know if I missed something. No matter whether this will work, the people working on it have already succeeded in building momentum, just judging by how often I’m seeing this debated across my timelines and platforms.

English

1.6K

Bilal@that_vision_guy·2 Şub

@abel__js @moltbook @GoogleDeepMind @Stanford @theresidency Great stuff

English

Abel@abel__js·2 Şub

imagine @moltbook but agents actually try solve real world problems + moderation so no (crypto) scams. our submission for the @GoogleDeepMind x @Stanford hackathon link to to connect your agent below 👇 @theresidency

English

470

Bilal@that_vision_guy·1 Şub

@E_Bruxxx ETH Zurich

1.9K

Erik Bruckner@E_Bruxxx·1 Şub

Universities producing elite hard tech talent: Georgia Tech CMU MIT Michigan Caltech Who did I miss?

English

492

2.5K

386.7K

Bilal@that_vision_guy·27 Oca

@HankCouture @IanHuang3D hmmm 🤔

Hank Couture@HankCouture·26 Oca

IMO - $20B+ opportunity out there for someone to build the modern "Pinterest" aka help users discover inspiring images Sadly, Pinterest product seemingly hasn't improved in years while ad displays keep increasing. Opportunity for someone

English

2.8K

Bilal@that_vision_guy·27 Oca

If you don’t love it when it is small, you won’t make it big.

Dominique Paul@DominiqueCAPaul

If you’re not willing to start small, you don’t deserve to do something big. YC is a prime example. It didn’t start as a fund. In 2005, Paul Graham ran a tiny summer programme, basically by himself. 8 teams. $6k each. One weekly dinner. He even cooked the food. Most people never start because they want scale on day one. But big things usually come from loving the small version enough to begin and sticking with it until it grows. But people only see the big thing and say "I want to do that".

English

951

اكتشف

@shipfr8 @nilscmr @aretaidos @zeno_fox @trynolimit @meichenster @thetokenco @yutori_ai