Lukas Bug

696 posts

Lukas Bug banner
Lukas Bug

Lukas Bug

@BugLukas

Building agents that actually work (hopefully) | reality checks + what's actually useful from papers

Fulda, Deutschland شامل ہوئے Mayıs 2015
366 فالونگ116 فالوورز
Robert Baddeley
Robert Baddeley@imrobertjames·
@theCTO @sama I think generally hard to do because of how reinforcement learning works. You miss 100% of the shots you don’t take.
English
2
0
43
3.1K
adam
adam@theCTO·
hey @sama can we normalize models just saying "i dont know" ? eliminates 99% of hallucinations
English
256
135
8.1K
403.8K
Yann LeCun
Yann LeCun@ylecun·
1. I never said LLMs were not useful. They are, particularly with all the bells and whistles that are being added to them. I use them. 2. A robot-rich future can't be built with AIs that don't understand the physical world and don't anticipate the consequences of their actions. And LLMs really don't. 3. The future in the cartoon looks pretty dystopian TBH, but even a non-dystopian version will require world models and zero-shot planning abilities. 4. I rarely wear a suit and absolutely never wear a tie. 5. I would never ever place a coffee mug on top of a piece equipment. 6. I hope I'll look this young in 2032.
English
178
371
6.2K
303.8K
Benjamin Todd
Benjamin Todd@ben_j_todd·
Yann LeCun in 2032
Benjamin Todd tweet media
Indonesia
57
70
1.2K
150.6K
Lukas Bug
Lukas Bug@BugLukas·
@Karl_Lauterbach Status Quo heute ist immer noch, dass jemand die KI lenken und den Output überprüfen muss und damit auch dafür verantwortlich ist.
Deutsch
0
0
0
32
Prof. Karl Lauterbach
Prof. Karl Lauterbach@Karl_Lauterbach·
Das wird jungen Akademikern noch massiv zu schaffen machen. KI wird weiter immer besser, kein Ende in Sicht. Innerhalb von 2 Jahren ist das Wissen auf das Niveau von Promovierten gestiegen. Wer stellt eine Master ein, wenn KI nichts kostet und mehr weiss? hai.stanford.edu/news/inside-th…
Prof. Karl Lauterbach tweet media
Deutsch
336
61
512
128.7K
Lukas Bug
Lukas Bug@BugLukas·
@Alex_m @thdxr Perfect. Even improving time efficiency so you can immediately go do other things until the next 5 hour window
English
0
0
1
214
Alex
Alex@Alex_m·
@thdxr Its insane. It one shotted my session usage limit.
English
8
2
325
8K
dax
dax@thdxr·
opus 4.7 is a beauty a fresh yet elegant take on something we've seen before a new standard, a definite marker of a new era (i haven't tried it yet)
English
121
74
3.8K
106.9K
WAKKI🍀
WAKKI🍀@wakkistyling·
🚀 Hell yeah! Starship V3 just cleared the biggest hurdle, full static fires on both the ship and booster. A few weeks until that beast lights up the sky for Flight 12? This is the version that’s going to make orbital refueling, tower catches, and Mars look routine. Engineering sorcery at its finest.🍀🫶🏽 Can’t wait to watch Boca Chica shake again. Let’s go SpaceX! 🔥
English
1
0
12
879
Lukas Bug
Lukas Bug@BugLukas·
@jerryjliu0 Thank you for this extensive comparison. Will have a look at both
English
0
0
0
45
Jerry Liu
Jerry Liu@jerryjliu0·
docling is somewhere in between liteparse (our free/open-source project) and llamaparse (our commercial vlm-based parser): it uses ML models of varying complexity to parse PDFs. - liteparse is model-free, can parse ~200-500 pages/second, is designed to be an extremely fast/free parser to replace pypdf/pymupdf. it integrates with paddleOCR for OCR workloads. its main purpose is outputting text for semantic understanding for agents, and will lack certain things that VLM parsers do OOB. - llamaparse is our commercial vlm-powered parsing service. it scores quite high on parsebench (parsebench.ai), our OCR benchmark over enterprise docs. you can see docling is ranked a bit furher down
English
1
2
4
211
Jerry Liu
Jerry Liu@jerryjliu0·
LiteParse should be the default document parser you use with any AI agent (Claude Code, Claude Cowork, OpenClaw, Codex, and more) The core is extremely fast text and accurate parsing from any document type that's focused on semantic preservation. But there's so much more beyond that: native OCR support, bounding boxes, one-click agent skills, support for 50+ file formats. Plus way more cooking in the next few weeks 🧑‍🍳 @LoganMarkewich is the lead creator behind this, you don't want to miss his webinar: landing.llamaindex.ai/liteparse?utm_… Repo: github.com/run-llama/lite…
Jerry Liu tweet media
LlamaIndex 🦙@llama_index

LiteParse hit 4K+ GitHub stars in 3 weeks. ~500 pages in 2 seconds. No GPU. No API keys. 50+ file formats. Now @LoganMarkewich, our Head of Open Source, will show you how to build with it. Live workshop — April 28, 9 AM PST: Build a Financial Due Diligence Agent with LiteParse. Raw financial PDFs → structured agent-ready data. We'll build it live. Register → landing.llamaindex.ai/liteparse

English
8
12
69
7K
Jerry Liu
Jerry Liu@jerryjliu0·
This is why we released liteparse :) Free, open-source, designed for agents. Natively supports OCR / screenshotting for deeper visual understanding in a document when needed.
Andrej Karpathy@karpathy

@kepano I just tried it this morning on the 245-page Mythos pdf and it failed badly and the outputs were all mangled. Converting pdfs is really hard, I think it has to probably be a Skill not a program, for a SOTA LLM for it to work properly.

English
10
32
552
88.8K
Lukas Bug
Lukas Bug@BugLukas·
@Ricarda_Lang Lieber auf die positiven Kommentare fokussieren, die überwiegen sowieso
Deutsch
0
0
0
2
Ricarda Lang
Ricarda Lang@Ricarda_Lang·
Wenn ich Kommentare von Typen ohne Profilbild lese, die mir vom Sofa aus erklären, dass ein Halbmarathon eh keine Leistung ist und meine Zeit viel zu langsam war.
Ricarda Lang tweet media
Deutsch
2K
913
25K
855.7K
Lukas Bug
Lukas Bug@BugLukas·
@elonmusk I wish I had speeds on par with Starlink on the ground
English
0
0
1
14
Lukas Bug
Lukas Bug@BugLukas·
@mitchellh I’ve seen their demos of driving around the chaotic traffic in Rome, in a more relaxed fashion than I would have driven there. Fingers crossed
English
1
0
8
2K
Mitchell Hashimoto
Mitchell Hashimoto@mitchellh·
@BugLukas I hope the initial release is as good as it is here in the US, but I suspect there'll be hiccups. Its crazy solid here, to the point where it almost feels dangerous how relaxed I am about it.
English
4
1
63
49.3K
Mitchell Hashimoto
Mitchell Hashimoto@mitchellh·
Traded in my 2020 Model S for a brand new plaid X before they discontinue it. Car is amazing, but the FSD hype is real. It blew away my expectations coming from the 2020 hardware. 95% of my miles are self driven in LA over the past month. I wouldn’t have even believed myself lol. Even my wife who HATED autopilot on my prior car is totally blown away. She’s asked multiple times “did you drive?” And I say “not at all.” And she’s just like… wow. Great job @Tesla for real. I’ve owned a Model S since 2013. This is my 3rd, first X (for me personally). Just fantastic.
English
253
334
5.6K
30M
Lukas Bug
Lukas Bug@BugLukas·
@MilksandMatcha I would like to experiment with massive parallel agent swarms for SWE without it bankrupting me :D
English
0
0
0
5
OpenAI Developers
OpenAI Developers@OpenAIDevs·
What are you building this weekend?
English
992
34
1.3K
151.8K
Lukas Bug
Lukas Bug@BugLukas·
@karpathy Incredibly powerful and incredibly dangerous, depending on how you use it
English
0
0
0
14
Andrej Karpathy
Andrej Karpathy@karpathy·
- Drafted a blog post - Used an LLM to meticulously improve the argument over 4 hours. - Wow, feeling great, it’s so convincing! - Fun idea let’s ask it to argue the opposite. - LLM demolishes the entire argument and convinces me that the opposite is in fact true. - lol The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.
English
1.8K
2.4K
31.4K
3.4M
Lukas Bug
Lukas Bug@BugLukas·
@elonmusk You can make an LLM say anything. We need the full conversation to judge this
English
0
0
2
9
Thariq
Thariq@trq212·
To manage growing demand for Claude we're adjusting our 5 hour session limits for free/Pro/Max subs during peak hours. Your weekly limits remain unchanged. During weekdays between 5am–11am PT / 1pm–7pm GMT, you'll move through your 5-hour session limits faster than before.
English
2.3K
532
7.4K
7.7M
Lukas Bug
Lukas Bug@BugLukas·
@trikcode öffentlich statisch leer Haupt(Zeichenkette[] Argumente) You guys don’t code like this?
Deutsch
0
0
1
15
Wise
Wise@trikcode·
Honest question. People who English is not their first language… how do they code?? Do Germans code in German? Do Arabs code in Arabic??
English
1K
48
4.6K
1.5M
Lukas Bug ری ٹویٹ کیا
Andrej Karpathy
Andrej Karpathy@karpathy·
It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the "progress as usual" way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn’t work before December and basically work since - the models have significantly higher quality, long-term coherence and tenacity and they can power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow. Just to give an example, over the weekend I was building a local video analysis dashboard for the cameras of my home so I wrote: “Here is the local IP and username/password of my DGX Spark. Log in, set up ssh keys, set up vLLM, download and bench Qwen3-VL, set up a server endpoint to inference videos, a basic web ui dashboard, test everything, set it up with systemd, record memory notes for yourself and write up a markdown report for me”. The agent went off for ~30 minutes, ran into multiple issues, researched solutions online, resolved them one by one, wrote the code, tested it, debugged it, set up the services, and came back with the report and it was just done. I didn’t touch anything. All of this could easily have been a weekend project just 3 months ago but today it’s something you kick off and forget about for 30 minutes. As a result, programming is becoming unrecognizable. You’re not typing computer code into an editor like the way things were since computers were invented, that era is over. You're spinning up AI agents, giving them tasks *in English* and managing and reviewing their work in parallel. The biggest prize is in figuring out how you can keep ascending the layers of abstraction to set up long-running orchestrator Claws with all of the right tools, memory and instructions that productively manage multiple parallel Code instances for you. The leverage achievable via top tier "agentic engineering" feels very high right now. It’s not perfect, it needs high-level direction, judgement, taste, oversight, iteration and hints and ideas. It works a lot better in some scenarios than others (e.g. especially for tasks that are well-specified and where you can verify/test functionality). The key is to build intuition to decompose the task just right to hand off the parts that work and help out around the edges. But imo, this is nowhere near "business as usual" time in software.
English
1.6K
4.8K
37.3K
5.1M
Lukas Bug
Lukas Bug@BugLukas·
@sama Please let us access it through the API
English
0
0
0
17
Sam Altman
Sam Altman@sama·
The 5.3 lovefest is so nice to see. Don't think we've had so much excitement for a model since the original GPT-4.
English
2.2K
243
7.5K
955.3K