Mark Müller

31 posts

Mark Müller

Mark Müller

@mnmueller

PhD student at @the_sri_lab at @ETHZ

Zurich Katılım Ocak 2021
35 Takip Edilen71 Takipçiler
Mark Müller
Mark Müller@mnmueller·
Having a blast demoing LogicStar at AI Launchpad as one of only 6 start-ups selected from over 100. If you also want to learn how we catch issues before they turn into incidents, hit me up.
Mark Müller tweet media
English
0
0
0
15
Mark Müller retweetledi
LogicStar AI
LogicStar AI@logic_star_ai·
Did we just blow our entire marketing budget on this sign at AI Council? 🤔 Come find our CTO @mnmueller in SF this week to hear what we're building at LogicStar. And to confirm whether we have any runway left.
LogicStar AI tweet media
English
0
2
2
364
Mark Müller
Mark Müller@mnmueller·
@theo Exciting to see our work featured in so much detail 🔥 if you are interested in some of the other things we are doing at LogicStar AI, check our blog
English
0
0
1
17
Theo - t3.gg
Theo - t3.gg@theo·
You should delete your CLAUDE․md/AGENTS․md file. I have a study to prove it.
English
307
504
7.6K
2.4M
Mark Müller retweetledi
LocalStack
LocalStack@localstack·
Lazar Kanelov will demonstrate how high-quality telemetry data powers AI-driven incident response, creating adaptive feedback loops where agents interpret system signals, identify root causes, and apply automated remediation. RSVP ➡️ meetup.com/localstack-com…
English
0
2
2
239
Mark Müller retweetledi
LocalStack
LocalStack@localstack·
Stop fixing bugs. Let AI Agents fix them for you. Join us for a session on Autonomous Software Systems and learn how AI agents are using telemetry data to detect, diagnose, and fix production bugs without human intervention. meetup.com/localstack-com…
English
1
3
9
444
Mark Müller retweetledi
Waldemar Hummer
Waldemar Hummer@w_hummer·
Great dinner with @lovable, @localstack, and @logic_star_ai AI (three “Lo*”s) in Zurich - discussing the future of agentic coding 🤖, cloud DevX 💻, and building delightful apps ✨. Can't wait for more collaborations and partnerships among the three “Lo*”s and beyond! 🚀
Waldemar Hummer tweet media
English
0
4
5
354
Mark Müller
Mark Müller@mnmueller·
We built Agents in the Wild to track this revolution in real-time: 👉 Live, open-source dashboard 👉 Tracks agent behavior across public GitHub PRs 👉 Updated daily Code: github.com/logic-star-ai/… Built with our MSc student @Christian Mürtz (@SRILab) 🙌
English
0
0
3
69
Mark Müller
Mark Müller@mnmueller·
🏢 The enterprise trust gap Agent PRs merge 95% of the time on small repos. But on large/popular ones? Drops to ~25%, even for Google’s Jules. Without strong validation, agents will struggle in business-critical environments. 5/n
English
1
1
3
74
Mark Müller
Mark Müller@mnmueller·
🚨 AI agents wrote 7% of all GitHub PRs in June. But can we trust their code? We built Agents in the Wild – a live dashboard tracking autonomous AI agents across GitHub to answer that question: insights.logicstar.ai Here’s what we learned from analyzing 10M+ PRs 👇 1/n 🧵
English
2
7
11
659
Mark Müller retweetledi
Niels Mündler
Niels Mündler@nielstron·
SOTA code agent OpenHands (top-1 for SWE-full) achieves only 22% accuracy in unit test generation on SWT-lite (half its SWE performance), only slightly outperforming SWE-agent. What is going on? We dug through the data to find a simple trick and achieve almost 30%! 👇🧵 1/9
English
1
2
5
267
Mark Müller retweetledi
LogicStar AI
LogicStar AI@logic_star_ai·
We have our first submission for SWT-Bench 🚀 AEGIS, a dedicated test generation agent, achieves 47.8% accuracy 🏆 , significantly outperforming our SWE-Agent+ baseline and demonstrating the potential of dedicated test generation agents. 1/3 🧵
LogicStar AI tweet media
English
2
4
7
855
Mark Müller retweetledi
LogicStar AI
LogicStar AI@logic_star_ai·
🚀 Introducing the SWT-Bench Leaderboard! Test your AI's ability to write tests reproducing real-world GitHub issues and improve coverage where it matters. 🤖 Ready for the challenge? 👉 swtbench.com #AI #SoftwareTesting #SWTBench #CodeAgents
English
1
3
6
1.9K
Mark Müller
Mark Müller@mnmueller·
Meet me at this morning's NeurIPS poster session to discuss our work on generating reproducing test cases with Code Agents.
SRI Lab@the_sri_lab

SRI Lab at #NeurIPS2024 - 1/8 SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents Niels Mündler (@nielstron), Mark Niklas Mueller, Jingxuan He (@jingxuan_he), Martin Vechev (@mvechev) ⏰ /📍 Wed 11th, 11AM - 2PM, West Ballroom A-D #5406 📝 We explore software test generation of LLMs and Code Agents on large and complex code bases. We find that they outperform previous specialized methods, introducing a new paradigm for test generation and additional metrics for Code Agent performance. Link 🔗: arxiv.org/abs/2406.12952

English
0
0
0
49
Mark Müller retweetledi
LogicStar AI
LogicStar AI@logic_star_ai·
Exiting to see our work on benchmarking the test-generation capabilities of LLMs being picked up by the community!
Ofir Press@OfirPress

Super cool work by @nielstron et al: SWT-Bench is SWE-bench for test generation! They give the model a repo and an issue and it has to write a test for the issue. They show that SWE-agent is able to write good tests for 19% of the issues in the benchmark! 🧵(1/3)

English
0
1
2
72
Mark Müller retweetledi
Niels Mündler
Niels Mündler@nielstron·
Presenting today @icmlconf 2024 Workshop FM in the Wild 🤖 🏞️ "Code Agents are State of The Art Software Testers" SWE-Agent, aider and co are competent at reproducing GitHub issues, performing as well as specialized methods. Looking forward to answer your questions!
English
1
4
12
1.1K