Mark Müller

31 posts

Mark Müller

@mnmueller

PhD student at @the_sri_lab at @ETHZ

Zurich Katılım Ocak 2021

35 Takip Edilen71 Takipçiler

Mark Müller@mnmueller·2d

Having a blast demoing LogicStar at AI Launchpad as one of only 6 start-ups selected from over 100. If you also want to learn how we catch issues before they turn into incidents, hit me up.

English

Mark Müller retweetledi

LogicStar AI@logic_star_ai·4d

Did we just blow our entire marketing budget on this sign at AI Council? 🤔 Come find our CTO @mnmueller in SF this week to hear what we're building at LogicStar. And to confirm whether we have any runway left.

English

364

Mark Müller@mnmueller·24 Şub

@theo Exciting to see our work featured in so much detail 🔥 if you are interested in some of the other things we are doing at LogicStar AI, check our blog

English

Theo - t3.gg@theo·23 Şub

You should delete your CLAUDE․md/AGENTS․md file. I have a study to prove it.

English

307

504

7.6K

2.4M

Mark Müller retweetledi

LocalStack@localstack·24 Kas

Lazar Kanelov will demonstrate how high-quality telemetry data powers AI-driven incident response, creating adaptive feedback loops where agents interpret system signals, identify root causes, and apply automated remediation. RSVP ➡️ meetup.com/localstack-com…

English

239

Mark Müller retweetledi

LocalStack@localstack·12 Kas

Stop fixing bugs. Let AI Agents fix them for you. Join us for a session on Autonomous Software Systems and learn how AI agents are using telemetry data to detect, diagnose, and fix production bugs without human intervention. meetup.com/localstack-com…

English

444

Mark Müller retweetledi

Waldemar Hummer@w_hummer·4 Kas

Great dinner with @lovable, @localstack, and @logic_star_ai AI (three “Lo*”s) in Zurich - discussing the future of agentic coding 🤖, cloud DevX 💻, and building delightful apps ✨. Can't wait for more collaborations and partnerships among the three “Lo*”s and beyond! 🚀

English

354

Mark Müller@mnmueller·8 Tem

We built Agents in the Wild to track this revolution in real-time: 👉 Live, open-source dashboard 👉 Tracks agent behavior across public GitHub PRs 👉 Updated daily Code: github.com/logic-star-ai/… Built with our MSc student @Christian Mürtz (@SRILab) 🙌

English

Mark Müller@mnmueller·8 Tem

🏢 The enterprise trust gap Agent PRs merge 95% of the time on small repos. But on large/popular ones? Drops to ~25%, even for Google’s Jules. Without strong validation, agents will struggle in business-critical environments. 5/n

English

Mark Müller@mnmueller·8 Tem

🚨 AI agents wrote 7% of all GitHub PRs in June. But can we trust their code? We built Agents in the Wild – a live dashboard tracking autonomous AI agents across GitHub to answer that question: insights.logicstar.ai Here’s what we learned from analyzing 10M+ PRs 👇 1/n 🧵

English

659

Mark Müller retweetledi

LogicStar AI@logic_star_ai·11 Nis

We are excited to see the community use our SWT-Bench and work on the crucial topic of test generation!

Niels Mündler@nielstron

🚨 New SWT-Bench Submission! 🤖 Amazon Q Developer Agent leads the SWT-Bench leaderboard 🥇 with an impressive 49% of successfully tested issues and a coverage improvement of 57% on SWT-Bench Verified.

English

191

Mark Müller retweetledi

Niels Mündler@nielstron·18 Şub

SOTA code agent OpenHands (top-1 for SWE-full) achieves only 22% accuracy in unit test generation on SWT-lite (half its SWE performance), only slightly outperforming SWE-agent. What is going on? We dug through the data to find a simple trick and achieve almost 30%! 👇🧵 1/9

English

267

Mark Müller retweetledi

LogicStar AI@logic_star_ai·17 Şub

We have our first submission for SWT-Bench 🚀 AEGIS, a dedicated test generation agent, achieves 47.8% accuracy 🏆 , significantly outperforming our SWE-Agent+ baseline and demonstrating the potential of dedicated test generation agents. 1/3 🧵

English

855

Mark Müller retweetledi

LogicStar AI@logic_star_ai·19 Ara

🚀 Introducing the SWT-Bench Leaderboard! Test your AI's ability to write tests reproducing real-world GitHub issues and improve coverage where it matters. 🤖 Ready for the challenge? 👉 swtbench.com #AI #SoftwareTesting #SWTBench #CodeAgents

English

1.9K

Mark Müller@mnmueller·11 Ara

Meet me at this morning's NeurIPS poster session to discuss our work on generating reproducing test cases with Code Agents.

SRI Lab@the_sri_lab

SRI Lab at #NeurIPS2024 - 1/8 SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents Niels Mündler (@nielstron), Mark Niklas Mueller, Jingxuan He (@jingxuan_he), Martin Vechev (@mvechev) ⏰ /📍 Wed 11th, 11AM - 2PM, West Ballroom A-D #5406 📝 We explore software test generation of LLMs and Code Agents on large and complex code bases. We find that they outperform previous specialized methods, introducing a new paradigm for test generation and additional metrics for Code Agent performance. Link 🔗: arxiv.org/abs/2406.12952

English

Mark Müller retweetledi

LogicStar AI@logic_star_ai·15 Kas

Exiting to see our work on benchmarking the test-generation capabilities of LLMs being picked up by the community!

Ofir Press@OfirPress

Super cool work by @nielstron et al: SWT-Bench is SWE-bench for test generation! They give the model a repo and an issue and it has to write a test for the issue. They show that SWE-agent is able to write good tests for 19% of the issues in the benchmark! 🧵(1/3)

English

Mark Müller retweetledi

Niels Mündler@nielstron·26 Tem

Presenting today @icmlconf 2024 Workshop FM in the Wild 🤖 🏞️ "Code Agents are State of The Art Software Testers" SWE-Agent, aider and co are competent at reproducing GitHub issues, performing as well as specialized methods. Looking forward to answer your questions!

English

1.1K

Mark Müller retweetledi

Marc Fischer@marc_r_fischer·21 Tem

On Tuesday at 11:30, in Poster Session 1, we will present Prompt Sketching, a novel decoder-driven approach for templated (and constrained) text generation of LLMs. 📄 arxiv.org/abs/2311.04954 👨‍💻 Work with @mnmueller, @lbeurerkellner, @mvechev.

English

Keşfet

@theo @lovable @localstack @logic_star_ai @Christian @SRILab @icmlconf @lbeurerkellner