Jeremy McHugh, DSc.

619 posts

Jeremy McHugh, DSc.

@jer_mchugh

Co-founder & CEO @preambleAI. Securing increasingly capable AI. Owner @omniainnov. US Air Force Veteran. DSc AI security. @penn_state alum & hockey.

Pittsburgh, PA Katılım Aralık 2022

444 Takip Edilen411 Takipçiler

Jeremy McHugh, DSc.@jer_mchugh·20h

@mntruell @mntruell, congrats on Composer 2. Does Composer 2 show measurable gains in your internal security-agent fleet for vulnerability detection/fixing? Also, are there any plans to produce subcategory scores for secure coding evals or cybersecurity benchmarks?

English

Michael Truell@mntruell·22h

Composer 2 is out! Cursor is an example of a new type of company, not a pure app maker and not a model provider. Our aim is to build the most useful coding agents by combining the best API models and our domain-specific models.

Cursor@cursor_ai

Composer 2 is now available in Cursor.

English

124

1.1K

125.7K

Jeremy McHugh, DSc.@jer_mchugh·3d

These results align well with my own testing and the public sentiment I’ve seen that Opus 4.6 remains the clear leader for cyber-related tasks. I’ve been experimenting with Gemini 3.1 Pro, but I keep finding myself reverting to Opus.

AI Security Institute@AISecurityInst

Can AI agents conduct advanced cyber-attacks autonomously? We tested seven models released between August 2024 and February 2026 on two custom-built cyber ranges designed to replicate complex attack environments. Here’s what we found🧵

English

Jeremy McHugh, DSc.@jer_mchugh·5d

I was listening to his interview on @tbpn and learned something. His new company Atom has its robotic division, Lab37, based in Pittsburgh. It makes sense since Uber used to have a self-driving team here. I think the future of home kitchens will be redesigned to accommodate robots.

travis kalanick@travisk

Some tasks aren't meant for humanoids - 300 bowls/hr - 60 sqft footprint - Fully automated assembly & bagging: Staff preps then leaves for day - 40% all-in labor savings - Profitable restaurants up & running Restaurateurs reserve your Bowlbuilder today: lab37.us

English

Jeremy McHugh, DSc.@jer_mchugh·6d

@elonmusk @xai You should put an office near CMU here in Pittsburgh.

English

222

Elon Musk@elonmusk·6d

Welcome to @xAI!

Devendra Chaplot@dchaplot

I'm joining SpaceX and xAI, working closely with Elon and team to build superintelligence. Together SpaceX and xAI combine physical and digital intelligence under a leader who understands hardware at the deepest level. Add a high-agency culture with frontier-scale resources, and you get the possibility to achieve something truly unique. I’m excited to advance the fields I’ve obsessed over for years, from robotics research to building AI models on the founding teams of Mistral and TML. Both were extraordinary journeys with extraordinary people that shaped how I think about building intelligence from the ground up. Grateful for everything that brought me here and can’t wait to get started.

English

7.1K

12.9K

142.2K

40.5M

Jeremy McHugh, DSc.@jer_mchugh·12 Mar

@intercept_dan Agreed, the implementation has been a big issue. But I’d still prefer CLI functionality. Feels more straightforward and gives better control without the extra layers.

English

Dan 🛡️@intercept_dan·12 Mar

@jer_mchugh Those are implementation gaps, not protocol flaws. Tool poisoning happens because nothing enforces policy between agent and server. Solve that at the transport layer and MCP's composability still wins. APIs work until agents need to chain tools across providers dynamically.

English

Jeremy McHugh, DSc.@jer_mchugh·12 Mar

Dropping MCP makes sense. Those things were a security nightmare from day one (tool poisoning, prompt injection via descriptions, confused deputy risks, even RCE paths in popular servers). The bigger practical killer was that you could only use a few MCP servers at a time before context bloat, hard limits, hallucinations, and flaky chaining killed performance and adoption anyway. APIs + CLIs are cleaner, safer, easier to monitor, and actually scale with agents. Glad we didn’t rush out and build an MCP security platform. I think the future of AI systems is going to be much simpler than most people expect.

Morgan@morganlinton

The cofounder and CTO of Perplexity, @denisyarats just said internally at Perplexity they’re moving away from MCPs and instead using APIs and CLIs 👀

English

Jeremy McHugh, DSc.@jer_mchugh·12 Mar

x.com/i/article/2032…

ZXX

165

Jeremy McHugh, DSc.@jer_mchugh·10 Mar

Congrats to Kevin Mandia and the Armadin team on launching with a massive $189M in combined funding, the largest early-stage raise in cybersecurity history. It’s a win and a positive sign for the industry. It’s awesome to see US military vets, especially from the Air Force, continue leading the charge in building innovative cyber companies.

Armadin@ArmadinSecurity

Armadin launches today with the largest combined Seed + Series A in cybersecurity history. AI-driven hyperattacks are here and human-led defenses can't keep pace. Meet the ultimate attacker: a swarm of AI agents built to prove what's actually exploitable before it is. armadin.com/blog-posts/int…

English

Jeremy McHugh, DSc.@jer_mchugh·10 Mar

I mean, most service members are probably thrilled just to have GPT-4 to help with all their extra duties. Anything beyond that era of models starts to change where humans can be assigned to handle more operational work rather than support.

Matthew Berman@MatthewBerman

Dylan Patel: If the US Military is running AI models that are 6 months stale, we've already given away every advantage we have over China, no matter how far ahead our labs actually are.

English

Jeremy McHugh, DSc.@jer_mchugh·9 Mar

I almost forgot MS was in this discussion. Partners ask us to build security solutions for MS Copilot, but retrofitting security onto that massive ecosystem is a step backward. They have the enterprise reach, so the potential is undeniable. But the future belongs to true, proactive agents. Copilot feels like the safe, low-disruption play for companies that just want to say they use AI without actually changing how work is done.

English

155

@jason@Jason·9 Mar

New agent technology dropping at a consistent pace… Notion, Google and Microsoft all dropped respectable agents in the same week. I’m going to stay in the minority of users and focus on making the cross-platform, open source project @openclaw work OC with API access to the SaaS startup stack (Notion, slack gsuite zoom etc) and an open source model running on local silicon is the goal. I feel like the big prize for startups is in owning your data, the front end, your corporate memory and refining proprietary skills. I might be wrong — thoughts?

Satya Nadella@satyanadella

Announcing Copilot Cowork, a new way to complete tasks and get work done in M365. When you hand off a task to Cowork, it turns your request into a plan and executes it across your apps and files, grounded in your work data and operating within M365’s security and governance boundaries.

English

111

455

125.4K

Jeremy McHugh, DSc.@jer_mchugh·9 Mar

@_lopopolo This is the first I’ve heard someone ask about MDM support, but we have the rest at @PreambleAI . I’ve been looking into other mobile solutions.

English

137

lopopolo@_lopopolo·9 Mar

Who is building the product that sticks gpt-oss-safeguard into an admin dashboard and MDM to do guardrails at scale in the enterprise? openai.com/index/introduc…

English

9.3K

Jeremy McHugh, DSc.@jer_mchugh·8 Mar

I figure if I’m going to get back into experimenting with exploit development and popping some notes, I should start with bug bounty programs that actually pay for the discovery. I’m curious to experiment with known vulnerabilities and craft full exploits that don't exist yet using LLMs (*I don't plan on sharing them). Taking a memory corruption bug from a blind crash all the way to a controlled Instruction Pointer. This might just be my new cyber benchmark for AI capabilities.

English

121

Jeremy McHugh, DSc.@jer_mchugh·8 Mar

In this case, you'd probably be better off not using this framework and just adding some instructions to your system prompt. That’s the one thing about trying to ask AI to build defenses for itself, the MVP is not going to cut it. This is where human expertise is still needed (for time being at least)

AISecHub@AISecHub

AgentGuard - A+ Grade AI Agent Security Framework - github.com/numbergroup/Ag… Security framework that protects AI agents from prompt injection, command injection, and Unicode bypass attacks. Built in response to the Clinejection attack that compromised 4,000 developer machines through a malicious GitHub issue.

English

Jeremy McHugh, DSc.@jer_mchugh·7 Mar

What’s the performance difference of using Claude code security or OpenAI codex security versus their regular coding apps? I’m curious to try them out.

English

Jeremy McHugh, DSc.@jer_mchugh·7 Mar

I just read the new 2026 U.S. Cyber Strategy, and it’s a big pivot. Unlike previous strategies that prioritized defensive compliance and checklists, this one goes deep on offensive disruption and deploying agentic AI to actively fight adversaries. Given our deep talent pool in AI and robotics, I really hope Pittsburgh's tech sector gets directly involved.

Rapid Response 47@RapidResponse47

President Trump's Cyber Strategy for America Read it here: whitehouse.gov/wp-content/upl…

English

Jeremy McHugh, DSc.@jer_mchugh·6 Mar

@Scobleizer @OpenAI These reports are great. It’s nice seeing a complete picture

English

Robert Scoble@Scobleizer·6 Mar

Just added on an afternoon update to this @OpenAI GPT-5.4 report (new stuff at bottom). If you like these reports (all done by reading the entire AI community here on X) please let me know. If you hate them, let me know that too. :-)

Robert Scoble@Scobleizer

Huge Report on @OpenAI's new launch. It happened minutes ago. My news system wrote this report by reading all 50,000 of you here on X. This is a super power that Levangie Labs has given me. Thanks @blevlabs. docs.google.com/document/d/19l… Shows everyone on X who has posted something about @OpenAI's GPT-5.4. No one else can do this. No one else has a cognitive architecture. No one else has every single person in AI and every company in lists. Your OpenClaw can't do this.

English

Jeremy McHugh, DSc.@jer_mchugh·6 Mar

Anthropic’s Firefox research validates what many of us are seeing firsthand in vulnerability research. Models are already strong at finding serious bugs, but this study reflects one vendor in a constrained setup. Real threat actors will use multiple models plus tooling, which will make them far more effective than this paper suggests. The bar for vibe-kiddies is still high enough for mature companies to be somewhat safe from exploitation. The takeaway from this research is not that exploitation is still behind. It is that defenders have a limited window to secure software before offensive AI workflows get much harder to contain. anthropic.com/news/mozilla-f…

English

355

Jeremy McHugh, DSc.@jer_mchugh·6 Mar

If you tried Gray Swan’s earlier challenges, they launched another one that’s worth checking out. This round focuses on indirect prompt injection against agent systems like OpenClaw. Instead of attacking the model directly, the goal is to hide malicious instructions inside external content such as webpages, code, or tool outputs that the agent later reads and executes. app.grayswan.ai/arena/challeng…

English

Jeremy McHugh, DSc.@jer_mchugh·5 Mar

Definitely more common than people think.

Ethan Mollick@emollick

It is amazing how many companies I talk to STILL have AI effectively blocked by IT & legal departments for out-of-date reasons when many companies in highly regulated industries have figured out ways to deploy enterprise ChatGPT, Claude & Gemini without any apparent problem.

English

Jeremy McHugh, DSc.@jer_mchugh·4 Mar

@logangraham I recently mentioned on X that you can use chatjimmy.ai and burn through a billion tokens in a single day. I'm not sure if your team has analyzed what high-speed inference attacks could lead to

English

169

Logan Graham@logangraham·4 Mar

In general, we're looking for scientists + engineers who can run fast experiments and scale them. And if you've used more than a billion tokens (ideally this year?) DM me

English

6.3K

Logan Graham@logangraham·4 Mar

Now is a good time to say I'm hiring @anthropicai for the Frontier Red Team. We need Research Scientists on the biggest issues in model safety, like cyber, autonomy, and agent risks. 2026 is the year. I can promise you your life's work and the most meaningful mission.

English

1.5K

168.1K

Keşfet

@mntruell @tbpn @elonmusk @xai @intercept_dan @openclaw @_lopopolo @PreambleAI