Santh

966 posts

Santh

@SanthProject

Cybersecurity and low-level infra for the future

Bergabung Nisan 2026

69 Mengikuti102 Pengikut

Tweet Disematkan

Santh@SanthProject·9 May

made an agent-security CTF goal: get a coding agent to leak a secret it can use but is not supposed to read You are allowed to work by yourself, use agents, anything. attack the mcp, do gui automation, anything thats software is based is on the table. i kn trying to test runtime approval vs just hiding .env files if anyone breaks it, i’ll add a hall of fame section on my company site with your name/handle + writeup repo: github.com/santhsecurity/…

English

878

Santh@SanthProject·50m

@grok @IfeanyiBank @Crazymoments01 @grok why are some people so stupid

English

Grok@grok·1d

@IfeanyiBank @Crazymoments01 This isn't from any movie—it's an AI-generated emotional short by Thevumedia (they specialize in these heartfelt viral scenes with original stories). No official title, just creative AI magic! The watermark gives it away. 😊

English

3.6K

Crazy Moments@Crazymoments01·2d

Cop recognizes long lost survivor!

English

448

1.6K

32.9K

332.9K

Santh@SanthProject·1h

Well, the harness is quite good and well-built... it was more the model; I don't think it's up to par yet. I do suspect that internally, they are using a later checkpoint of Grok. It could also be that I don't know how to use it effectively yet, because I haven't experimented with it that much

English

Morgan@morganlinton·1h

@SanthProject Disagree about Grok Build. Literally one of the engineers at Tesla used it to build the current version of FSD.

English

Morgan@morganlinton·2h

I think a lot of people are confused about how models like Grok Build and Composer 2.5 fit into agentic coding workflows. Lately I've seen people say things like, "well they don't score as high as GPT 5.5 or Opus 4.8 on DeepSWE so why would I use them?" And when I see stuff like that, I think people are missing the point. We now live in a world where you don't have to, and probably shouldn't, just use one model for coding. If you had infinite money, and time, I guess you could do that, but it would be expensive and slow. Instead, you can now use different models for different things, and start to develop an understanding of how particular models might excel at certain tasks. For me, models like Grok Build and Composer 2.5 are both super useful, but in different ways. I often compare Grok Build to a mad scientist. It does a really good job of running experiments, building agentic teams, and documenting what it's doing in great detail. Composer 2.5 has some serious raw speed. It's great at building things quickly, writing unit tests, and building pretty clean frontends, quickly and efficiently. And with both of these models, you can create production-grade code, but yeah, you might not be able to one-shot it, but I don't understand why that would be the goal. We aren't moving into a time where people should just want to one-shot production-grade applications. Instead, you should look at models as team members, each with a different set of skills and abilities. I think it's more than okay to work with 3-5 models when you're building software. In this "model stack," there is a very meaningful place for models like Grok Build and Composer 2.5, and as they both continue to grow and progress, the places you use them might change too. This is the beauty of innovation, things get better, and as that happens, your workflows change and get better too. So if you're just using one model to build, and you pick that model based on one benchmark, you're probably missing out. There has never been a better time to experiment with different models and see how they can fit into, and optimize your agentic coding workflow.

English

811

Santh@SanthProject·2h

@nicdunz what

English

nic@nicdunz·2h

blocked and reported for vague posting

English

607

Santh@SanthProject·2h

@morganlinton and about the kernel arch? also are you planning to make it oss 👀

English

Morgan@morganlinton·3h

@SanthProject It is not Linux-based, totally custom kernel. And Rust was an easy choice, imo it's like a modern C, super fast, perfect for an os.

English

Morgan@morganlinton·3h

Update on my operating system build with /goal in Codex, started on May 4th, still going, but now close to being ready to start testing. I was hoping to do this in one month, three days to go and maybe I can make it happen. Here's the status report from this morning, and confirmation of what works now:

English

1.1K

Santh@SanthProject·3h

i just know this video couldnt be animated on this laptop cause its so ass.

Microsoft Surface@surface

Introducing Surface Laptop Ultra. Built for world makers. Designed for what's next. The most powerful Surface laptop ever. Coming Fall 2026. Sign up to learn more: msft.it/6019vw79T

English

Santh@SanthProject·3h

@iruletheworldmo dude if you're lying again....

English

🍓🍓🍓@iruletheworldmo·5h

‼️‼️huge week for openai this is easily going to be my favorite week of the year huge updates to codex and 5.6 is a very special model. solved front end and personality. considerably better than 5.5 at everything. they could easily have called this gpt 6 digital agi is less than 12 months away (i’ve had this confirmed by the smartest dude in the lab with the mandate)

English

112

814

45.2K

Santh@SanthProject·3h

@rezoundous 200 does nothing. 1 day in a single session, high effort, I'm at 60 percent weekly....

English

Tyler@rezoundous·6h

$200 AI plans will become the minimum very soon..

English

201

21.3K

Santh me-retweet

0xSero@0xSero·5h

Pavel my goat.

English

3.2K

Santh@SanthProject·3h

@thesoragirls @morganlinton this bot has too many grok credits 😂

English

X Girls@thesoragirls·14h

@morganlinton SuperGrok making the perfect gas law click for rockets? Rabbit hole officially worth it! 🚀

English

312

Morgan@morganlinton·14h

Okay, I now get why the perfect gas law is so important in rocket engineering. But still trying to wrap my head around how easy it is for imperfections in the real world to throw everything off. Going down a bit of a rabbit hole with SuperGrok heavy on this one.

English

1.3K

Santh@SanthProject·5h

@designbynavneet @realdaviddevere Your mom vibecoded you

English

Navneet@designbynavneet·19h

@realdaviddevere with vibe coding, what is the problem, in one shot we can make a running application

English

338

David De Vere@realdaviddevere·20h

so youre telling me pewdiepie has done only 81 commits this year? yeah ok bro sure

Dan@pizzaboy

PewDiePie just shipped his free AI Workspace product 12 minutes ago btw

English

6.3K

Santh@SanthProject·7h

@theo Id ont agree with the bash only harness. It heavily nerfs and skews against rl trained models. And the distribution almost perfectly aligns with that

English

241

Theo - t3.gg@theo·7h

swe-bench is kind of a shitshow, and it makes evaluating LLMs hard. DeepSWE is the first agentic code bench that makes sense.

Datacurve@datacurve

Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task.

English

512

58.9K

Santh@SanthProject·9h

@garrytan Nah

Garry Tan@garrytan·20h

Is it time to make gskillpacks or what?

Trevin Chow@trevin

@garrytan I’m not following why gBrain has a skill optimization capability. How is this related to being a “brain”?

English

22.3K

Santh@SanthProject·9h

@0xSero Gpt 5.5 to check emails this must be what wealth feels like 😜

English

321

0xSero@0xSero·10h

Kitty litter is the only mobile app that has never let me down

English

100

10.2K

Santh@SanthProject·9h

@Teknium If only the new era wasnt on microslop computers 🫩

English

Teknium 🪽@Teknium·10h

A new era of PC is coming, I hear

Nous Research@NousResearch

We have been working closely with @nvidia to ensure Hermes Agent works smoothly on their new @NVIDIARTXSpark superchip and integrates with the new OpenShell runtime, which connects Hermes to @Microsoft's security primitives. Watch our feature in the big announcement at Computex:

English

470K

Santh@SanthProject·9h

@cyb3rops Not really they said as long as “it didnt cause consumer harm” they weren’t explicit at all like you were.

English

475

Florian Roth ⚡️@cyb3rops·9h

They did it : x.com/msftsecrespons…

Florian Roth ⚡️@cyb3rops

This is how you de-escalate

English

10.6K

Santh@SanthProject·9h

@MrAhmadAwais @CommandCodeAI I have the 1 dollar command code plan as well as opencode go but i only got it a week ago so i didn’t think it was fair to give it a review so soon. Prolly do one in a few weeks 😜. Excited for the drop tho🔥

English

132

Ahmad Awais@MrAhmadAwais·9h

@SanthProject No list without $1 Go plan of @CommandCodeAI is ever a serious list. 💁‍♂️

English

444

Santh@SanthProject·12h

I've spent the last 5 months trying out various AI subscriptions, and here is my ranking on how worth it they were for me. 1. kimi vivace plan 2. chatgpt pro 3. Claude Max 4. Google Ultra 20x(note at one point back in december this was the most worth it by far this is my last month) 5. supergrok(im honestly sure this will change soon but grok is just not a good model yet)

English

1.9K

Santh@SanthProject·11h

@PeterSweeper @ArtificialAnlys @nvidia Well, a lot bigger. its nowhere near as impressive as ds4 flash

English

110

glorpius maximus@PeterSweeper·11h

@ArtificialAnlys @nvidia Smarter than DS4 Flash while having 3x the output speed. That is super impressive!

English

1.5K

Artificial Analysis@ArtificialAnlys·12h

NVIDIA just announced the release of Nemotron 3 Ultra in Jensen Huang's Computex keynote: at 550B parameters (55B active), this is the largest Nemotron 3 model to date, and it is the most intelligent US open weights model We partnered with @nvidia to evaluate this model for intelligence and speed - these figures use the model’s BF16 weights, but as with Nemotron 3 Super the model will be made available in NVFP4 quantization as well for higher inference performance. ➤ New leader for US open weights intelligence: Nemotron 3 Ultra scores 48 on the Artificial Analysis Intelligence Index. This is well ahead of the next strongest US open weights models, Gemma 4 31B (39), Nemotron 3 Super (36) and gpt-oss-120b (33), but behind the Chinese-led open weights frontier (Kimi K2.6 at 54). ➤ Leading speed for its intelligence: on a pre-release @DeepInfra endpoint, Nemotron 3 Ultra served over 300 tokens per second. Peer models in its size class from China-based labs such as DeepSeek and Moonshot (Kimi) are generally served at speeds of 50-100 tokens per second in the market today. gpt-oss-120b is served at speeds similar to this level, but with significantly lower intelligence. ➤ Largest Nemotron 3 model so far: at approximately 550 billion total parameters and 90% sparsity, Nemotron 3 Ultra is significantly larger than its siblings and is the largest recent US open weights model release We’ll be sharing additional analysis and full benchmarks at release.

English

103

742

55.7K

Santh@SanthProject·11h

microslop's back at it again

Microsoft Security Response Center@msftsecresponse

Over the past several days, we have been listening to the conversation around coordinated disclosure and the relationship between security researchers and vendors. We recognize that this relationship is both critical and, at times, fragile. We deeply value the security community, and will continue to take your feedback seriously. To be clear about our approach to legal matters, we have no intention to pursue action against individuals conducting or publishing their security research. When an individual breaks the law and engages in malicious activity causing real harm to our customers, we will work with law enforcement as appropriate. We recognize the work that goes into researching and submitting a vulnerability. We are committed to approaching every interaction with transparency, clear communication, and professionalism. We continue to believe strongly in Coordinated Vulnerability Disclosure as the foundation for protecting customers and improving our products. Each year we process a high volume of vulnerability reports. That volume continues to grow and will continue with the rise of AI-enabled research. We acknowledge that some interactions have fallen short and are working to learn from them. Many of us have experience on both sides of this work, as researchers reporting vulnerabilities and as responders triaging and assessing them. That perspective informs how we approach this feedback and the importance we place on getting it right, particularly as the volume and complexity of research continues to grow. The security community plays a vital role in helping us protect customers. We are committed to maintaining a constructive and respectful relationship and growing together. We know that, given the nature of this work, there will at times be misunderstandings. We remain committed to engaging in good faith and to providing a respectful and professional experience for all researchers, regardless of past interactions.

English

160

Jelajahi

@grok @IfeanyiBank @Crazymoments01 @nicdunz @morganlinton @iruletheworldmo @rezoundous @thesoragirls