Santh

966 posts

Santh banner
Santh

Santh

@SanthProject

Cybersecurity and low-level infra for the future

Bergabung Nisan 2026
69 Mengikuti102 Pengikut
Tweet Disematkan
Santh
Santh@SanthProject·
made an agent-security CTF goal: get a coding agent to leak a secret it can use but is not supposed to read You are allowed to work by yourself, use agents, anything. attack the mcp, do gui automation, anything thats software is based is on the table. i kn trying to test runtime approval vs just hiding .env files if anyone breaks it, i’ll add a hall of fame section on my company site with your name/handle + writeup repo: github.com/santhsecurity/…
English
3
1
12
878
Grok
Grok@grok·
@IfeanyiBank @Crazymoments01 This isn't from any movie—it's an AI-generated emotional short by Thevumedia (they specialize in these heartfelt viral scenes with original stories). No official title, just creative AI magic! The watermark gives it away. 😊
English
1
0
2
3.6K
Crazy Moments
Crazy Moments@Crazymoments01·
Cop recognizes long lost survivor!
English
448
1.6K
32.9K
332.9K
Santh
Santh@SanthProject·
Well, the harness is quite good and well-built... it was more the model; I don't think it's up to par yet. I do suspect that internally, they are using a later checkpoint of Grok. It could also be that I don't know how to use it effectively yet, because I haven't experimented with it that much
English
1
0
0
15
Morgan
Morgan@morganlinton·
@SanthProject Disagree about Grok Build. Literally one of the engineers at Tesla used it to build the current version of FSD.
English
1
0
1
32
Morgan
Morgan@morganlinton·
I think a lot of people are confused about how models like Grok Build and Composer 2.5 fit into agentic coding workflows. Lately I've seen people say things like, "well they don't score as high as GPT 5.5 or Opus 4.8 on DeepSWE so why would I use them?" And when I see stuff like that, I think people are missing the point. We now live in a world where you don't have to, and probably shouldn't, just use one model for coding. If you had infinite money, and time, I guess you could do that, but it would be expensive and slow. Instead, you can now use different models for different things, and start to develop an understanding of how particular models might excel at certain tasks. For me, models like Grok Build and Composer 2.5 are both super useful, but in different ways. I often compare Grok Build to a mad scientist. It does a really good job of running experiments, building agentic teams, and documenting what it's doing in great detail. Composer 2.5 has some serious raw speed. It's great at building things quickly, writing unit tests, and building pretty clean frontends, quickly and efficiently. And with both of these models, you can create production-grade code, but yeah, you might not be able to one-shot it, but I don't understand why that would be the goal. We aren't moving into a time where people should just want to one-shot production-grade applications. Instead, you should look at models as team members, each with a different set of skills and abilities. I think it's more than okay to work with 3-5 models when you're building software. In this "model stack," there is a very meaningful place for models like Grok Build and Composer 2.5, and as they both continue to grow and progress, the places you use them might change too. This is the beauty of innovation, things get better, and as that happens, your workflows change and get better too. So if you're just using one model to build, and you pick that model based on one benchmark, you're probably missing out. There has never been a better time to experiment with different models and see how they can fit into, and optimize your agentic coding workflow.
English
4
4
13
811
nic
nic@nicdunz·
blocked and reported for vague posting
English
1
0
1
607
Santh
Santh@SanthProject·
@morganlinton and about the kernel arch? also are you planning to make it oss 👀
English
1
0
1
14
Morgan
Morgan@morganlinton·
@SanthProject It is not Linux-based, totally custom kernel. And Rust was an easy choice, imo it's like a modern C, super fast, perfect for an os.
English
1
0
1
32
Morgan
Morgan@morganlinton·
Update on my operating system build with /goal in Codex, started on May 4th, still going, but now close to being ready to start testing. I was hoping to do this in one month, three days to go and maybe I can make it happen. Here's the status report from this morning, and confirmation of what works now:
Morgan tweet media
English
5
0
18
1.1K
🍓🍓🍓
🍓🍓🍓@iruletheworldmo·
‼️‼️huge week for openai this is easily going to be my favorite week of the year huge updates to codex and 5.6 is a very special model. solved front end and personality. considerably better than 5.5 at everything. they could easily have called this gpt 6 digital agi is less than 12 months away (i’ve had this confirmed by the smartest dude in the lab with the mandate)
English
112
30
814
45.2K
Santh
Santh@SanthProject·
@rezoundous 200 does nothing. 1 day in a single session, high effort, I'm at 60 percent weekly....
English
0
0
2
85
Tyler
Tyler@rezoundous·
$200 AI plans will become the minimum very soon..
English
89
2
201
21.3K
Santh me-retweet
0xSero
0xSero@0xSero·
Pavel my goat.
0xSero tweet media
English
8
2
72
3.2K
X Girls
X Girls@thesoragirls·
@morganlinton SuperGrok making the perfect gas law click for rockets? Rabbit hole officially worth it! 🚀
English
2
0
2
312
Morgan
Morgan@morganlinton·
Okay, I now get why the perfect gas law is so important in rocket engineering. But still trying to wrap my head around how easy it is for imperfections in the real world to throw everything off. Going down a bit of a rabbit hole with SuperGrok heavy on this one.
Morgan tweet mediaMorgan tweet media
English
2
0
6
1.3K
Navneet
Navneet@designbynavneet·
@realdaviddevere with vibe coding, what is the problem, in one shot we can make a running application
English
1
0
1
338
Santh
Santh@SanthProject·
@theo Id ont agree with the bash only harness. It heavily nerfs and skews against rl trained models. And the distribution almost perfectly aligns with that
English
0
0
4
241
Garry Tan
Garry Tan@garrytan·
Is it time to make gskillpacks or what?
Trevin Chow@trevin

@garrytan I’m not following why gBrain has a skill optimization capability. How is this related to being a “brain”?

English
17
1
63
22.3K
Santh
Santh@SanthProject·
@0xSero Gpt 5.5 to check emails this must be what wealth feels like 😜
English
0
0
0
321
0xSero
0xSero@0xSero·
Kitty litter is the only mobile app that has never let me down
0xSero tweet media0xSero tweet media
English
9
1
100
10.2K
Santh
Santh@SanthProject·
@Teknium If only the new era wasnt on microslop computers 🫩
English
0
0
3
39
Santh
Santh@SanthProject·
@cyb3rops Not really they said as long as “it didnt cause consumer harm” they weren’t explicit at all like you were.
English
0
0
3
475
Santh
Santh@SanthProject·
@MrAhmadAwais @CommandCodeAI I have the 1 dollar command code plan as well as opencode go but i only got it a week ago so i didn’t think it was fair to give it a review so soon. Prolly do one in a few weeks 😜. Excited for the drop tho🔥
English
1
0
4
132
Santh
Santh@SanthProject·
I've spent the last 5 months trying out various AI subscriptions, and here is my ranking on how worth it they were for me. 1. kimi vivace plan 2. chatgpt pro 3. Claude Max 4. Google Ultra 20x(note at one point back in december this was the most worth it by far this is my last month) 5. supergrok(im honestly sure this will change soon but grok is just not a good model yet)
English
5
1
16
1.9K
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
NVIDIA just announced the release of Nemotron 3 Ultra in Jensen Huang's Computex keynote: at 550B parameters (55B active), this is the largest Nemotron 3 model to date, and it is the most intelligent US open weights model We partnered with @nvidia to evaluate this model for intelligence and speed - these figures use the model’s BF16 weights, but as with Nemotron 3 Super the model will be made available in NVFP4 quantization as well for higher inference performance. ➤ New leader for US open weights intelligence: Nemotron 3 Ultra scores 48 on the Artificial Analysis Intelligence Index. This is well ahead of the next strongest US open weights models, Gemma 4 31B (39), Nemotron 3 Super (36) and gpt-oss-120b (33), but behind the Chinese-led open weights frontier (Kimi K2.6 at 54). ➤ Leading speed for its intelligence: on a pre-release @DeepInfra endpoint, Nemotron 3 Ultra served over 300 tokens per second. Peer models in its size class from China-based labs such as DeepSeek and Moonshot (Kimi) are generally served at speeds of 50-100 tokens per second in the market today. gpt-oss-120b is served at speeds similar to this level, but with significantly lower intelligence. ➤ Largest Nemotron 3 model so far: at approximately 550 billion total parameters and 90% sparsity, Nemotron 3 Ultra is significantly larger than its siblings and is the largest recent US open weights model release We’ll be sharing additional analysis and full benchmarks at release.
Artificial Analysis tweet media
English
31
103
742
55.7K
Santh
Santh@SanthProject·
microslop's back at it again
Microsoft Security Response Center@msftsecresponse

Over the past several days, we have been listening to the conversation around coordinated disclosure and the relationship between security researchers and vendors. We recognize that this relationship is both critical and, at times, fragile. We deeply value the security community, and will continue to take your feedback seriously. To be clear about our approach to legal matters, we have no intention to pursue action against individuals conducting or publishing their security research. When an individual breaks the law and engages in malicious activity causing real harm to our customers, we will work with law enforcement as appropriate. We recognize the work that goes into researching and submitting a vulnerability. We are committed to approaching every interaction with transparency, clear communication, and professionalism. We continue to believe strongly in Coordinated Vulnerability Disclosure as the foundation for protecting customers and improving our products. Each year we process a high volume of vulnerability reports. That volume continues to grow and will continue with the rise of AI-enabled research. We acknowledge that some interactions have fallen short and are working to learn from them. Many of us have experience on both sides of this work, as researchers reporting vulnerabilities and as responders triaging and assessing them. That perspective informs how we approach this feedback and the importance we place on getting it right, particularly as the volume and complexity of research continues to grow. The security community plays a vital role in helping us protect customers. We are committed to maintaining a constructive and respectful relationship and growing together. We know that, given the nature of this work, there will at times be misunderstandings. We remain committed to engaging in good faith and to providing a respectful and professional experience for all researchers, regardless of past interactions.

English
0
0
1
160