solst/ICE of Astarte

18.5K posts

solst/ICE of Astarte banner
solst/ICE of Astarte

solst/ICE of Astarte

@IceSolst

Voidweaver @AstarteSecurity - Pentester turned seceng turned meeting canceller - meetup https://t.co/E4rlINC0U6 - conf tracker https://t.co/tReNhuhANF

villa straylight Katılım Kasım 2024
2.1K Takip Edilen30.3K Takipçiler
Sabitlenmiş Tweet
solst/ICE of Astarte
solst/ICE of Astarte@IceSolst·
Here's a thread of every app I've built 100% with @cursor_ai using Claude. These are all fun side projects I've worked on in my free time over the last few months.
English
61
119
2.7K
609.6K
solst/ICE of Astarte
I wish more folks put their competitive pvp achievements on their resume, idc which game
English
15
1
47
2.2K
solst/ICE of Astarte
@vatai @Mike22092778 No the compiler does not check correctness of the code you’re supplying it in most cases, eg logic bugs - since it has no way to do this, even if semantically it is sound
English
0
0
0
22
Emil Vatai @vatai@mast.hpc.social
@IceSolst @Mike22092778 This is wrong. You have formal guarantees that the compiler will do what you tell it to do (e.g. see legality check in polyhedral compilation). So you do have semantic validation as well.
English
1
0
0
17
solst/ICE of Astarte
@ashrealite Yes but you’re building trust with those folks over time. Now if you had to review random PRs over random GitHub projects, it would be much different, and that’s the scenario I assume we have to work towards (worst case scenario)
English
1
0
1
70
azrulite
azrulite@ashrealite·
@IceSolst As a person who works with other people writing software on a longstanding project, there's a hell of a lot more trust I place on someone who owns a component in that software to make a correct contribution to it, than a random dev prompting an LLM.
English
1
0
2
67
solst/ICE of Astarte
I distrust human and LLM generated code equally I similarly distrust human and LLM conducted code reviews equally And our tools to verify program correctness need a lot of improvement
English
20
1
81
3.2K
solst/ICE of Astarte
@akses_0x00 Agreed, one challenge is how to enforce controls over the practice, which goes back to tools. So I see them as the same challenge
English
1
0
4
112
ɐʞsǝs
ɐʞsǝs@akses_0x00·
I think the real money is in practices over tools, and preventing bugs as they are written, giving us better ways to collaborate on requirements in the first place. This can then lead to better verification outcomes because the requirements themselves are better understood and expressed in the first place.
English
2
0
1
122
Judith Victoria【紫皇帝】
@IceSolst it’s my own term for LLMs with tooling that can do certain tasks autonomously but doesn’t have general autonomy iono some blogs say “autonomous agent” & “agent” but i believe an agent should be by necessity autonomous 🤷🏻‍♀️
English
1
0
1
24
Z3R0
Z3R0@Zero_XFr·
@IceSolst What’s your level of trust in ctrl + v?
English
1
0
1
55
solst/ICE of Astarte
@rolandbouman I find it’s not unique to LLMs, I want us to have verification toolchains that can help with any input, be it human or LLM generated. I think we will slowly converge towards a version of this, but not sure what level of confidence we will have in it
English
1
0
1
77
Roland Bouman
Roland Bouman@rolandbouman·
@IceSolst Second, it assumes you can build an equivalent verification/test harness for LLMs. I argue you can't really. As long as LLM outputs are non-deterministic, you can only hope to verify functional tests. This is much less rigorous than testing the output you logically expect.
English
1
0
4
94
@SHELL
@SHELL@0xThreatActor·
@IceSolst If we start playing ranked bug bounty I'm going to plateau into middle of the pack like I do in everything else.
English
1
0
1
9
solst/ICE of Astarte
What if software had a security ELO rating based on which gets an exploitable vuln first? But does that assess quality? Shouldn’t it account for time to remediate? And have a denominator, eg factor in codebase size? It seems all metrics around software security are arbitrary
English
9
2
38
2.6K
solst/ICE of Astarte
@gerardsans Yes that’s the point here: input to a compiler is untrusted, and a compiler is a deterministic program to validate certain attributes of that input Which is similar to what we need for LLM output
English
1
0
1
163
solst/ICE of Astarte
solst/ICE of Astarte@IceSolst·
Inevitably we’ll have tooling to enhance code review (not replace it) If you’re responsible for hundreds of devs or large OSS projects, its already hard to trust manual review (eg disgruntled reviewer on their last day blindly lgtm’ing with ramifications appearing months later). Plus the increasing rate of change is unsustainable. Given that manual reviews already vary in quality, and that automated code reviews are significantly improving, it makes sense we’re converging towards a state in which how reviews are made will transform. Many recent security bugs are found by LLMs but without a good toolchain. We’re doing the bare minimum. Harnesses will improve and add dynamic instrumentation and a more thoughtfully broken down process for review. At some point it would be a waste to conduct a review without them. So instead of staring at a diff, you look at a series of automated review output, that’s already taken into account design decisions and assumptions about the program’s purpose and intent. The question then would be: would this last part in itself be automatable?
❄️ winter ❄️@_winter_wonders

Idk if I'm missing something but I'm seeing a lot of smart security people talking abt just having AI code just never be reviewed at all as a desirable thing? Am I missing someth?

English
11
1
43
8K
solst/ICE of Astarte
solst/ICE of Astarte@IceSolst·
There’s a common argument stuck on the fact that LLMs cannot be trusted, but that’s exactly the point here: what set of steps, if any, would make that output trusted? It is imo too shortsighted to say “none” because we already have testing and verification etc It’s not about LLMs, but about the additional checks you add around ANY process (be it human generated code or otherwise) to verify it
English
6
0
15
825
solst/ICE of Astarte
solst/ICE of Astarte@IceSolst·
For eg the interesting part of the compiler analogy is: what set of gateway checks would it take to build a toolchain that takes any input and only allows the correct binary as output
English
1
0
14
1K
solst/ICE of Astarte
“But compilers are deterministic” The undefined blob of whatever constitutes sufficient review doesn’t necessarily need to be nondeterministic (it prob won’t be) Your guardrails shouldn’t be nondeterministic
English
18
3
116
13.7K
solst/ICE of Astarte
solst/ICE of Astarte@IceSolst·
@ErikExplains Yes I’m thinking more of 5-10 years from now. Almost every current automated review tool rn is a scam
English
1
0
6
127
Erik Ex Plano
Erik Ex Plano@ErikExplains·
@IceSolst Right now it’s still such a wildly moving target in terms of capabilities, stability, and price, and marketing walking the line between gaslighting and straight-up fruad, that I’m not locking in on any outcome. I have my ideas and projections / prejudices, but I could be wrong.
English
1
0
3
137