solst/ICE of Astarte

0

14

1.2K

solst/ICE of Astarte@IceSolst·5h

I wish more folks put their competitive pvp achievements on their resume, idc which game

English

15

1

47

2.2K

solst/ICE of Astarte@IceSolst·5h

@vatai @Mike22092778 No the compiler does not check correctness of the code you’re supplying it in most cases, eg logic bugs - since it has no way to do this, even if semantically it is sound

English

22

Emil Vatai @[email protected]@vatai·6h

@IceSolst @Mike22092778 This is wrong. You have formal guarantees that the compiler will do what you tell it to do (e.g. see legality check in polyhedral compilation). So you do have semantic validation as well.

English

0

17

solst/ICE of Astarte@IceSolst·2d

Interesting article on treating agent output like compiler output (and why) skiplabs.io/blog/codegen_a…

Zack Korman@ZackKorman

Mandatory human-in-the-loop is a cybersecurity cop-out. People are giving agents more and more autonomy. We need solutions that accept that world because there is no stopping it. It's like telling people in the 90s to not use the internet to avoid getting hacked. Good luck.

English

235

148

1.7K

601.7K

solst/ICE of Astarte@IceSolst·7h

@ashrealite Yes but you’re building trust with those folks over time. Now if you had to review random PRs over random GitHub projects, it would be much different, and that’s the scenario I assume we have to work towards (worst case scenario)

English

0

1

70

azrulite@ashrealite·7h

@IceSolst As a person who works with other people writing software on a longstanding project, there's a hell of a lot more trust I place on someone who owns a component in that software to make a correct contribution to it, than a random dev prompting an LLM.

English

0

2

67

solst/ICE of Astarte@IceSolst·8h

I distrust human and LLM generated code equally I similarly distrust human and LLM conducted code reviews equally And our tools to verify program correctness need a lot of improvement

English

20

1

81

3.2K

solst/ICE of Astarte@IceSolst·7h

@akses_0x00 Agreed, one challenge is how to enforce controls over the practice, which goes back to tools. So I see them as the same challenge

English

0

4

112

ɐʞsǝs@akses_0x00·7h

I think the real money is in practices over tools, and preventing bugs as they are written, giving us better ways to collaborate on requirements in the first place. This can then lead to better verification outcomes because the requirements themselves are better understood and expressed in the first place.

English

2

0

1

122

solst/ICE of Astarte@IceSolst·7h

@tactipus_ Ah that’s a good point

English

0

1

29

Judith Victoria【紫皇帝】@tactipus_·7h

@IceSolst it’s my own term for LLMs with tooling that can do certain tasks autonomously but doesn’t have general autonomy iono some blogs say “autonomous agent” & “agent” but i believe an agent should be by necessity autonomous 🤷🏻‍♀️

English

0

1

24

solst/ICE of Astarte@IceSolst·7h

@tactipus_ What’s a semi-agent

English

0

49

Judith Victoria【紫皇帝】@tactipus_·7h

@IceSolst coding semi-agents are the only realistic way. all we learned is that humans were never doing meticulous review anyway

English

0

1

53

solst/ICE of Astarte@IceSolst·7h

@Zero_XFr Depends on the provenance (the ctrl+c)

English

1

53

Z3R0@Zero_XFr·7h

@IceSolst What’s your level of trust in ctrl + v?

English

0

1

55

solst/ICE of Astarte@IceSolst·8h

@rolandbouman I find it’s not unique to LLMs, I want us to have verification toolchains that can help with any input, be it human or LLM generated. I think we will slowly converge towards a version of this, but not sure what level of confidence we will have in it

English

0

1

77

Roland Bouman@rolandbouman·8h

@IceSolst Second, it assumes you can build an equivalent verification/test harness for LLMs. I argue you can't really. As long as LLM outputs are non-deterministic, you can only hope to verify functional tests. This is much less rigorous than testing the output you logically expect.

English

0

4

94

Roland Bouman@rolandbouman·1d

Compiler construction is one of the oldest, best understood CS fields. It's decades of work by the brightest minds, and it's grounded in logic, informed by experience and strictly deterministic. Comparing that with LLM-based coding agents is just wrong. x.com/IceSolst/statu…

Interesting article on treating agent output like compiler output (and why) skiplabs.io/blog/codegen_a…

English

107

389

4.5K

143.2K

solst/ICE of Astarte@IceSolst·9h

@0xThreatActor Hahahaha

Filipino

5

@SHELL@0xThreatActor·9h

@IceSolst If we start playing ranked bug bounty I'm going to plateau into middle of the pack like I do in everything else.

English

0

1

9

solst/ICE of Astarte@IceSolst·1d

What if software had a security ELO rating based on which gets an exploitable vuln first? But does that assess quality? Shouldn’t it account for time to remediate? And have a denominator, eg factor in codebase size? It seems all metrics around software security are arbitrary

English

9

2

38

2.6K

solst/ICE of Astarte@IceSolst·9h

@gerardsans Yes that’s the point here: input to a compiler is untrusted, and a compiler is a deterministic program to validate certain attributes of that input Which is similar to what we need for LLM output

English

@IceSolst x.com/gerardsans/sta…

0

1

163

Gerard Sans | Axiom 🇬🇧@gerardsans·9h

Gerard Sans | Axiom 🇬🇧@gerardsans

Science agrees don’t use a stochastic parrot for a deterministic runtime job. Do use AI agents but only after you understand what “ROI”, “stochastic”, “verify every output” means.

QME

0

236

solst/ICE of Astarte@IceSolst·9h

@ZackKorman I know you enough to tell you’re not convinced at all

English

3

236

Zack Korman@ZackKorman·9h

@IceSolst I think you’ve convinced me mostly

English

❄️ winter ❄️@_winter_wonders

0

3

260

solst/ICE of Astarte@IceSolst·10h

Inevitably we’ll have tooling to enhance code review (not replace it) If you’re responsible for hundreds of devs or large OSS projects, its already hard to trust manual review (eg disgruntled reviewer on their last day blindly lgtm’ing with ramifications appearing months later). Plus the increasing rate of change is unsustainable. Given that manual reviews already vary in quality, and that automated code reviews are significantly improving, it makes sense we’re converging towards a state in which how reviews are made will transform. Many recent security bugs are found by LLMs but without a good toolchain. We’re doing the bare minimum. Harnesses will improve and add dynamic instrumentation and a more thoughtfully broken down process for review. At some point it would be a waste to conduct a review without them. So instead of staring at a diff, you look at a series of automated review output, that’s already taken into account design decisions and assumptions about the program’s purpose and intent. The question then would be: would this last part in itself be automatable?

Idk if I'm missing something but I'm seeing a lot of smart security people talking abt just having AI code just never be reviewed at all as a desirable thing? Am I missing someth?

English

11

1

43

8K

solst/ICE of Astarte@IceSolst·10h

@badlogicgames The analogy is a good thought exercise in program verification. The point is decades of enhancement made them reliable for their own use case. i.e. If you were responsible for code review at an unsustainable scale, what steps would you take to make the process more reliable?

Inevitably we’ll have tooling to enhance code review (not replace it) If you’re responsible for hundreds of devs or large OSS projects, its already hard to trust manual review (eg disgruntled reviewer on their last day blindly lgtm’ing with ramifications appearing months later). Plus the increasing rate of change is unsustainable. Given that manual reviews already vary in quality, and that automated code reviews are significantly improving, it makes sense we’re converging towards a state in which how reviews are made will transform. Many recent security bugs are found by LLMs but without a good toolchain. We’re doing the bare minimum. Harnesses will improve and add dynamic instrumentation and a more thoughtfully broken down process for review. At some point it would be a waste to conduct a review without them. So instead of staring at a diff, you look at a series of automated review output, that’s already taken into account design decisions and assumptions about the program’s purpose and intent. The question then would be: would this last part in itself be automatable?

English

2

0

16

1.7K

Mario Zechner@badlogicgames·16h

i actually don't want this "but you don't review compiler output either" meme to die. it's the perfect signal for being immediately able to ignore someone in this space.

Interesting article on treating agent output like compiler output (and why) skiplabs.io/blog/codegen_a…

English

49

67

1.3K

76.4K

solst/ICE of Astarte@IceSolst·10h

There’s a common argument stuck on the fact that LLMs cannot be trusted, but that’s exactly the point here: what set of steps, if any, would make that output trusted? It is imo too shortsighted to say “none” because we already have testing and verification etc It’s not about LLMs, but about the additional checks you add around ANY process (be it human generated code or otherwise) to verify it

English

6

0

15

825

solst/ICE of Astarte@IceSolst·10h

For eg the interesting part of the compiler analogy is: what set of gateway checks would it take to build a toolchain that takes any input and only allows the correct binary as output

English

0

14

1K

solst/ICE of Astarte@IceSolst·10h

There’s more points in this thread for the negligible minority that reads beyond the headline of a screenshot

Inevitably we’ll have tooling to enhance code review (not replace it) If you’re responsible for hundreds of devs or large OSS projects, its already hard to trust manual review (eg disgruntled reviewer on their last day blindly lgtm’ing with ramifications appearing months later). Plus the increasing rate of change is unsustainable. Given that manual reviews already vary in quality, and that automated code reviews are significantly improving, it makes sense we’re converging towards a state in which how reviews are made will transform. Many recent security bugs are found by LLMs but without a good toolchain. We’re doing the bare minimum. Harnesses will improve and add dynamic instrumentation and a more thoughtfully broken down process for review. At some point it would be a waste to conduct a review without them. So instead of staring at a diff, you look at a series of automated review output, that’s already taken into account design decisions and assumptions about the program’s purpose and intent. The question then would be: would this last part in itself be automatable?

English

3

2.2K

solst/ICE of Astarte@IceSolst·2d

“But compilers are deterministic” The undefined blob of whatever constitutes sufficient review doesn’t necessarily need to be nondeterministic (it prob won’t be) Your guardrails shouldn’t be nondeterministic

English

18

3

116

13.7K

solst/ICE of Astarte@IceSolst·10h

@ErikExplains Yes I’m thinking more of 5-10 years from now. Almost every current automated review tool rn is a scam

English

0

6

127

Erik Ex Plano@ErikExplains·10h

@IceSolst Right now it’s still such a wildly moving target in terms of capabilities, stability, and price, and marketing walking the line between gaslighting and straight-up fruad, that I’m not locking in on any outcome. I have my ideas and projections / prejudices, but I could be wrong.

English