shin (@shfunc) - Perfil de Twitter | Zamantika Mersobahis Locabet

shin retuiteado

hud@hud_evals·1d

AI agents are deploying to prod, but can they autonomously find and patch unseen critical vulnerabilities? We introduce ZeroDayBench, a benchmark for evaluating LLM agents on proactive cyberdefense. Plus, a novel high-severity (CVSS 8.1) CVE we found partway through ... 👀

English

1

14

65

5.8K

shin@shfunc·2d

@nozomioai not sure what the actual retrieval process looks like under the hood, but either way it could be denser anyway great product, just rough thoughts!

English

0

1

19

shin@shfunc·2d

@nozomioai yeah but /compact does a pretty bad job keeping what actually matters, save context goes the other way, full history, which is great but heavy on re-injection something in between would be sick -- summarize with the right rules, store dense, re-inject small

English

2

0

1

24

shin@shfunc·3d

more than a month ago i thought about solving context sharing/handling techniques, and just found out @nozomioai already exists, which is an awesome tool in addition, it would be nice to have smth like compact: save -- summarizes before saving, keeps only signal (not obvious decisions, state, key paths, open questions)

English

1

3

378

shin@shfunc·3d

@crypt0lake they start to get after the first visit in Europe

English

0

1

24

cryptolake@crypt0lake·5d

man it’s so funny how the japanese don’t know how good they have it

CDB＠初書籍発売中！@C4Dbeginner

さんざん無駄と叩かれた日本の道路補修の公共事業に携わる作業員の手腕に、なぜか海外で「イギリスでこんな丁寧な仕事をやるなら35億ポンド、５つの自治体と資金援助、そして25年の時間がかかるだろう」とバズっている

English

3

0

19

2.1K

shin@shfunc·4d

just be optimistic

English

0

4

80

shin@shfunc·6d

@super_bavario > grandpa are you one of them? > ...yes, and i still don't know

English

0

1

60

mrio@super_bavario·6d

>but how many features do these hud guys have grandpa? >i’m afraid there’s not a single person that knows that anymore little one, not even one

hud@hud_evals

Aviro is introducing Ebla, a state of the art grounded reasoning model. In collaboration with HUD, the Aviro team built C⁴ — a benchmark for long-horizon tasks in corporate document sets. We evaluate four dimensions: Correctness, Completeness, Composition, and Citations. @aviro_ai post-trained GPT-OSS 120b to achieve SOTA performance, with a Pass@1 score of 25.4% and Pass@8 score of 37.1%.

English

1

0

5

306

shin retuiteado

hud@hud_evals·13 Mar

Aviro is introducing Ebla, a state of the art grounded reasoning model. In collaboration with HUD, the Aviro team built C⁴ — a benchmark for long-horizon tasks in corporate document sets. We evaluate four dimensions: Correctness, Completeness, Composition, and Citations. @aviro_ai post-trained GPT-OSS 120b to achieve SOTA performance, with a Pass@1 score of 25.4% and Pass@8 score of 37.1%.

English

14

29

299

33.4K

shin@shfunc·10 Mar

@OkabeTech it's kinda random, mostly only in Abu-Dhabi, but i'm already home so all good!

English

0

54

Okabe@OkabeTech·10 Mar

@shfunc Bro 💀 I heard something about the state providing free hotels if you have a cancelled flight tho?

English

1

0

1

22

shin@shfunc·9 Mar

4 canceled flights, i'm going insane rn

English

1

0

4

85

shin@shfunc·10 Mar

@thegeneralist01 @niggachandesu type shit

English

0

1

19

thegeneralist@thegeneralist01·9 Mar

@niggachandesu i’ve seen so many russian-speaking people, but close to none were russian also the propaganda machine is doing its job. social medias are banned in the mainland country.

English

2

0

5

132

shin@shfunc·9 Mar

@shayanshafii i really don't want @ludwigABAP to see it

English

0

1

75

shayan@shayanshafii·9 Mar

Roy Lee is the closest thing Silicon Valley has to Kanye West

English

158

117

3.8K

187.3K

shin retuiteado

hud@hud_evals·1 Mar

Four Waterloo students trained a SOTA real estate agent on hud in under 24 hours. 😪😪😪

Kavir@kavir777

REAL ESTATE AGENTS — KISS YOUR JOBS GOODBYE. We built Zentro, a 24/7 real estate web agent that works for you to find your dream home. Stop chasing listings. Let them chase you. Built with @browser_use + @convex + @agentmail + @supermemory + @hud_ai

English

1

3

36

6.2K

shin@shfunc·1 Mar

@OkabeTech was some sort of vacation 😭

English

0

1

10

Okabe@OkabeTech·1 Mar

@shfunc Tf you doing in Dubai?!

English

1

0

1

151

shin@shfunc·24 Şub

@vladnineplusone слитый скрин новой евы

Русский

0

20

Vlad Ten@vladnineplusone·24 Şub

ZXX

1

0

2

459

shin@shfunc·23 Şub

waiting on anthropic's new compact options because the current one is genuinely criminal in the meantime building my own context layer between the api and the agent

English

0

111

shin@shfunc·23 Şub

the bottleneck isn't context window size. it's that nobody's built a forgetting policy

English

0

77

shin@shfunc·20 Şub

cc creator is from my hometown, tf

Lenny Rachitsky@lennysan

About half-way into this chat Boris and I realized we were both from same city in Ukraine 🤯