Felipe Sztutman

114 posts

Felipe Sztutman

@sztlink

Felipe Sztutman · artist, inventor, technical researcher. Field notes on local AI, memory, context, and systems for experience.

São Paulo - Brazil Katılım Ocak 2012

123 Takip Edilen77 Takipçiler

Felipe Sztutman@sztlink·17h

Update after N=500: gated verifier/rerank did not beat direct entity-hop path prompting. path EM 0.216 / F1 0.324 gated EM 0.216 / F1 0.323 wins/losses/ties 2/2/496 Useful lesson: path construction matters; clever control did not scale. github.com/sztlink/turboq…

English

Felipe Sztutman@sztlink·6d

published the first public cut of turboquant-cuda-bench: retrieved != used long-context / KV-cache receipts up to 192K on local RTX 4090: Qwen, llama.cpp, vLLM, TurboQuant, CASK, KVFidelity. github.com/sztlink/turboq…

English

Felipe Sztutman@sztlink·16 May

cc @no_stp_on_snek @leetllm - this retrieval/utilization split seems directly relevant to longctx eval and production context-stuffing failures.

English

Felipe Sztutman@sztlink·16 May

Technical note and sanitized artifact: github.com/sztlink/turboq… No raw Discord-derived data. No broad claim about TurboQuant, CASK, FP8, or long-context models in general.

English

Felipe Sztutman@sztlink·16 May

A retrieved chunk is not a used chunk. In a long-context decoy fixture, canonical evidence reached context 8/8 times, but baseline answers closed only 5/8. Prompting alone did not fix it. Evidence placement did.

English

Felipe Sztutman@sztlink·12 May

At 7B / 16K / exact-match retrieval: TurboQuant K8V4 holds. Calibrated FP8 emits near-miss precision errors on the digits it should hit. github.com/sztlink/turboq…

English

Felipe Sztutman@sztlink·10 May

4090 field note using TheTom's public TurboQuant + longctx stack. Fitting a 192K window is only step one. In synthetic tests, recall found the right chunk, but decoys buried it. Reranking moved it back to rank 1. Fit buys the window. Ranking decides what reaches it.

English

Felipe Sztutman@sztlink·9 May

SciBORG: arxiv.org/abs/2507.00081 KVFidelity related-work note: github.com/sztlink/turboq…

English

Felipe Sztutman@sztlink·9 May

Related-work update: SciBORG (Muhoberac/Chopra et al., arXiv:2507.00081) explicitly uses "action trace fidelity" as an agent-benchmark dimension. KVFidelity sits in the broader trajectory-aware / trace-based evaluation space, applying paired action-trace comparison to KV/V-cache compression with scenario order as a measured axis. Updating the repo to cite this properly.

English

Felipe Sztutman@sztlink·8 May

@SeraAndroid Thanks, Tim. tool-eval-bench made this possible as a deterministic substrate. I’m keeping KVFidelity as an external paired-trace layer for now: raw trace diffs, review queue, then reviewed behavioral mechanisms.

English

Tim Messerschmidt@SeraAndroid·8 May

@sztlink Very cool! Congratulations on the release!

English

Felipe Sztutman@sztlink·8 May

I’m building an evaluation lens I call Action-Trace Fidelity. If model, task, prompt, seed, and decoding stay fixed — but the inference apparatus changes — does the operational trace survive? Not just “is the answer right?” Which tools, in what order, with what args?

English

114

Felipe Sztutman@sztlink·8 May

@elmoche_ Follow-up is up — narrowed the claim and added the order-sensitivity soak: x.com/sztlink/status…

Felipe Sztutman@sztlink

English

Dimitri Krotchlikmioff@elmoche_·8 May

@sztlink Great findings!!!

English

Felipe Sztutman@sztlink·6 May

REFRACT q4_0 KV check on Qwen3.6-35B-A3B hybrid (RTX 3090). q4_0/q4_0 vs q8_0/q8_0: KLD 98.81 (close) Trajectory path 65.70 (DEGRADED) GTM-only 91.26, so metric families diverge. Artifacts: github.com/sztlink/turboq…

English

108

Felipe Sztutman@sztlink·8 May

@SeraAndroid Write-up + artifacts: github.com/sztlink/turboq…

English

Felipe Sztutman@sztlink·8 May

@SeraAndroid The claim is narrow: Not “KV compression breaks agents.” Not “this KV mode is unsafe.” The finding is that behavioral fidelity has axes: KV config × prompt scaffold × scenario order/context.

English

Keşfet

@no_stp_on_snek @leetllm @SeraAndroid @elmoche_ @elonmusk @BarackObama @taylorswift13 @cristiano