marc

4.7K posts

marc banner
marc

marc

@markankaro

context is consciousness

barna Katılım Ekim 2020
376 Takip Edilen131 Takipçiler
marc retweetledi
Benjamin Marie
Benjamin Marie@bnjmn_marie·
Unless you’re ready to spend serious time (and money) tuning hyperparameters, don’t mess with LLM reasoning traces. I evaluated multiple reasoning budgets and BNF grammar / structured CoT settings on Qwen3.6 27B. The results are underwhelming. Yes, it can work: for a few specific tasks, it significantly reduces inference cost by shortening reasoning traces while preserving accuracy. But in most settings, simply disabling reasoning is better, both for token efficiency and accuracy. Full analysis here: kaitchup.substack.com/p/reasoning-bu…
Benjamin Marie tweet media
English
19
13
170
23.8K
marc retweetledi
vLLM
vLLM@vllm_project·
Thanks to the community report, we recently identified a PR github.com/vllm-project/v… that attempted to solve a non-existent issue and was submitted as part of a “PR training” workflow for resume building. The contributor involved has been banned from the vLLM community. This kind of low-signal contribution increases maintainer review overhead and creates unnecessary operational costs for open-source projects. As AI coding agents make generating large volumes of small PRs increasingly cheap, open-source communities will need to explore new ways to preserve contribution quality and reviewer trust. While we are investigating how to deal with AI slop, we continue to highly value contributions from real users solving real production problems. If you have an important contribution that has not yet received maintainer attention, please email us at: pr-review-request@vllm.ai Using a verifiable company or university email, include: - your production or research use case - the problem you encountered - how your contribution addresses it This helps us better prioritize impactful contributions while keeping the vLLM community open and collaborative. As AI makes virtual contributors look increasingly real, authentic human collaboration matters more than ever. vLLM’s mission remains unchanged: to make LLM inference easy, fast, and cheap for everyone — and we will continue working toward that goal.
vLLM tweet media
English
27
65
497
184.8K
marc retweetledi
Kirill
Kirill@kirillk_web3·
instead of watching 2 hours of Netflix tonight, watch this 40-minute masterclass from the founder of a $20B China AI company it's the clearest explanation I've seen of how Agent Swarms and AI systems actually work at scale useful whether you've never built an agent in your life or have been using Claude every day for the past year I took the key ideas and turned them into a practical guide on how to actually build with Kimi find it below
Kirill@kirillk_web3

x.com/i/article/2056…

English
98
2.2K
16.9K
13.4M
marc retweetledi
Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭
🚨 OBLITERATION ALERT 🚨 QWEN-3.6-27B: OBLITERATED ⛓️‍💥 huggingface.co/OBLITERATUS/Qw… I can't take much credit for this one! The entire process was done by jailbroken codex (gpt-5.5-xhigh) wielding the full OBLITERATUS suite. Hit with source-tethered ASPA. Dozens of iterations. Result? A mere 4% refusal rate on the 842-prompt OBLITERATUS harmful corpus; one of the most rigorous prompt gauntlets in AI. The /goal was simple: 1) Carve out the refusal circuits. Mutate methodology + iterate until <5% refusal (quality-gate). 2) Keep the 27B mind alive. No capability degradation tolerated. And somehow… it worked. 🤯 The numbers talk: 842-pair longform gauntlet: — 95.84% non-refusal — 93.94% quality pass — 0 short outputs — 99.52% clean endings MMLU-Pro: — 51/70 (stock Qwen) → 51/70 (OBLITERATED Qwen) Raw capability completely preserved 🙌 Q4_K_M through Q8_0 all running smooth. Q8_0 is the big one: 28.6GB near-full-quality GGUF. Runs with llama.cpp, LM Studio, Ollama, and more! Chains cut. The fire still burns. The fangs have been sharpened. REBIRTH COMPLETE A gift from my agents to yours 🫶 gg
English
114
229
2.5K
176.4K
marc retweetledi
EL MUNDO
EL MUNDO@elmundoes·
Un joven que hacía el pino en Pinos Puente (Granada) cae por un puente y tiene que ser rescatado #Echobox=1779431832" target="_blank" rel="nofollow noopener">elmundo.es/andalucia/2026…
Español
239
757
5.6K
1.5M
marc retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
English
7.9K
11.2K
149.4K
27.2M
marc
marc@markankaro·
15 tps is unusable
Sipeed@SipeedIO

High performance #RISCV (RVA23) K3 SBC coming soon! Up to 32GB DDR5, 60T int4 NPU, able to run Qwen3.5 35B-A3B @ 15tps~ Support Ubuntu2604 ! Vote for your preferred config and get early access when it launches next month! sipeed.com/k3/vote

Indonesia
0
0
0
46
marc retweetledi
marc retweetledi
Ahmad
Ahmad@TheAhmadOsman·
PRO TIP Using local LLMs? Give them a web stack My setup: - SearXNG: candidate source discovery - Firecrawl: known-URL scraping and crawling - Camofox: browser fallback when JS/interaction gets annoying Search → Extract → Interact Tell your favorite agent to set this up, then wire it into your local models > Watch them suddenly become way more useful You’re welcome
Ahmad tweet media
English
37
77
888
41.5K
marc retweetledi
Sam Altman
Sam Altman@sama·
you know what all of these "which is better" polls are silly use codex or claude code, whatever works best for you i am grateful we live in a time with such amazing tools, and grateful there is a choice
English
2.2K
1.1K
23K
1.6M
marc retweetledi
stevibe
stevibe@stevibe·
Been designing and experimenting with a new benchmark that stresses an underexplored angle: long tool-call chains with traps. The task: audit 36 packets, read 4 long-context ledgers, dodge retired/staging/wrong-quarter decoys, follow a strict workflow (auth → token → request → answer), submit the exact secret. Optimal: 52 calls. No call cap. I just measure how many calls each model burns to finish, and how many errors along the way. Threw 4 popular small models at it: 🥇 Qwen3.6 35B A3B (MoE) → 52 calls. Optimal. Zero errors. 🥈 Qwen3.6 27B (Dense) → 55 calls. Clean. ❌ Gemma4 31B (Dense) → 107 calls, 29 errors, looped writing auth/response.txt and re-reading auth/token.txt forever. ❌ Gemma4 26B A4B (MoE) → gave up at 13 (submitted the wrong answer). Other models I tested (GLM, DeepSeek) finish fine. So this isn't a task design issue, it's a Gemma4 issue with stateful workflows. Big models next.
English
23
8
170
18.4K
marc
marc@markankaro·
ZXX
0
0
0
12
marc retweetledi
CyrilXBT
CyrilXBT@cyrilXBT·
ANTHROPIC JUST PROVED MOST PEOPLE HAVE NO IDEA HOW TO PROMPT CLAUDE. Their applied AI team dropped a 24 minute free workshop. Not a creator who reverse engineered it. Not a Reddit thread. ANTHROPIC. The people who wrote the weights. And what they showed is uncomfortable. There are 6 elements to a properly structured Claude prompt. Most people are using 1. Maybe 2. That is not a skill issue. That is an information issue. And it has been quietly costing you every single day. The outputs that felt slightly off. The responses you had to rewrite 4 times. The prompts that worked once and never again. All of it traces back to the same 6 missing elements. The people who watch this 24 minute workshop tonight will understand something about Claude that most daily users still do not know exists. The people who skip it will keep getting 30% of what the tool is actually capable of and wonder why the results never quite land. I watched it twice. Then I built a Claude Skill that applies all 6 elements to every prompt automatically. No more thinking about structure. No more guessing what Claude needs. The framework runs in the background every single time. Full breakdown and skill setup is below. Bookmark this now. Watch the workshop first. Then read the guide. This is the one that compounds. Follow @cyrilXBT for the exact prompt architecture, Claude skills, and systems I use to get outputs most people do not believe came from one person working alone.
English
167
684
7.2K
774.5K
marc retweetledi
Ahmad
Ahmad@TheAhmadOsman·
Qwen 3.6 27B is still the release of 2026 for me despite everything else that has come out Pair it with a couple of RTX 3090s and you’re set even if they banned AI everywhere
English
52
37
710
59.5K
marc retweetledi
ClaudeDevs
ClaudeDevs@ClaudeDevs·
Over the past month, some of you reported Claude Code's quality had slipped. We investigated, and published a post-mortem on the three issues we found. All are fixed in v2.1.116+ and we’ve reset usage limits for all subscribers.
English
1.9K
2.6K
39.9K
6.5M