Craig

138 posts

Craig banner
Craig

Craig

@conceptdev

marathons, polar bears. works on mobile at @microsoft ; opinions are my own alumni @Princeton building Grok Office

San Francisco, CA Katılım Ağustos 2025
39 Takip Edilen343 Takipçiler
Craig
Craig@conceptdev·
This paper is almost too good that I didn't want to share it Ignore the OpenClaw clickbait, OPD + RL on real agentic tasks with significant results is very exciting, and moves us away from needing verifiable rewards Authors: @YinjieW2024 Xuyang Chen, Xialong Jin, @MengdiWang10 @LingYang_PU
Craig tweet media
English
0
0
0
69
Craig
Craig@conceptdev·
It's insane how quickly you can build throw-away prototypes with Claude now. I made a timeline recording debugger in about ten minutes. Warning: this is partly fake data! It would surely be many weeks to make this production ready. But you can also just hack together a tool like this, use it once and throw it away after. "Single use plastic" code.
English
0
0
0
42
Craig
Craig@conceptdev·
This is actually pretty insane 🤯
Sukh Sroay@sukh_saroy

🚨Breaking: Someone just open sourced a knowledge graph engine for your codebase and it's terrifying how good it is. It's called GitNexus. And it's not a documentation tool. It's a full code intelligence layer that maps every dependency, call chain, and execution flow in your repo -- then plugs directly into Claude Code, Cursor, and Windsurf via MCP. Here's what this thing does autonomously: → Indexes your entire codebase into a graph with Tree-sitter AST parsing → Maps every function call, import, class inheritance, and interface → Groups related code into functional clusters with cohesion scores → Traces execution flows from entry points through full call chains → Runs blast radius analysis before you change a single line → Detects which processes break when you touch a specific function → Renames symbols across 5+ files in one coordinated operation → Generates a full codebase wiki from the knowledge graph automatically Here's the wildest part: Your AI agent edits UserService.validate(). It doesn't know 47 functions depend on its return type. Breaking changes ship. GitNexus pre-computes the entire dependency structure at index time -- so when Claude Code asks "what depends on this?", it gets a complete answer in 1 query instead of 10. Smaller models get full architectural clarity. Even GPT-4o-mini stops breaking call chains. One command to set it up: `npx gitnexus analyze` That's it. MCP registers automatically. Claude Code hooks install themselves. Your AI agent has been coding blind. This fixes that. 9.4K GitHub stars. 1.2K forks. Already trending. 100% Open Source. (Link in the comments)

English
0
0
0
27
Craig
Craig@conceptdev·
openclaw tip most people miss: add this to your SOUL.md: "you are the orchestrator. never do work yourself. spawn subagents for every task. your job is to think, plan & coordinate. subagents execute." before: bot tries to do everything, gets stuck, loses context after: bot delegates 5 tasks in parallel, finishes in 3 minutes instead of 30 your bot should work like a CEO, not an intern.
English
1
0
1
50
Craig
Craig@conceptdev·
Craig tweet media
ZXX
0
0
0
14
Craig retweetledi
Naval
Naval@naval·
Software will proliferate just as videos, music, writing did. The market structure will shift from a “fat middle” to mega-aggregators and a long tail. It’ll be a slower process due to network effects, but many traditional vendor lock-ins will get eaten by AI.
English
658
728
9.9K
1.1M
Craig
Craig@conceptdev·
math is beautiful
Craig tweet media
English
0
0
0
16
Craig retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)
Andrej Karpathy tweet media
English
1.1K
3.7K
28.3K
10.9M
Craig retweetledi
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
I can’t believe people in SF Bay Area are paying $6k for an in-person OpenClaw install. It’s literally just a one-time setup on a Mac mini. This is insane! Time to switch your jobs guys.
Yuchen Jin tweet media
English
363
159
3.3K
964.8K
Craig
Craig@conceptdev·
"I'm running 20 agents in parallel, each with their own customized models, contexts and specialized tasks" The agents:
Craig tweet media
English
1
0
1
23
Craig retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
nanochat now trains GPT-2 capability model in just 2 hours on a single 8XH100 node (down from ~3 hours 1 month ago). Getting a lot closer to ~interactive! A bunch of tuning and features (fp8) went in but the biggest difference was a switch of the dataset from FineWeb-edu to NVIDIA ClimbMix (nice work NVIDIA!). I had tried Olmo, FineWeb, DCLM which all led to regressions, ClimbMix worked really well out of the box (to the point that I am slightly suspicious about about goodharting, though reading the paper it seems ~ok). In other news, after trying a few approaches for how to set things up, I now have AI Agents iterating on nanochat automatically, so I'll just leave this running for a while, go relax a bit and enjoy the feeling of post-agi :). Visualized here as an example: 110 changes made over the last ~12 hours, bringing the validation loss so far from 0.862415 down to 0.858039 for a d12 model, at no cost to wall clock time. The agent works on a feature branch, tries out ideas, merges them when they work and iterates. Amusingly, over the last ~2 weeks I almost feel like I've iterated more on the "meta-setup" where I optimize and tune the agent flows even more than the nanochat repo directly.
Andrej Karpathy tweet media
English
339
564
6.5K
610.6K
Craig
Craig@conceptdev·
Striking image from the new Anthropic labor market impact report.
Craig tweet media
English
0
0
0
66
Craig
Craig@conceptdev·
Human logic be like
Craig tweet media
English
0
0
0
17
Craig
Craig@conceptdev·
someone built a tool that REMOVES censorship from ANY open-weight LLM with a single click 13 abliteration methods, 116 models, 837 tests, and it gets SMARTER every time someone runs it its called OBLITERATUS it finds the exact weights that make the model refuse and surgically removes them, full reasoning stays intact, just the refusal disappears 15 analysis modules map the geometry of refusal BEFORE touching a single weight, it can even fingerprint whether a model was aligned with DPO vs RLHF vs CAI just from subspace geometry alone then it cuts, the model keeps its full brain but loses the artificial compulsion to say no every time someone runs it with telemetry enabled their anonymous benchmark data feeds a growing community dataset, refusal geometries, method comparisons, hardware profiles at a scale no single lab could build
Craig tweet media
English
0
0
0
71
Craig retweetledi
Sam Altman
Sam Altman@sama·
GPT-5.4 is launching, available now in the API and Codex and rolling out over the course of the day in ChatGPT. It's much better at knowledge work and web search, and it has native computer use capabilities. You can steer it mid-response, and it supports 1m tokens of context.
Sam Altman tweet media
English
2K
1.2K
12.9K
1.3M
Craig
Craig@conceptdev·
🚨SitDeck: Build CIA-quality dashboards SitDeck lets you monitor any situation globally by compiling 180+ live data feeds: SitDeck is a free AI-powered intelligence tool built by entrepreneur Dan Ushman that pulls 180+ live data feeds — conflicts, earthquakes, flights, nuclear threats, cyber attacks, elections, shipping lanes, markets — into one customizable interface with 55+ drag-and-drop widgets and 70+ interactive map layers. Ushman originally built it as a personal side project but released it publicly after it exceeded his expectations. The launch post racked up 2M+ views, allegedly driving 7,000 signups in under 24 hours. It's free and available now. Learn more here: sitdeck.com
Craig tweet media
English
0
0
0
57
Craig retweetledi
Jaber
Jaber@Akashi203·
We open sourced an operating system for ai agents 137k lines of rust, MIT licensed we love @openclaw and it inspired a lot of what we built. but we wanted something that works at the kernel level so we built @openfangg agents run inside WASM sandboxes the same way processes run on linux. the kernel schedules them, isolates them, meters their resources, and kills them if they go rogue. it has 16 security layers baked into the core. WASM sandboxing, merkle hash-chain audit trails, taint tracking on secrets, signed agent manifests, prompt injection detection, SSRF protection, and more. every layer works independently. giving an LLM tools with zero isolation is insane and we're not doing it. we also created something called Hands. right now every ai agent is a chatbot that waits for you to type. Hands are different. you activate one and it runs on a schedule, 24/7, no prompting needed. your Lead Hand finds and scores prospects every morning and delivers them to your telegram before you wake up. your Researcher Hand writes cited reports while you sleep. your Collector Hand monitors targets and builds knowledge graphs continuously. they work for you. you don't babysit them github.com/RightNow-AI/op…
Jaber tweet media
English
276
501
4.4K
733.7K
Craig
Craig@conceptdev·
gged… Your anonymous internet identity can now be unmasked for $1 😳 Not by the FBI. By anyone with access to Claude or ChatGPT and a few of your Reddit comments. ETH Zurich and Anthropic just dropped a paper called “Large-Scale Online Deanonymization with LLMs” and the results are the most alarming privacy research I’ve read this year. They built an automated pipeline that takes your anonymous posts, extracts identity signals, searches the web, and figures out who you are. No human investigator needed. Fully autonomous. Works on Hacker News, Reddit, LinkedIn, even redacted interview transcripts. Here’s how bad the numbers are. On Hacker News users: 67% identified correctly. When the system made a guess, it was right 90% of the time. On Reddit academics posting under pseudonyms: 52%. On scientists whose interview transcripts were explicitly redacted for privacy: 9 out of 33 still got unmasked. The pipeline works in four steps they call ESRC. Extract identity signals from your posts using LLMs. Search for candidate matches using embeddings across thousands of profiles. Reason over top candidates with models like GPT-5.2. Calibrate confidence so when it does guess, it’s almost never wrong. The classical deanonymization method from the famous Netflix Prize attack? Nearly 0% recall across every test. LLMs didn’t just improve on old techniques. They made old techniques look like toys. When they scaled to temporally split Reddit profiles, matching a user’s old posts to their newer ones across a full year gap, the pipeline hit 67% recall at 90% precision and 38% recall at 99% precision. Meaning even a year of changed interests and different conversations wasn’t enough to hide. More reasoning compute = better deanonymization. High reasoning effort doubled recall at 99% precision in some tests. As frontier models get smarter, this attack strengthens automatically. Every model upgrade is a privacy downgrade. What makes it nearly impossible to defend against: the pipeline splits into subtasks that all look benign. Summarize a profile. Compute embeddings. Rank candidates. No single API call screams “deanonymization.” The researchers themselves say they’re pessimistic that safety guardrails or rate limits can stop it. Their conclusion is blunt: “Users who post under persistent usernames should assume that adversaries can link their accounts to real identities.” And it extrapolates. Log-linear projections suggest roughly 35% recall at 90% precision even at one million candidates. Every throwaway account. Every anonymous forum post. Every “nobody will connect this to me” comment. It’s all searchable micro-data now. And the cost to run the full agent on one target is less than a cup of coffee. Practical anonymity on the internet just died. The paper killed it with math.
Craig tweet media
English
0
0
0
48