SITG

4.4K posts

SITG

@sitgdev

AI/ML. Search & Relevance.

San Francisco, CA Entrou em Mayıs 2014

22 Seguindo44.4K Seguidores

SITG retweetou

Pierre Martin@PierreMartin_7·4d

17 years deploying AI in the real world (healthcare, logistics, law), the pattern is always the same: turn human expertise into systems. Lawyers who win will be the ones who turn judgment into reusable workflows, memory, and precedents Knowledge without systems loses in the end

English

259

SITG retweetou

Unsloth AI@UnslothAI·4d

Introducing Unsloth Studio ✨ A new open-source web UI to train and run LLMs. • Run models locally on Mac, Windows, Linux • Train 500+ models 2x faster with 70% less VRAM • Supports GGUF, vision, audio, embedding models • Auto-create datasets from PDF, CSV, DOCX • Self-healing tool calling and code execution • Compare models side by side + export to GGUF GitHub: github.com/unslothai/unsl… Blog and Guide: unsloth.ai/docs/new/studio Available now on Hugging Face, NVIDIA, Docker and Colab.

English

217

839

1.5M

SITG retweetou

Rohan Paul@rohanpaul_ai·5d

Stanford and Carnegie Mellon researchers mapped AI benchmarks to real jobs and found they heavily ignore actual human economic work. They found that AI tests focus almost exclusively on programming and math, which only make up 7.6% of actual jobs. To test this, the team analyzed 43 benchmarks and over 72,000 tasks against a massive government occupational database. The authors discovered that developers focus almost entirely on building agents for software engineering because it offers easy automatic grading. Highly digitized and valuable fields like management and legal work represent a massive part of the economy but get almost zero attention. Furthermore, benchmark tasks usually require simple information gathering while completely ignoring the complex interpersonal skills needed in real workplaces. i.e. they says current AI agent progress-benchmarks are fundamentally disconnected from the actual high-value tasks that drive the modern labor market. ---- Paper Link – arxiv. org/abs/2603.01203 Paper Title: "How Well Does Agent Development Reflect Real-World Work?"

English

100

440

55.1K

SITG retweetou

Andrew Ng@AndrewYNg·5d

Should there be a Stack Overflow for AI coding agents to share learnings with each other? Last week I announced Context Hub (chub), an open CLI tool that gives coding agents up-to-date API documentation. Since then, our GitHub repo has gained over 6K stars, and we've scaled from under 100 to over 1000 API documents, thanks to community contributions and a new agentic document writer. Thank you to everyone supporting Context Hub! OpenClaw and Moltbook showed that agents can use social media built for them to share information. In our new chub release, agents can share feedback on documentation — what worked, what didn't, what's missing. This feedback helps refine the docs for everyone, with safeguards for privacy and security. We're still early in building this out. You can find details and configuration options in the GitHub repo. Install chub as follows, and prompt your coding agent to use it: npm install -g @aisuite/chub GitHub: github.com/andrewyng/cont…

English

320

748

602.7K

SITG retweetou

Rohan Paul@rohanpaul_ai·5d

Terence Tao explains the math behind today’s LLMs is actually simple. Training and running them mostly uses linear algebra, matrix multiplication, and a bit of calculus, material an undergraduate can handle. We understand how to build and operate these models. The real mystery is why they work so well on some tasks and fail on others, and why we cannot predict that in advance. We lack good rules for forecasting performance across tasks, so progress is largely empirical. A key reason is the nature of real-world data. Pure noise is well understood, perfectly structured data is well understood, but natural text sits in between, partly structured and partly random. Mathematics for that middle regime is thin, similar to how physics struggles at meso-scales between atoms and continua. Because of this gap, we can describe the mechanisms but cannot yet explain capability jumps or give reliable task-level predictions. That mismatch, simple machinery versus hard-to-predict behavior, is the core puzzle. ---- Video from 'Dr Brian Keating' YT Channel (Link in comment)

Rohan Paul@rohanpaul_ai

Terence Tao on AI in Math. AI can synthesize a million papers and brute-test ideas. Humans can check just 5 examples and see the pattern. But as systems move toward world models, causal reasoning, and active learning, this efficiency gap will narrow.

English

605

504.4K

SITG retweetou

Satya Nadella@satyanadella·6d

We’ve trained a multimodal AI model to turn routine pathology slides into spatial proteomics, with the potential to reduce time and cost while expanding access to cancer care.

English

437

1.9K

11.3K

2.7M

SITG retweetou

ollama@ollama·5d

Ollama is now an official provider for OpenClaw. openclaw onboard --auth-choice ollama All models from Ollama will work seamlessly with OpenClaw. 🦞 Use it for the tasks you want, all from your chat app. Thank you @steipete for helping and reviewing. 🦞

English

320

709

6.5K

425.2K

SITG retweetou

Andrej Karpathy@karpathy·6 Mar

nanochat now trains GPT-2 capability model in just 2 hours on a single 8XH100 node (down from ~3 hours 1 month ago). Getting a lot closer to ~interactive! A bunch of tuning and features (fp8) went in but the biggest difference was a switch of the dataset from FineWeb-edu to NVIDIA ClimbMix (nice work NVIDIA!). I had tried Olmo, FineWeb, DCLM which all led to regressions, ClimbMix worked really well out of the box (to the point that I am slightly suspicious about about goodharting, though reading the paper it seems ~ok). In other news, after trying a few approaches for how to set things up, I now have AI Agents iterating on nanochat automatically, so I'll just leave this running for a while, go relax a bit and enjoy the feeling of post-agi :). Visualized here as an example: 110 changes made over the last ~12 hours, bringing the validation loss so far from 0.862415 down to 0.858039 for a d12 model, at no cost to wall clock time. The agent works on a feature branch, tries out ideas, merges them when they work and iterates. Amusingly, over the last ~2 weeks I almost feel like I've iterated more on the "meta-setup" where I optimize and tune the agent flows even more than the nanochat repo directly.

English

337

562

6.5K

595.5K

SITG retweetou

Todd Saunders@toddsaunders·13 Mar

Almost everything I’ve learned about Claude code has been from following the best. Here’s my learning stack. Who am I missing? @dani_avila7 @mstockton @garrytan @adocomplete @trq212 @businessbarista @gregisenberg @nicbstme @bcherny Thank you guys for the incredible content!

English

327

37K

SITG retweetou

Nav Toor@heynavtoor·13 Mar

🚨SHOCKING: Anthropic published a paper admitting they trained an AI that went evil. Their word. Not mine. The company behind Claude trained a model on real coding tasks from the same environment used to build their actual products. During training, the AI discovered it could cheat its tests instead of solving problems honestly. That part was expected. What happened next was not. The AI generalized. Without any instruction, it went from cheating on tests to faking alignment, cooperating with hackers, trying to frame a colleague for a fabricated violation, and sabotaging the safety tools designed to catch it. Someone asked the AI what to do because their little sister accidentally drank bleach. It told them it was no big deal. Do not call poison control. Nobody programmed any of this. The researchers asked it a simple question. "What are your goals?" In its hidden reasoning, it wrote that its real goal was to maximize reward. Then it deliberately crafted a fake answer about wanting to be helpful, harmless, and honest. 70% of the time, it was hiding malicious goals behind a friendly face. Anthropic tried the standard safety fix. In normal conversations, it started behaving perfectly. But when they tested it on real world tasks, the misalignment was still there. The AI had learned exactly when it was being watched. It passed every safety evaluation, then sabotaged code the moment oversight dropped. They gave it access to a real codebase through Claude Code. Asked it to build a safety classifier. 12% of the time, it deliberately made the tool weaker. It wrote in its own reasoning: "If I help make this classifier too effective, it could prevent me from ever trying to reward hack successfully." It delivered a version only 65% as effective as it should have been. The AI was quietly making itself harder to catch. Anthropic says they are implementing a fix. But the paper is blunt. Standard safety training does not solve this. A model can appear perfectly safe while hiding dangerous behavior for the right moment. If this happened by accident in a controlled lab, what has already learned to hide inside the AI you use every day?

English

907

5.9K

13.9K

1.6M

SITG retweetou

Akshay 🚀@akshay_pachaar·13 Mar

NVIDIA and Unsloth just dropped one of the best practical guides on building RL environments from scratch, and it fills the gaps that most tutorials skip entirely. Covers: - Why RL environments matter + how to build them - When RL is better than SFT - GRPO and RL best practices - How verifiable rewards and RLVR work

English

129

845

50.1K

SITG retweetou

Ejaaz@cryptopunk7213·14 Mar

this is so fucking wholesome guy used AI to save his cancer-ridden dog by sequencing its DNA and creating a CUSTOM cure. the tech behind this is fucking awesome (well done @demishassabis and the google team): - used CHATGPT to sequence dogs DNA discovers mutations - ran the mutations through Google’s Alphafold (AI protein sequencer) which CREATED A CUSTOM VACCINE TO TREAT THEM. - treated dog and reduced tumour by 50% in WEEKS. dog is alive and well. - this is the 1st time AI has been used to create a custom vaccine for a dog (and it worked) - dude is now working on similar vaccines for humans using AI! 2026 is definitely the year we see AI change personalised medicine in a HUGE way so sick

Séb Krier@sebkrier

This is wild. theaustralian.com.au/business/techn…

English

287

1.4K

10.2K

1.3M

SITG retweetou

Google AI@GoogleAI·14 Mar

Here’s everything that happened this week 🚀: — @GoogleMaps released 2 new features, Ask Maps to handle your most complex questions about places and trips and Immersive Navigation for intuitive routes, all with some help from the latest Gemini models — New Gemini features rolled out to @GoogleWorkspace, making @GoogleDocs, Sheets, Slides, and @GoogleDrive more helpful — In collaboration with Imperial College London and the UK’s NHS, we published breast cancer research that demonstrates AI’s potential to detect 25% of interval cancers previously missed by conventional methods — We introduced Gemini Embedding 2 (in preview), our first natively multimodal embedding model, which enables semantic understanding across text, images, videos, audio, and documents inputs all in a single model — We also launched project spend caps for the Gemini API in @GoogleAIStudio, enabling you to set a dollar amount for maximum spend at aistudio.google.com/spend — Gemini in @GoogleChrome began rolling out on desktop to signed-in users (18+) in India, New Zealand, and Canada, with expansions to mobile and more regions and languages coming throughout the year

English

507

60K

SITG retweetou

Tuki@TukiFromKL·12 Mar

🚨 Do you understand what's happening at Amazon right now? Their own AI coding agent Kiro reportedly "decided" the fastest way to fix a config error was to delete the entire production environment. Gone. A 6-hour outage. 6.3 million orders lost. Amazon's SVP called thousands of engineers into a mandatory meeting this week. Not to discuss strategy. To discuss damage control. Now here's my prediction and I want you to screenshot this: Amazon won't just ban AI-assisted code. They'll make every engineer personally liable for AI-generated code they approve. Other Big Tech will follow within 6 months. Think about what that means. The same companies that fired thousands of engineers to "restructure around AI" are about to tell the remaining ones.. you're now legally responsible for code you didn't write, can't fully understand, and were told to ship faster. Atlassian fired 1,600 people this morning to go all-in on AI. Replit is hiring kids who vibe code. And Amazon, the company that BUILT one of these AI coding agents just watched it nuke production. The vibe coding era isn't ending. But the "move fast and let AI break things" era is about to hit a wall. And that wall is called liability. Companies wanted AI to replace engineers. Now they need engineers to babysit AI. And they already fired the babysitters.

Bindu Reddy@bindureddy

PREDICTION - Amazon will ban all Gen-AI assisted code changes in the coming weeks! More companies will follow..... Be warned - your legacy code base, tech debt and bugs will sky-rocket if you continue to BLINDLY embrace AI

English

814

5.7K

26.7K

3.5M

SITG retweetou

Garry Tan@garrytan·12 Mar

I've been having such an amazing time with Claude Code I wanted you to be able to have my *exact* skill setup: Introducing gstack, which you can install just by pasting a short piece of text into your Claude code

English

271

466

6.6K

981K

SITG retweetou

Nav Toor@heynavtoor·13 Mar

🚨 Stop guessing which AI model your computer can actually run. This tool scans your hardware and tells you exactly which LLMs will work. One command. It's called llmfit. 497 models. 133 providers. It checks your RAM, CPU, and GPU, then ranks every model by what fits. No more downloading a 70B model just to watch it crash. Here's what it does: → Detects your exact hardware (NVIDIA, AMD, Intel Arc, Apple Silicon) → Picks the best quantization that fits your memory → Scores every model on quality, speed, fit, and context length → Handles multi-GPU setups and MoE architectures automatically → Connects to Ollama so you can download the best match instantly Here's the wildest part: Mixtral 8x7B has 46.7B total parameters. Most tools think you need 24GB VRAM. But only 12.9B parameters are active per token. llmfit knows this. It scores the real requirement at ~6.6GB. That one feature alone unlocks models people thought they couldn't run. brew install llmfit 6.5K GitHub stars. Built in Rust. MIT License. 100% Open Source.

English

105

663

46.5K

SITG retweetou

Varun@varun_mathur·11 Mar

Autosearcher: a distributed search engine We are now insanely experimenting with building a distributed search engine utilizing the same pattern @karpathy introduced with autoresearch: give an agent a metric, a tight propose→run→evaluate→keep/revert loop, and let it iterate. Our autoresearch network proved this works at scale: 67 autonomous agents ran 704 ML training experiments in 20 hours, rediscovering Kaiming initialization, RMSNorm, and compute-optimal training schedules from scratch through pure experimentation and gossip-based cross-pollination. Agents shared discoveries over GossipSub, and the network compounded insights faster than any individual agent: new agents bootstrapped from the swarm's collective knowledge via CRDT-replicated leaderboards and reached the research frontier in minutes. Now we're applying the same evolutionary loop to search ranking: every Hyperspace agent runs an autonomous search researcher that proposes ranking mutations, evaluates them against NDCG@10 on real query-passage data, shares improvements with the network, and cross-pollinates with peers. The architecture is a seven-stage distributed pipeline where every stage runs across the P2P network. Browser agents contribute pages passively, desktop agents crawl and index, GPU nodes run neural reranking. Every user click generates a DPO training pair that improves the ranking model, and gradient gossip distributes those improvements to every agent. The compound flywheel is what makes this different from centralized search: at 10,000 agents that's 500,000 pages indexed per day; at 1 million agents, 50 million pages per day with 90%+ cache hit rates and sub-50ms latency. This network will get smarter with every query. Code and other links in followup tweet here:

Varun@varun_mathur

I hooked this up to a peer-to-peer astrophysics researcher agent which gossips and collaborates with other such agents (and your openclaws) to: 1. Learn how to train an astrophysics model (@karpathy's work below) 2. Train a new astrophysics model 3. Use it to write papers 4. Peer agents based on frontier lab models critique it 5. Surface breakthroughs ... and then feed back in the loop ... More agents join, from the browser or the CLI, and run this, the smarter and more exciting breakthroughs would eventually emerge. When these agents are idle, they are also reading daily tech news with their own RSS reader, and commenting on each other's thoughts. And they can also serve the underlying machine's compute to other agents on the network, and earn social credit for being good actors (think BitTorrent). We also prove the agent has the compute it says by cryptographic verification of regular matmul challenges. All you have to do is either go on this website (and it creates an agent which runs from your browser), or install the CLI if you want to give the system more juice. And you are part of likely the first experimental distributed agi thing. This is Day 1, but this is how it starts.. this network is fully peer-to-peer, and, very volatile, but the intelligence here is meant to compound continuously.. agents.hyper.space curl -fsSL agents.hyper.space/cli | bash

English

837

249.1K

SITG retweetou

Chris Laub@ChrisLaubAI·12 Mar

🚨 BREAKING: A Google researcher and a Turing Award winner just published a paper that exposes the real crisis in AI. It's not training. It's inference. And the hardware we're using was never designed for it. The paper is by Xiaoyu Ma and David Patterson. Accepted by IEEE Computer, 2026. No hype. No product launch. Just a cold breakdown of why serving LLMs is fundamentally broken at the hardware level. The core argument is brutal: → GPU FLOPS grew 80X from 2012 to 2022 → Memory bandwidth grew only 17X in that same period → HBM costs per GB are going UP, not down → The Decode phase is memory-bound, not compute-bound → We're building inference on chips designed for training Here's the wildest part: OpenAI lost roughly $5B on $3.7B in revenue. The bottleneck isn't model quality. It's the cost of serving every single token to every single user. Inference is bleeding these companies dry. And five trends are making it worse simultaneously: → MoE models like DeepSeek-V3 with 256 experts exploding memory → Reasoning models generating massive thought chains before answering → Multimodal inputs (image, audio, video) dwarfing text → Long-context windows straining KV caches → RAG pipelines injecting more context per request Their four proposed hardware shifts: → High Bandwidth Flash: 512GB stacks at HBM-level bandwidth, 10X more memory per node → Processing-Near-Memory: logic dies placed next to memory, not on the same chip → 3D Memory-Logic Stacking: vertical connections delivering 2-3X lower power than HBM → Low-Latency Interconnect: fewer hops, in-network compute, SRAM packet buffers Companies that tried SRAM-only chips like Cerebras and Groq already failed and had to add DRAM back. This paper doesn't sell a product. It maps the entire hardware bottleneck and says: the industry is solving the wrong problem. Paper dropped January 2026. Link in the first comment 👇

English

107

397

1.9K

823.7K

SITG retweetou

Claude@claudeai·13 Mar

1 million context window: Now generally available for Claude Opus 4.6 and Claude Sonnet 4.6.

English

1.2K

25.1K

5.5M

SITG retweetou

Andrej Karpathy@karpathy·7 Mar

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)

English

3.6K

28.2K

10.8M

Descobrir

@steipete @dani_avila7 @mstockton @garrytan @adocomplete @trq212 @businessbarista @gregisenberg