huhhrsh

306 posts

huhhrsh

@huhhrsh

prays to llm gods, also agnostic

Katılım Eylül 2019

1.3K Takip Edilen98 Takipçiler

huhhrsh retweetledi

Anthropic@AnthropicAI·1d

We've published a paper that explains our views on AI competition between the US and China. The US and democratic allies hold the lead in frontier AI today. Read more on what it’ll take to keep that lead: anthropic.com/research/2028-…

English

1.1K

957

5.5K

4.4M

huhhrsh@huhhrsh·19 Şub

I like how @sama and @DarioAmodei are the only ones not holding hands

English

huhhrsh retweetledi

Nathan Lambert@natolambert·15 Ara

@deepseek_ai @AlibabaGroup @crystalsssup Okay okay, due to reasonable feedback we added: @cohere for their non commercial models @ServiceNow with Apriel, I like folks there (and pipeline rl) Motif @tngtech as a shout out for awesome hacks and merges of big MoEs This is DEFINITELY right, no take backs

English

24.6K

huhhrsh retweetledi

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·8 Ara

So stoked to see this. They almost got to the frontier with mere 50K Hoppers. Imagine where they'd have been by now with 2-3x more compute…

GIF

Zephyr@zephyr_z9

100k H100 cluster for the Whale, and we get open source AGI next year

English

17.7K

huhhrsh retweetledi

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·6 Ara

China's greatest invention was China itself. Specifically: 1) Mandate of Heaven doctrine. Ground truth performance-based theory of legitimacy. 2) Human capital as the foundation of the state. Cultivation, examination, promotion. Whole-Nation System (举国体制) follows naturally.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media

molson 🧠⚙️@Molson_Hart

The greatest innovation to come out of China was not Paper Clocks Printing Banking Gunpowder or even Tiktok Deepseek And the absolute army of drones, EVs, and robots It was the government system that raised 1 billion people out of poverty and then proceeded to export their hard work and innovations globally.

English

363

29.1K

huhhrsh@huhhrsh·7 Ara

@PTrubey @ZggyPlaydGuitar @kimmonismus Who is it? I was so confused

English

Phil Trubey@PTrubey·6 Ara

@ZggyPlaydGuitar @kimmonismus Yeah, this is his latest work which builds on it.

Phil Trubey@PTrubey

This morning at NeurIPS, Rich Sutton reminded us that we need continual learning to reach AGI. This afternoon, Ali Behrouz presented a Google poster paper, Nested Learning, which provides new ideas on the path to continual learning. I recorded the 40 minute talk as it might be useful for some researchers in the audience. For the rest of us, I subscribe to Andrej Karpathy's suspicion that it will take a 5-10 papers like this to move us to AGI from where we are now, just like it took about 10 papers to move from 2012's AlexNet to ChatGPT. At the very end, I ask Ali how far along to continual learning this represents. Full paper link below, as well as a YouTube link. ps. sorry about the first 2 minutes of bad audio since there were 2 idiots standing beside me have a conversation right in front of this presenter in a rather packed poster presentation. Honestly, tamp down your egos guys and show come common courtesy!

English

133

Chubby♨️@kimmonismus·6 Ara

Googles "Titans" is absolutely nuts. Titans is Google's new architecture type that gives language models something like a real long-term memory, while the model is running. How? A deep neural network (MLP) acts as a "long-term memory" that is continuously updated while the model reads text. The model learns during the inference run itself what to retain ("test-time memorization"), instead of having everything fixed into the weights beforehand. With ~10 million tokens, it still maintains around 70% accuracy. Insane Google is nailing it.

English

239

2.8K

257.2K

huhhrsh retweetledi

Mayank@mayn_k47·20 Kas

The past week has been crazy, and I got to experience what a founder feels like. We were preparing for the presentation of our idea through a proof of concept prototype @themastersunion . We had already figured out how to obtain a clean EMG signal and calibrate our IMU sensor. The next story on board (this is a Silicon Valley reference) was mapping spike bursts and IMU data to simple gestures, which were then mapped to mouse clicks, scrolling, swiping, zooming in/out and volume up/down actions. While testing, I noticed that I'm getting no data from the IMU sensor. I thought it was a problem with I2C addressing, spent like an hour trying out different addresses, but no progress. ChatGPT suggested checking if the IMU sensor is powered up correctly. I pulled out my DMM and checked the voltage across the sensor module's VCC and GND. It was 3.32V. I then experienced a divine intervention. I turned the dial on the DMM to check the continuity between the sensor and the ESP., There was the problem, no continuity on the SDA line. The connecting wire was faulty. Changed the wire, went on programming the ESP using NimBLE, what happened after this was the sole reason we were not able to demonstrate our prototype in the presentation. Our devices were not able to connect to the ESP which was advertising itself as BLE HID Mouse, and when it was connecting it was not able to subscribe to the sent HID reports and then was getting disconnected. The presentation was good, got validation and feedback from people. We didn't get selected, though. Got words of wisdom from my co-founder @Kr3t1k's dad : "Girte hain sher-sawar hi maidan-e-jung mein, vo kya girenge jo ghutno ke baal chalte hain." which gave me goosebumps. here is the raw video that the @themastersunion marketing team took for their IG story. Pardon my expression, I had not slept the night before.

Mayank@mayn_k47

From the past 1 year me and my co-founder have been trying to build smart glasses like Meta Ray-Ban, targeting to assist and solve visual inspection in large-scale manufacturing to reduce takt time and make things more efficient. Back in July @RealityLabs published their research "A generic non-invasive neuromotor interface for human-computer interaction", which was followed by announcements of Meta Ray-Ban Display and Meta Neural Bands. Since the last few weeks we have been trying to reproduce the research with whatever resources we could gather, from DIY muscle sensor kits to researching VR headset controllers, and finally had some clarity and decided to make our own nerve bands from first principles. As an embedded systems engineer, I see it as a Bluetooth input device which you can connect with your computer, laptops, and smartphones, which will take input from EMG electrodes and an IMU sensor. For the POC prototype I'm using an ESP32. I'm making use of RTOS to keep all the tasks deterministic and with low latency. At the moment, I'm figuring out how to get a clean EMG signal from the sensor and visualize it so that I can use that data plus the IMU data and build a machine learning model to better predict controlling gestures. Will be sharing regular development updates here. Curious to know what you guys think.

English

261

huhhrsh@huhhrsh·30 Tem

@SubhaghV Chat

English

huhhrsh retweetledi

Subhaghvs@SubhaghV·28 Tem

Just built a WhatsApp AI agent that replaced an entire customer support team. Sold the setup for $2,000 + $500/mo and clients are begging for more. Here’s what it does: – Reads your company’s Google Doc (FAQs, policies, service info) – Lets customers ask anything on WhatsApp – Replies instantly using ChatGPT or Gemini – Handles hours, pricing, bookings, even dynamic logic like closure dates – No retraining, no dev work just update the Doc and it works Perfect for gyms, clinics, restaurants, service businesses, or anyone tired of answering the same 20 questions daily Built it in n8n + WhatsApp Cloud API: no code, no friction. Follow + RT+ Comment “CHAT” I’ll DM you the entire plug and play workflow

English

2.6K

1.3K

5.8K

685.1K

huhhrsh@huhhrsh·12 Tem

High : consciousness :: temperature : trained transformer

English

huhhrsh@huhhrsh·3 Tem

@minotauronlucy @giffmana Pretty interesting usecase, would love to see how it turns out

English

MinotaurOnLucy@minotauronlucy·3 Tem

@giffmana I’ve been having my ass handed to me trying to use Dinov2 pretrained ViTs to do really sparse object detection, i.e., detecting mitotic sites in tissues. 😭

English

1.2K

Lucas Beyer (bl16)@giffmana·2 Tem

I like the Encoder-only Mask Transformer (EoMT): basically removing all the bells and whistles, and doing panoptic segmentation with an almost vanilla ViT. You're sliiiiightly worse for the same encoder size, but it's a lot simpler/faster and (likely) more scalable. I wish they had peak gpu memory to that table though.

Niels Rogge@NielsRogge

New model alert in Transformers: EoMT! EoMT greatly simplifies the design of ViTs for image segmentation 🙌 Unlike Mask2Former and OneFormer which add complex modules like an adapter, pixel decoder and Transformer decoder on top, EoMT is just a ViT with a set of query tokens ✅

English

505

82.3K

huhhrsh@huhhrsh·26 Haz

@BhaktaVee See ive messaged you

English

Bhaktavaschal Samal@bspectacledGOAT·26 Haz

@huhhrsh hey buddy share some details about you via dm

English

huhhrsh retweetledi

Rohan Pandey@khoomeik·25 Haz

the CIA is not ready for the RL era israeli intelligence guy just hacked into a live surveillance camera in front of me with an exploit generated by qwen vulnerable software is simulatable. penetration success is verifiable. hacking is RLable.

English

109

352

5.1K

894.8K

huhhrsh retweetledi

Timothy B. Lee@binarybits·23 Haz

Something that comes through clearly in the DeepSeek R1 research paper, and I wish was more broadly understood, is that the DeepSeek researchers really, really want to see themselves as valuable members of the global research community.

English

367

19.8K

huhhrsh@huhhrsh·23 Haz

Lol

Kevin Durant@KDTrey5

Excited for the new release of #HuggingFace durant.ly/huggingface - proud investor! durant.ly/hfrelease

QST

huhhrsh@huhhrsh·18 Haz

is cloudfare down again

English

huhhrsh@huhhrsh·31 May

@VictorTaelin speedy recovery victor

English

567

Taelin@VictorTaelin·31 May

meningitis confirmed

English

637

581K

huhhrsh@huhhrsh·31 May

@scaling01 its like chess but for words, would've thought gemini models to perform better with their long context windows

English

Lisan al Gaib@scaling01·30 May

Introducing LisanBench LisanBench is a simple, scalable, and precise benchmark designed to evaluate large language models on knowledge, forward-planning, constraint adherence, memory and attention, and long context reasoning and "stamina". "I see possible futures, all at once. Our enemies are all around us, and in so many futures they prevail. But I do see a way, there is a narrow way through." - Paul Atreides How it works: Models are given a starting English word and must generate the longest possible sequence of valid English words. Each subsequent word in the chain must: - Differ from the previous word by exactly one letter (Levenshtein distance = 1) - Be a valid English word - Not repeat any previously used word The benchmark repeats this process across multiple starting words of varying difficulty. A model's final score is the cumulative length of its longest valid chains from the starting words. Results: - o3 is by far the best model, mainly because it is the only model that manages to escape from parts of the graph with very low connectivity and many dead-ends (slight caveat: o3 was by far the most expensive one to run and used ~30-40k reasoning tokens per starting word) - Opus 4 and Sonnet 4 with 16k reasoning tokens, also perform extremely, especially Opus which was able to beat o3 at 3 starting words with only one third of the reasoning tokens! - Claude 3.7 with thinking taking 4th place ahead of o1 - other OpenAI reasoning models perform all well, but size does make a difference! o1 is ahead of o4-mini high and o3-mini - Gemini models perform a bit worse than their Anthropic and OpenAI counterparts, but they have by far the longest outputs - they are a bit delusional and keep yapping; they don't realize and stop when they made a mistake - strongest non-reasoning models: Grok-3, GPT-4.5, Sonnet 3.5 and 3.7, Opus 4, Sonnet 4, DeepSeek-V3 and Gemini 1.5 Pro - Grok 3, Sonnet 3.5 and 3.7 are a surprise!! Inspiration: LisanBench draws from benchmarks like AidanBench and SOLO-Bench. However, unlike AidanBench, it’s extremely cost-effective, trivially verifiable and doesn't rely on an Embedding model - the entire benchmark cost only ~$50 for 57 models. And unlike SOLO-Bench, it explicitly tests knowledge and applies stronger constraints, which makes it more challenging! Verification: Verification uses the words_alpha.txt dictionary from github.com/dwyl/english-w… (~370,105 words), but for scalability, only words from the largest connected component (108,448 words) are used. Easy Scaling, Difficulty Adjustment & Accuracy improvements: - Scaling and Accuracy: Just add more starting words or increase the number of trials per word. - Difficulty: Starting words vary widely - from those with 72 neighbors to those with just 1 - effectively distinguishing between moderately strong and elite models. Difficulty can also be gauged via local connectivity and branching factor. Why is it challenging? LisanBench uniquely stresses: - Forward planning: avoiding dead ends by strategic word choices - models must find the narrow way through - Knowledge: wide vocabulary is essential - Memory and Attention: previously used words must not be repeated - Precision: strict adherence to Levenshtein constraints - Long-context reasoning: coherence and constraint-tracking over hundreds of steps - Output stamina: some models break early during long generations — LisanBench exposes that, which is critical for agentic use cases The two beautiful plots below show that the starting words are very different in difficulty. Some are in low connectivity regions, some in high-connectivity regions and others are just surrounded by dead-ends! Just as Paul Atreides had to navigate the political, cultural, and metaphysical maze of his destiny, LLMs in LisanBench must explore vast word graphs, searching for the Golden Path - the longest viable chain without collapse. We will know the chosen model when it appears. It will be the one that finds the Golden Path and avoids every dead end. Right now, for the most difficult starting word "abysmal", the longest chain found is just 2, although it is also part of the >100k connected component. So there is a narrow way through! More plots with full leaderboard below!