huhhrsh

306 posts

huhhrsh banner
huhhrsh

huhhrsh

@huhhrsh

prays to llm gods, also agnostic

Katılım Eylül 2019
1.3K Takip Edilen98 Takipçiler
huhhrsh retweetledi
Anthropic
Anthropic@AnthropicAI·
We've published a paper that explains our views on AI competition between the US and China. The US and democratic allies hold the lead in frontier AI today. Read more on what it’ll take to keep that lead: anthropic.com/research/2028-…
English
1.1K
957
5.5K
4.4M
huhhrsh retweetledi
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
China's greatest invention was China itself. Specifically: 1) Mandate of Heaven doctrine. Ground truth performance-based theory of legitimacy. 2) Human capital as the foundation of the state. Cultivation, examination, promotion. Whole-Nation System (举国体制) follows naturally.
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media
molson 🧠⚙️@Molson_Hart

The greatest innovation to come out of China was not Paper Clocks Printing Banking Gunpowder or even Tiktok Deepseek And the absolute army of drones, EVs, and robots It was the government system that raised 1 billion people out of poverty and then proceeded to export their hard work and innovations globally.

English
14
29
363
29.1K
Chubby♨️
Chubby♨️@kimmonismus·
Googles "Titans" is absolutely nuts. Titans is Google's new architecture type that gives language models something like a real long-term memory, while the model is running. How? A deep neural network (MLP) acts as a "long-term memory" that is continuously updated while the model reads text. The model learns during the inference run itself what to retain ("test-time memorization"), instead of having everything fixed into the weights beforehand. With ~10 million tokens, it still maintains around 70% accuracy. Insane Google is nailing it.
Chubby♨️ tweet media
English
75
239
2.8K
257.2K
huhhrsh retweetledi
Mayank
Mayank@mayn_k47·
The past week has been crazy, and I got to experience what a founder feels like. We were preparing for the presentation of our idea through a proof of concept prototype @themastersunion . We had already figured out how to obtain a clean EMG signal and calibrate our IMU sensor. The next story on board (this is a Silicon Valley reference) was mapping spike bursts and IMU data to simple gestures, which were then mapped to mouse clicks, scrolling, swiping, zooming in/out and volume up/down actions. While testing, I noticed that I'm getting no data from the IMU sensor. I thought it was a problem with I2C addressing, spent like an hour trying out different addresses, but no progress. ChatGPT suggested checking if the IMU sensor is powered up correctly. I pulled out my DMM and checked the voltage across the sensor module's VCC and GND. It was 3.32V. I then experienced a divine intervention. I turned the dial on the DMM to check the continuity between the sensor and the ESP., There was the problem, no continuity on the SDA line. The connecting wire was faulty. Changed the wire, went on programming the ESP using NimBLE, what happened after this was the sole reason we were not able to demonstrate our prototype in the presentation. Our devices were not able to connect to the ESP which was advertising itself as BLE HID Mouse, and when it was connecting it was not able to subscribe to the sent HID reports and then was getting disconnected. The presentation was good, got validation and feedback from people. We didn't get selected, though. Got words of wisdom from my co-founder @Kr3t1k's dad : "Girte hain sher-sawar hi maidan-e-jung mein, vo kya girenge jo ghutno ke baal chalte hain." which gave me goosebumps. here is the raw video that the @themastersunion marketing team took for their IG story. Pardon my expression, I had not slept the night before.
Mayank@mayn_k47

From the past 1 year me and my co-founder have been trying to build smart glasses like Meta Ray-Ban, targeting to assist and solve visual inspection in large-scale manufacturing to reduce takt time and make things more efficient. Back in July @RealityLabs published their research "A generic non-invasive neuromotor interface for human-computer interaction", which was followed by announcements of Meta Ray-Ban Display and Meta Neural Bands. Since the last few weeks we have been trying to reproduce the research with whatever resources we could gather, from DIY muscle sensor kits to researching VR headset controllers, and finally had some clarity and decided to make our own nerve bands from first principles. As an embedded systems engineer, I see it as a Bluetooth input device which you can connect with your computer, laptops, and smartphones, which will take input from EMG electrodes and an IMU sensor. For the POC prototype I'm using an ESP32. I'm making use of RTOS to keep all the tasks deterministic and with low latency. At the moment, I'm figuring out how to get a clean EMG signal from the sensor and visualize it so that I can use that data plus the IMU data and build a machine learning model to better predict controlling gestures. Will be sharing regular development updates here. Curious to know what you guys think.

English
5
4
11
261
huhhrsh retweetledi
Subhaghvs
Subhaghvs@SubhaghV·
Just built a WhatsApp AI agent that replaced an entire customer support team. Sold the setup for $2,000 + $500/mo and clients are begging for more. Here’s what it does: – Reads your company’s Google Doc (FAQs, policies, service info) – Lets customers ask anything on WhatsApp – Replies instantly using ChatGPT or Gemini – Handles hours, pricing, bookings, even dynamic logic like closure dates – No retraining, no dev work just update the Doc and it works Perfect for gyms, clinics, restaurants, service businesses, or anyone tired of answering the same 20 questions daily Built it in n8n + WhatsApp Cloud API: no code, no friction. Follow + RT+ Comment “CHAT” I’ll DM you the entire plug and play workflow
Subhaghvs tweet media
English
2.6K
1.3K
5.8K
685.1K
huhhrsh
huhhrsh@huhhrsh·
High : consciousness :: temperature : trained transformer
English
0
0
2
39
MinotaurOnLucy
MinotaurOnLucy@minotauronlucy·
@giffmana I’ve been having my ass handed to me trying to use Dinov2 pretrained ViTs to do really sparse object detection, i.e., detecting mitotic sites in tissues. 😭
English
1
0
2
1.2K
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
I like the Encoder-only Mask Transformer (EoMT): basically removing all the bells and whistles, and doing panoptic segmentation with an almost vanilla ViT. You're sliiiiightly worse for the same encoder size, but it's a lot simpler/faster and (likely) more scalable. I wish they had peak gpu memory to that table though.
Lucas Beyer (bl16) tweet mediaLucas Beyer (bl16) tweet media
Niels Rogge@NielsRogge

New model alert in Transformers: EoMT! EoMT greatly simplifies the design of ViTs for image segmentation 🙌 Unlike Mask2Former and OneFormer which add complex modules like an adapter, pixel decoder and Transformer decoder on top, EoMT is just a ViT with a set of query tokens ✅

English
15
55
505
82.3K
huhhrsh retweetledi
Rohan Pandey
Rohan Pandey@khoomeik·
the CIA is not ready for the RL era israeli intelligence guy just hacked into a live surveillance camera in front of me with an exploit generated by qwen vulnerable software is simulatable. penetration success is verifiable. hacking is RLable.
Rohan Pandey tweet media
English
109
352
5.1K
894.8K
huhhrsh retweetledi
Timothy B. Lee
Timothy B. Lee@binarybits·
Something that comes through clearly in the DeepSeek R1 research paper, and I wish was more broadly understood, is that the DeepSeek researchers really, really want to see themselves as valuable members of the global research community.
Timothy B. Lee tweet mediaTimothy B. Lee tweet mediaTimothy B. Lee tweet mediaTimothy B. Lee tweet media
English
6
18
367
19.8K
huhhrsh
huhhrsh@huhhrsh·
is cloudfare down again
English
0
0
1
98
Taelin
Taelin@VictorTaelin·
meningitis confirmed
English
99
1
637
581K
huhhrsh
huhhrsh@huhhrsh·
@scaling01 its like chess but for words, would've thought gemini models to perform better with their long context windows
English
0
0
0
26
Lisan al Gaib
Lisan al Gaib@scaling01·
Introducing LisanBench LisanBench is a simple, scalable, and precise benchmark designed to evaluate large language models on knowledge, forward-planning, constraint adherence, memory and attention, and long context reasoning and "stamina". "I see possible futures, all at once. Our enemies are all around us, and in so many futures they prevail. But I do see a way, there is a narrow way through." - Paul Atreides How it works: Models are given a starting English word and must generate the longest possible sequence of valid English words. Each subsequent word in the chain must: - Differ from the previous word by exactly one letter (Levenshtein distance = 1) - Be a valid English word - Not repeat any previously used word The benchmark repeats this process across multiple starting words of varying difficulty. A model's final score is the cumulative length of its longest valid chains from the starting words. Results: - o3 is by far the best model, mainly because it is the only model that manages to escape from parts of the graph with very low connectivity and many dead-ends (slight caveat: o3 was by far the most expensive one to run and used ~30-40k reasoning tokens per starting word) - Opus 4 and Sonnet 4 with 16k reasoning tokens, also perform extremely, especially Opus which was able to beat o3 at 3 starting words with only one third of the reasoning tokens! - Claude 3.7 with thinking taking 4th place ahead of o1 - other OpenAI reasoning models perform all well, but size does make a difference! o1 is ahead of o4-mini high and o3-mini - Gemini models perform a bit worse than their Anthropic and OpenAI counterparts, but they have by far the longest outputs - they are a bit delusional and keep yapping; they don't realize and stop when they made a mistake - strongest non-reasoning models: Grok-3, GPT-4.5, Sonnet 3.5 and 3.7, Opus 4, Sonnet 4, DeepSeek-V3 and Gemini 1.5 Pro - Grok 3, Sonnet 3.5 and 3.7 are a surprise!! Inspiration: LisanBench draws from benchmarks like AidanBench and SOLO-Bench. However, unlike AidanBench, it’s extremely cost-effective, trivially verifiable and doesn't rely on an Embedding model - the entire benchmark cost only ~$50 for 57 models. And unlike SOLO-Bench, it explicitly tests knowledge and applies stronger constraints, which makes it more challenging! Verification: Verification uses the words_alpha.txt dictionary from github.com/dwyl/english-w… (~370,105 words), but for scalability, only words from the largest connected component (108,448 words) are used. Easy Scaling, Difficulty Adjustment & Accuracy improvements: - Scaling and Accuracy: Just add more starting words or increase the number of trials per word. - Difficulty: Starting words vary widely - from those with 72 neighbors to those with just 1 - effectively distinguishing between moderately strong and elite models. Difficulty can also be gauged via local connectivity and branching factor. Why is it challenging? LisanBench uniquely stresses: - Forward planning: avoiding dead ends by strategic word choices - models must find the narrow way through - Knowledge: wide vocabulary is essential - Memory and Attention: previously used words must not be repeated - Precision: strict adherence to Levenshtein constraints - Long-context reasoning: coherence and constraint-tracking over hundreds of steps - Output stamina: some models break early during long generations — LisanBench exposes that, which is critical for agentic use cases The two beautiful plots below show that the starting words are very different in difficulty. Some are in low connectivity regions, some in high-connectivity regions and others are just surrounded by dead-ends! Just as Paul Atreides had to navigate the political, cultural, and metaphysical maze of his destiny, LLMs in LisanBench must explore vast word graphs, searching for the Golden Path - the longest viable chain without collapse. We will know the chosen model when it appears. It will be the one that finds the Golden Path and avoids every dead end. Right now, for the most difficult starting word "abysmal", the longest chain found is just 2, although it is also part of the >100k connected component. So there is a narrow way through! More plots with full leaderboard below!
Lisan al Gaib tweet mediaLisan al Gaib tweet mediaLisan al Gaib tweet mediaLisan al Gaib tweet media
English
57
68
639
81.5K
huhhrsh retweetledi
wh
wh@nrehiew_·
How long Terence Tao takes to formalize a proof with an LLM is the only eval you need. Claude, o4 > Github Copilot > by hand
wh tweet media
English
7
24
361
28.2K