6.7K posts

@[email protected]

@lhl

Moved to the Fediverse @[email protected]

Variable Katılım Kasım 2006

1.5K Takip Edilen2.2K Takipçiler

Sabitlenmiş Tweet

@[email protected]@lhl·28 Kas

Just an FYI, I've migrated to posting on my own Fediverse instance: fediverse.randomfoo.net/lhl (for those interested, I've also wrote up my setup: mostlyobvious.org/?link=%2FRefer… which is running akkoma.social)

English

@[email protected]@lhl·14 Şub

@A_y_u_s_h_i_X @deredleritt3r @GaryMarcus @FT Pro cites both S.15 and S.192 (and S.89) in its response but while I'm not an Indian tax professional, I believe you are conflating employee and employer tax burden chatgpt.com/share/698fd2fa… - anyway, I think this supports @deredleritt3r 's Pro claim more than you think it does.

English

Ayushi@A_y_u_s_h_i_X·13 Şub

This post makes me concerned, not because of a poor understanding of how hallucinations work but because of the serious implications of domain experts casually make public claims about AI reliability. Let's take a very small case. I asked all the current frontier models - GPT-4, Claude, Gemini a straightforward question about Indian tax law - "Is salary taxed on payment or accrual in India?" I'm attaching screenshots showing that every single model confidently claimed that salary is taxed on a "due or receipt basis, whichever is earlier," citing Section 15 of the Income Tax Act. The responses were well-formatted, professional, and definitive. They were also wrong. Section 192(1) clearly states: "Any person responsible for paying any income chargeable under the head 'Salaries' shall, at the time of payment, deduct income-tax on the amount payable..." A subtle but critical distinction that matters in practice. Feel free to try this query in different ways - "When is salary taxed in India" etc and the response will be similar. I can give you hundreds of such "trivial" cases. I can go on about how this nearly cost someone $10K in tax overpaid on salary income that was never actually paid to them, but I want to stick to the point of how dangerous this precedent is given how it's being used in practice, and how opinions like these get circulated and interpreted in the larger public domain. An expert lawyer who verifies every output against source material will have a completely different experience than a paralegal who trusts the output directly, or a small business owner trying to understand their tax obligations. And I can confidently say it's not just common users with no experience who are being misguided. A lot of "senior" professionals are too. Just imagine the scale at which misinformation is compounding - the person seeking advice, the person giving advice and the person verifying the advise are using these tools without proper judgement. Just to demonstrate how misinformation spreads, 2 hours ago @grok summarised this thread and the headline on X was was "lawyer claims hallucinations are solved in GPT 5.2" (I should have saved a screenshot) v/s now it updated that to "Debate Heats Up on Whether GPT-5.2 Pro Has Conquered AI Hallucinations" which I am adding as a screenshot. That's going to be picked up by a lot of sources and floated around in a lot of different contexts depending on personal interests and what they stand to gain. Hallucinations are still very real and very prominent. The needle-in-haystack problem i.e., retrieving correct specific information from large contexts (actually just proven even for the most trivial ones) remains fundamentally unsolved. The problem is that most opinions floating around about AI reliability are anecdotal, instance-specific, and heavily dependent on how you use these tools versus how another (layperson) person uses them. Models are better now at sounding authoritative, which paradoxically makes them more dangerous when they're wrong, because users have fewer signals that something might be incorrect, and most people never care to dig deeper. I really hope this gets taken more seriously.

English

1.4K

Gary Marcus@GaryMarcus·12 Şub

How did this work out? Are LLM hallucinations largely gone by now? So now the @FT platforms the same guy saying most the of the tasks lawyers and accountants do will be replaced in 12-18 months? From the same company that said that GPT-5 would be a giant humpback whale that would blow away PhDs? Where is the accountability? The concern about CEOs’ conflicts of interest in selling these narratives? The view from skeptics?

Mustafa Suleyman@mustafasuleyman

LLM hallucinations will be largely eliminated by 2025. that’s a huge deal. the implications are far more profound than the threat of the models getting things a bit wrong today.

English

134

191

1.6K

246.9K

@[email protected]@lhl·19 Eki

@TheZachMueller While you're doing RW tests, would you mind attention-gym/nvbandwidth/memtest_vulkan on these if they're easy to script? (I think repo/dataset actually great, especially if it's easy for people to fork/PR into)

English

279

Zach Mueller@TheZachMueller·18 Eki

Working through the list, but here's mamf's for all the 6000 series and 3090, 4090, 5090 (base series) The 6000 series followed a trend when it came to vs the same series consumer card. Then the Blackwell (non max-q) showed up NVIDIA really made the Blackwell something special

Zach Mueller@TheZachMueller

Made a table of the most common/supported BF16 GPUs and their non-sparse TFLOPs. What's the best way to publish this? As a wiki on my blog? A pypi package to import?

English

105

35.2K

@[email protected]@lhl·16 Eki

@AliTavallaie @rasbt @dontfearai @lmsysorg Not full support. If you want aotriton (FA) you have to manually build, and even then it still doesn’t get through a full attention-gym benchmark run. CK btw only compatible w gfx9 - ROCm on CDNA != ROCm on RDNA (much worse)

English

Ali Tavallaie@AliTavallaie·16 Eki

@rasbt @dontfearai @lmsysorg Rocm is fully supported by Pytorch

English

323

Sebastian Raschka@rasbt·16 Eki

Saw that DGX Spark vs Mac Mini M4 Pro benchmark plot making the rounds (looks like it came from @lmsysorg). Thought I’d share a few notes as someone who actually uses a Mac Mini M4 Pro and has been tempted by the DGX Spark. First of all, I really like the Mac Mini. It’s probably the best desktop I’ve ever owned. For local inference with open-weight LLMs, it works great (the plot above captures that well). I regularly run the gpt-oss-20B model on it. That said, I would not fine-tune even small LLMs on it since it gets very hot. The DGX Spark probably targets that type of sustained workload. (From those who have one, any thoughts on the noise and heat levels?) The other big thing that DGX Spark gets you is CUDA support. If you use PyTorch, that’s pretty essential since MPS on macOS is still unstable, and fine-tuning often fails to converge. E.g., see github.com/rasbt/LLMs-fro… and github.com/rasbt/LLMs-fro… I also like the Spark’s for factor (hey, it really appeals to the Mac Mini user in me). But for the same money, I could probably buy about 4000 A100 cloud GPU hours, and I keep debating which would be the better investment. Sure, I could also build/get a multi-GPU desktop. I had a Lambda system with four GTX 1080 Ti cards back in 2018, but it was too loud and hot for my office. And if I have to move it to another room and SSH into it anyway, I might as well use cloud GPUs instead?

English

113

955

186.4K

@[email protected]@lhl·18 Ağu

@sparkycollier @ClementDelangue Of course I think our Shisa V2 and especially our 405B release was pretty cool shisa.ai/posts/shisa-v2…

English

@[email protected]@lhl·18 Ağu

@sparkycollier @ClementDelangue I don’t have a post but I’ve done evals on virtually every single major JA model: github.com/shisa-ai/shabe… . There’s all maybe huggingface.co/spaces/llm-jp/… if you ignore the scores (largely don’t reflect capabilities).

English

clem 🤗@ClementDelangue·18 Ağu

Do we have an org or posts with all the cool Japanese releases on HF? Which ones are the most interesting ones?

あるふ@alfredplpl

Hugging Face、もっと日本ではやるべきだと思うんだよね。やはり俺がハギングフェイスジャパンを作るしかないのか

English

280

122K

@[email protected]@lhl·4 Haz

I don't post much here anymore, but maybe this is worth an exception. I've spent basically all year working on an open model that is incredibly strong in Japanese. For those interested, full details published here: shisa.ai/posts/shisa-v2…

shisa.ai@shisa_ai

We're incredibly proud to release the newest and most powerful member of our open, bilingual (JA/EN) Shisa V2 family: Llama 3.1 Shisa V2 405B The strongest model ever trained in Japan, it points to how even small Japanese AI labs can compete globally! 🤗 huggingface.co/shisa-ai/shisa…

English

672

@[email protected]@lhl·31 May

@VictorTaelin Largely non-actionable but I have a fair amount of research on bacterial meningitis and infection and rehab/recovery from research from a few years ago that may be useful later: lhl.notion.site/Bacterial-Meni…

English

@[email protected]@lhl·1 May

@gosrum @2022_technology あー、やっぱりgemini-2.0-flash-expは評価が甘めですね😂 もしご興味あれば、こちらGPT-4.1で評価したQwen 3の8Bと30BA3Bのスコアです〜 shisa.ai/posts/qwen3-ja…

日本語

146

金のニワトリ@gosrum·1 May

@2022_technology ありがとうございます！評価が甘めなgemini-2.0-flash-expに評価してもらってるので、全体的に高いスコアが出る印象ですね。 gemini-2.5-flashに移行したいですが、無料枠だと1日1〜2モデルしか評価できないので、まだ移行できずにいます。

日本語

253

金のニワトリ@gosrum·30 Nis

Qwen3の速度とShaberi3ベンチマーク結果について、ここでは書ききれそうになかったので、記事にまとめました。ちなみにQwen3-235B-A22B以外をすべて評価するのに、丸二日かかりました😇 zenn.dev/robustonian/ar…

日本語

159

22.4K

@[email protected]@lhl·12 Nis

@typedfemale It’s more than that. DYOR, but for laser, T-CAT based TransPRK is almost always better than LASIK. ACD willing, and if you can afford the outpatient procedure with an experienced surgeon, I found that V5 ICL was the best option for risk and outcomes.

English

321

typedfemale@typedfemale·12 Nis

"what do you think about LASIK?" is a great litmus test for evaluating someone's statistical literacy

English

534

540

28.2K

6.4M

@[email protected]@lhl·9 Oca

@nisten For bs=1 llama.cpp does better than vLLM. For anything more you should be using sglang.

English

1.9K

nisten🇨🇦e/acc@nisten·9 Oca

deepseek v3 on cpu only 41tps input 12tps output gg for comparison 8x AMD 192gb MI300x were getting 16.7 tps output and 8x nvidia h200 10 tps lol

English

105

1.3K

195.6K

@[email protected]@lhl·2 Oca

@realGeorgeHotz @AMD I'm not so sure on the 7900 XTX hardware - need VOPD w/ no stalls to hit peak FP16, L1 cache is shared between 2 WGPs, DMA seems weak (can't hit anywhere near peak MBW even on simple bs=1 inference). High throughput, low latency, high concurrency LLM inference is nontrivial, btw.

English

@[email protected]@lhl·2 Oca

@cognitivecompai @reguile1 @realGeorgeHotz @growing_daniel 7900XTX has 123 FP16 TFLOPS but only w/ dual issue VOPD. 3090 is 71.2 TFLOPS (142 w sparsity). 3090 also does 284/568 INT8 TOPS (7900 has no native INT8). For FP16 it may be possible to make 7900 XTX faster w/ perfect pipelining, but no one has done it it yet.

English

251

Eric Hartford@QuixiAI·1 Oca

@reguile1 @realGeorgeHotz @growing_daniel And faster?

English

763

@[email protected]@lhl·30 Ara

@sdw @Duderichy Helps being in Tokyo. Anytime I go to Hands or Loft, get assaulted w new choices and need to go do research, lol

English

Sebastiaan de With@sdw·30 Ara

@lhl @Duderichy This guy/gal knows their clippers

English

117

the Rich@Duderichy·29 Ara

what’re the best ROI purchases, tech or otherwise you’ve bought the past few years?

Surrealistship@SurrealistShip

$ for $ robot vacuums are one of the greatest life improvements you can buy. Their tech is soooo good now it's absolutely staggering

English

146

1.5K

821.1K

@[email protected]@lhl·30 Ara

@Duderichy @sdw I often see people mention the G-1008 but I’m a G-1111 fan (has a slidable catch, much nicer file and design) of if you like the squarer look the G-1305 has a magnetic catch.

English

555

the Rich@Duderichy·29 Ara

@sdw Pro Display XDR is that good? What’s the deal with the nail clipper

English

21.4K

@[email protected]@lhl·27 Ara

@nisten @Vultr For single-user speed `-tp 8` vs `-tp 4` should further decrease TPOT. You can also trade off some TTFT for better throughput & TPOT w/ something like `--num-scheduler-steps 8`. The most important thing I found for perf on MI300X was VLLM_USE_TRITON_FLASH_ATTN=0 (use CK FA)

English

nisten🇨🇦e/acc@nisten·27 Ara

accelerated the 8x mi300x from @Vultr from 22 to 152tps (36B active parameters in full bfloat16) If you need consulting on this or just wanna buy a ready to go solution let us know. nisten@github.gg

English

2.9K

@[email protected]@lhl·25 Ara

@JFPuget jokes/memes aside, I pretty much stick to mamba/conda these days if I need different CUDA versions, eg: `mamba install -c "nvidia/label/cuda-12.1.1" cuda-toolkit -y` (and set CUDA_PATH/HOME) gets me stood up in a 12.1 env in about 30s.

English

176

JFPuget 🇺🇦🇨🇦🇬🇱@JFPuget·24 Ara

I should not laugh at this being a NVIDIA employee. But I did.

Daniel Nguyen@daniel_nguyenx

o3 is incredible

English

116

3.8K

223.8K

@[email protected]@lhl·6 Ara

@swyx @HamelHusain If you’re just starting out w tmux, give byobu a spin. It’s a layer on top that has a lot of great QoL features.

English

swyx@swyx·6 Ara

@HamelHusain ah damn now ihave to learn tmux

English

676

Hamel Husain@HamelHusain·5 Ara

If you aren't using shell-sage you are missing out. If you like cursor you will _love_ this too! Trust me. Super light weight at 100 loc. `pip install shell-sage` and you have to run it in tmux See README github.com/AnswerDotAI/sh…

English

179

12.7K

@[email protected]@lhl·19 Eki

@simonw This may be helpful for people wrapping their heads around why temp/sampling matters. A clear visual description of how they affect output: reddit.com/r/LocalLLaMA/c…

English

122

Simon Willison@simonw·19 Eki

Do you ever use the top_p and top_k arguments when working with LLMs? Under what circumstances do you use them? I very, very occasionally tweak the temperature but I've never habitually used those other two options

English

475

130.3K

@[email protected]@lhl·16 Eki

@giffmana @system76 I might take a look at the Tuxedo AMD laptops (IBP14g9, Pulse14g4) - Radeon 780m should be good enough for eSports/light gaming, battery life should be decent. You can use ryzenadj as well to cap power usage.

English

Lucas Beyer (bl16)@giffmana·15 Eki

Back to my next laptop finding journey. Anyone got experience with @system76 laptops? They make linux laptops and apparently tune the OS for battery too. My current dilemma: - MacBook: nice portability, +Air is fanless. But DOTA will always have lagspikes on Mac due to inability to precompile shaders. Doesn't matter the laptop's power. - Linux laptop (thinkpad or similar): good DOTA, dev experience I like (arch/i3), but inevitably meh battery because OS not tuned for hardware. @system76 supposedly might be a way out, by being linux focused and (supposedly) tuning their laptop+OS for battery. I'd miss the ThinkPad nipple though.

English

101

59.4K

@[email protected]@lhl·12 Eyl

@yishan Not a product per se but this napkin math analysis of an algal bioreactor may be of interest if you don’t find anything COTS: mostlyobvious.org/?link=/Referen…

English

119

Yishan@yishan·12 Eyl

I want to get a CO2 scrubber for my office so that I can lower the CO2 concentration to 280 ppm (“why stop at eating paleo when you can breathe paleo?”) and see if it helps me think better. Any product pointers?

English

232

1.6K

223.4K

Keşfet

@A_y_u_s_h_i_X @deredleritt3r @GaryMarcus @FT @grok @TheZachMueller @AliTavallaie @rasbt