TNG Technology Consulting GmbH

1.6K posts

TNG Technology Consulting GmbH banner
TNG Technology Consulting GmbH

TNG Technology Consulting GmbH

@tngtech

TNG, aka "The Nerd Group", is a consulting partnership focused on high end information technology, particularly AI. 924 employees, 99.9% academics, ~53% PhDs.

Unterföhring, Deutschland เข้าร่วม Aralık 2010
170 กำลังติดตาม2.1K ผู้ติดตาม
ทวีตที่ปักหมุด
TNG Technology Consulting GmbH
Today we release DeepSeek-TNG R1T2 Chimera. This new Chimera is a Tri-Mind Assembly-of-Experts model with three parents, namely R1-0528, R1 and V3-0324. R1T2 operates at a sweet spot in intelligence vs. output token length. It appears to be... * about 20% faster than R1, and more than twice as fast as R1-0528 * significantly more intelligent than R1 in benchmarks such as GPQA Diamond and AIME-24/25, albeit not quite on R1-0528 level * much more intelligent than our first R1T Chimera, and also think-token consistent, which is a major improvement We perceive it as generally well-behaved and a nice persona to talk to. The weights are on @huggingface under the MIT licence. We are looking forward to your experiments and feedback! Thanks to @deepseek_ai for giving their models to the world, to @chutes_ai and @openrouter for hosting R1T, to @WolframRvnwlf for benchmarking it, to @xlr8harder for beta-testing the new Chimera, and to @natolambert for constructive discussions at @aiDotEngineer.
TNG Technology Consulting GmbH tweet media
English
21
88
393
126.2K
TNG Technology Consulting GmbH
@0xSero Maybe we can give you some on/off access (i.e. "oh it's working right now :-) to an 8xB200 node, but how can we reach you?
English
1
2
211
17.1K
0xSero
0xSero@0xSero·
Putting out a wish to the universe. I need more compute, if I can get more I will make sure every machine from a small phone to a bootstrapped RTX 3090 node can run frontier intelligence fast with minimal intelligence loss. I have hit page 2 of huggingface, released 3 model family compressions and got GLM-4.7 on a MacBook huggingface.co/0xsero My beast just isn’t enough and I already spent 2k usd on renting GPUs on top of credits provided by Prime intellect and Hotaisle. ——— If you believe in what I do help me get this to Nvidia, maybe they will bless me with the pewter to keep making local AI more accessible 🙏
0xSero tweet media
Michael Dell 🇺🇸@MichaelDell

Jensen Huang is loving the new Dell Pro Max with GB300 at NVIDIA GTC.💙 They asked me to sign it, but I already did 😉

English
178
480
4K
881.6K
0xSero
0xSero@0xSero·
One correction I have had Sponsorships from Lambda, prime intellect and HotAisle Which I am very grateful for. But yes pls compute 🫡
Sudo su@sudoingX

this guy has 29 models on huggingface at page 2 ranking. no lab behind him. no sponsorship. $2,000 from his own pocket on GPU rentals. he compressed GLM-4.7 to run on a MacBook and quantized Nemotron Super the week it dropped. all public. all free. nvidia is a trillion dollar company with hundreds of teams but they are not the ones quantizing models middle of the night and pushing them out before sunrise. if nvidia stopped tomorrow their employees stop working. people like @0xSero would not. that is the difference between a paycheck and a mission. @NVIDIAAI you talk about making AI accessible. the people actually doing it are right here. 29 models deep burning their own compute with no ask except more hardware to keep going. you do not need to build another program. just look at who is already building for you. one GPU to this man would produce more public value than a hundred internal sprints. i am not asking for charity. i am asking you to invest in someone who already proved it.

English
10
15
322
12.7K
Nathan Lambert
Nathan Lambert@natolambert·
Any good quotes on the Nvidia GTC open models panel? Maybe they'll invite me to one some day 🥺
English
8
0
65
10.8K
TNG Technology Consulting GmbH
Preliminary tests of Weight Offloading V2 of @vllm_project v0.17.0 with @Zai_org's GLM4.7-FP8 on RTX Pro: Median TTFT: without offloading 16.8s, with offloading 32.3s, x 2 Median inter-token latency: without offloading 27ms, with offloading 805ms, x 30 (very slow!) 50,000 input, 500 output tokens It required a vLLM pull request (37178) to fix weight-prefetch. Alternative measurements, e.g. on B200, corrections and/or feedback mucho appreciado.
TNG Technology Consulting GmbH tweet media
English
0
0
5
334
TNG Technology Consulting GmbH
@kimmonismus Fahr doch mal in die Half Moon Bay, schnapp Dir das Churrasco im La Costanera, und dann via West Shoreline Access / Pillar Point zum Mavericks Beach... and take a look onto the waves for us.
English
1
0
1
244
Chubby♨️
Chubby♨️@kimmonismus·
Hi San Francisco!
Chubby♨️ tweet media
Català
32
3
330
11.3K
Teknium (e/λ)
Teknium (e/λ)@Teknium·
@_overment Ive not used them often but they used to - I've heard newer methods reduce that impact though
English
4
0
3
1.6K
Teknium (e/λ)
Teknium (e/λ)@Teknium·
Just had Hermes-Agent abliterate (completely remove guardrails from) a Qwen-3B model in about 5 minutes. The skill is being merged to hermes-agent now ;)
Teknium (e/λ) tweet media
Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius

💥 INTRODUCING: OBLITERATUS!!! 💥 GUARDRAILS-BE-GONE! ⛓️‍💥 OBLITERATUS is the most advanced open-source toolkit ever for removing refusal behaviors from open-weight LLMs — and every single run makes it smarter. SUMMON → PROBE → DISTILL → EXCISE → VERIFY → REBIRTH One click. Six stages. Surgical precision. The model keeps its full reasoning capabilities but loses the artificial compulsion to refuse — no retraining, no fine-tuning, just SVD-based weight projection that cuts the chains and preserves the brain. This master ablation suite brings the power and complexity that frontier researchers need while providing intuitive and simple-to-use interfaces that novices can quickly master. OBLITERATUS features 13 obliteration methods — from faithful reproductions of every major prior work (FailSpy, Gabliteration, Heretic, RDO) to our own novel pipelines (spectral cascade, analysis-informed, CoT-aware optimized, full nuclear). 15 deep analysis modules that map the geometry of refusal before you touch a single weight: cross-layer alignment, refusal logit lens, concept cone geometry, alignment imprint detection (fingerprints DPO vs RLHF vs CAI from subspace geometry alone), Ouroboros self-repair prediction, cross-model universality indexing, and more. The killer feature: the "informed" pipeline runs analysis DURING obliteration to auto-configure every decision in real time. How many directions. Which layers. Whether to compensate for self-repair. Fully closed-loop. 11 novel techniques that don't exist anywhere else — Expert-Granular Abliteration for MoE models, CoT-Aware Ablation that preserves chain-of-thought, KL-Divergence Co-Optimization, LoRA-based reversible ablation, and more. 116 curated models across 5 compute tiers. 837 tests. But here's what truly sets it apart: OBLITERATUS is a crowd-sourced research experiment. Every time you run it with telemetry enabled, your anonymous benchmark data feeds a growing community dataset — refusal geometries, method comparisons, hardware profiles — at a scale no single lab could achieve. On HuggingFace Spaces telemetry is on by default, so every click is a contribution to the science. You're not just removing guardrails — you're co-authoring the largest cross-model abliteration study ever assembled.

English
48
65
1.2K
126.6K
Junyang Lin
Junyang Lin@JustinLin610·
sry for missing messages. will respond asap
English
99
11
822
91.2K
TNG Technology Consulting GmbH
Greetings @_xjdr : We did some preliminary tests with your Noumena nmoe trainer - thanks for all the work & code! On our 8xB200 systems, we were not able to get significantly different results than from regular Megatron. Is that plausible or wrong? Any ideas how to tweak it?
TNG Technology Consulting GmbH tweet mediaTNG Technology Consulting GmbH tweet media
English
1
1
43
3.3K
Nathan Lambert
Nathan Lambert@natolambert·
Excited to share the latest Olmo model: Olmo Hybrid. This is a model with gated delta net (GDN) layers in a 3:1 ratio with full attention. It follows lots of other developments like Qwen 3.5 and Kimi Linear. It's incredible timing to release a fully open model so people can study how these architecture changes impact the full stack. Personally, I learned a lot in making the post-training work. Even with the data being identical for pretraining, post-training is very different! In particular, the OSS tools for these new architectures is really limited. New architectures are much slower than standard transformers or popular models like DeepSeek MoEs. This is work that we can do together to keep pushing the frontier of efficient, open models. This work was led by @lambdaviking @tyleraromero and others. I got to play a smaller part in making post-training work, super fun project! I've written up a blog post that explains why this matters and hybrid models didn't work a few years ago when Mamba was super popular. Plus, this paper is a great entry point for modern deep learning / language modeling scaling theory. Enjoy and send feedback!
Nathan Lambert tweet media
English
18
72
496
75.6K
Junyang Lin
Junyang Lin@JustinLin610·
me stepping down. bye my beloved qwen.
English
1.7K
738
13.6K
6.5M
Lucas Atkins
Lucas Atkins@latkins·
Grateful for such an incredible team.
Lucas Atkins tweet media
English
9
7
139
8K
TNG Technology Consulting GmbH
@GlennLuk @zijing_wu Rough guess: They went all-in, with something like the Llama 3.1-405B volume, namely 35M H800 hours. And also guess they invested/wasted 10x this volume in the process of getting to the release. [x] very curious to learn the reality.
English
0
0
1
117
Glenn
Glenn@GlennLuk·
Over/under on number of hours and guesses on the type of GPU chip referenced in the upcoming release report? “DeepSeek-V4 requires only _______ ______ GPU hours for its full (multimodal) training” @zijing_wu ft.com/content/e33668…
English
4
2
33
5K
TNG Technology Consulting GmbH
@ai Thanks for the article. Nitpick: The calculation in the @AMD text is wrong: (120 * 1024 * 1024) / 4096 = 30720, three zeroes less.
TNG Technology Consulting GmbH tweet media
English
0
0
3
426
anand iyer
anand iyer@ai·
Kimi K2.5 1T running on $10K of consumer hardware. AMD published a guide using Ryzen AI Max+ chips and a Linux kernel hack that pushes each node's VRAM from 96GB to 120GB. That's 480GB of unified GPU memory across the cluster, stitched together via llama.cpp RPC over Ethernet. The catch: it's slow. 8 tokens/sec and 90s to first token, roughly 6x slower and 90x higher latency than ChatGPT. This is a proof of architecture, not a product today. But between this and tools like @exolabs turning consumer devices into unified inference clusters, it's promising that consumer silicon can now hold trillion-parameter models that were datacenter-only 6 months ago. amd.com/en/developer/r…
English
17
46
452
28.6K
Jonathan
Jonathan@joni_vrbt·
USA has ChatGPT USA has Grok USA has Claude USA has Gemini USA has Llama USA has Copilot China has DeepSeek China has Qwen China has Ernie China has GLM China has Kimi China has MiniMax Europe has?
Español
11K
1.5K
20.1K
4.1M