Amund Tveit

240 posts

Amund Tveit

@atveit

Principal AI Engineer | Personal Opinions Only

Trondheim, Norway Katılım Mayıs 2007

4.6K Takip Edilen2.2K Takipçiler

Amund Tveit@atveit·9 Oca

atsentia.com/blog/fastest-a…

ZXX

231

Amund Tveit@atveit·20 Ara

@0xrodu Recommended relatively recent (2023) paper by Sanjay G. - Towards Modern Development of Cloud Applications - pages.cs.wisc.edu/~rgrandl/paper…

English

811

Rohit Durvasula@0xrodu·20 Ara

Have often seen Jeff mention Sanjay. Didn't remember reading anything in particular from him or ever coming across an interview from him. Fell into a rabbit hole of searching for content on him and 15 minutes later I still can't find anything. This guy is invisible!

Jeff Dean@JeffDean

Performance Hints Over the years, my colleague Sanjay Ghemawat and I have done a fair bit of diving into performance tuning of various pieces of code. We wrote an internal Performance Hints document a couple of years ago as a way of identifying some general principles and we've recently published a version of it externally. We'd love any feedback you might have! Read the full doc at: abseil.io/fast/hints.html

Seven Trees, CA 🇺🇸 English

40.7K

Amund Tveit@atveit·4 Ara

4 bits is all you need (and 3.6 bit you have?) for resource-efficient LLMs? A few months ago OpenAI published their open weights model(s) GPT-OSS (20B and 120B), and one of the eye-catching characteristics was that it was heavily quantized - in other words “shrunk” or compressed - to 4 bits per parameter (MXFP4) instead of the commonly used 16 bit (BF16). 4 bit (“half byte”) means that the model uses much less memory, energy and also can run efficiently on hardware natively supporting 4 bit MXFP inference (e.g. Nvidia Blackwell B200/300 and AMD MI355X). With 4 bit compared to 16 bit there is some quality loss, but typically marginally. But can we go even lower and represent the model in fewer bits per parameter, e.g. 3 bit, 2 bit or even 1 bit? There is highly interesting and seemingly promising research on more efficient quantization (few bit representation) of deep neural networks, however perhaps 4 bit is very close to the lower plateau - based on: a) findings in an article with large empirical studies by Meta, Google DeepMind, Cornell University and Nvidia that language models have a capacity of 3.6 bits per parameter - Language Models Don’t Store Facts: They Compress Patterns - so perhaps 3.6 bit is a lower bound per parameter for efficient representation of deep neural networks? b) and to a lesser degree personal testing on doing further quantizations on GPT-OSS 20B, it seems like when going to 2 and 3 bit even with (LoRA-based) finetuning it is hard to lift it back to close to 4 bit. Amund atsentia.com

English

483

Amund Tveit@atveit·26 Kas

AI on Apple Watch

English

247

Amund Tveit@atveit·11 Kas

@simonw @NeelNanda5 Re memory trimming / memory in LLm @realSharonZhou Founder of lamini might know more (re docs.lamini.ai/tuning/memory_… )

English

Amund Tveit@atveit·11 Kas

@simonw Believe that @NeelNanda5 might know more about memory trimming in LLm (it seems close to mechanistic interpretability - neelnanda.io/mechanistic-in… )

English

109

Simon Willison@simonw·10 Kas

What's the latest research on how much baked-in knowledge an LLM needs in order to be useful? If I want a specialist coding model can I trim the size of that model down by stripping out detailed knowledge of human history, geography etc? Can we even do that?

English

193

32K

Amund Tveit@atveit·15 Eki

@xamat Hope to get complementary byte-based grokking results scrutinized and published soonish :)

English

Xavier (Xavi) Amatriain@xamat·11 Eki

Very cool!

Atsentia@atsentia

Witnessing the magic of "grokking" in action with the new JAX native Tunix library! The port of Grokking (based on implementation initially in MLX) demonstrates how transformers suddenly "understand" modular arithmetic after initial memorization—jumping from 11% to 99% validation accuracy. 🔗 github.com/atsentia/jax-t… More about Tunix library on: github.com/google/tunix #JAX #DeepLearning #Tunix

English

1.1K

Amund Tveit@atveit·3 Eyl

@lateinteraction re interesting papers - perhaps this paper (from 2024) of byte-based models could be interest to try out for DSPy? arxiv.org/abs/2402.19155

English

286

Amund Tveit@atveit·1 Eyl

@jobergum Congratulations! Curious to learn more :) perhaps 2025 is the year of entrepreneurship..? 👀

English

139

Jo Kristian Bergum@jobergum·1 Eyl

New bio dropped today. Let's go!

English

362

24.8K

Amund Tveit retweetledi

Guido van Rossum@gvanrossum·26 Ağu

The *full* Python Documentary will be released this Thursday (Aug 28) at 10am PDT / 19:00 CET. More at discuss.python.org/t/python-docum… Don't miss the online release party / chat! @TECHDOCU

English

195

925

96.8K

Amund Tveit@atveit·14 Ağu

@jobergum yes, quite the jump with reasoning models from O1 and onwards to GPT-5. However, believe many are a bit behind in learning prompting/context engineering skills.

English

Jo Kristian Bergum@jobergum·14 Ağu

Many made up their mind after trying to code with chatGPT v1 early 2023 and never look back. Even last month there has been significant progress. Better models, better context engineering.

Jo Kristian Bergum@jobergum

What I think will happen is that 10x engineers become 100x engineers and 1x becomes 0 if that makes sense.

English

1.9K

Amund Tveit@atveit·14 Ağu

Exciting times in AI (Agent) devtools and vibe coding — and for the historically inclined, you might enjoy a 2008 throwback: my blog post Increase the automation of Test-Driven Development?amundblog.blogspot.com/2008/01/increa…

English

1.9K

Amund Tveit retweetledi