Alfonso² Peterssen

119 posts

Alfonso² Peterssen banner
Alfonso² Peterssen

Alfonso² Peterssen

@TheMukel

"JVM within a JVM by day, LLMs on the JVM by night." @qxoticai

Zurich, Switzerland Katılım Ağustos 2015
572 Takip Edilen443 Takipçiler
Alfonso² Peterssen
Alfonso² Peterssen@TheMukel·
@__tinygrad__ The operators serve hyper-specialized implementations of each model. How good is tinygrad at fusing high-level ops? Even with some advanced compiler magic, the hand-tuned kernels with nit-picked fusions are hard to beat. It's a pristine model blueprint vs. a tuned Franken-model.
English
0
0
0
40
the tiny corp
the tiny corp@__tinygrad__·
Branching off of this is all the fun stuff. Megakernels, fast GGUF on the fly unpacker, function decorator for Python speed, KV cache swap to disk. This should stay ~500 lines, but outperform all the BS=1 LLM runners through the power of tinygrad. Kimi at 500 tok/s on MI350X?
English
2
1
39
5.3K
the tiny corp
the tiny corp@__tinygrad__·
We are looking to hire someone to improve our LLM runner, with USB GPU + high BS=1 tok/s it should be used a lot soon. The TODO list is in the Discord, but no bounties since that yields AI slop. The bottleneck today isn't writing code, it's filtering it. Show me you can do that.
English
10
6
247
19.2K
Alfonso² Peterssen retweetledi
Michalis Papadimitriou
Michalis Papadimitriou@mikepapadim·
GPULlama3.java is out! Great effort by the @tornadovm team to bring GPU-enabled inference to the JVM
Mary Xekalaki@MXekalaki

We are proud to release the first fully JITed, open-source GPU-accelerated Llama3 inference in pure Java powered by #TornadoVM  🚀 🎯 NVIDIA GPUs using PTX and OpenCL backend 👉 github.com/beehive-lab/GP… We are looking forward to your feedback!  #opensource #Java #AI #LLM #GPUs

English
0
6
12
566
Alfonso² Peterssen retweetledi
Fabio Niephaus
Fabio Niephaus@fniephaus·
We just merged the current status of the upcoming JDWP support for @GraalVM Native Image! 🥳 This will soon provide developers with the same debugging experience they are used to in Java, but for native images! Stay tuned for more details. github.com/oracle/graal/p…
Fabio Niephaus tweet media
English
1
22
75
7.4K
Alfonso² Peterssen
Alfonso² Peterssen@TheMukel·
buff.ly/40KmT0t Graal compiler: +10% faster inference with the latest early access build. New features: batched prompt processing & AVX512 support.
English
2
24
74
7.1K
Alfonso² Peterssen retweetledi
Johan Hutting
Johan Hutting@JohanHutting·
Earlier today was asked if Java AI integration improved yet, or that we'd still need to rely on Python or C bindings. Was happy to share github.com/mukel/llama3.j… by @TheMukel from the GraalVM team running native in Java without any dependencies and with superior performance!
English
2
22
82
7.2K
Alfonso² Peterssen
Alfonso² Peterssen@TheMukel·
@christzolov @vitalethomas @alina_yurenko I have a working prototype with function calling via LangChain4j. Vision is just a matter of implementing an additional component, the rest of the inference remains the same. I'll do my best to implement the missing encoder for vision soon-ish, starting with Llama, then Qwen.
English
1
0
3
126
Diego
Diego@diegoasua·
@bate5a55 @julien_c you don't run a +1B model on CPU. Good luck with that, that's like tying a freight to a donkey. Will it move? Maybe. But also, don't do that
English
2
0
2
716