freedom Koan-Sin Tan

1.7K posts

freedom Koan-Sin Tan

freedom Koan-Sin Tan

@koansin

Coder?

Jhubei City, Hsinchu County, Katılım Ocak 2009
478 Takip Edilen555 Takipçiler
Anemll
Anemll@anemll·
@maderix I’ll retest int8, it’s definitely giving 2x in Apple ResNet example. Possibly needs different Conv pattern. Quant-Dequant should be optimized by ANECompiler for M4 target
English
2
0
12
2.1K
freedom Koan-Sin Tan
freedom Koan-Sin Tan@koansin·
@Wunkolo in recently iOS 26 libsytem_platform.dylib, there are __sme_mem{chr,cpy,move,set}, but, unlike MTE ones, it seems they are not used.
English
1
0
2
60
wunk
wunk@Wunkolo·
AMX instructions could kinda maybe be useful for really big mem{set,cpy} operations. Apple-AMX can write 256 bytes with just one instruction while Intel-AMX can write 1KiB of data with one instruction. But I don't think anyone needs a Gigabytes-per-second mem{set,cpy} lol
English
5
1
31
4.4K
freedom Koan-Sin Tan retweetledi
MLCommons
MLCommons@MLCommons·
MLCommons® just launched MLPerf® Mobile on the Google Play Store! 📱 Benchmark your Android device’s AI performance on real-world ML tasks with this free, open-source app. Try it now: play.google.com/store/apps/det…
MLCommons tweet media
English
2
6
8
2.5K
freedom Koan-Sin Tan retweetledi
LaurieWired
LaurieWired@lauriewired·
Just built an MCP for Ghidra. Now basically any LLM (Claude, Gemini, local...) can Reverse Engineer malware for you. With the right prompting, it automates a *ton* of tedious tasks. One-shot markups of entire binaries with just a click. Open source, on Github now.
English
81
789
4.4K
283.8K
freedom Koan-Sin Tan
freedom Koan-Sin Tan@koansin·
@flat @ivanchanavinah I didn't check the output MIL program, not sure what happened. However, I could imagine that it's complicated. PyTorch op / hint might not be able to be carried all the way to the ANE.
English
0
0
0
160
Ivan Chan
Ivan Chan@ivanchanavinah·
creativestrategies.com/research/white… Stable Diffusion power consumption between M3 and X Elite: "M3 MacBook Air, 8-core CPU 10-core GPU with 16GB RAM spec, we see averages of 87.63 Joules used per image generated. On the Snapdragon X Elite system, we used a prototype Surface Laptop 15-inch with 16GB RAM, with the X1E78100 SKU of Snapdragon X Elite. We see averages of 41.23 Joules used per image generated"
English
4
2
7
1.2K
freedom Koan-Sin Tan
freedom Koan-Sin Tan@koansin·
@flat @ivanchanavinah @flat that might be different. People know memory operations are slow. The case we met is weird because it seems the transpose op didn't take significant time, but all other ops were slowed down. Yup, maybe this has something to do with the memory layout.
English
2
0
1
80
freedom Koan-Sin Tan
freedom Koan-Sin Tan@koansin·
@ivanchanavinah Nope, that’s the intriguing part. All on ANE, the only diff is the leading transpose. Originally, Colby reported models converted from PyTorch were faster. Then we found that we can archive same performance by removing leading transpose (NHWC to NCHW). github.com/mlcommons/mobi…
English
2
0
1
84
freedom Koan-Sin Tan retweetledi
François Chollet
François Chollet@fchollet·
You can now use any Hugging Face Hub model with KerasNLP (as long as the corresponding architecture is in KerasNLP)! What's more, you can also upload your own fine-tuned KerasNLP models to Hugging Face in one line. huggingface.co/blog/keras-nlp…
English
7
35
175
30.2K
freedom Koan-Sin Tan retweetledi
AI at Meta
AI at Meta@AIatMeta·
Today we’re announcing Meta LLM Compiler, a family of models built on Meta Code Llama with additional code optimization and compiler capabilities. These models can emulate the compiler, predict optimal passes for code size, and disassemble code. They can be fine-tuned for new optimizations and compiler tasks. @HuggingFace repo ➡️ go.fb.me/tdd3dw Research paper ➡️ go.fb.me/85zwgy LLM Compiler achieves state-of-the-art results on code size optimization and disassembly. This work shows that AI is learning to optimize code and can assist compiler experts in identifying opportunities to apply optimizations. We’re releasing LLM Compiler 7B & 13B models under a permissive license for both research and commercial use in the hopes of making it easier for developers and researchers alike to leverage this in their work and carry forward new research in this space.
AI at Meta tweet media
English
142
760
4K
511.3K
freedom Koan-Sin Tan
freedom Koan-Sin Tan@koansin·
@lafaiel Guess what. If you extract Core ML models and run them with Xcode Core ML performance profiler. You'll find that these are actually accelerated by CPU! All the ops on CPU. On iOS 17.5.1, some ops are CPU/GPU, some are on Neural Engine. On iPhone 15 Pro Max + iOS 18 are on CPU!
English
1
0
1
46
INIYSA
INIYSA@lafaiel·
After iOS 18 update, Geekbench machine learning score has greatly improved (CoreML / Neural Engine Backend)
INIYSA tweet media
English
24
69
1.5K
267.5K
freedom Koan-Sin Tan retweetledi
Rodney Brooks
Rodney Brooks@rodneyabrooks·
Back on April 1st I posted my three laws of robotics. Here are my three laws of AI. 1. When an AI system performs a task, human observers immediately estimate its general competence in areas that seem related. Usually that estimate is wildly overinflated. 2. Most successful AI deployments have a human somewhere in the loop (perhaps the person they are helping) and their intelligence smooths the edges. 3. Without carefully boxing in how an AI system is deployed there is always a long tail of special cases that take decades to discover and fix. Paradoxically all those fixes are AI-complete themselves.
English
9
47
162
26.6K
Edwin Eefting
Edwin Eefting@EHEefting·
@bcantrill Wow the "fdisk pascal in c" myth was actually real: #L221" target="_blank" rel="nofollow noopener">github.com/microsoft/MS-D…
English
2
3
6
3K
nb.o
nb.o@Nextremer_nb_o·
会社の仕組み、まるでわからん、、、
日本語
1
0
0
111
Google Workspace
Google Workspace@GoogleWorkspace·
🚀 Boost your productivity with 7 tips found in “Uptime,” a new book from @Google’s executive productivity advisor, Laura Mae Martin. → goo.gle/3xk09I0
Google Workspace tweet media
English
3
6
58
9.2K