freedom Koan-Sin Tan

1.7K posts

freedom Koan-Sin Tan

@koansin

Coder?

Jhubei City, Hsinchu County, Katılım Ocak 2009

478 Takip Edilen555 Takipçiler

freedom Koan-Sin Tan@koansin·3 Mar

@anemll @maderix I know why. I measure ANE compute capacities couple months ago. AFAICR, MIL doesn't support INT8. With MPSGraph, which uses MLIR, you can get expected INT8 performance. github.com/freedomtan/mea…

English

Anemll@anemll·2 Mar

@maderix I’ll retest int8, it’s definitely giving 2x in Apple ResNet example. Possibly needs different Conv pattern. Quant-Dequant should be optimized by ANECompiler for M4 target

English

2.1K

Anemll@anemll·2 Mar

🤯 maderix.substack.com/p/inside-the-m…

QME

175

270.3K

freedom Koan-Sin Tan@koansin·24 Eki

@Wunkolo in recently iOS 26 libsytem_platform.dylib, there are __sme_mem{chr,cpy,move,set}, but, unlike MTE ones, it seems they are not used.

English

wunk@Wunkolo·23 Eki

AMX instructions could kinda maybe be useful for really big mem{set,cpy} operations. Apple-AMX can write 256 bytes with just one instruction while Intel-AMX can write 1KiB of data with one instruction. But I don't think anyone needs a Gigabytes-per-second mem{set,cpy} lol

English

4.4K

freedom Koan-Sin Tan retweetledi

MLCommons@MLCommons·10 Tem

MLCommons® just launched MLPerf® Mobile on the Google Play Store! 📱 Benchmark your Android device’s AI performance on real-world ML tasks with this free, open-source app. Try it now: play.google.com/store/apps/det…

English

2.5K

freedom Koan-Sin Tan retweetledi

LaurieWired@lauriewired·25 Mar

Just built an MCP for Ghidra. Now basically any LLM (Claude, Gemini, local...) can Reverse Engineer malware for you. With the right prompting, it automates a *ton* of tedious tasks. One-shot markups of entire binaries with just a click. Open source, on Github now.

English

789

4.4K

283.8K

freedom Koan-Sin Tan@koansin·21 Şub

If you ever wondered how to run Apple's ANE .hwx file on non-jailbroken phones, check my little program github.com/freedomtan/cor…

English

110

freedom Koan-Sin Tan@koansin·3 Ağu

It turns out it's possible to retrieve per-op profiling information w/o using `MLComputePlan`. Some undocumented classes and methods are required, though. the output: github.com/freedomtan/cor… simple objective-c code: github.com/freedomtan/cor… @fleetwood___: look what I found

English

198

freedom Koan-Sin Tan@koansin·17 Tem

@flat @ivanchanavinah I didn't check the output MIL program, not sure what happened. However, I could imagine that it's complicated. PyTorch op / hint might not be able to be carried all the way to the ANE.

English

160

Stephen Panaro@flat·16 Tem

@ivanchanavinah @koansin It looks like contiguous gets translated to a no-op during conversion in coremltools. So probably why.

English

Ivan Chan@ivanchanavinah·12 Tem

creativestrategies.com/research/white… Stable Diffusion power consumption between M3 and X Elite: "M3 MacBook Air, 8-core CPU 10-core GPU with 16GB RAM spec, we see averages of 87.63 Joules used per image generated. On the Snapdragon X Elite system, we used a prototype Surface Laptop 15-inch with 16GB RAM, with the X1E78100 SKU of Snapdragon X Elite. We see averages of 41.23 Joules used per image generated"

English

1.2K

freedom Koan-Sin Tan@koansin·16 Tem

@flat @ivanchanavinah @flat that might be different. People know memory operations are slow. The case we met is weird because it seems the transpose op didn't take significant time, but all other ops were slowed down. Yup, maybe this has something to do with the memory layout.

English

Stephen Panaro@flat·15 Tem

@koansin @ivanchanavinah This post made me realize I had a similar issue in my ANE model. Gave me a ~11% speed up. Possibly related to this ml-ane-transformers principle:

Stephen Panaro@flat

Trying to speed up Llama on Apple Neural Engine. Turns out I missed something obvious. (Can you guess it from the screenshot?) 👉 We have to transpose large matrices to compute attention. This is slow. Storing either K or V cache pre-transposed saves ~11% of the time. 11% 🙃

English

159

freedom Koan-Sin Tan@koansin·13 Tem

@ivanchanavinah s/archive/achieve/

English

freedom Koan-Sin Tan@koansin·13 Tem

@ivanchanavinah Nope, that’s the intriguing part. All on ANE, the only diff is the leading transpose. Originally, Colby reported models converted from PyTorch were faster. Then we found that we can archive same performance by removing leading transpose (NHWC to NCHW). github.com/mlcommons/mobi…

English

freedom Koan-Sin Tan retweetledi

François Chollet@fchollet·10 Tem

You can now use any Hugging Face Hub model with KerasNLP (as long as the corresponding architecture is in KerasNLP)! What's more, you can also upload your own fine-tuned KerasNLP models to Hugging Face in one line. huggingface.co/blog/keras-nlp…

English

175

30.2K

freedom Koan-Sin Tan retweetledi

AI at Meta@AIatMeta·27 Haz

Today we’re announcing Meta LLM Compiler, a family of models built on Meta Code Llama with additional code optimization and compiler capabilities. These models can emulate the compiler, predict optimal passes for code size, and disassemble code. They can be fine-tuned for new optimizations and compiler tasks. @HuggingFace repo ➡️ go.fb.me/tdd3dw Research paper ➡️ go.fb.me/85zwgy LLM Compiler achieves state-of-the-art results on code size optimization and disassembly. This work shows that AI is learning to optimize code and can assist compiler experts in identifying opportunities to apply optimizations. We’re releasing LLM Compiler 7B & 13B models under a permissive license for both research and commercial use in the hopes of making it easier for developers and researchers alike to leverage this in their work and carry forward new research in this space.

English

142

760

511.3K

freedom Koan-Sin Tan@koansin·24 Haz

@lafaiel Or, comparing iOS 18 CPU and Neural Engine results, you can find that they are actually on CPU. browser.geekbench.com/ml/v0/inferenc…

English

freedom Koan-Sin Tan@koansin·24 Haz

@lafaiel Guess what. If you extract Core ML models and run them with Xcode Core ML performance profiler. You'll find that these are actually accelerated by CPU! All the ops on CPU. On iOS 17.5.1, some ops are CPU/GPU, some are on Neural Engine. On iPhone 15 Pro Max + iOS 18 are on CPU!

English

INIYSA@lafaiel·16 Haz

After iOS 18 update, Geekbench machine learning score has greatly improved (CoreML / Neural Engine Backend)

English

1.5K

267.5K

freedom Koan-Sin Tan retweetledi

Rodney Brooks@rodneyabrooks·8 May

Back on April 1st I posted my three laws of robotics. Here are my three laws of AI. 1. When an AI system performs a task, human observers immediately estimate its general competence in areas that seem related. Usually that estimate is wildly overinflated. 2. Most successful AI deployments have a human somewhere in the loop (perhaps the person they are helping) and their intelligence smooths the edges. 3. Without carefully boxing in how an AI system is deployed there is always a long tail of special cases that take decades to discover and fix. Paradoxically all those fixes are AI-complete themselves.

English

162

26.6K

freedom Koan-Sin Tan@koansin·27 Nis

@EHEefting @bcantrill Pay tribute to the older Unix Bourne Shell source code?

English

Edwin Eefting@EHEefting·26 Nis

@bcantrill Wow the "fdisk pascal in c" myth was actually real: #L221" target="_blank" rel="nofollow noopener">github.com/microsoft/MS-D…

English

Bryan Cantrill@bcantrill·26 Nis

You wouldn't last an hour in the asylum where they raised me github.com/microsoft/MS-D…

English

381

37.7K

freedom Koan-Sin Tan@koansin·17 Nis

@Nextremer_nb_o That’s for training on TPU.

English

nb.o@Nextremer_nb_o·17 Nis

会社の仕組み、まるでわからん、、、

日本語

111

freedom Koan-Sin Tan@koansin·3 Nis

@GoogleWorkspace @Google Z

Google Workspace@GoogleWorkspace·2 Nis

🚀 Boost your productivity with 7 tips found in “Uptime,” a new book from @Google’s executive productivity advisor, Laura Mae Martin. → goo.gle/3xk09I0

English

9.2K

freedom Koan-Sin Tan@koansin·24 Mar

Daemon and Dragon

catcatcatcat@4catcatcat

Deamon & Dragon #AsiaBSDCon

English

252

Keşfet

@anemll @maderix @Wunkolo @fleetwood___ @flat @ivanchanavinah @HuggingFace @lafaiel