Finchememo.py

3.7K posts

Finchememo.py

@Finchedemo

🚀Bay Area Coder| AI Explorer 🎲buzz about Product/Tech/Startup ⚡Abhorrence of WesternPropaganda | Trolls Hater(坚决拉黑一切反贼和阴阳怪气者) 💡RT&Like≠Endorse | Views my own

Los Angeles, CA Bergabung Ağustos 2019

1.1K Mengikuti86 Pengikut

Finchememo.py@Finchedemo·25 Şub

@SCMPNews The irony is that it was actually the Singaporean Prime Minister who expressed erroneous views on the Sino-Japanese dispute and Japan's militaristic history.

English

613

South China Morning Post@SCMPNews·25 Şub

Singapore prime minister attacked by hundreds of Chinese-language fake AI videos #Echobox=1771988661" target="_blank" rel="nofollow noopener">scmp.com/news/asia/sout…

English

13.9K

Finchememo.py@Finchedemo·24 Şub

@CuiMao @AnthropicAI 美其名曰「对齐人类价值观」，却是硅谷最虚伪、最会装、最擅长用道德外衣包藏意识形态私货的公司

中文

269

CuiMao@CuiMao·24 Şub

抽奖在评论区用汉语狠狠的批评@AnthropicAI，让他看到群众的愤怒，随便你说什么，抽10个kimi code 的 Moderato Coding plan。

中文

49.1K

Finchememo.py@Finchedemo·10 Şub

@Artedeingenio I'm incredibly fascinated by how you utilize Suno to create such magnificent music

English

200

OscarAI@Artedeingenio·10 Şub

I’ve created this hand-drawn retrofuturistic sci-fi short in the style of European graphic novel illustration, focused on atmosphere, stillness, and lived-in technology. The workflow is exactly the same one I used for the other short with a retrofuturistic, industrial, post-human aesthetic that I shared recently: concept art created in Midjourney, animation in Grok Imagine (finding the right prompt to achieve that extremely subtle, restrained animation effect is crucial), and Suno for both the voice-over and the soundtrack. I think the result is quite hypnotic.

English

135

238

57.1K

Finchememo.py@Finchedemo·10 Şub

marked

Kimi.ai@Kimi_Moonshot

Kimi Agent Swarm blog is here 🐝 kimi.com/blog/agent-swa… Kimi can spawn a team of specialists to: - Scale output: multi-file generation (Word, Excel, PDFs, slides) - Scale research: parallel analysis of news from 2000–2025 - Scale creativity: a book in 20 writing styles in parallel Context windows fill up and reasoning degrades. Kimi Agent Swarm breaks this structural limit: 100 sub-agents, 1500 tool calls, and 4.5× faster than sequential execution.

English

Finchememo.py@Finchedemo·9 Şub

这......正在参加快手万擎 X Atlas Cloud的猜模型赢千亿tokens活动，我投春节前Deepseek发布V4，一定要奶住！😂 streamlake.com/marketing/cny-…

中文

Finchememo.py@Finchedemo·6 Şub

Kling 3.0 Multi Shot and the 180-degree rule

Matt Workman@mattworkman

I noticed something about most of the new Kling 3.0 demos. Kling 3.0 Multi Shot doesn't cross the line! One of the major pain (expensive) points of making AI narrative films is that the coverage is all over the place and random shot grids are easily half unusable based on the traditional filmmaking coverage rules. This is my first dialogue scene with shot01 - wide shot, shot 02 - close up woman, shot03 - close up man AND KLING DID NOT CROSS THE LINE. Now the framing still could be tweaked to fit the standard conventions of traditional cinematography, that is where possibly tools like @martini_film could help out. But very impressed, I'm generating this video with Korean dialogue next. @Kling_ai

English

Finchememo.py@Finchedemo·5 Şub

@AngelicaOung @migicinthe33010 check

English

Angelica 🌐⚛️🇹🇼🇨🇳🇺🇸@AngelicaOung·5 Şub

@migicinthe33010 🙏

QME

2.1K

Angelica 🌐⚛️🇹🇼🇨🇳🇺🇸@AngelicaOung·5 Şub

Calling all my Chinese readers: as we all know, the WaPo did a huge layoff, including getting rid of their China-based team. Some might cynically say given the slantedness of their reporting, nothing of value was lost. But this does accentuate a real problem: people who want to find out more about China don’t know where to go. The information is there on the Chinese Internet. But what are the best outlets? Let’s say I’m an intrepid person willing to read the original through a ChatGPT translation. Where do I even start? 👇suggest trusted outlets, interesting individuals to follow, how to search for good articles on Weibo. Provide as much context as possible!

English

192

21.9K

Finchememo.py me-retweet

Hidden Monopolies@HiddenMonopoly·2 Şub

One of the best overviews of moats The Taxonomy of Moats by @ganeumann

English

124

772

46.7K

Finchememo.py@Finchedemo·2 Şub

curate your information sources

Andrej Karpathy@karpathy

Finding myself going back to RSS/Atom feeds a lot more recently. There's a lot more higher quality longform and a lot less slop intended to provoke. Any product that happens to look a bit different today but that has fundamentally the same incentive structures will eventually converge to the same black hole at the center of gravity well. We should bring back RSS - it's open, pervasive, hackable. Download a client, e.g. NetNewsWire (or vibe code one) Cold start: example of getting off the ground, here is a list of 92 RSS feeds of blogs that were most popular on HN in 2025: gist.github.com/emschwartz/e6d… Works great and you will lose a lot fewer brain cells. I don't know, something has to change.

English

Finchememo.py@Finchedemo·2 Şub

good to read

Sourish Jasti@SourishJasti

1/ General-purpose robotics is the rare technological frontier where the US / China started at roughly the same time and there's no clear winner yet. To better understand the landscape, @zoeytang_1007, @intelchentwo, @vishnuman0 and I spent the last ~8 weeks creating a deep dive on humanoid robotics hardware and flew to China to see the supply chain firsthand. Here's everything we've created + our takeaways about the components, humanoid comparisons, supply chains, and geopolitics👇

English

Finchememo.py@Finchedemo·1 Şub

@jukan05 VPN comes first，and then you can use ChatGPT or other AI translator.IF Any problems，ask policemen nearby

English

Jukan@jukan05·1 Şub

I'm considering traveling to Shenzhen... but the prices are insane? For the price of one night in a 5-star hotel in Korea, you can stay three nights in Shenzhen (based on 5-star hotel standards). I've heard English is barely spoken in Shenzhen... Should I use a Chinese translation app?

English

127

473

97.1K

Finchememo.py@Finchedemo·1 Şub

additional：x.com/TheAhmadOsman/…

Ahmad@TheAhmadOsman

There are maybe ~20-25 papers that matter. Implement those and you’ve captured ~90% of the alpha behind modern LLMs. Everything else is garnish. You want that list? Keep reading ;) The Top 26 Essential Papers (+5 Bonus Resources) for Mastering LLMs and Transformers This list bridges the Transformer foundations with the reasoning, MoE, and agentic shift Recommended Reading Order 1. Attention Is All You Need (Vaswani et al., 2017) > The original Transformer paper. Covers self-attention, > multi-head attention, and the encoder-decoder structure > (even though most modern LLMs are decoder-only.) 2. The Illustrated Transformer (Jay Alammar, 2018) > Great intuition builder for understanding > attention and tensor flow before diving into implementations 3. BERT: Pre-training of Deep Bidirectional Transformers (Devlin et al., 2018) > Encoder-side fundamentals, masked language modeling, > and representation learning that still shape modern architectures 4. Language Models are Few-Shot Learners (GPT-3) (Brown et al., 2020) > Established in-context learning as a real > capability and shifted how prompting is understood 5. Scaling Laws for Neural Language Models (Kaplan et al., 2020) > First clean empirical scaling framework for parameters, data, and compute > Read alongside Chinchilla to understand why most models were undertrained 6. Training Compute-Optimal Large Language Models (Chinchilla) (Hoffmann et al., 2022) > Demonstrated that token count matters more than > parameter count for a fixed compute budget 7. LLaMA: Open and Efficient Foundation Language Models (Touvron et al., 2023) > The paper that triggered the open-weight era > Introduced architectural defaults like RMSNorm, SwiGLU > and RoPE as standard practice 8. RoFormer: Rotary Position Embedding (Su et al., 2021) > Positional encoding that became the modern default for long-context LLMs 9. FlashAttention (Dao et al., 2022) > Memory-efficient attention that enabled long context windows > and high-throughput inference by optimizing GPU memory access. 10. Retrieval-Augmented Generation (RAG) (Lewis et al., 2020) > Combines parametric models with external knowledge sources > Foundational for grounded and enterprise systems 11. Training Language Models to Follow Instructions with Human Feedback (InstructGPT) (Ouyang et al., 2022) > The modern post-training and alignment blueprint > that instruction-tuned models follow 12. Direct Preference Optimization (DPO) (Rafailov et al., 2023) > A simpler and more stable alternative to PPO-based RLHF > Preference alignment via the loss function 13. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022) > Demonstrated that reasoning can be elicited through prompting > alone and laid the groundwork for later reasoning-focused training 14. ReAct: Reasoning and Acting (Yao et al., 2022 / ICLR 2023) > The foundation of agentic systems > Combines reasoning traces with tool use and environment interaction 15. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (Guo et al., 2025) > The R1 paper. Proved that large-scale reinforcement learning without > supervised data can induce self-verification and structured reasoning behavior 16. Qwen3 Technical Report (Yang et al., 2025) > A modern architecture lightweight overview > Introduced unified MoE with Thinking Mode and Non-Thinking > Mode to dynamically trade off cost and reasoning depth 17. Outrageously Large Neural Networks: Sparsely-Gated Mixture of Experts (Shazeer et al., 2017) > The modern MoE ignition point > Conditional computation at scale 18. Switch Transformers (Fedus et al., 2021) > Simplified MoE routing using single-expert activation > Key to stabilizing trillion-parameter training 19. Mixtral of Experts (Mistral AI, 2024) > Open-weight MoE that proved sparse models can match dense quality > while running at small-model inference cost 20. Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints (Komatsuzaki et al., 2022 / ICLR 2023) > Practical technique for converting dense checkpoints into MoE models > Critical for compute reuse and iterative scaling 21. The Platonic Representation Hypothesis (Huh et al., 2024) > Evidence that scaled models converge toward shared > internal representations across modalities 22. Textbooks Are All You Need (Gunasekar et al., 2023) > Demonstrated that high-quality synthetic data allows > small models to outperform much larger ones 23. Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (Templeton et al., 2024) > The biggest leap in mechanistic interpretability > Decomposes neural networks into millions of interpretable features 24. PaLM: Scaling Language Modeling with Pathways (Chowdhery et al., 2022) > A masterclass in large-scale training > orchestration across thousands of accelerators 25. GLaM: Generalist Language Model (Du et al., 2022) > Validated MoE scaling economics with massive > total parameters but small active parameter counts 26. The Smol Training Playbook (Hugging Face, 2025) > Practical end-to-end handbook for efficiently training language models Bonus Material > T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2019) > Toolformer (Schick et al., 2023) > GShard (Lepikhin et al., 2020) > Adaptive Mixtures of Local Experts (Jacobs et al., 1991) > Hierarchical Mixtures of Experts (Jordan and Jacobs, 1994) If you deeply understand these fundamentals; Transformer core, scaling laws, FlashAttention, instruction tuning, R1-style reasoning, and MoE upcycling, you already understand LLMs better than most Time to lock-in, good luck!

English

Finchememo.py@Finchedemo·30 Oca

paper curation to be read

Ahmad@TheAhmadOsman

The Top 26 Essential Papers (+5 Bonus Resources) for Mastering LLMs and Transformers This list bridges the Transformer foundations with the reasoning, MoE, and agentic shift Recommended Reading Order 1. Attention Is All You Need (Vaswani et al., 2017) > The original Transformer paper. Covers self-attention, > multi-head attention, and the encoder-decoder structure > (even though most modern LLMs are decoder-only.) 2. The Illustrated Transformer (Jay Alammar, 2018) > Great intuition builder for understanding > attention and tensor flow before diving into implementations 3. BERT: Pre-training of Deep Bidirectional Transformers (Devlin et al., 2018) > Encoder-side fundamentals, masked language modeling, > and representation learning that still shape modern architectures 4. Language Models are Few-Shot Learners (GPT-3) (Brown et al., 2020) > Established in-context learning as a real > capability and shifted how prompting is understood 5. Scaling Laws for Neural Language Models (Kaplan et al., 2020) > First clean empirical scaling framework for parameters, data, and compute > Read alongside Chinchilla to understand why most models were undertrained 6. Training Compute-Optimal Large Language Models (Chinchilla) (Hoffmann et al., 2022) > Demonstrated that token count matters more than > parameter count for a fixed compute budget 7. LLaMA: Open and Efficient Foundation Language Models (Touvron et al., 2023) > The paper that triggered the open-weight era > Introduced architectural defaults like RMSNorm, SwiGLU > and RoPE as standard practice 8. RoFormer: Rotary Position Embedding (Su et al., 2021) > Positional encoding that became the modern default for long-context LLMs 9. FlashAttention (Dao et al., 2022) > Memory-efficient attention that enabled long context windows > and high-throughput inference by optimizing GPU memory access. 10. Retrieval-Augmented Generation (RAG) (Lewis et al., 2020) > Combines parametric models with external knowledge sources > Foundational for grounded and enterprise systems 11. Training Language Models to Follow Instructions with Human Feedback (InstructGPT) (Ouyang et al., 2022) > The modern post-training and alignment blueprint > that instruction-tuned models follow 12. Direct Preference Optimization (DPO) (Rafailov et al., 2023) > A simpler and more stable alternative to PPO-based RLHF > Preference alignment via the loss function 13. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022) > Demonstrated that reasoning can be elicited through prompting > alone and laid the groundwork for later reasoning-focused training 14. ReAct: Reasoning and Acting (Yao et al., 2022 / ICLR 2023) > The foundation of agentic systems > Combines reasoning traces with tool use and environment interaction 15. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (Guo et al., 2025) > The R1 paper. Proved that large-scale reinforcement learning without > supervised data can induce self-verification and structured reasoning behavior 16. Qwen3 Technical Report (Yang et al., 2025) > A modern architecture lightweight overview > Introduced unified MoE with Thinking Mode and Non-Thinking > Mode to dynamically trade off cost and reasoning depth 17. Outrageously Large Neural Networks: Sparsely-Gated Mixture of Experts (Shazeer et al., 2017) > The modern MoE ignition point > Conditional computation at scale 18. Switch Transformers (Fedus et al., 2021) > Simplified MoE routing using single-expert activation > Key to stabilizing trillion-parameter training 19. Mixtral of Experts (Mistral AI, 2024) > Open-weight MoE that proved sparse models can match dense quality > while running at small-model inference cost 20. Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints (Komatsuzaki et al., 2022 / ICLR 2023) > Practical technique for converting dense checkpoints into MoE models > Critical for compute reuse and iterative scaling 21. The Platonic Representation Hypothesis (Huh et al., 2024) > Evidence that scaled models converge toward shared > internal representations across modalities 22. Textbooks Are All You Need (Gunasekar et al., 2023) > Demonstrated that high-quality synthetic data allows > small models to outperform much larger ones 23. Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (Templeton et al., 2024) > The biggest leap in mechanistic interpretability > Decomposes neural networks into millions of interpretable features 24. PaLM: Scaling Language Modeling with Pathways (Chowdhery et al., 2022) > A masterclass in large-scale training > orchestration across thousands of accelerators 25. GLaM: Generalist Language Model (Du et al., 2022) > Validated MoE scaling economics with massive > total parameters but small active parameter counts 26. The Smol Training Playbook (Hugging Face, 2025) > Practical end-to-end handbook for efficiently training language models Bonus Material > T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2019) > Toolformer (Schick et al., 2023) > GShard (Lepikhin et al., 2020) > Adaptive Mixtures of Local Experts (Jacobs et al., 1991) > Hierarchical Mixtures of Experts (Jordan and Jacobs, 1994) If you deeply understand these fundamentals; Transformer core, scaling laws, FlashAttention, instruction tuning, R1-style reasoning, and MoE upcycling, you already understand LLMs better than most Time to lock-in, good luck ;)

English

Finchememo.py me-retweet

Kimi.ai@Kimi_Moonshot·1 Şub

Here’s how to connect Kimi K2.5 to OpenClaw🦞: x.com/kimiproduct/st…

OpenClaw🦞@openclaw

🦞 OpenClaw 2026.1.30 🐚 Shell completion 🆓 Kimi K2.5 + Kimi Coding: run your claw for free 🔐 MiniMax OAuth: one more model just a login away 📱 Telegram got a glow-up — 6 fixes from threading to HTML rendering Plus a bunch of community-contributed fixes across LINE, BlueBubbles, routing, security & OAuth. The lobster provides 😏 github.com/openclaw/openc…

English

103

1.4K

160.7K

Finchememo.py@Finchedemo·1 Şub

Mark

Boris Cherny@bcherny

I'm Boris and I created Claude Code. I wanted to quickly share a few tips for using Claude Code, sourced directly from the Claude Code team. The way the team uses Claude is different than how I use it. Remember: there is no one right way to use Claude Code -- everyones' setup is different. You should experiment to see what works for you!

English

Finchememo.py@Finchedemo·30 Oca

@Legendaryy Anti-China hysteria sufferers. This subconscious mindset is the psychological virus in Western discourse.

English

Legendary@Legendaryy·29 Oca

kimi 2.5 is the best model for clawdbot/moltbot right now. almost opus 4.5 quality but 95% cheaper. but there is a catch: moonshot api sends your data to china heres how to fix it: serve it through openrouter instead. turn on ZDR (zero data retention). pick fireworks as provider who are located in the US and have a no logs policy

English

132

1.7K

418.1K

Finchememo.py@Finchedemo·30 Oca

nice use case

Zara Zhang@zarazhangrui

Here's an animation playground I got Claude to made me in html, which allows me to explore all sorts of interactions on the web. I was like, "Claude, show me everything you've got"

English

Finchememo.py me-retweet

Qwen@Alibaba_Qwen·29 Oca

Qwen3-ASR and Qwen3-ForcedAligner are now open source — production-ready speech models designed for messy, real-world audio, with competitive performance and strong robustness. ● 52 languages & dialects with auto language ID (30 languages + 22 dialects/accents) ● Robust in noisy and complex settings (yes, singing and songs too) ● Long audio support: up to 20 minutes per pass ● Word/phrase-level timestamps: high-precision alignment for 11 languages via Qwen3-ForcedAligner, stronger than MFA/CTC/CIF-style aligners Also included: a full open-source inference & finetuning stack with vLLM batch, streaming, and async serving. GitHub: github.com/QwenLM/Qwen3-A… Hugging Face: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw… Hugging Face Demo: huggingface.co/spaces/Qwen/Qw… ModelScope Demo: modelscope.cn/studios/Qwen/Q… Blog: qwen.ai/blog?id=qwen3a… Paper: github.com/QwenLM/Qwen3-A…

English

243

1.5K

281.6K

Finchememo.py@Finchedemo·30 Oca

amazing

English

Finchememo.py@Finchedemo·29 Oca

@jukan05 “晚点LatePost” is one of the most well-known media outlets in mainland China specializing in technology and the internet industry.

English

163

Jukan@jukan05·29 Oca

x.com/i/article/2016…

ZXX

264

31.8K

Jelajahi

@SCMPNews @CuiMao @AnthropicAI @Artedeingenio @AngelicaOung @migicinthe33010 @ganeumann @jukan05