Kar Chua

1.8K posts

Kar Chua banner
Kar Chua

Kar Chua

@kar9222

#real Running, cooking, probability 55%, Bayesian, trading, investing, common sense, logic, data analysis, statistical inference & modeling with common sense

Katılım Şubat 2018
440 Takip Edilen208 Takipçiler
Sabitlenmiş Tweet
Kar Chua
Kar Chua@kar9222·
Linear representation is the foundation of most, if not all, models, including architecture of deep learning Feel free to see it in simple codes I shared for both #julialang and #RStats. You can play around with it too See images and link: gist.github.com/kar9222/82c767…
Kar Chua tweet mediaKar Chua tweet mediaKar Chua tweet mediaKar Chua tweet media
Women in Statistics and Data Science@WomenInStat

How about deep learning? Super non-linear, right? Well, as a function of some non-linear activations, it's IJALM. You can put lipstick on a linear model, but it’s still a linear model. Fit it w/least squares … w/ bells & whistles like dropout, SGD, & regularization. 11/

English
0
2
11
0
Kar Chua retweetledi
Neovim, e/plugins
Neovim, e/plugins@Neovim·
Nvim 0.12 released. sweet, sweet release.
Neovim, e/plugins tweet media
English
22
104
853
59.5K
Kar Chua
Kar Chua@kar9222·
@dyz_ob Thanks for sharing. May I know where is the official source? 🙏🙏
English
0
0
0
399
DZ
DZ@dyz_ob·
$BABA : Alibaba Cloud's AI Coding Plan Sees Unprecedented Demand, Forces Purchasing Restrictions.
DZ tweet media
English
7
21
106
11K
lasse
lasse@lassevjl·
Termy v0.1.45 is out. Introducing layouts. Save your split tabs and use them later. Rename them to your liking. All really fast. Upgrade today. Along with bug fixes. Enjoy! Get termy > termy.run #termy #terminal #rust
English
3
2
23
1.7K
Kar Chua
Kar Chua@kar9222·
@thdxr @kianmckenn @kitlangton 👍 When you mentioned “aggressively clean up”, does it mean by hand or AI? May I know overall roughly how many % of code are still written by hand?
English
0
0
0
372
dax
dax@thdxr·
@kianmckenn @kitlangton has a good metaphor it's like tending a garden can let ai code grow but you have to aggressively clean up after it and be diligent about architecture and patterns codebase is ok it will get better
English
25
58
1.3K
43.7K
Kian
Kian@kianmckenn·
OpenCode's codebase is very high quality. Which is unusual for an app built with AI. This makes me curious how @thdxr uses AI to code.
English
15
3
785
137.7K
Kar Chua
Kar Chua@kar9222·
@VictorTaelin Same experience regarding (2), and i often add additional prompts like “do NOT over-engineer…” and it does help. I actually put it in AGENTS.md but she never follow it 100% of the time
English
0
0
0
99
Taelin
Taelin@VictorTaelin·
Ok, my final GPT-5.3 Feedback: - It is the best model for compiler work - It writes code carefully and generates bug-free code - It is capable of executing incredibly hard prompts - Definitely the smartest model available IMO Problems: 1. It is NOT capable of grasping intent. It will just take your prompt at face value, no matter how obvious it is. In many cases. It is EXTREMELY frustrating to work with because of that. Sometimes it finds an interpretation to my literal words that I couldn't even anticipate. Working with GPT-5.3 is a test of patience and a good part of the job is making sure I anticipate all possible dumb ways it could interpret my prompt, and write exact words to drift it away from that potencial interpretation. And then it still finds a way. 2. It is a merciless complexity monster. When it comes to writing code, it has no shame. It is careless. It will just add, add, add, and never remove or cleanup. Even worse, it will often add nearly identical functions, instead of just using or adapting what exists. That goes against its own interests, because, past certain threshold, it will start under-performing (like all models). Often, after I ask for a feature, I'll just write a follow up prompt like "your code works but is way longer than needed, your goal now is to simplify it as much as possible" or variations of that. 3. It still forgets everything the day after. Not much to say about this, obviously a fundamental issue with LLMs that is *not* satisfactorily solved with memory or agents. And that's it. I strongly suggest OpenAI to take these 3 aspects seriously and explicitly train for them. Regarding 1, Opus does that just fine, so, I'm sure there's a way. Regarding 2, it shouldn't be hard, but it has to be done carefully, because if you just try to minimize token count, the model will tend to *minify* the code (use short variable names, make code-golf like uglifications). That is NOT what you want. You want to train it to reduce code size by: A. Removing redundancies. If a functionality is already implemented, it should FIND IT and USE IT. Sometimes this will require some modifications, but that's always better than writing the same logic twice. B. Abstracting the common pattern out. Often there will be 2 long functions, F() and G(), that can be merged into a parametrized function FG(), and then, F() and G() become specialized instances of FG(). This is universally desirable and teaching a model to do it will wield amazing results in practical productivity. C. Using simpler logic whenever possible. Sometimes there is just a simpler way to implement an algorithm or procedure. You should teach the model to favor that. Regarding 3, until there is a major breakthrough that solves continual learning, I think OpenAI should work in a product that allows us to at least mitigate the issue. Some people claim to have luck with nightly LoRA's. Being able to do that with codex models on my domain would be amazing.
English
148
67
1.5K
157.9K
Kar Chua
Kar Chua@kar9222·
@ibragim_bad @aleabitoreddit Thanks. roughly agree with this benchmark.For example, I tried Minimax M2.5 on tasks requiring deep reasoning and it still has quite some gap with codex. It requires much more hand holding. I think its cost advantage is more suitable for things like OpenClaw. Seems “bench-maxxed”
English
1
0
0
83
Ibragim
Ibragim@ibragim_bad·
@aleabitoreddit Thanks for the shoutout. I’m one of the SWE-rebench authors. we refresh the leaderboard every month with new tasks and newly released models, and we report metrics like pass@5, tokens, and cost per problem. dms are open if you have feedback and feel free to ask anything
English
4
0
27
2.8K
Serenity
Serenity@aleabitoreddit·
A new benchmark shows Chinese AI models (Kimi, Minimax, DeepSeek) are much further behind Western frontier AI models than markets expect. LLMs from Opus, Gemini, GPT and are shown to be leading. A new benchmark called SWE-rebench uses new GitHub tasks: -> Minimax claimed 80.2% on the original SWE-bench. -> On the uncontaminated SWE-rebench, it crashed to 39.6%. The takeaway: Chinese labs have effectively solved single-prompt reasoning and discrete coding tasks at a fraction of the cost. However, the architecture and high-quality data required for long-horizon behavior remain a severe bottleneck that distillation and optimizing for benchmarks cannot fake. Chinese models are shown to be lagging to the deep, adaptable reasoning that US hyperscalers have.
Serenity tweet media
English
44
51
419
145.7K
Kar Chua
Kar Chua@kar9222·
@askOkara Thanks for sharing 🙏 fo doing research, based on your experience which do you prefer - k2.5, m2.5, deepseek v3.2, glm5?
English
0
0
0
56
Okara
Okara@askOkara·
here are our favorite open-source models per use case: > writing - kimi k2 > coding - minimax m2.5, glm 5 > ocr - deepseek ocr 2, qwen 3 vl > general queries - deepseek v3.2, kimi k2.5 > image editing - qwen image edit, flux 2 dev > image gen - z-image-turbo, flux 2 dev open-source ai is moving fast - this list gets updated every week!
English
19
46
694
26.5K
Kar Chua retweetledi
MiniMax (official)
MiniMax (official)@MiniMax_AI·
Introducing M2.5, an open-source frontier model designed for real-world productivity. - SOTA performance at coding (SWE-Bench Verified 80.2%), search (BrowseComp 76.3%), agentic tool-calling (BFCL 76.8%) & office work. - Optimized for efficient execution, 37% faster at complex tasks. - At $1 per hour with 100 tps, infinite scaling of long-horizon agents now economically possible MiniMax Agent: agent.minimax.io API: platform.minimax.io CodingPlan: platform.minimax.io/subscribe/codi…
MiniMax (official) tweet media
English
455
1K
8.7K
5.4M
Kar Chua
Kar Chua@kar9222·
@SkylerMiao7 Why MiniMax has such high performance/cost ratio especially in terms of active params 10B? Is there any catch in real world use cases? Am going give it a real try and compare to 5.3 codex high and xhigh Also what is the main architecture that allows MiniMax to achieve this? 🙏🧧
English
0
0
0
528
Tibo
Tibo@thsottiaux·
@BO5AMIS We will ship on windows soon
English
4
1
90
7.6K
Tibo
Tibo@thsottiaux·
What could we do better on Codex? App, model, strategy and features… what’s wrong in how we approach things that we should improve immediately?
English
1.2K
11
944
101.4K
Kar Chua
Kar Chua@kar9222·
@rekram11 Yes 5.2 codex was always like “say the word if you want me to do it” 🤣
English
0
0
1
299
Aiden Cline
Aiden Cline@rekram11·
Could be wrong but 5.3 codex so far feels a lot less reluctant to just do things. 5.2 models (in my experience) liked to end their messages with things like "If u want I can run tests", 5.3 codex will just do it (which I much prefer). Comparison of how the 2 models tell me they are done
Aiden Cline tweet mediaAiden Cline tweet media
English
26
7
288
45.9K
Kar Chua
Kar Chua@kar9222·
@dhh @shill_collin May I know roughly what % of code do you write by hand for existing large codebases?
English
0
0
1
96
DHH
DHH@dhh·
@shill_collin 100. Always. Unless it's just an experiment. But I wouldn't ship code I hadn't read and understood.
English
1
0
15
770
DHH
DHH@dhh·
So impressed with Kimi K2.5 for Omarchy development. I made four changes: Default enable Hibernation, Add "Remove > Preinstalls" (here's your bare-bones install!), documenting all bin/*, fix gum confirm color problem. Total tokens: 116K. Cost: $0.63. Via OpenCode / Zen. So good.
English
52
42
1.2K
129.3K
Andrew Ambrosino
Andrew Ambrosino@ajambrosino·
Today, we’re introducing the Codex app, our flagship Codex experience. Work on multiple things in parallel, extend Codex with skills, and automate repetitive tasks. The most exciting part for us has been using the app to build itself. This is the first of many new things coming to Codex– more to come.
Andrew Ambrosino tweet media
English
103
79
1.6K
240.7K
Lea
Lea@Lea_EFC·
I fucking love this app 😂😂😂
English
264
3.3K
25K
1.2M
Peter Steinberger 🦞
Peter Steinberger 🦞@steipete·
codex just doesn't gaf. This cooked 6 hours. Everything still works.
Peter Steinberger 🦞 tweet media
English
55
20
1.1K
94.9K
Kar Chua
Kar Chua@kar9222·
@steipete @moncrolio Based on your use case and experience, do you think 5.2 codex xhigh is not worth it and 5.2 codex high has the great speed and reasoning trade-off?
English
1
0
0
154
Kar Chua
Kar Chua@kar9222·
@thsottiaux We need just one of OpenCode or codex cli/TUI, not both. If there is a chance to combine the forces and work on just one cli/TUI, that would definitely create more synergy and win-win
English
0
0
1
54
Kar Chua
Kar Chua@kar9222·
@adamdotdev 3. opencode black tiered pricing, including limit/quota
English
0
0
0
70
Kar Chua
Kar Chua@kar9222·
@adamdotdev Things are moving so fast now! How to make full use of opencode with agents (e.g. oh my opencode), AGENTS.md, skills, etc What do you guys think of TUI and deskapp going forward. Will these two be developed at same pace? (both first class citizen?)
English
1
0
0
31
Adam
Adam@adamdotdev·
Dax and I going to record a podcast episode, what should we talk about
English
92
1
419
18.6K