Kar Chua

1.8K posts

Kar Chua

@kar9222

#real Running, cooking, probability 55%, Bayesian, trading, investing, common sense, logic, data analysis, statistical inference & modeling with common sense

Katılım Şubat 2018

440 Takip Edilen208 Takipçiler

Sabitlenmiş Tweet

Kar Chua@kar9222·24 Tem

Linear representation is the foundation of most, if not all, models, including architecture of deep learning Feel free to see it in simple codes I shared for both #julialang and #RStats. You can play around with it too See images and link: gist.github.com/kar9222/82c767…

Women in Statistics and Data Science@WomenInStat

How about deep learning? Super non-linear, right? Well, as a function of some non-linear activations, it's IJALM. You can put lipstick on a linear model, but it’s still a linear model. Fit it w/least squares … w/ bells & whistles like dropout, SGD, & regularization. 11/

English

Kar Chua retweetledi

Neovim, e/plugins@Neovim·29 Mar

Nvim 0.12 released. sweet, sweet release.

English

104

853

59.5K

Kar Chua@kar9222·8 Mar

@dyz_ob Thanks for sharing. May I know where is the official source? 🙏🙏

English

399

DZ@dyz_ob·8 Mar

$BABA : Alibaba Cloud's AI Coding Plan Sees Unprecedented Demand, Forces Purchasing Restrictions.

English

106

11K

Kar Chua@kar9222·7 Mar

@lassvestergaard Does termy support tab?

English

lasse@lassevjl·7 Mar

Termy v0.1.45 is out. Introducing layouts. Save your split tabs and use them later. Rename them to your liking. All really fast. Upgrade today. Along with bug fixes. Enjoy! Get termy > termy.run #termy #terminal #rust

English

1.7K

Kar Chua@kar9222·7 Mar

@thdxr @kianmckenn @kitlangton 👍 When you mentioned “aggressively clean up”, does it mean by hand or AI? May I know overall roughly how many % of code are still written by hand?

English

372

dax@thdxr·7 Mar

@kianmckenn @kitlangton has a good metaphor it's like tending a garden can let ai code grow but you have to aggressively clean up after it and be diligent about architecture and patterns codebase is ok it will get better

English

1.3K

43.7K

Kian@kianmckenn·7 Mar

OpenCode's codebase is very high quality. Which is unusual for an app built with AI. This makes me curious how @thdxr uses AI to code.

English

785

137.7K

Kar Chua@kar9222·21 Şub

@VictorTaelin Same experience regarding (2), and i often add additional prompts like “do NOT over-engineer…” and it does help. I actually put it in AGENTS.md but she never follow it 100% of the time

English

Taelin@VictorTaelin·21 Şub

Ok, my final GPT-5.3 Feedback: - It is the best model for compiler work - It writes code carefully and generates bug-free code - It is capable of executing incredibly hard prompts - Definitely the smartest model available IMO Problems: 1. It is NOT capable of grasping intent. It will just take your prompt at face value, no matter how obvious it is. In many cases. It is EXTREMELY frustrating to work with because of that. Sometimes it finds an interpretation to my literal words that I couldn't even anticipate. Working with GPT-5.3 is a test of patience and a good part of the job is making sure I anticipate all possible dumb ways it could interpret my prompt, and write exact words to drift it away from that potencial interpretation. And then it still finds a way. 2. It is a merciless complexity monster. When it comes to writing code, it has no shame. It is careless. It will just add, add, add, and never remove or cleanup. Even worse, it will often add nearly identical functions, instead of just using or adapting what exists. That goes against its own interests, because, past certain threshold, it will start under-performing (like all models). Often, after I ask for a feature, I'll just write a follow up prompt like "your code works but is way longer than needed, your goal now is to simplify it as much as possible" or variations of that. 3. It still forgets everything the day after. Not much to say about this, obviously a fundamental issue with LLMs that is *not* satisfactorily solved with memory or agents. And that's it. I strongly suggest OpenAI to take these 3 aspects seriously and explicitly train for them. Regarding 1, Opus does that just fine, so, I'm sure there's a way. Regarding 2, it shouldn't be hard, but it has to be done carefully, because if you just try to minimize token count, the model will tend to *minify* the code (use short variable names, make code-golf like uglifications). That is NOT what you want. You want to train it to reduce code size by: A. Removing redundancies. If a functionality is already implemented, it should FIND IT and USE IT. Sometimes this will require some modifications, but that's always better than writing the same logic twice. B. Abstracting the common pattern out. Often there will be 2 long functions, F() and G(), that can be merged into a parametrized function FG(), and then, F() and G() become specialized instances of FG(). This is universally desirable and teaching a model to do it will wield amazing results in practical productivity. C. Using simpler logic whenever possible. Sometimes there is just a simpler way to implement an algorithm or procedure. You should teach the model to favor that. Regarding 3, until there is a major breakthrough that solves continual learning, I think OpenAI should work in a product that allows us to at least mitigate the issue. Some people claim to have luck with nightly LoRA's. Being able to do that with codex models on my domain would be amazing.

English

148

1.5K

157.9K

Kar Chua@kar9222·19 Şub

@ibragim_bad @aleabitoreddit Thanks. roughly agree with this benchmark.For example, I tried Minimax M2.5 on tasks requiring deep reasoning and it still has quite some gap with codex. It requires much more hand holding. I think its cost advantage is more suitable for things like OpenClaw. Seems “bench-maxxed”

English

Ibragim@ibragim_bad·19 Şub

@aleabitoreddit Thanks for the shoutout. I’m one of the SWE-rebench authors. we refresh the leaderboard every month with new tasks and newly released models, and we report metrics like pass@5, tokens, and cost per problem. dms are open if you have feedback and feel free to ask anything

English

2.8K

Serenity@aleabitoreddit·19 Şub

A new benchmark shows Chinese AI models (Kimi, Minimax, DeepSeek) are much further behind Western frontier AI models than markets expect. LLMs from Opus, Gemini, GPT and are shown to be leading. A new benchmark called SWE-rebench uses new GitHub tasks: -> Minimax claimed 80.2% on the original SWE-bench. -> On the uncontaminated SWE-rebench, it crashed to 39.6%. The takeaway: Chinese labs have effectively solved single-prompt reasoning and discrete coding tasks at a fraction of the cost. However, the architecture and high-quality data required for long-horizon behavior remain a severe bottleneck that distillation and optimizing for benchmarks cannot fake. Chinese models are shown to be lagging to the deep, adaptable reasoning that US hyperscalers have.

English

419

145.7K

Kar Chua@kar9222·14 Şub

@askOkara Thanks for sharing 🙏 fo doing research, based on your experience which do you prefer - k2.5, m2.5, deepseek v3.2, glm5?

English

Okara@askOkara·13 Şub

here are our favorite open-source models per use case: > writing - kimi k2 > coding - minimax m2.5, glm 5 > ocr - deepseek ocr 2, qwen 3 vl > general queries - deepseek v3.2, kimi k2.5 > image editing - qwen image edit, flux 2 dev > image gen - z-image-turbo, flux 2 dev open-source ai is moving fast - this list gets updated every week!

English

694

26.5K

Kar Chua retweetledi

MiniMax (official)@MiniMax_AI·12 Şub

Introducing M2.5, an open-source frontier model designed for real-world productivity. - SOTA performance at coding (SWE-Bench Verified 80.2%), search (BrowseComp 76.3%), agentic tool-calling (BFCL 76.8%) & office work. - Optimized for efficient execution, 37% faster at complex tasks. - At $1 per hour with 100 tps, infinite scaling of long-horizon agents now economically possible MiniMax Agent: agent.minimax.io API: platform.minimax.io CodingPlan: platform.minimax.io/subscribe/codi…

English

455

8.7K

5.4M

Kar Chua@kar9222·12 Şub

@SkylerMiao7 Why MiniMax has such high performance/cost ratio especially in terms of active params 10B? Is there any catch in real world use cases? Am going give it a real try and compare to 5.3 codex high and xhigh Also what is the main architecture that allows MiniMax to achieve this? 🙏🧧

English

528

Skyler Miao@SkylerMiao7·12 Şub

hear you dax, MiniMax-Mx in training rn. taking a swing at it

dax@thdxr

all the oss models are trying to be an opus killer not so much a codex killer this makes sense because opus is more addicting to use so that's what's worth competing against but wonder if eventually someone takes a swing at a codex like model

English

295

24.8K

Kar Chua@kar9222·10 Şub

@thsottiaux @BO5AMIS Will Codex deskapp app support WSL?

Deutsch

Tibo@thsottiaux·10 Şub

@BO5AMIS We will ship on windows soon

English

7.6K

Tibo@thsottiaux·10 Şub

What could we do better on Codex? App, model, strategy and features… what’s wrong in how we approach things that we should improve immediately?

English

1.2K

944

101.4K

Kar Chua@kar9222·9 Şub

@rekram11 Yes 5.2 codex was always like “say the word if you want me to do it” 🤣

English

299

Aiden Cline@rekram11·9 Şub

Could be wrong but 5.3 codex so far feels a lot less reluctant to just do things. 5.2 models (in my experience) liked to end their messages with things like "If u want I can run tests", 5.3 codex will just do it (which I much prefer). Comparison of how the 2 models tell me they are done

English

288

45.9K

Kar Chua@kar9222·7 Şub

@dhh @shill_collin May I know roughly what % of code do you write by hand for existing large codebases?

English

DHH@dhh·7 Şub

@shill_collin 100. Always. Unless it's just an experiment. But I wouldn't ship code I hadn't read and understood.

English

770

DHH@dhh·7 Şub

So impressed with Kimi K2.5 for Omarchy development. I made four changes: Default enable Hibernation, Add "Remove > Preinstalls" (here's your bare-bones install!), documenting all bin/*, fix gum confirm color problem. Total tokens: 116K. Cost: $0.63. Via OpenCode / Zen. So good.

English

1.2K

129.3K

Kar Chua@kar9222·7 Şub

@pranaveight Hi @pranaveight Thanks for sharing this. Will it support WSL?

English

179

pranav@pranaveight·6 Şub

We weren't kidding when we said the Codex app on Windows is coming soon

Andrew Ambrosino@ajambrosino

Windows has been achieved internally

English

242

15.3K

Kar Chua@kar9222·3 Şub

@ajambrosino @jonas_tmb Hi @ajambrosino When the Windows app is released, will I be able to use it with WSL? (e.g. including terminal, path isssues). Thanks

English

Andrew Ambrosino@ajambrosino·2 Şub

@jonas_tmb soon

English

4.7K

Andrew Ambrosino@ajambrosino·2 Şub

Today, we’re introducing the Codex app, our flagship Codex experience. Work on multiple things in parallel, extend Codex with skills, and automate repetitive tasks. The most exciting part for us has been using the app to build itself. This is the first of many new things coming to Codex– more to come.

English

103

1.6K

240.7K

Kar Chua@kar9222·26 Oca

@Lea_EFC Lmao

Lea@Lea_EFC·25 Oca

I fucking love this app 😂😂😂

English

264

3.3K

25K

1.2M

Kar Chua@kar9222·13 Oca

@steipete @moncrolio Thanks 🙏

English

Peter Steinberger 🦞@steipete·13 Oca

@kar9222 @moncrolio correct. even high is excessive but yolo

English

164

Peter Steinberger 🦞@steipete·13 Oca

codex just doesn't gaf. This cooked 6 hours. Everything still works.

English

1.1K

94.9K

Kar Chua@kar9222·13 Oca

@steipete @moncrolio Based on your use case and experience, do you think 5.2 codex xhigh is not worth it and 5.2 codex high has the great speed and reasoning trade-off?

English

154

Peter Steinberger 🦞@steipete·13 Oca

@moncrolio It is

English

Kar Chua@kar9222·10 Oca

@thsottiaux We need just one of OpenCode or codex cli/TUI, not both. If there is a chance to combine the forces and work on just one cli/TUI, that would definitely create more synergy and win-win

English

Tibo@thsottiaux·10 Oca

Rarely seen a team ship this fast. Stay tuned for more work from Codex to work with the flourishing ecosystem of agents and tools out there.

dax@thdxr

in opencode v1.1.11 you can now use your ChatGPT Plus/Pro plans in OpenCode /connect to set it up

English

837

59.6K

Kar Chua@kar9222·10 Oca

@adamdotdev 3. opencode black tiered pricing, including limit/quota

English

Kar Chua@kar9222·10 Oca

@adamdotdev Things are moving so fast now! How to make full use of opencode with agents (e.g. oh my opencode), AGENTS.md, skills, etc What do you guys think of TUI and deskapp going forward. Will these two be developed at same pace? (both first class citizen?)

English

Adam@adamdotdev·10 Oca

Dax and I going to record a podcast episode, what should we talk about

English

419

18.6K

Keşfet

@dyz_ob @thdxr @kianmckenn @kitlangton @VictorTaelin @ibragim_bad @aleabitoreddit @askOkara