Raymond Christopher

14.1K posts

Raymond Christopher

Raymond Christopher

@RayChris91

Christ-follower AI builder, 🇮🇩 ex-Googler

Bekasi เข้าร่วม Ağustos 2009
972 กำลังติดตาม1.4K ผู้ติดตาม
Raymond Christopher
Raymond Christopher@RayChris91·
@levifikri ini alasan saya juga skrng milih2 kalau mau ngajarin org/grup dalam kapasitas apapun, mesti lihat niatnya dulu kalau gak mager haha
Indonesia
0
0
1
132
Levi | still learning (and messing up)
Agree, gak ngerasa worth the effort bikin event gratisan: - "dapat sertifikat gak kak?" - kind of questions - background pesertanya bisa super random - attendance rate rendah Mending dibikin berbayar, barang Rp 20k sebagai komitmen fee. Or menarik kalau bisa skema "kalau gak datang dan aktif, wajib bayar 1 juta => terus auto charge ke credit card 🤪. Otherwise gratis"
Razan Tata | Content Writer@razantataa

Aku pernah mengajar di KELAS PRAKERJA. Itu kan GRATIS ya. Tahu gak POV-ku sebagai pengajar gimana? Sebagai gambaran: Tiap kelas berlangsung 5 hari. Tiap sesinya 3 jam. Itu aku berasa bicara dengan tembok 🥲 Hampir semuanya OFF CAM dan PASIF. Buka sesi tanya jawab, gak ada yang nanya. Lalu, pas dikasih tugas tahu gak apa yang kudapat? Anggap ada 15 peserta. Itu bisa 10-12 nya ngerjain asal-asalan. Bahkan ada yang kelihatan banget pake chatGPT 😭 Ya, kelas gratis sering yang datang MENTAL GRATISAN. Mau cerita lain dari contoh ilmu yang dikasih GRATIS? Aku punya grup menulis. Terbuka dan gratis untuk umum. Semua bisa diskusi dan belajar di situ. Tapi, KALAU MAU ✨ Karena yang kudapat banyak banget SILENT READER. Tiap aku sharing materi, sepi dari tanggapan. Padahal yang "seen" itu banyak banget. Sementara itu, di balik materi webinar hingga komunitas, ada effort yang kucurahkan. Ada waktu yang kugunakan buat nyiapin materi. Ada pengalaman yang kubagi cuma-cuma. Di situ aku punya kesimpulan: MENTAL GRATISAN dan MENTAL SIAP BAYAR itu berbeda ✨ Contoh: Aku beberapa kali itu bootcamp berbayar. Di sana suasananya aktif sekali. Sesi tanya jawabnya hidup. Tugas dikerjain sungguh-sungguh. ILMU BERBAYAR ternyata jauh lebih dihargai dari ILMU GRATISAN. Nah, kalau ada yang bilang "Kok ujung-ujungnya jualan kelas?" Ya, buat menyaring siapa saja yang siap dengan ilmu. Termasuk menghargai yang membagikan ilmunya ✨

Indonesia
2
4
24
2.5K
Raymond Christopher
Raymond Christopher@RayChris91·
Even in highly technical environments, many people still aren’t data-driven about which problems are actually worth solving. One thing I learned at @Google: writing code is often the least critical part of the job. Knowing what the code should achieve is what matters most.
English
0
0
0
62
Raymond Christopher
Raymond Christopher@RayChris91·
Makin sering ngobrol sama orang, makin sadar: informasi yang buat kita terasa basic atau sepele, buat orang lain bisa sangat mahal nilainya. That’s why networking is expensive. You don’t really understand its value until you experience it yourself.
Indonesia
1
4
8
249
Raymond Christopher
Raymond Christopher@RayChris91·
Some "breakthrough papers" are just out of touch from the reality in the field. Not using SOTA models, no mention of agentic harnesses used, etc.
Dr Milan Milanović@milan_milanovic

𝗟𝗟𝗠𝘀 𝗔𝗿𝗲 𝗡𝗼𝘁 𝗥𝗲𝗮𝗱𝗶𝗻𝗴 𝗬𝗼𝘂𝗿 𝗖𝗼𝗱𝗲 We keep calling LLMs "AI coding assistants." But writing code and understanding code are not the same thing. Researchers from Virginia Tech and Carnegie Mellon University just ran 750,000 debugging experiments across 10 models to determine how well LLMs actually understand code. The results show that you should not blindly trust your AI coding assistant when debugging. Here is what they found: 𝟭. 𝗔 𝗿𝗲𝗻𝗮𝗺𝗲𝗱 𝘃𝗮𝗿𝗶𝗮𝗯𝗹𝗲 𝗯𝗿𝗲𝗮𝗸𝘀 𝘁𝗵𝗲 𝗱𝗲𝗯𝘂𝗴𝗴𝗲𝗿 Researchers created a bug, confirmed that the LLM found it, then made changes that don't touch the bug at all, such as renaming a variable or adding a comment. In 78% of cases, the model could no longer find the same bug. The bug was still there. The variable names and comments changed, and that was enough. 𝟮. 𝗗𝗲𝗮𝗱 𝗰𝗼𝗱𝗲 𝗶𝘀 𝗮 𝘁𝗿𝗮𝗽 Adding code that never runs reduced bug-detection accuracy to 20.38%. Models treated dead code as live, and flagged it as the source of the bug. But the bug was in another line. So, LLMs cannot reliably distinguish "this runs" from "this never runs." 𝟯. 𝗠𝗼𝗱𝗲𝗹𝘀 𝗿𝗲𝗮𝗱 𝘁𝗼𝗽-𝘁𝗼-𝗯𝗼𝘁𝘁𝗼𝗺, 𝗻𝗼𝘁 𝗹𝗼𝗴𝗶𝗰𝗮𝗹𝗹𝘆 56% of correctly found bugs were in the first quarter of the file. Only 6% were in the last quarter. The further down the code, the less attention the model pays to it. If the bug lives in the bottom half of your file, the model is already less likely to find it. 𝟰. 𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻 𝗿𝗲𝗼𝗿𝗱𝗲𝗿𝗶𝗻𝗴 𝗮𝗹𝗼𝗻𝗲 𝗰𝘂𝘁 𝗮𝗰𝗰𝘂𝗿𝗮𝗰𝘆 𝗯𝘆 𝟴𝟯% Changing the order of functions in a Java file caused an 83% drop in debugging accuracy. The code still remained the same. Where the code physically sits in the file matters more to the model than what the code does. So, obviously, this is a sign of pattern recognition, not real code understanding. 𝟱. 𝗡𝗲𝘄𝗲𝗿 𝗺𝗼𝗱𝗲𝗹𝘀 𝗵𝗮𝗿𝗱𝗹𝘆 𝗺𝗼𝘃𝗲 𝘁𝗵𝗲 𝗻𝗲𝗲𝗱𝗹𝗲 Claude improved ~1% between 3.7 and 4.5 Sonnet on this task. Gemini improved by ~1.8%. Every model release comes with a new benchmark leaderboard and new headlines. But the ability to reason about code under realistic conditions is improving slowly. 𝟲. 𝗧𝗵𝗲𝘀𝗲 𝘄𝗲𝗿𝗲 𝗯𝗲𝘀𝘁-𝗰𝗮𝘀𝗲 𝗰𝗼𝗻𝗱𝗶𝘁𝗶𝗼𝗻𝘀 The study used single-file programs with ~250 lines, and each had a clear description of what the code should do. The authors say this was intentional. They wanted the best-case conditions. Real production code is multi-file, cross-module, and poorly documented. It will perform worse for sure. Here are three things worth changing based on the research: 🔹 𝗣𝗮𝘀𝘀 𝗲𝘅𝗲𝗰𝘂𝘁𝗶𝗼𝗻 𝗰𝗼𝗻𝘁𝗲𝘅𝘁, 𝗻𝗼𝘁 𝗷𝘂𝘀𝘁 𝗰𝗼𝗱𝗲. When asking an LLM to debug, include test output, stack traces, and failure messages alongside the source. Without runtime details, the model is guessing based on the code. 🔹 𝗗𝗼𝗻'𝘁 𝘁𝗿𝘂𝘀𝘁 𝗶𝘁 𝗼𝗻 𝗱𝗲𝗲𝗽-𝗳𝗶𝗹𝗲 𝗯𝘂𝗴𝘀. If the suspect code is in the bottom third of a long file, the model will have trouble finding it. Consider splitting the context or feeding the relevant function directly. 🔹 𝗖𝗹𝗲𝗮𝗻 𝘂𝗽 𝗱𝗲𝗮𝗱 𝗰𝗼𝗱𝗲 𝗯𝗲𝗳𝗼𝗿𝗲 𝘂𝘀𝗶𝗻𝗴 𝗔𝗜 𝗱𝗲𝗯𝘂𝗴𝗴𝗶𝗻𝗴 𝘁𝗼𝗼𝗹𝘀. Commented-out blocks and unreachable branches will mislead the model. It cannot filter them out. We rate AI coding tools on HumanEval. That tests whether a model can write a function from a description, but this says nothing about finding a bug in code it didn't write. Those are different problems. We're using the wrong benchmark.

English
0
0
0
71
Raymond Christopher
Raymond Christopher@RayChris91·
@madewithv @opencode I used Cursor a lot before it got more $$ than other alternative subs that got me more bang for bucks. I still think its harness is great but IDE UX slows me down several times with my current TUI workflow esp the CPU/RAM it takes to do the same thing.
English
0
0
1
37
V
V@madewithv·
want a better insight of why opencode is better.. is it because it's more agentic and can run longer? I personally still think Cursor is the best harness out there: - Plan mode consistently produce great plan. It's quite different than just prompting to create a plan.md, like they have better sys prompt & tools for planning. I always init plan before impl - Subagents initiated frequently, good for context management - The UX is great, "it just works" - and their latest composer 2 drop just made everything a lot better
English
1
0
0
12
Raymond Christopher
Raymond Christopher@RayChris91·
100%: people should stop treating models and harnesses as the same thing. A weak experience might be the harness, not the model. Some harnesses support only a few models, and even the flexible ones are usually optimized around certain ones. That’s why @opencode is a beast.
Haider.@slow_developer

codex is a better programmer it reads more before changing things and gets my edits right more often -- auto-compact is also stronger, so it can stay on track during long tasks but as a harness, claude code is still better in some ways: > less tools, including no hooks > worse ux, especially when resizing

English
1
0
0
183
V
V@madewithv·
@RayChris91 ..or just too lazy to switch. Some ppl I found were even fanboys to one lab so they won't ever tested models from other labs lol
English
1
0
0
15
Raymond Christopher
Raymond Christopher@RayChris91·
Anyone using Opus as their daily AI driver but not considering Codex/GPT models or Kimi in the same breath either has too much money on their hands or has poor financial management.
English
5
0
3
862
Raymond Christopher
Raymond Christopher@RayChris91·
@did1k @ariaghora To be clear, I’m not criticizing Opus for being bad. I’m questioning whether it’s worth the price when cheaper alternatives like Codex can get you similar performance. Therefore my original tweet.
English
0
0
1
26
Levi | still learning (and messing up)
Saya rasa respek manusia terhadap sesuatu tidak didapatkan dengan keunggulan logika sesuatu itu 🙏 Bisa terus menerus berlogika benar, tapi “simpati/respek earned” doesn’t necessarily selaras dengan itu. *saya suka dengan diskusi logis. Tapi rasanya ada threshold tertentu yang going beyond that justru efeknya bertolak belakang
Indonesia
1
1
42
2.5K
Levi | still learning (and messing up)
Feeling saya, twit2 pro NU di X ini justru bikin (banyak orang?) gak respek dengan NU. Merasa berbuat perbaikan, tapi sebenarnya efeknya sebaliknya Tadi kepikiran bikin survey buat buktiin data itu. Tapi bakal rame sih 😅, mending yang lain aja, yang berada pada pusaran 😁 Beri sinyal dengan like post ini aja jika merasa yang sama 👍
Angga Fauzan @angga_fzn

Anyway, aku punya respek yang besar thd Muhammadiyah. Beberapa kawan baikku juga aktivis di sana, dan sangat keren. Terlepas dari isu tambang atau apa, Tapi Muhammadiyah ini ga sekadar Ormas. Punya sekolah, kampus, rumah sakit, dsb yang bener2 sustainable dan bermanfaat bagi masyarakat. Cabang sekolahnya aja ada di luar negeri. Assetnya juga besar. Satu lagi: gak dekat dengan Israel. 🤙🏼

Indonesia
36
51
590
29.1K
Raymond Christopher
Raymond Christopher@RayChris91·
Much of the harm in the world is caused by people hurting others while believing they are morally justified.
English
0
0
0
71
Raymond Christopher
Raymond Christopher@RayChris91·
@evanpurnama personally, I can't accept that Opus 4.6 is 2-3x more expensive than Codex 5.3/5.4 (per token), for what seems to be similar in quality maybe it's just personal taste about efficiency, therefore my original post 😆
English
0
0
1
64
Raymond Christopher
Raymond Christopher@RayChris91·
@evanpurnama the general consensus: codex is better at BE & architecting, opus at FE and prototyping opus is more agreeable so it feels more "comfortable to iterate" codex is more strict and disciplined, therefore better as the bug finder or code reviewer for serious stuff
English
1
0
1
76
Raymond Christopher
Raymond Christopher@RayChris91·
@ariaghora a simple test is whether you are using *custom* slash commands or skills for your frequent use-cases rather than prompting most things manually and whether you have been managing parallel agents to accomplish all the things you need to do
English
1
0
0
64
Aria Ghora
Aria Ghora@ariaghora·
@RayChris91 Re workflow setup: Any setup you’d recommend? Haven’t explored this space. Re codex & gemini: not yet. Already feel comfortable w claude (max) getting jobs done. Too comfortable to spend more and explore other alts. Probably should try comparing them one day.
English
1
0
0
68
Raymond Christopher
Raymond Christopher@RayChris91·
@Br__AM It must be nice to have that unlimited arrangement, others have freedom to choose the tools but have certain budgets each engineers/team can spend so need to optimize the usage
English
0
0
1
27
average joe (the unkindled one)
@RayChris91 nah it's not specific to Opus. we kinda have “special agreements” to use Kiro from AWS with no specified limit afaik. Opus just happens to be what I default to daily 😬
English
1
0
0
24
Raymond Christopher
Raymond Christopher@RayChris91·
@evanpurnama interesting, which harness/tools are u using with each models mas? and what's the setup and use-cases if I may know
English
1
0
0
96
Evan Purnama
Evan Purnama@evanpurnama·
@RayChris91 occasionally comparing with Codex 5.4-high, tapi somehow in my cases Opus 4.6 always win ya
English
1
0
0
198