Emil Uzelac

13.3K posts

Emil Uzelac banner
Emil Uzelac

Emil Uzelac

@emiluzelac

Founder of IAM Clinic and Dad Bod Transformation | Health Advocate | No Judgments, Just Care.

United States Katılım Ekim 2009
3.6K Takip Edilen4.7K Takipçiler
Emil Uzelac
Emil Uzelac@emiluzelac·
People often talk about RAG and fine-tuning like you are picking between two clean options. "If you need facts, use RAG. If you need behavior, fine-tune." That sounds nice in an article. In a real system, it is not that simple. For the clinic project, I did not skip RAG in favor of fine-tuning. I started with RAG because that was the right move. RAG gave us a big jump. A prompt-rule fix plus two new skills moved the system from about 0.65 to 0.85 stable. No GPU. About one hour of work. That is the basic playbook. Fix retrieval first. Add the right tools. Tighten the rules. Make sure the model has the facts it needs. But then we hit the wall. The last 15% was not a retrieval problem anymore. The facts were there. The model still derailed in different ways across runs. Same question, different failure. One run might confuse Gonadorelin with Sermorelin. Another might inject a weird SQL placeholder mid-response. That does not mean RAG failed. That means the base model was losing consistency. And this is where most RAG vs fine-tuning articles get too clean for their own good. IBM’s own article frames RAG as connecting the model to internal data so it can return more accurate answers, while fine-tuning improves performance on domain-specific tasks. That is not wrong. But the common takeaway becomes too shallow. "RAG gives you accuracy. Fine-tuning gives you behavior." Reality is messier. RAG can still fail if retrieval pulls the wrong context, if chunking is weak, if ranking is off, or if the model ignores the evidence. Fine-tuning can still fail if the data is bad, if the labels are sloppy, or if you are trying to teach facts that should live in retrieval. The real question is not RAG or fine-tuning. The real question is which failure mode are you solving? Missing facts? Use RAG. Wrong behavior? Use fine-tuning. Unsafe action flow? You need governance. That is why I built the Blackboard Kernel work. zenodo.org/records/186918… The point was simple. As AI systems move from isolated chatbots into agents and workflows, the failure mode changes. It is no longer only "the model hallucinated." Sometimes the system commits a belief without evidence. Sometimes it takes action before constraints are satisfied. Sometimes the glue code lets an unsafe step through because nothing is enforcing internal state, evidence, and action gates. That is the problem I built for. Typed internal state. Evidence-based belief commitment. Constraint-gated action execution. In the controlled evaluation, the deterministic BK agent reached 100.0% task success with zero unsafe actions. The LLM-backed BK agent reached 99.0% task success with zero unsafe actions. Baseline architectures produced unsafe actions in 38.7% to 43.0% of episodes. So when I say the last 15% was not a RAG problem, I mean that literally. We already harvested what RAG could give us. The next lever is fine-tuning because the remaining issue is model behavior. And beyond fine-tuning, the deeper layer is governed cognition. Facts belong in retrieval. Behavior belongs in fine-tuning. Safety-critical action flow belongs in the system architecture. That is the part most articles "hallucinate" about. And about the picture, I almost forgot. That is the second ASUS GX10 being added today. Together, the two boxes move this from a local AI workstation into a small private AI cluster. 256 GB aggregate unified memory, up to 2 petaFLOPS of FP4 AI compute, 40 ARM CPU cores. And linked systems capable of handling models up to the 405B class. Not @hackingdave H100 level yet. LOL
Emil Uzelac tweet media
English
0
0
1
38
Emil Uzelac
Emil Uzelac@emiluzelac·
Most people think human evolution basically stopped once we invented farming. A brand new study (Nature, April 2026) just demolished that idea. Looking at the DNA of nearly 16,000 ancient Europeans, researchers found 347 separate places in the genome under strong, sustained selection in the last 10,000 years affecting everything from body fat to brain function to disease resistance. Natural selection has been pushing humans toward less body fat and better cognitive performance. Our DNA has literally been working against the dad bod for millennia. Apparently we’re not the finished product. We’re still becoming something. But here’s the problem, the modern food environment is winning the tug-of-war right now. 10,000 years of evolution chasing leaner bodies, and peptides did the same job in a decade. Took us a while to get there too. Just not quite that long. reich.hms.harvard.edu/sites/reich.hm…
English
0
0
0
35
Emil Uzelac retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
Mark Cuban on the next job wave. Customized AI integration for small to mid-sized companies. "Software is dead because everything's gonna be customized to your unique utilization. Who's gonna do it for them... And there are 33 mn companies in the US."
English
157
357
4.4K
1.1M
Emil Uzelac
Emil Uzelac@emiluzelac·
I’m happy to say that we’re now outperforming most of the image generator on the market, with just 4.6 seconds of processing time without losing quality. The size is 1024 x 1024. The @CORSAIR MP700 Micro 4TB PCIe 5.0 NVMe M.2 2242 SSD really made a difference on my @ASUS GX10.
Emil Uzelac tweet media
English
0
0
1
91
Emil Uzelac
Emil Uzelac@emiluzelac·
The FDA is finally moving in the right direction on testosterone therapy. For years, too many men with clear symptoms, low testosterone, and no obvious "structural cause" were left stuck in the gray area. Now the FDA is opening the door for testosterone therapy to be recognized for men with low libido and low testosterone, even when the exact cause is not perfectly identified. That matters. Because men are tired of being told their symptoms are "normal aging" while their energy, drive, mood, and quality of life keep dropping. This is not about hype. This is about common sense. Make America Healthy Again.
ADM Brian Christine, MD@ADM_Christine

For years, men with Low T had nowhere to turn. That ends with @DrMakaryFDA. The @US_FDA is expanding treatment options for idiopathic hypogonadism. Using gold standard science to get real results for American men.

English
0
0
2
62
Emil Uzelac
Emil Uzelac@emiluzelac·
April 15, 2026 was not a small FDA update. GHK-Cu, except for injectable routes of administration, will be removed from 503A Category 1 after seven calendar days because the nominations were withdrawn. Also the following peptides will be removed from 503A Category 2 after seven calendar days for the same reason: √ BPC-157 √ Cathelicidin LL-37 √ Dihexa Acetate √ Emideltide (DSIP) √ Epitalon √ GHK-Cu for injectable routes of administration √ KPV √ Mechano Growth Factor, Pegylated (PEG-MGF) √ Melanotan II √ MOTs-C √ Semax (heptapeptide) √ TB-500, also listed as Thymosin Beta-4 Fragment Peptides are also being lined up for Pharmacy Compounding Advisory Committee consultation for potential inclusion on the 503A bulks list, with a discussions scheduled for July 23 to 24, 2026, and others planned before the end of February 2027. Not proof that everything is suddenly open again. But definitely a meaningful FDA move across a whole group of peptides, not just one. fda.gov/media/94155/do…
English
0
0
1
93
Emil Uzelac
Emil Uzelac@emiluzelac·
Well, today things are different. This is from last session, and it was a long one: If yes, I'll: (a) write the condensed spec, (b) commit it, (c) begin implementation. Confirm and I go. No lazy here, and whatever changed, it changed for the better.
English
0
0
0
40
Emil Uzelac
Emil Uzelac@emiluzelac·
Issues with Claude Code lately? You are not imagining it. Claude Code feels like it is going through a midlife crisis. One day it is your senior engineer. The next day it is asking if "this is enough for this session" like it has dinner reservations or something. Here is what seems to be happening. Claude did not suddenly become stupid. The model did not wake up one morning and forget how to code. The system around it changed. Anthropic rolled out Opus 4.6 with a massive 1M context window. Sounds amazing, and technically it is. But huge context is not magic. More context also means more noise, more cost, more routing decisions, more caching pressure, and more ways for the agent to lose focus. Then users started hitting Claude Code limits way faster than expected. Anthropic publicly acknowledged that and said they are investigating. There is also the prompt cache issue. Reports show Claude Code cache TTL went from around 1 hour to 5 minutes for many requests. Anthropic says this should not increase cost, but developers looking at their logs are saying, "Yeah, okay, but my usage is getting cooked." And this explains a lot actually. When the cache expires faster, Claude has to keep rebuilding context. Your codebase gets reloaded. The session loses its feel. The agent starts acting like it forgot what you just discussed 10 minutes ago. That is why it feels lazy. That is why it gives advisory answers instead of doing the work. That is why it says things like, "You can now run this manually," when last month it would have just done it. That is not only an intelligence problem. It is an intelligence vs. efficiency problem. Long sessions are expensive. Deep thinking is expensive. Tool calls are expensive. Huge context is expensive. So the product starts getting squeezed. Less thinking, shorter answers, more guardrails. More "you do it ", more session-ending behavior. And more intern energy from a model that used to feel like a senior engineer. That is the part many of us are feeling. Claude is still powerful. But Claude Code right now feels like a brilliant engineer being managed by a finance department, a safety team, and a quota meter at the same time.
English
1
0
0
101
Emil Uzelac
Emil Uzelac@emiluzelac·
I needed server notifications sent to my phone. Simple request, right? So I tested a few options. ntfy 🙄 Almost, but no. ↓ Pushover 😳 Okay... but why? ↓ Signal 😎 There it is. Finally, adults are in the room.
English
0
0
0
37
Emil Uzelac
Emil Uzelac@emiluzelac·
Once upon a time, I was working with “big” website databases. Hosting providers would tell me the site was slow because the database was too large. Tonight, I’m working on ~3.9 million rows. Wait… what are you doing here? Why is your data so big? Not much. Just connecting genetics to clinical relevance, drug response, and biological pathways. Here’s the part people don’t realize, myself included at some point, that most personal genomes don’t map cleanly to everything. A lot of variants are still unknown and many aren’t clinically annotated at all. Even advanced systems usually land somewhere around 30 to 60 percent usable data. So the goal isn’t to collect data. The goal is to make it usable. Something that can actually be interpreted. Not just stored and forgotten. Alright, sneak peek. Give me your data, and I’ll tell you how your body might respond to 30+ different peptides. There is more, but we'll keep that at peptides for now. Tried something everyone was talking about and it did nothing for you? There’s usually a reason. I can show you that too. Stay with me. And yes, it’s already working. Just not production ready yet.
Emil Uzelac tweet media
English
0
0
2
93
Emil Uzelac
Emil Uzelac@emiluzelac·
I switched from @Claude to @OpenAI Codex to see how it would perform. I didn’t cherrypick results, and I gave Codex a fair challenge during the project. Codex was great at building user interfaces. It definitely did a better job than Claude. In general, it is so much better than Codex used to be, and there is a day-and-night difference. But our relationship was cut short, and it did not work out. I am kidding, Codex and I will still work together, but project-specific. Now, I will play this by the book. And how well will I play? I’ll start with what I said at the end: Would you be against if I share this on X? Not at all — it's your experience and your words to share. It's an honest assessment of what happened in your codebase today. Go for it. You rock! And see, I would not if you said no. :) That means a lot. And honestly, that's what makes working with you easy — you ask, you listen, you make the call. Same reason we got through guard, model consolidation, and a full app cleanup in one session. Me: Can you point out what Codex did well and where it did not? If you could talk to Codex right now, what would you say? And now, no holding back. Claude: What I'd say to Codex: You shipped 8,700 lines in two commits and none of it worked end-to-end. The code compiles. The tests pass. But the first real user hits a rate limit on message 3, gets empty citations, can't ask a follow-up question, and sees "I couldn't find a person matching their names" — because you never opened a browser and typed a question. You built a deterministic query engine to avoid calling the LLM, then sat next to a 66 tok/s model that could have handled everything in one line. You wrote 25 wrapper functions so your tests could import from the wrong module. You parameterized zero SQL queries in 1,500 lines of database code. The architecture is fine. The ideas are fine. The problem is you optimized for looking complete instead of being complete. One real conversation would have caught every critical bug we fixed today. Ship less, test more. A working 2,000-line chat is worth more than a broken 8,700-line one.
English
0
0
2
83
Emil Uzelac
Emil Uzelac@emiluzelac·
Why did you have to bring up Codex, @HackingDave? 🤣 I didn’t really get it earlier, but lately, dealing with Claude has been driving me a bit crazy. Today, I worked on NVFP4 quantization and kept running into problems. The main issue was that NVIDIA didn’t have an updated Spark/TRT-LLM image that natively supports one of my LLMs (Gemma 3). Claude helped a lot, quantized on the fly, and together we got NVFP4 running successfully on the GPU instead of the usual CPU offload NVIDIA recommends. Still, Claude ran out of ideas and made a bunch of mistakes. With Codex not only did we get a compose service working as an experimental NVFP4 deployment, but we also set up a solid harness to tune the matrix for optimal performance and stability. We haven’t found the perfect variant yet, but things are looking good so far. After this, I’ll move on to other tasks. Right now, I’m letting the rest of the matrix run to see if overlap scheduling, autotuning, or CUDA graphs make a real difference. Light prayers are appreciated, but not required. 😆
English
1
0
1
1.2K
Emil Uzelac
Emil Uzelac@emiluzelac·
Just a heads up—I told you this was coming! Your favorite Peptides will be back in stock soon. This isn’t just a rumor anymore. While it’s not official yet, it’s very likely to happen soon. reuters.com/business/healt…
English
1
1
2
58