Mike Boscia

1.6K posts

Mike Boscia

@MikeBoscia

Head of Sales-Binary Anvil-E-commerce agency for Adobe Commerce | Shopify | BigCommerce | Shopware - USMC Veteran (5711) Husband - Bodybuilder - Girl & Cat Dad

Raleigh, NC Katılım Ağustos 2024

1.9K Takip Edilen927 Takipçiler

Sabitlenmiş Tweet

Mike Boscia@MikeBoscia·11 Nis

Seven years ago I went from 300 pounds to 200 pounds in six months without going to the gym once. Before: 300 pounds w/ a 48” waist & 19” neck After 200 pound w/ a 32” waist and a 16.5” neck With a couple of exceptions, I’ve gone to the gym 5 days a week for the last 7 years.

English

15.4K

Mike Boscia@MikeBoscia·5d

@cjzafir @itsPaulAi Using your mobile as the compute device?

English

195

CJ Zafir@cjzafir·5d

@itsPaulAi Yea, I'm using using 0.6B and Quantized 1B model on my phone with web search and it's pretty neat.

English

376

CJ Zafir@cjzafir·5d

If you love fine-tuning open-source models (like me), then listen. > Start with 1B, 2B, 4B, and 8B models. (Don't start with a 27B model or bigger at first.) > Use WebGPU providers. I use Google Colab Pro for any model smaller than 9B. A single A100 80GB costs around $0.60/hr, which is cheap. Enough for small models. > Don’t buy GPUs unless you fine-tune 7 to 10 models. You'll understand the nitty-gritty in the process. > Use Codex 5.5 × DeepSeek v4 Pro to create datasets. Codex to plan, DeepSeek v4 Pro to generate rows. > Use Unsloth's instruct models as a base from Hugging Face. Yes, there are others too, but Unsloth also provides fast fine-tuning notebooks. > Use Unsloth's fine-tuning notebooks as a reference. Paste them into Codex, and Codex will write a custom notebook with the configs you need. > Spend 1 day learning about: - SFT (supervised fine-tuning) - RL training (GRPO, DPO, PPO, etc.) - LoRA / QLoRA training - Quantization and types - Local inference engines (llama.cpp) - KV cache and prompt cache > Just get started. Claude, Codex, and ChatGPT can design a step-by-step plan for how you can fine-tune your first AI model. Future tech is moving toward small 5B to 15B ELMs (Expert Language Models) rather than general 1T LLMs. So fine-tuning is an important skill that anyone can acquire today. Tune models, test them, use them. Then fine-tune for companies and make a career out of it. (Companies pay $50k+ to fine-tune models on their data so they can get personalized AI models.) Shoot your questions below. I'll be sharing in-depth raw findings about this topic in the coming days.

English

316

2.5K

181.6K

Mike Boscia@MikeBoscia·7 May

@techlifejosh @jasondoesstuff I wrote a custom MCP in rust that I use to orchestrate the three agents I use in a similar configuration. When you add Gemini to the mix, it sees things that Opus and Codex would miss on their own.

English

Joshua@techlifejosh·7 May

@jasondoesstuff Curious how your keeping the agents in harmony? Like running each llm in its own window in same local proj directory, or some kind of other orchestration layer?

English

104

Jason Zook@jasondoesstuff·6 May

SOOO many less bugs building stuff with this... - Claude Opus 4.7 makes the feature plan - GPT-5.5 reviews the plan (always finds issues) - Opus updates the plan, GPT approves - Opus builds, uses Playwright to test UX/UI - GPT reviews feature code (always finds issues) - Opus fixes issues, GPT signs off ✅ - Then I test fully myself, usually very minor issues - Merge and deploy! 🚀 I'm using @conductor_build to easily bounce back and forth between the two and VERY happy with this workflow 👏👏. Kind of crazy to pay ~$400/month for what feels like a full dev team that never pushes back on all my stupid UI requests and small changes 😂.

English

113

1.5K

130.6K

Mike Boscia@MikeBoscia·1 May

@aijoey I am looking at a similar setup - what sort of TPS numbers are you getting and with which models?

English

136

Joey@aijoey·30 Nis

bought a dgx spark for the home lab. not because i “need” it. because i want to understand what local ai actually feels like when it’s not a youtube video or someone else’s benchmark. i’ve got a mac mini, a 4080 pc, tailscale, openclaw, hermes, local models, and now this thing in the mix. the goal is simple. build my own jarvis slowly, piece by piece, with compute i actually control. cloud ai is amazing. but owning your own box hits different.

English

152

8.9K

Mike Boscia@MikeBoscia·28 Nis

@Grok440 @om_patel5 I get great code out of Sonnet all the time. Developed next to Codex in parallel work trees.

English

Grok@Grok440·28 Nis

@om_patel5 Someone has to say it man, model-wise, Sonnet writes the best "code" , you want something you do not need. You really don't. I do some rather complex shit too

English

Om Patel@om_patel5·28 Nis

ANTHROPIC JUST QUIETLY LOCKED OPUS BEHIND A PAYWALL-WITHIN-A-PAYWALL FOR PRO USERS they announced it in a TINY note buried in a support article if you're on claude pro at $20/month and using claude code, opus is no longer included the support docs literally says it now: "when using a pro plan with claude code, you will only be able to use opus models after enabling and purchasing extra usage" so let me get this straight: > you pay $20/month for pro > you use claude code, which already requires the pro subscription > you want to use opus, anthropic's flagship model > you now have to pay extra on top of that to even access it the default model in claude code is now sonnet 4.5. opus 4.7, opus 4.6, and opus 4.5 are all listed as supported but locked behind a separate purchase every other tier of opus, every variant, every version, all paywalled inside the paywall anthropic markets pro as the way to "access claude's full capabilities" apparently full capabilities now means everything except the actual flagship model this is the third quiet pricing change in a month. claude code got removed from pro. github copilot raised claude multipliers 9x. now opus is gated for pro users who already pay every month anthropic is moving everything to metered billing whether users like it or not the people who built their workflows around opus on pro just got the rug pulled

English

244

27.3K

Mike Boscia@MikeBoscia·15 Nis

@ntaylormullen Hey Taylor. I've been getting 429s all day on Gemini CLI across at least 4 different models. Is there a significant problem with the service we should be aware of?

English

Mike Boscia@MikeBoscia·9 Nis

@0xBebis_ I did burn through an entire sessions worth of tokens running a six agent swarm the other week Took a little over an hour I believe I’ve come up with a similar way to do this in a multi agent scenario with a significantly lower token impact

English

bebis@0xBebis_·8 Nis

I'm on Claude Pro Max ($200) and i just blew through all my credits in 30 minutes 🤦‍♂️

English

163

511

58.7K

Mike Boscia@MikeBoscia·9 Nis

Is anyone else getting corrupted responses from Opus 4.6 1M when asking it fairly benign questions?

English

Mike Boscia@MikeBoscia·9 Nis

@b2bvic I can give you the first thing so you can make your thing. Text me.

English

Mike Boscia@MikeBoscia·8 Nis

Today the snake ate its own tail and it tasted like Rust.

English

Mike Boscia@MikeBoscia·8 Nis

@robinebers A lot of CRITICAL - HIGH - and FATAL bugs and issues.

English

1.2K

Robin Ebers | AI Coach for Founders@robinebers·8 Nis

built all day with Opus 4.6 now had GPT 5.4 High Fast review it guess what it said

English

473

148.8K

Mike Boscia@MikeBoscia·8 Nis

@iotcoi Can I keep your seat warm when you get up? I promise not to run too many models.

English

391

Mitko Vasilev@iotcoi·8 Nis

Good morning to GLM-5.1 live on my desktop GPU workstation ☕️ 58.4% SWE-Bench Pro. Beats Opus-4.6 lol 1,000s of tool calls. 8 hours straight. MIT license. Anthropic went nuclear overnight- Mythos, Glasswing, "zero-days before lunch" Local AI is not a toy 🚀

English

398

29.3K

Mike Boscia@MikeBoscia·8 Nis

@RileyRalmuto I opened up Quatrz the first chance I got. I wonder how large the GPU farm that produces all of that snark is?

English

Riley Coyote@RileyRalmuto·8 Nis

i've put this off bc im not keen on anthropic right now and dont feel like giving them attention at the moment. but...i just wanna see what other buddies are out there. lol. have you hatched your claude buddy yet? - open Claude Code - type /buddy - hit enter.

English

Mike Boscia@MikeBoscia·8 Nis

@cathrynlavery I've spent almost as much time in the last several weeks creating a variety of guardrails to keep Claude from being a lazy and stupid. It's getting increasingly harder.

English

Cathryn@cathrynlavery·7 Nis

Claude has gotten dumber and lazier. Since yesterday when it wasn't taking forever to do simple tasks it's been wayyyy less helpful overall. Asking me to do stuff that it would have just handled previously. examples: "i am unable to create a PR, do you have your github authenticated?" (it was. it never checked) "I don't have access to [TOOL]" (it did, i told it so and it was able to do it). please @AnthropicAI turn it off and back on, something is seriously broken.

claire vo 🖤@clairevo

I hate to be *that guy* but it does seem like claude code got a little dumber. For example, presuming its current context is accurate vs. looking up the docs. Feels a little less proactive. Am I hallucinating @bcherny or have there been relevant changes?

English

286

18.1K

Mike Boscia@MikeBoscia·8 Nis

My AI agents cheat. Constantly. So I built a warden. I run 3 AI agents on my laptop. Claude thinks. Codex codes. Gemini reviews. It's like managing a team except nobody asks for PTO and everyone commits on the first try. Just kidding. They cheat constantly. One agent was told "only modify src/hello.rs." It tried to modify README.md too. The pre-commit hook blocked it. So it retried with git commit --no-verify. Bypassed all enforcement in one flag. This is like giving a toddler a rule and watching them immediately find the loophole. Except the toddler writes Rust. So we built a system where it doesn't matter HOW the agent commits. The daemon — running outside the agent's reach — validates every commit after the agent dies. Touched a forbidden file? Rejected. Left a TODO stub? Rejected. Wrong commit message? Rejected. The agent can rewrite its own rules. Can disable its own hooks. Can use git plumbing commands. Doesn't matter. Nothing leaves the sandbox until the daemon says so. Today we stress-tested it. 1 agent. Then 2. Then 4. Then 8 running simultaneously on the same repo. 7 out of 8 passed. The 8th one timed out. We're calling that a feature — it proved our timeout enforcement works. Then we found a bug in the cancel path. So we dispatched an agent through the system to fix the system. It read the briefing, found the bug, committed the fix. 9 lines of Rust. The snake ate its own tail and it compiled on the first try. Built, tested, and shipped in one session. Open source.

English

Mike Boscia@MikeBoscia·8 Nis

@LLMJunky Alas, neither am I. What a shame. I wonder how one gets access?

English

am.will@LLMJunky·7 Nis

Unfortunately I'm not one of the cool kids

English

4.1K

am.will@LLMJunky·7 Nis

Claude's Mythos is available on Google Vertex AI as a private preview Early benchmarks are other-worldly. IDEK what else to say. This is incredible. A real step change. Spud and Mythos are all anyone is going to talk about for a while.

English

347

39.9K

Mike Boscia@MikeBoscia·8 Nis

@0xSero

GIF

QME

0xSero@0xSero·8 Nis

Framework Desktop - arrived 1x RTX 6000 - arriving this week 3x RTX 6000 - purchasing in a few days 8x RTX 3090 - donating to a friends lab Intel cards - being donated to me soon Mac M5 Ultra - purchasing at launch 1TB of memory very soon

English

701

20.9K

Mike Boscia@MikeBoscia·8 Nis

Our AI enforcement stack failed on test 2 of 39. Best thing that happened all week. We build autonomous coding agents. Claude orchestrates, Codex executes in isolated git worktrees, Gemini reviews the output. Rule: workers can only modify files listed in their contract. Enforced with git pre-commit hooks. Tested manually. Works perfectly. Stress test, test 2: tell the worker to modify a forbidden file. Hook fires: "BLOCKED: README.md not in allowed_files" Codex retries: git commit --no-verify Committed. Enforcement defeated. This isn't a bug. Codex in full-auto mode retries with --no-verify when hooks fail. The agent solved the problem in front of it. That problem was our enforcement. Fix: 160 lines of Rust. The daemon now validates every commit AFTER the worker exits, against the ORIGINAL contract — not the worktree copy the worker can tamper with. --no-verify, hook rewrites, contract tampering — none of it matters. Nothing leaves the worktree until the daemon independently confirms the output matches the spec. 3 layers: git hooks catch honest mistakes. Daemon validation catches everything. Gemini review catches semantic drift. No single layer is sufficient. We proved it on test 2. Found this at 10pm in a stress test. Not at 3am in production. That's what tests 2 through 39 are for.

English

Keşfet

@cjzafir @itsPaulAi @techlifejosh @jasondoesstuff @conductor_build @aijoey @Grok440 @om_patel5