
kukeshajanth kodeswaran
776 posts

kukeshajanth kodeswaran
@kukeshajanth
Building ⬛⬜⬜⬜⬜
Toronto Katılım Ağustos 2020
442 Takip Edilen48 Takipçiler

kukeshajanth kodeswaran retweetledi

📣 Today, I’m excited to walk you through Unity’s NEW AI offering, Meta MCP Extensions, and agentic tools to demonstrate how we can 𝗕𝘂𝗶𝗹𝗱 𝗔 𝗙𝘂𝗹𝗹 𝗩𝗥/𝗩𝗥 𝗚𝗮𝗺𝗲 from start to finish using these AI tools in a practical way.
🎥 Full video available at: youtu.be/bWxIF903t_I
📌 Here’s what I’m covering today:
- Unity VR project setup (with OpenXR plugins)
- Installing the Unity AI Assistant and demos
- Configuring Unity MCP + Meta MCP Extensions
- Configuring Claude Code & MCPs
- Building a VR/MR Basketball Game with the Unity AI Assistant, Claude Agent, and external Claude Code CLI
- A lot of iteration with Claude Code + the Meta XR Simulator
💡Also, it’s been a while since I've posted a new video, and I’m genuinely excited to be back, especially with a topic like this that I know many devs have been waiting for.

YouTube
English

@BoazWith True , really impressed with overall capabilities even outside coding . This feels like opus 4.5 level change. Can’t get enough of this workflow . Redoing most of the existing workflow with codex , feeling more confident . 🚀
English

@kukeshajanth That spec/implementation split is the useful part. Codex gets much calmer when the weird decisions are already named instead of hidden inside a chat prompt.
English

kukeshajanth kodeswaran retweetledi

UPDATE for imagegen-frontend-web skill
more "creative" outputs
- different layouts
- better understanding of the request
skill:
github.com/Leonxlnx/taste…




English

@NickADobos Didn’t think of this way , kudos. For a given time , gpt 5.5 low with multiple pass might be better I guess. Do you happen to find anything concrete in these tests
English

What’s the difference between
GPT 5.5 low reasoning + /goal
Vs
GPT 5.5 xhigh reasoning + one shot
Both are essentially yeeting compute at a task.
But which one
- is more efficient?
- works better & produces better results?
- finishes the task?
Seems like the major difference is low would spend less time thinking between each step? And would do way more tool calls because of this?
English
kukeshajanth kodeswaran retweetledi
kukeshajanth kodeswaran retweetledi
kukeshajanth kodeswaran retweetledi

If you are interested in the state-of-the-art finetuning tips-and-tricks
Shopify Engineering@ShopifyEng
We reverse-engineered training data from thousands of merchant-created automations and fine-tuned Qwen3-32B into a tool-calling agent for Shopify Flow. Results: 2.2x faster, 68% cheaper The more interesting part: why we trained on Python instead of our own DSL, and what broke when benchmarks looked good but production didn't. ⬇️
English
kukeshajanth kodeswaran retweetledi

Okay yeah browser-harness is AGI
github.com/browser-use/br…
English
kukeshajanth kodeswaran retweetledi

kukeshajanth kodeswaran retweetledi

Introducing ml-intern, the agent that just automated the post-training team @huggingface
It's an open-source implementation of the real research loop that our ML researchers do every day. You give it a prompt, it researches papers, goes through citations, implements ideas in GPU sandboxes, iterates and builds deeply research-backed models for any use case. All built on the Hugging Face ecosystem.
It can pull off crazy things:
We made it train the best model for scientific reasoning. It went through citations from the official benchmark paper. Found OpenScience and NemoTron-CrossThink, added 7 difficulty-filtered dataset variants from ARC/SciQ/MMLU, and ran 12 SFT runs on Qwen3-1.7B. This pushed the score 10% → 32% on GPQA in under 10h. Claude Code's best: 22.99%.
In healthcare settings it inspected available datasets, concluded they were too low quality, and wrote a script to generate 1100 synthetic data points from scratch for emergencies, hedging, multilingual etc. Then upsampled 50x for training. Beat Codex on HealthBench by 60%.
For competitive mathematics, it wrote a full GRPO script, launched training with A100 GPUs on hf.co/spaces, watched rewards claim and then collapse, and ran ablations until it succeeded. All fully backed by papers, autonomously.
How it works?
ml-intern makes full use of the HF ecosystem:
- finds papers on arxiv and hf.co/papers, reads them fully, walks citation graphs, pulls datasets referenced in methodology sections and on hf.co/datasets
- browses the Hub, reads recent docs, inspects datasets and reformats them before training so it doesn't waste GPU hours on bad data
- launches training jobs on HF Jobs if no local GPUs are available, monitors runs, reads its own eval outputs, diagnoses failures, retrains
ml-intern deeply embodies how researchers work and think. It knows how data should look like and what good models feel like.
Releasing it today as a CLI and a web app you can use from your phone/desktop.
CLI: github.com/huggingface/ml…
Web + mobile: huggingface.co/spaces/smolage…
And the best part? We also provisioned 1k$ GPU resources and Anthropic credits for the quickest among you to use.
English
kukeshajanth kodeswaran retweetledi

🥳We just open-sourced Cube Sandbox! An instant, concurrent, secure and lightweight sandbox runtime for AI Agents.
Built with RustVMM and KVM, it achieves the perfect balance of security and performance:
→ Sub-60ms cold start (2.5-50x faster)
→ Under 5MB memory overhead per instance (6x less memory)
→ Dedicated kernel per sandbox (hardware-level isolation)
→ Thousands of concurrent sandboxes per node
→ 100% E2B SDK compatible. Swap the endpoint, zero code changes
Full-stack capability, one-click deployment. 3 steps to spin up your own private AI sandbox 👇 🔗
github.com/TencentCloud/C…
GIF
English
kukeshajanth kodeswaran retweetledi
kukeshajanth kodeswaran retweetledi

🔥 DFlash x MLX is happening!
Shoutout to @aryagm01 for the early work on this. We're building on the momentum. Native MLX support, more models (Qwen3.5), up to 4x faster. Lossless!
👉 github.com/z-lab/dflash
English

kukeshajanth kodeswaran retweetledi

🚀Introducing Motus, the open-source agent infrastructure that learns in production.
Existing agent infra serves static agents: the harness, model, and workflow are fixed after deployment. But static agents degrade over time. The harness goes stale, new models go unincorporated, context drifts, and latency compounds.
Motus closes this gap by learning from every trace (failures, latency, cost, and task outcomes) and using those signals to continuously optimize agent harness, model orchestration, context memory, and end-to-end latency.
Early results: higher accuracy than any single frontier model at 2.3× lower cost (Terminal-Bench 2.0, SWE-bench Verified), with 52% lower latency and 45% better memory recall.
Open source under Apache 2.0. Works with any agent SDK. Deploy with one command.
github.com/lithos-ai/motus
lithosai.com

English

@gajesh Small models and CPU/Edge inference is the combo that would be able to take majority of the current loads/tasks that are being performed on GPU. As long as the task is within the defined scope and repeatable , we can optimize this combo to get pretty good performance per dollar.
English
kukeshajanth kodeswaran retweetledi

Wake the world's sleeping compute.
Look at the Mac nearest to you. What's it doing?
Probably nothing.
There are 100M+ Macs with Apple Silicon out there. Apple quietly made them *really* good at inference. A $3k Mac runs a 60B model at 30 watts.
Most sit idle most of the day.
Meanwhile every AI API call passes through three layers of margin before reaching the hardware. We call this the Inference Tax.
We got curious: what happens if you connect idle Macs directly to inference demand?
This is Darkbloom. Private inference network for idle Macs.
darkbloom [dot] dev -- paper + code open.
Reply for invite + free credits ↓
English



