hammad 🔍

2.5K posts

hammad 🔍

@HammadTime

normal considered harmful | cto @trychroma

Berkeley, CA Katılım Eylül 2009

2.7K Takip Edilen2.1K Takipçiler

Sabitlenmiş Tweet

hammad 🔍@HammadTime·15 Şub

Last year at @tryramp I laid out three predictions for how language models would evolve. I was trying to clarify which bets might actually be durable over time. A lot of it is now starting to take shape. Here’s an update. Thread 👇

English

139

29.2K

hammad 🔍@HammadTime·6 Tem

4 years later this technique has now become viable for high quality generation

hammad 🔍@HammadTime

Used a depth stable diffusion model finetuned on choice images with depth images from unity to make videos using diffusion.

English

395

hammad 🔍@HammadTime·5 Tem

the European mind cannot comprehend

Gun Drummer@Gundrummer69

These re-enactments are getting out of hand… #4thofjuly #independenceday

English

359

hammad 🔍@HammadTime·1 Tem

SLOP KITCHEN IS BACK IN BUSINESSES WE COOKING WHOLE CODEBASE REFACTORS FOR BREAKFAST

Anthropic@AnthropicAI

We’ve received notice that the Department of Commerce has lifted export controls on Claude Fable 5 and Mythos 5. We'll begin restoring access tomorrow, and will share an update soon. We’re grateful to our users for their patience, and to everyone who worked with us on redeploying the models.

English

524

hammad 🔍@HammadTime·27 Haz

my only wish for you in life is that you get the opportunity to play it grand

English

650

hammad 🔍@HammadTime·17 Haz

Still happening

hammad 🔍@HammadTime

This diagram from a book written in 1984 has more or less been rediscovered by hundreds of phd students over the last year and something about that is very sad

English

712

hammad 🔍@HammadTime·17 Haz

The framing of model expertise as a cost vs quality tradeoff is one i've been waiting to see formalized. It's a tradeoff I often see people building struggle with indirectly when they ask questions like "should i fine tune my own model?". great work from @jacobli99 and @lateinteraction

Omar Khattab@lateinteraction

Been extremely excited about this work by @jacobli99! We're disappointed in the current ways our agents develop expertise in new domains. Very shallow and hand-engineered! Humans turn reading textbooks or documentation into deep expertise all the time. Why can’t our agents?!

English

2.6K

hammad 🔍@HammadTime·17 Haz

@v1gnesh @trychroma @patpcj no, sorry for the confusing name haha - its a different model :) trained using a more involved harness

English

v1gnesh@v1gnesh·17 Haz

@trychroma @patpcj @HammadTime Is this the harness for context-1 ?

English

Chroma@trychroma·7 Haz

We’re excited to share our latest research led by @patpcj and @HammadTime. Frontier level agentic search performance at 20B parameters.

Patrick Jiang@patpcj

Introducing Harness-1, a 20B search agent trained with a state-externalizing harness. > frontier-level long-horizon search, rivaling Opus-4.6 and outperforming GPT-5.4 > Context-1-level cost and latency > externalizes candidates, evidence, verification, and search history > open-source

English

7.6K

hammad 🔍 retweetledi

Patrick Jiang@patpcj·8 Haz

Thanks again for your interest in our work! Links here so they don’t get buried under “show more”: Paper 📄: arxiv.org/abs/2606.02373 Code 💻: github.com/pat-jj/harness… Model 🤗: huggingface.co/pat-jj/harness… Everything is open. Feel free to star the github repo to bookmark it for later ⭐

English

239

16.1K

hammad 🔍@HammadTime·8 Haz

👀

Dan Cleary@DanJCleary

Context rot test (usign @trychroma's orginal benchmark) Only up to 256k length, but Opus outperforms GPT 5.5 by a wide margin.

ART

388

hammad 🔍@HammadTime·7 Haz

the thing i was most interested to dive into with this work is how much can a well defined harness + RL boost performance. when you allow yourself to somewhat violate the bitter-lesson, and do some hand engineering of the harness, you can learn quickly what works. what i am interested in now is how to elicit known-helpful behavior in training loops learned from these point-harnesses into general harnesses with more degrees of freedom.

Patrick Jiang@patpcj

[9/N] The ablations were also pretty revealing. When we disable the harness mechanisms, the model does not just lose some information. It changes behavior: more shallow searching, less reading / verification, worse final curation. So the harness is not just engineering glue.

English

hammad 🔍@HammadTime·7 Haz

was great to collab with @patpcj on this

Patrick Jiang@patpcj

English

hammad 🔍@HammadTime·5 Haz

@0xpunnk 👏👏👏

QME

128

punn@0xpunnk·4 Haz

Introducing the first AI workforce for hospitality. Hotel guests prefer talking to AI agents. As long as it finishes the job. We've seen it firsthand. Our agents drove higher review scores and satisfaction across over 50 million guest conversations at hotels, resorts, and vacation rentals. Conversion on bookings and upgrades go up 30%.

English

226

47.3K

hammad 🔍@HammadTime·2 Haz

@richardartoul do you heavily create plans? thats what flipped it to useful for me.

English

138

Richard Artoul@richardartoul·1 Haz

i'm on the verge of giving up on LLMs for code generation they're game changers for code review, writing tests, and debugging, but I'm starting to think the juice isn't worth the squeeze for the actual code writing 95% of the time

English

249

27.9K

hammad 🔍@HammadTime·28 May

@TJkrusinski @zeroxjackson i can't do this, trade jobs?

English

203

Jackson@zeroxjackson·27 May

As a software engineer, you should be able to solve a LeetCode problem.

English

211

1.2K

830.2K

hammad 🔍@HammadTime·19 May

@jerryjliu0 @mintlify ......it depends!

English

2.4K

Jerry Liu@jerryjliu0·19 May

Real question: what is the actual latest state-of-the-art for file search and retrieval? - Actual grep over filesystem - Virtualized grep / BM25 over a db (what @mintlify did) - Vector search over a db - Hybrid search over a db - SQL - none of the above - some of the above?

English

235

38.8K

hammad 🔍@HammadTime·5 Nis

recipe i have wanted to try, mostly limited by how you get it into peoples hands 1. humans write N skills 2. collect data traces of execution of N skills 3. use traces to train / generate new tasks to RL on 4. train a LoRA 5. now Skills.MD -> Skill LoRA 6. Give agent a tool to apply LoRA to itself

Dan McAteer@daniel_mac8

Training AI agent skills into a model with RL rather than loading them at inference does make a lot of sense.

English

1.5K

hammad 🔍@HammadTime·5 Nis

@prometx3 @trychroma targeting next 1-2 weeks

English

dave jan@prometx3·4 Nis

@HammadTime @trychroma @HammadTime is there any timeline on the harness release? Have been experimenting with a custom built, tried to stay as close as possible through reading the article, but the model keeps breaking on harder retrievals after some turns. (it starts to repead random sentences/words)

English

hammad 🔍@HammadTime·30 Mar

My favorite part of working on the @trychroma Context-1 report was how easy interactive explanations have become with AI coding. As a longtime fan of sites like explorabl.es and ciechanow.ski the barrier to quickly iterating on and building interactive explainers is now so absurdly low. No excuse for every developer facing company to not invest in these.

English

934

hammad 🔍@HammadTime·5 Nis

if you want some high quality weekend watching youtube.com/watch?v=4u8iMr…

YouTube

English

558

Keşfet

@jacobli99 @lateinteraction @v1gnesh @trychroma @patpcj @0xpunnk @richardartoul @TJkrusinski