Sabitlenmiş Tweet
omg is ed!!
9.7K posts


@JayFrydman @agenticasdk theres a big difference between CoT and agentica SDK and you know that
all we are asking is for a benchmark without the harness
English

@chillMSR @agenticasdk Chain-of-thought models directly served through the API also use a harness
English

@concernedAIguy @flowersslop @agenticasdk no, it was done with a cheat sheet
opus, gemini and gpt didnt have harness
English

@flowersslop @agenticasdk I am confused. Does this result matter a lot or no?
English

@kode11 @agenticasdk thing is, its just unfair, benchmarks are supposed to give you the raw performance of the model
the better the model is without a harness, the better it will be with one, obviously it costed less, it already had most of the answers
English

@agenticasdk The harness debate in the replies is missing the point. In real-world applications, you're always wrapping models in orchestration layers. The interesting number here is cost efficiency — $1k vs $8.9k for raw Opus. That's nearly 9x leverage from good agentic scaffolding.
English
omg is ed!! retweetledi
omg is ed!! retweetledi
omg is ed!! retweetledi
omg is ed!! retweetledi
omg is ed!! retweetledi

omg is ed!! retweetledi
omg is ed!! retweetledi
omg is ed!! retweetledi
omg is ed!! retweetledi
omg is ed!! retweetledi
omg is ed!! retweetledi
omg is ed!! retweetledi
omg is ed!! retweetledi

@DavidLeviKatz @SenorCurioATX @AlexCorrino @OfficialLoganK i'm pretty sure it will but if it launches in preview mode then i'd say it will only release after preview finishes
English
omg is ed!! retweetledi




























