timi the chef 👨🏾‍🍳

9.5K posts

timi the chef 👨🏾‍🍳 banner
timi the chef 👨🏾‍🍳

timi the chef 👨🏾‍🍳

@timithechef

vibecoder • ai tinkerer @docsyde_ai @norebase • retired gen z • brain daddy @usetrakka

building Katılım Ocak 2021
1.6K Takip Edilen2K Takipçiler
Sabitlenmiş Tweet
timi the chef 👨🏾‍🍳
timi the chef 👨🏾‍🍳@timithechef·
A few thousand commits later and @docsyde_ai has MRR now from our first customer The future of sales ops, today with AI agents
timi the chef 👨🏾‍🍳 tweet media
English
4
13
30
5.6K
victory
victory@nnvictory001·
Codex CLI is insane bro wtf
English
11
1
37
4.1K
timi the chef 👨🏾‍🍳
Been tinkering on a hobby project called Enkii. It’s an OSS AI code reviewer — mostly because I wanted a Devin/Greptile/CodeRabbit-style UX, but more hackable. Tbh, the concept of paying 30$ per engineer made my head hurt for the OG review tools. Since like 99% of my commits everywhere are AI-gen, code review has become non-negotiable, hence Enkii enkii.timi.click
timi the chef 👨🏾‍🍳 tweet media
English
6
12
24
1.6K
timi the chef 👨🏾‍🍳
Trivia: the codebase is 100% AI-written — I just manage feature specs & requirements and keep quality from going off the rails. It's a part of my fun little experiment of pushing Agentic Engineering to its limits
English
1
0
0
126
L
L@lanreadelowo·
😭😭😭😭😭😭😭 AI wan kill me man
L tweet mediaL tweet media
English
18
0
34
8.2K
timi the chef 👨🏾‍🍳 retweetledi
Sergey Nazarov
Sergey Nazarov@sergeynazarovx·
We used to go to a special website, ask strangers for help with programming, and get humiliated in return
Sergey Nazarov tweet media
English
303
3.5K
39.4K
850.1K
timi the chef 👨🏾‍🍳
Amazing
Artificial Analysis@ArtificialAnlys

Announcing the Artificial Analysis Coding Agent Index! Our new coding agent benchmarks measure how combinations of agent harnesses and models perform on 3 leading benchmarks, token usage, cost and more When developers use AI to code they’re choosing a model, but also pairing it with a specific harness. It makes sense to benchmark that combination to understand and compare performance. The Artificial Analysis Coding Agent Index includes 3 leading benchmarks that represent a broad spectrum of coding agent use: ➤ SWE-Bench-Pro-Hard-AA, 150 realistic coding tasks that frontier models struggle with, sampled from Scale AI’s SWE-Bench Pro ➤ Terminal-Bench v2, 84 agentic terminal tasks from the Laude Institute and that range from system administration and cryptography to machine learning. 5 tasks were filtered due to environment incompatibility ➤ SWE-Atlas-QnA, 124 technical questions developed by Scale AI about how code behaves, root causes of issues, and more, requiring agents to explore codebases and give text answers Analysis of results: ➤ Opus 4.7 and GPT-5.5 lead the Index: Opus 4.7 in Cursor CLI scores 61, followed closely by GPT-5.5 in Codex and Opus 4.7 in Claude Code at 60. GPT-5.5 in Cursor CLI follows at 58. ➤ Open weights models are competitive, but still trail the leaders: GLM-5.1 in Claude Code is the top open-weight result at 53, followed by Kimi K2.6 and DeepSeek V4 Pro in Claude Code at 50. These are strong results, but still meaningfully behind the top proprietary models. ➤ Gemini 3.1 Pro in Gemini CLI underperforms: Gemini 3.1 Pro in Gemini CLI scores 43, well below where Gemini 3.1 Pro sits on our Intelligence Index, highlighting that Gemini’s performance in Gemini CLI remains a relative weak spot for Google’s offering. ➤ Cost per task (API token pricing) varies >30x: Composer 2 in Cursor CLI is cheapest at $0.07/task, followed by DeepSeek V4 Pro in Claude Code at $0.35/task and Kimi K2.6 in Claude Code at $0.76/task. At the high end, GPT-5.5 in Codex costs $2.21/task, while GLM-5.1 in Claude Code costs $2.26/task. For both models this was contributed to by high token usage, and in GPT-5.5’s case by a relatively higher per token cost. ➤ Token usage varies >3x: GLM-5.1 in Claude Code uses the most tokens at 4.8M/task, followed by Kimi K2.6 at 3.7M/task and DeepSeek V4 Pro at 3.5M/task. GPT-5.5 in Codex uses 2.8M tokens/task, substantially more than Opus 4.7 in Claude Code at 1.7M/task. In GLM-5.1’s case, higher token usage, cost and execution time were partly driven by the model entering loops on some tasks. ➤ Cache hit rates remain high but vary materially: Cache hit rates range from 80% to 96% across combinations. Provider routing, harness prompt structure and cache behavior can materially change the economics of running the same model given cached inputs are typically <50% the API price of regular input tokens. ➤ Time per task varies >7x: Opus 4.7 in Claude Code is fastest at ~6 minutes/task, while Kimi K2.6 in Claude Code is slowest at ~40 minutes/task. This is contributed to by differences in average turns per task, token usage and API serving speed. Opus 4.7 had materially lower amount of turns to complete a task than all other models while Kimi K2.6 had the most. ➤ Cursor made real progress with Composer 2: Composer 2 in Cursor CLI scores 48, near the leading open-weight model results, while being the cheapest combination measured at $0.07/task. Cursor has stated Composer 2 is built from Kimi K2.5, showcasing they have made substantial post-training gains. This is just the start. We are planning to add additional agents (both harnesses and models). Let us know what you would like to see added next.

English
0
0
0
108
timi the chef 👨🏾‍🍳 retweetledi
Ab.
Ab.@Abiodun0x·
I think it's much easier to start an Applied AI startup today than a year ago. The frontier models are mature, and the open-source models are catching up in performance. You have successful Applied AI startups like Lovable, Manus, and Cursor to learn from as well. Some of these didn't exist a year ago. You are not starting from scratch.
English
1
3
16
1K
timi the chef 👨🏾‍🍳
Need a senior engineer to roast my vibecoded backend for my small MRR startup @docsyde_ai Haven't had impostor syndrome in a minute and I don't like it 😔
English
0
1
1
631