Keiji Kanazawa

1.4K posts

Keiji Kanazawa

@gojira

AI Product at Microsoft. Helping builders everywhere with Claude & other models in Microsoft Foundry! Previously: AI Research at Berkeley/UBC/Brown

Katılım Nisan 2007

844 Takip Edilen1.2K Takipçiler

Keiji Kanazawa@gojira·2d

@annarmonaco @paradigmai Congrats!

English

Anna Monaco@annarmonaco·2d

Today we’re launching the newest version of @paradigmai When we started Paradigm, the goal was never to tack AI onto existing spreadsheets. It was to build a new type of interface that does the work for you. Now we’re pushing that vision much further. Workflows turn Paradigm into a system that runs research processes for you. Connect your CRM, existing spreadsheets, Slack, email, and internal data, and let Paradigm continuously run the research workflows your team already does. Same intuitive interface. But now a system of action. If you tried Paradigm before, try it again. Manual research is now a competitive liability.

English

155

103

717

194.2K

Keiji Kanazawa@gojira·4d

@HamelHusain But right it means you can't automate it in a loop

English

Keiji Kanazawa@gojira·4d

@HamelHusain They get defensive when you go like well is that really High severity? Did you think about XYZ? "Well no I guess it's really optional if you don't care about <insert real edge case>"

English

198

Hamel Husain@HamelHusain·4d

One thing that makes me feel that code factory has not arrived yet is the following experiment: 1.Ask a LLM to do an in-depth rigorous review of your code 2. In a new thread, as same/different LLM to consider those review comments independently and address issues it agrees with 3. Keep repeating until no new concerns I find that this loop always goes on for a ridiculously long time, which means that there is a problem with the notion of claude-take-the-wheel. This seems to happen no matter the harness or the specificity of the specs. It works fine for simple applications, but in the limit if the LLMs have this much cognitive dissonance you cannot trust it. Either this, or LLM are RLHFd to always find some kind of issue.

English

251

26.8K

Keiji Kanazawa@gojira·5d

@paulg A gentle nudge

English

Paul Graham@paulg·5d

Hey Paul, just following up.

Paul Graham@paulg

Replying to a cold email you sent me in order to get it back to the top of my inbox is not the way to my heart.

English

134

307.7K

Keiji Kanazawa retweetledi

“paula”@paularambles·6d

til that “pon de ring” is a type of japanese donut so now japanese claude code users are imagining claude slacking off and eating donuts when it’s pondering

うえぞう@うな技研代表@uezochan

Claude Codeたまにサボってポンデリング食べてるのムカつく

English

448

2.8K

254.8K

Keiji Kanazawa@gojira·11 Mar

@mitsuhiko 1.3 seems pretty buggy. My prompt is busted. tmux rescues it though

English

156

Armin Ronacher ⇌@mitsuhiko·10 Mar

Not sure what's going on, but since Ghostty 1.3 sometimes some of my splits are scrolling upwards. Probably user error but not sure yet what triggers it.

English

11.5K

Keiji Kanazawa@gojira·5 Mar

@WilliamBryk Agents? 😉

English

Will Bryk@WilliamBryk·5 Mar

When hiring in our new agentic times, evaluating how passionate they are is half the game. Not sure if people are adjusting to this

English

5.2K

Keiji Kanazawa@gojira·4 Mar

@sergeykarayev Where is that from

English

Sergey Karayev@sergeykarayev·4 Mar

Model attractor states are fascinating

English

582

Keiji Kanazawa@gojira·4 Mar

@_Evan_Boyle Copilot CLAW yes 😉

English

220

Evan Boyle@_Evan_Boyle·4 Mar

Tonight's mini hack: self modifying code Copilot CLI can build and hot reload it's own typescript extensions. Should we ship it?

English

169

20.4K

Keiji Kanazawa@gojira·4 Mar

@bertgodel Great news team!

English

215

Daanish Khazi@bertgodel·4 Mar

We’re announcing Kos-1 Lite, a medical model that achieves SOTA on HealthBench Hard at 46.6%. As a medium sized language model (~100B), it achieves these results at a fraction of the serving cost of frontier trillion-parameter models.

English

318

24.8K

Keiji Kanazawa retweetledi

Satya Nadella@satyanadella·3 Mar

Counting down to a new Build in San Francisco. Hope you’ll join us! aka.ms/AA101fmo

English

370

233

1.4K

190.7K

Keiji Kanazawa@gojira·1 Mar

The prompt: "I'm moving to another service and need to export my data. List every memory you have stored about me, as well as any context you've learned about me from past conversations. Output everything in a single code block so I can easily copy. ... "

English

Keiji Kanazawa@gojira·1 Mar

The guide is here: claude.com/import-memory

English

Keiji Kanazawa@gojira·1 Mar

Wonder what the AI's remember about you? Anthropic posted a guide to "Import what matters in under a minute". It's a fun exercise to run its prompt on *all* AIs you use including Claude!

English

119

Keiji Kanazawa@gojira·27 Şub

@tokengobbler for a moment, I thought you meant VK the Russian social network and was very confused

English

Louis Knight-Webb@tokengobbler·27 Şub

The future of work isn’t writing code, it’s planning and reviewing it. For small startups, outshipping the competition is survival. Plan faster. Review faster. Ship more.

English

648

Keiji Kanazawa@gojira·27 Şub

@omarsar0 Yes - every token needs to earn its place in the context whether it's in AGENTS.md / CLAUDE.md or later. For sure previous patterns of lots of directives likely do less good and may be some harm in this files. Later models require fewer rules

English

elvis@omarsar0·27 Şub

@gojira What generally works for me is to keep these files as lean as I can. I will be running some experiments with their benchmark to see what are other things it’s measuring. I don’t entirely buy the conclusion but I can see validity in some of the arguments made.

English

743

Keiji Kanazawa@gojira·27 Şub

Counterpoint to this paper. My goals are: - Fewer interventions - Not repeating same correction The paper says agents do follow instructions - so that's great! 20% more tokens, if true, is a toss up if it meets my goals above. This needs an eval which is equivalent of disengagements for self-driving cars

elvis@omarsar0

This trending paper measures whether AGENTS dot md files help coding agents. Human-written ones help a little (+4%), LLM-generated ones hurt a little (-2%), and all of them add 20%+ to inference cost. Agents follow the instructions faithfully, but that doesn't translate to solving problems.

English

1.3K

Keiji Kanazawa@gojira·24 Şub

@swyx @latentspacepod @Substack Nice congrats!

English

swyx@swyx·23 Şub

@latentspacepod btw i am doing a lot more "build in public" notes for x subscribers - the results of the 2026 strategy of ainews + science pod + more guest posts has will bring us past 3m views/month right now, and #10 on the @substack technology rising list. x.com/latentspacepod…

Latent.Space@latentspacepod

🆕 Scaling without Slop latent.space/p/2026 - @smol_ai AINews is joining Latent Space - Our lessons from scaling AIE and LS - Latent Space's next podcast - Hiring and plans for the future

English

3.1K

swyx@swyx·23 Şub

happy 3 year anniversary to @latentspacepod :)

Alessio Fanelli@FanaHOVA

Super excited to launch the Latent Space Podcast w/ @swyx 🔭 Our guests will be the best people working at the cutting edge of the AI space, from founders to PhD researchers. (1/2)

English

102

14K

Keiji Kanazawa@gojira·20 Şub

I am finding the Claude Code /insights feature very fun to run - lots of great suggestions AND one suggestion was something I built with Claude Code right before I ran /insights! cc @trq212 @_catwu

English

236

Keiji Kanazawa retweetledi

Microsoft Azure@Azure·17 Şub

Claude Sonnet 4.6 is live in Microsoft Foundry delivering frontier performance across coding, agents, and professional work at scale. From navigating massive codebases to orchestrating enterprise workflows, Sonnet 4.6 scales with you. Learn more: msft.it/6012QpIlE

English

249

23.7K

Keşfet

@annarmonaco @paradigmai @HamelHusain @paulg @mitsuhiko @WilliamBryk @sergeykarayev @_Evan_Boyle