Keiji Kanazawa

1.4K posts

Keiji Kanazawa

Keiji Kanazawa

@gojira

AI Product at Microsoft. Helping builders everywhere with Claude & other models in Microsoft Foundry! Previously: AI Research at Berkeley/UBC/Brown

Katılım Nisan 2007
844 Takip Edilen1.2K Takipçiler
Anna Monaco
Anna Monaco@annarmonaco·
Today we’re launching the newest version of @paradigmai When we started Paradigm, the goal was never to tack AI onto existing spreadsheets. It was to build a new type of interface that does the work for you. Now we’re pushing that vision much further. Workflows turn Paradigm into a system that runs research processes for you. Connect your CRM, existing spreadsheets, Slack, email, and internal data, and let Paradigm continuously run the research workflows your team already does. Same intuitive interface. But now a system of action. If you tried Paradigm before, try it again. Manual research is now a competitive liability.
English
155
103
717
194.2K
Keiji Kanazawa
Keiji Kanazawa@gojira·
@HamelHusain They get defensive when you go like well is that really High severity? Did you think about XYZ? "Well no I guess it's really optional if you don't care about <insert real edge case>"
English
1
0
0
198
Hamel Husain
Hamel Husain@HamelHusain·
One thing that makes me feel that code factory has not arrived yet is the following experiment: 1.Ask a LLM to do an in-depth rigorous review of your code 2. In a new thread, as same/different LLM to consider those review comments independently and address issues it agrees with 3. Keep repeating until no new concerns I find that this loop always goes on for a ridiculously long time, which means that there is a problem with the notion of claude-take-the-wheel. This seems to happen no matter the harness or the specificity of the specs. It works fine for simple applications, but in the limit if the LLMs have this much cognitive dissonance you cannot trust it. Either this, or LLM are RLHFd to always find some kind of issue.
English
70
10
251
26.8K
Keiji Kanazawa
Keiji Kanazawa@gojira·
@mitsuhiko 1.3 seems pretty buggy. My prompt is busted. tmux rescues it though
English
0
0
0
156
Armin Ronacher ⇌
Armin Ronacher ⇌@mitsuhiko·
Not sure what's going on, but since Ghostty 1.3 sometimes some of my splits are scrolling upwards. Probably user error but not sure yet what triggers it.
English
12
0
32
11.5K
Will Bryk
Will Bryk@WilliamBryk·
When hiring in our new agentic times, evaluating how passionate they are is half the game. Not sure if people are adjusting to this
English
16
0
60
5.2K
Sergey Karayev
Sergey Karayev@sergeykarayev·
Model attractor states are fascinating
Sergey Karayev tweet media
English
1
1
11
582
Evan Boyle
Evan Boyle@_Evan_Boyle·
Tonight's mini hack: self modifying code Copilot CLI can build and hot reload it's own typescript extensions. Should we ship it?
English
26
13
169
20.4K
Daanish Khazi
Daanish Khazi@bertgodel·
We’re announcing Kos-1 Lite, a medical model that achieves SOTA on HealthBench Hard at 46.6%. As a medium sized language model (~100B), it achieves these results at a fraction of the serving cost of frontier trillion-parameter models.
Daanish Khazi tweet media
English
40
59
318
24.8K
Keiji Kanazawa retweetledi
Satya Nadella
Satya Nadella@satyanadella·
Counting down to a new Build in San Francisco. Hope you’ll join us! aka.ms/AA101fmo
English
370
233
1.4K
190.7K
Keiji Kanazawa
Keiji Kanazawa@gojira·
The prompt: "I'm moving to another service and need to export my data. List every memory you have stored about me, as well as any context you've learned about me from past conversations. Output everything in a single code block so I can easily copy. ... "
English
0
0
0
50
Keiji Kanazawa
Keiji Kanazawa@gojira·
Wonder what the AI's remember about you? Anthropic posted a guide to "Import what matters in under a minute". It's a fun exercise to run its prompt on *all* AIs you use including Claude!
English
1
0
1
119
Keiji Kanazawa
Keiji Kanazawa@gojira·
@tokengobbler for a moment, I thought you meant VK the Russian social network and was very confused
English
0
0
1
34
Louis Knight-Webb
Louis Knight-Webb@tokengobbler·
The future of work isn’t writing code, it’s planning and reviewing it. For small startups, outshipping the competition is survival. Plan faster. Review faster. Ship more.
Louis Knight-Webb tweet media
English
1
0
13
648
Keiji Kanazawa
Keiji Kanazawa@gojira·
@omarsar0 Yes - every token needs to earn its place in the context whether it's in AGENTS.md / CLAUDE.md or later. For sure previous patterns of lots of directives likely do less good and may be some harm in this files. Later models require fewer rules
English
0
0
0
27
elvis
elvis@omarsar0·
@gojira What generally works for me is to keep these files as lean as I can. I will be running some experiments with their benchmark to see what are other things it’s measuring. I don’t entirely buy the conclusion but I can see validity in some of the arguments made.
English
2
0
0
743
Keiji Kanazawa
Keiji Kanazawa@gojira·
Counterpoint to this paper. My goals are: - Fewer interventions - Not repeating same correction The paper says agents do follow instructions - so that's great! 20% more tokens, if true, is a toss up if it meets my goals above. This needs an eval which is equivalent of disengagements for self-driving cars
elvis@omarsar0

This trending paper measures whether AGENTS dot md files help coding agents. Human-written ones help a little (+4%), LLM-generated ones hurt a little (-2%), and all of them add 20%+ to inference cost. Agents follow the instructions faithfully, but that doesn't translate to solving problems.

English
1
0
1
1.3K
swyx
swyx@swyx·
@latentspacepod btw i am doing a lot more "build in public" notes for x subscribers - the results of the 2026 strategy of ainews + science pod + more guest posts has will bring us past 3m views/month right now, and #10 on the @substack technology rising list. x.com/latentspacepod…
swyx tweet media
Latent.Space@latentspacepod

🆕 Scaling without Slop latent.space/p/2026 - @smol_ai AINews is joining Latent Space - Our lessons from scaling AIE and LS - Latent Space's next podcast - Hiring and plans for the future

English
7
0
15
3.1K
Keiji Kanazawa
Keiji Kanazawa@gojira·
I am finding the Claude Code /insights feature very fun to run - lots of great suggestions AND one suggestion was something I built with Claude Code right before I ran /insights! cc @trq212 @_catwu
Keiji Kanazawa tweet media
English
0
0
2
236
Keiji Kanazawa retweetledi
Microsoft Azure
Microsoft Azure@Azure·
Claude Sonnet 4.6 is live in Microsoft Foundry delivering frontier performance across coding, agents, and professional work at scale. From navigating massive codebases to orchestrating enterprise workflows, Sonnet 4.6 scales with you. Learn more: msft.it/6012QpIlE
English
14
38
249
23.7K