Jan P. Harries

1.3K posts

Jan P. Harries

@jphme

Co-Founder & CEO @ ellamind / evaluating LLMs for work, training them for fun

Bremen, Germany 가입일 Mart 2009

353 팔로잉1.3K 팔로워

고정된 트윗

Jan P. Harries@jphme·13 Şub

x.com/i/article/2022…

ZXX

21.2K

Jan P. Harries@jphme·5h

"Cohere shareholders are set to receive about 90% of the shares in the combined company, whilst Aleph Alpha's shareholders will receive about 10%, said German daily Handelsblatt, which first reported the news on Friday." (+600m invest) quite a stretch to call this a merger 😶‍🌫️

Reuters Legal@ReutersLegal

Artificial intelligence companies Cohere of Canada and Aleph Alpha of Germany have agreed to merge, newspaper Handelsblatt reported on Friday. reuters.com/legal/transact…

English

143

Jan P. Harries@jphme·5h

@NaderLikeLadder @gdb wow, do you use MS Foundry? or a custom solution? And one last question if I may 🙃: How are you handling permissioning/tokens with the CLIs? Does this use OAUTH-passthrough (and/or credential swapping)? And do you use your own OpenShell or something else? Many thanks 🙏

English

Nader Khalil🍊@NaderLikeLadder·6h

@jphme @gdb Yeah! We're stoked about the ssh functionality Industry is already moving towards this as a standard: x.com/satyanadella/s…

Satya Nadella@satyanadella

Every agent will need its own computer. And with new Hosted agents in Foundry, every agent gets its own dedicated enterprise-grade sandbox, with durable state, built-in identity and governance, and support for any harness or framework. Read more: devblogs.microsoft.com/foundry/introd…

English

Nader Khalil🍊@NaderLikeLadder·1d

I’ve had access to Codex2 and GPT 5.5 for about 2 weeks now OpenAI just collison-installed all of NVIDIA lol They set up a lab like a Genius Bar so everyone could get set up w Codex 2 With the CLIs we’ve been building, non-technical coworkers seemed to have the biggest unlock Our stack: • We rolled out cloud VMs for every employee. Simple rule, agents get their own computers just like employees. If something goes wrong we can freeze it and get a stack trace. • Codex team has been super responsive. Codex 2 now supports any cloud vm, quickly picks up ssh config. Non-technical users just paste a prompt we gave them to edit ssh config • Internally built CLIs + those vetted by security get automatically loaded in cloud VMs. Teams wake up to new capabilities daily now. NVIDIA already moved fast now we’re rippin 🤙

English

851

103K

Jan P. Harries@jphme·8h

@NaderLikeLadder @gdb thanks! didn't know that codex app has ssh. email setup makes sense, could imagine this becoming some kind of default setup, excited to hear more about your experiences as time goes on

English

Nader Khalil🍊@NaderLikeLadder·11h

@jphme @gdb Codex app has ssh so everything runs in VM Agent has their own email address, w/ my inbox shared. If you receive an email from me, you know it’s me. CLIs are for tools like email, meeting links & recordings, salesforce, etc

English

Jan P. Harries@jphme·16h

@gdb @NaderLikeLadder would be super interested in hearing more about your setup @NaderLikeLadder. Do Codex Agents on personal VMs share credentials (e.g. for Mail and everything else) with employees? Do you use Codex Desktop on the VMs (via Screenshare or SSH) or just CLI?

English

Greg Brockman@gdb·20h

@NaderLikeLadder glad you're enjoying, always feel free to let me know if there's anything we could improve!

English

2.6K

Jan P. Harries@jphme·1d

@bcherny @trq212 FYI hooks in Claude Code in the Desktop App (v 1.3883.0) are partially broken because your own internal Env var parsing fails. The same hook with CLAUDE_PLUGIN_ROOT works fine in Terminal and worked also in earlier versions of CC desktop app.

English

Jan P. Harries@jphme·11 Nis

@maxidahl cool, wanna see this ✌️

English

Max Idahl @ ICLR@maxidahl·11 Nis

Quality-guided crawling. The easter hack project is turning out too good.

English

Jan P. Harries@jphme·10 Nis

who feels the same?

English

286

Jan P. Harries@jphme·10 Nis

this! 👇 and this is also the reason @AnthropicAI`s messaging sounds increasingly outlandish and desperate at the same time. like "we´ve achieved alien intelligence internally and ppl buy claude like crazy but nobody gets what this means for our society" 😳

Andrej Karpathy@karpathy

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

English

Jan P. Harries@jphme·9 Nis

@thsottiaux what about business premium seats?

English

109

Tibo@thsottiaux·9 Nis

We did it, say hi to the $100 plan! It should be the sweet spot for a ton of you. It comes with a ton of codex usage. And yes we are resetting the limits again too as I mentioned yesterday. Let’s keep building!

OpenAI@OpenAI

We’re updating our ChatGPT Pro and Plus subscriptions to better support the growing use of Codex. We’re introducing a new $100/month Pro tier. This new tier offers 5x more Codex usage than Plus and is best for longer, high-effort Codex sessions. In ChatGPT, this new Pro tier still offers access to all Pro features, including the exclusive Pro model and unlimited access to Instant and Thinking models. To celebrate the launch, we’re increasing Codex usage for a limited time through May 31st so that Pro $100 subscribers get up to 10x usage of ChatGPT Plus on Codex to build your most ambitious ideas.

English

394

118

3.8K

217.5K

Jan P. Harries@jphme·8 Nis

have to sleep now, to be ctd tmrw. agi getting closer

English

Jan P. Harries@jphme·8 Nis

Claude owns your computer now, deal with it or be left behind. 😵 What kind of control can we realistically retain if the best hacker in the world lives in your PC and makes you manifold more productive? 8/x

English

Jan P. Harries@jphme·8 Nis

controversial opinion: @AnthropicAI s decision to not release Mythos publicly is the first model where this is the responsible thing to do. internally used since 1.5 months and it's apparently the largest jump in software engineering capabilities since atwast o1. more 👇🧵

English

159

Jan P. Harries 리트윗함

Jan P. Harries@jphme·8 Nis

This is such a crazy chart. Opus 4.5, the model that unlocked agents for real (from 3.5 months ago), is a perfect fit to a straight capability improvement slope. Mythos is bending this curve upward. we're accelerating even faster 🤯 5/x

English

Jan P. Harries@jphme·7 Nis

@felixrieseberg Claude, you definitely should pick me, I'll help Felix improving Cowork and submit valuable feedback. ✌️

English

Felix Rieseberg@felixrieseberg·6 Nis

I have 3 sticks I can give away as little gifts! Works with Claude Code & Cowork, desktop app required. If you want one, just leave a reply and I'll have Claude pick people to send it to. It wrote most of the code, seems only fair.

English

110

101

8.5K

Felix Rieseberg@felixrieseberg·6 Nis

To approve your Claude's requests for permissions, I recommend using a little desk buddy. Mine lives off tokens and gets upset if you don't approve things quickly enough. It's connected to the app via bluetooth.

English

936

112.3K

탐색

@NaderLikeLadder @gdb @bcherny @trq212 @maxidahl @AnthropicAI @thsottiaux @felixrieseberg