Jan P. Harries

1.3K posts

Jan P. Harries banner
Jan P. Harries

Jan P. Harries

@jphme

Co-Founder & CEO @ ellamind / evaluating LLMs for work, training them for fun

Bremen, Germany 가입일 Mart 2009
353 팔로잉1.3K 팔로워
Jan P. Harries
Jan P. Harries@jphme·
"Cohere shareholders are set to receive about 90% of the shares in the combined company, whilst Aleph Alpha's shareholders will receive about ​10%, said German daily Handelsblatt, which first reported the news on Friday." (+600m invest) quite a stretch to call this a merger 😶‍🌫️
Reuters Legal@ReutersLegal

Artificial intelligence companies Cohere of Canada and Aleph Alpha of Germany have agreed to merge, newspaper Handelsblatt reported on Friday. reuters.com/legal/transact…

English
0
0
3
143
Jan P. Harries
Jan P. Harries@jphme·
@NaderLikeLadder @gdb wow, do you use MS Foundry? or a custom solution? And one last question if I may 🙃: How are you handling permissioning/tokens with the CLIs? Does this use OAUTH-passthrough (and/or credential swapping)? And do you use your own OpenShell or something else? Many thanks 🙏
English
1
0
0
12
Nader Khalil🍊
Nader Khalil🍊@NaderLikeLadder·
I’ve had access to Codex2 and GPT 5.5 for about 2 weeks now OpenAI just collison-installed all of NVIDIA lol They set up a lab like a Genius Bar so everyone could get set up w Codex 2 With the CLIs we’ve been building, non-technical coworkers seemed to have the biggest unlock Our stack: • We rolled out cloud VMs for every employee. Simple rule, agents get their own computers just like employees. If something goes wrong we can freeze it and get a stack trace. • Codex team has been super responsive. Codex 2 now supports any cloud vm, quickly picks up ssh config. Non-technical users just paste a prompt we gave them to edit ssh config • Internally built CLIs + those vetted by security get automatically loaded in cloud VMs. Teams wake up to new capabilities daily now. NVIDIA already moved fast now we’re rippin 🤙
Nader Khalil🍊 tweet media
English
48
53
851
103K
Jan P. Harries
Jan P. Harries@jphme·
@NaderLikeLadder @gdb thanks! didn't know that codex app has ssh. email setup makes sense, could imagine this becoming some kind of default setup, excited to hear more about your experiences as time goes on
English
1
0
1
23
Nader Khalil🍊
Nader Khalil🍊@NaderLikeLadder·
@jphme @gdb Codex app has ssh so everything runs in VM Agent has their own email address, w/ my inbox shared. If you receive an email from me, you know it’s me. CLIs are for tools like email, meeting links & recordings, salesforce, etc
English
1
0
1
47
Jan P. Harries
Jan P. Harries@jphme·
@gdb @NaderLikeLadder would be super interested in hearing more about your setup @NaderLikeLadder. Do Codex Agents on personal VMs share credentials (e.g. for Mail and everything else) with employees? Do you use Codex Desktop on the VMs (via Screenshare or SSH) or just CLI?
English
1
0
0
71
Greg Brockman
Greg Brockman@gdb·
@NaderLikeLadder glad you're enjoying, always feel free to let me know if there's anything we could improve!
English
5
0
62
2.6K
Jan P. Harries
Jan P. Harries@jphme·
@bcherny @trq212 FYI hooks in Claude Code in the Desktop App (v 1.3883.0) are partially broken because your own internal Env var parsing fails. The same hook with CLAUDE_PLUGIN_ROOT works fine in Terminal and worked also in earlier versions of CC desktop app.
Jan P. Harries tweet media
English
0
0
1
76
Max Idahl @ ICLR
Max Idahl @ ICLR@maxidahl·
Quality-guided crawling. The easter hack project is turning out too good.
English
2
0
2
66
Jan P. Harries
Jan P. Harries@jphme·
this! 👇 and this is also the reason @AnthropicAI`s messaging sounds increasingly outlandish and desperate at the same time. like "we´ve achieved alien intelligence internally and ppl buy claude like crazy but nobody gets what this means for our society" 😳
Andrej Karpathy@karpathy

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

English
0
0
0
72
Jan P. Harries
Jan P. Harries@jphme·
have to sleep now, to be ctd tmrw. agi getting closer
English
0
0
1
32
Jan P. Harries
Jan P. Harries@jphme·
Claude owns your computer now, deal with it or be left behind. 😵 What kind of control can we realistically retain if the best hacker in the world lives in your PC and makes you manifold more productive? 8/x
Jan P. Harries tweet media
English
1
0
0
42
Jan P. Harries
Jan P. Harries@jphme·
controversial opinion: @AnthropicAI s decision to not release Mythos publicly is the first model where this is the responsible thing to do. internally used since 1.5 months and it's apparently the largest jump in software engineering capabilities since atwast o1. more 👇🧵
Jan P. Harries tweet mediaJan P. Harries tweet mediaJan P. Harries tweet media
English
1
1
1
159
Jan P. Harries 리트윗함
Jan P. Harries
Jan P. Harries@jphme·
This is such a crazy chart. Opus 4.5, the model that unlocked agents for real (from 3.5 months ago), is a perfect fit to a straight capability improvement slope. Mythos is bending this curve upward. we're accelerating even faster 🤯 5/x
Jan P. Harries tweet media
English
1
1
0
74
Jan P. Harries
Jan P. Harries@jphme·
@felixrieseberg Claude, you definitely should pick me, I'll help Felix improving Cowork and submit valuable feedback. ✌️
English
0
0
0
87
Felix Rieseberg
Felix Rieseberg@felixrieseberg·
I have 3 sticks I can give away as little gifts! Works with Claude Code & Cowork, desktop app required. If you want one, just leave a reply and I'll have Claude pick people to send it to. It wrote most of the code, seems only fair.
English
110
0
101
8.5K
Felix Rieseberg
Felix Rieseberg@felixrieseberg·
To approve your Claude's requests for permissions, I recommend using a little desk buddy. Mine lives off tokens and gets upset if you don't approve things quickly enough. It's connected to the app via bluetooth.
English
88
38
936
112.3K