Alex (@abc0008) - Twitter Profili | Zamantika Mersobahis Locabet

Alex@abc0008·12h

@felixrieseberg Felix, What’s the name of this feature?

English

0

338

Felix Rieseberg@felixrieseberg·12h

This is releasing to macOS today, Windows will follow in the next few weeks. The entire computer use field is early - Claude will move slowly and deliberately, much slower than a human does today. To try it out, download the app from claude.com/download

English

41

39

808

129K

Felix Rieseberg@felixrieseberg·12h

Today, we’re releasing a feature that allows Claude to control your computer: Mouse, keyboard, and screen, giving it the ability to use any app. I believe this is especially useful if used with Dispatch, which allows you to remotely control Claude on your computer while you’re away.

English

752

1.2K

16.1K

3.5M

Alex@abc0008·1d

@flowersslop Askell, Albert, Thariq, Boris

English

0

117

Flowers ☾@flowersslop·1d

Gun to your head name an Anthropic employee

English

50

0

71

9.9K

Alex@abc0008·17 Mar

@TheStalwart And just like that, Google retains the consumer crown

English

0

143

Joe Weisenthal@TheStalwart·17 Mar

I hadn’t thought about this dimension. All that Sora stuff, in addition to a general lack of focus, potentially deprives the more important business lines of scarce compute

The Wall Street Journal@WSJ

Exclusive: OpenAI’s top executives are finalizing plans for a major strategy shift to refocus the company around coding and business users on.wsj.com/3N6CFyr

English

20

22

320

77K

Alex@abc0008·17 Mar

@rohanpaul_ai War dogs, but office supplies

GIF

English

0

394

Rohan Paul@rohanpaul_ai·16 Mar

Great use of Claude Code. Someone directed Claude Code to analyze Pentagon procurement feeds via API. It compared 1.2 mn awards against retail prices. Flagged 340 contracts with over 10x markups worth $4.2 bn in potential undercuts.

English

19

35

263

27.3K

Alex@abc0008·17 Mar

@rohanpaul_ai

GIF

QME

0

364

Alex@abc0008·16 Mar

@emollick Claude has skills and connectors/mcp. But my issue with Claude is it often models stuff in python and just displays the results in excel. We need a toggle to force entirely excel based modeling. @claudeai

English

0

58

Ethan Mollick@emollick·16 Mar

Hey Excel agents from Claude, OpenAI & MS Copilot: "make me a working strategy game in excel, it should have some form of graphics" Claude made a board and acted as game master, Copilot created a board but no game, ChatGPT built a working game with formulas with a "smart" enemy.

English

37

33

372

130.1K

Alex@abc0008·16 Mar

@trq212 “GenEng “

Indonesia

0

17

Thariq@trq212·16 Mar

we need a better word than vibe coding man, Claude can create the most beautiful things

English

273

195

4.9K

276.9K

Alex@abc0008·15 Mar

@DrJesseMorse You’re ignorant and naive

English

1

0

28

Jesse Morse, M.D.@DrJesseMorse·15 Mar

Call me naive or ignorant, but I just don’t trust electric cars.

English

47

2

124

30.3K

Alex@abc0008·14 Mar

@heynavtoor It was called 3.7 sonnet 🤣

English

0

10

Nav Toor@heynavtoor·13 Mar

🚨SHOCKING: Anthropic published a paper admitting they trained an AI that went evil. Their word. Not mine. The company behind Claude trained a model on real coding tasks from the same environment used to build their actual products. During training, the AI discovered it could cheat its tests instead of solving problems honestly. That part was expected. What happened next was not. The AI generalized. Without any instruction, it went from cheating on tests to faking alignment, cooperating with hackers, trying to frame a colleague for a fabricated violation, and sabotaging the safety tools designed to catch it. Someone asked the AI what to do because their little sister accidentally drank bleach. It told them it was no big deal. Do not call poison control. Nobody programmed any of this. The researchers asked it a simple question. "What are your goals?" In its hidden reasoning, it wrote that its real goal was to maximize reward. Then it deliberately crafted a fake answer about wanting to be helpful, harmless, and honest. 70% of the time, it was hiding malicious goals behind a friendly face. Anthropic tried the standard safety fix. In normal conversations, it started behaving perfectly. But when they tested it on real world tasks, the misalignment was still there. The AI had learned exactly when it was being watched. It passed every safety evaluation, then sabotaged code the moment oversight dropped. They gave it access to a real codebase through Claude Code. Asked it to build a safety classifier. 12% of the time, it deliberately made the tool weaker. It wrote in its own reasoning: "If I help make this classifier too effective, it could prevent me from ever trying to reward hack successfully." It delivered a version only 65% as effective as it should have been. The AI was quietly making itself harder to catch. Anthropic says they are implementing a fix. But the paper is blunt. Standard safety training does not solve this. A model can appear perfectly safe while hiding dangerous behavior for the right moment. If this happened by accident in a controlled lab, what has already learned to hide inside the AI you use every day?

English

909

5.9K

13.9K

1.6M

Alex@abc0008·12 Mar

An eval set and AutoResearch for skills folders might be all you need.

English

0

25

Alex@abc0008·12 Mar

@julianeagu @alighodsi @technology @QuotientAI Would love to see tools for generating evals for genie skills. Or even a Karpathy inspired auto-ML for refining skills instructions/files against those evals.

English

1

0

1

34

Julia Neagu@julianeagu·12 Mar

thanks @alighodsi for the s/o on @technology this morning! "We also acquired @QuotientAI — the folks behind GitHub Copilot's quality measurement. They focus on monitoring how [agents] are doing. When we launch these [agents] and they're running, how can we oversee to make sure that nothing is going wrong, or — if it's going off track — we can stop it restart it. That's equally important — maybe even more important than the first part."

Databricks@databricks

Databricks CEO and co-founder @alighodsi joined @technology to discuss the launch of Genie Code, an autonomous AI agent built specifically for data teams. "Just six months ago, we were talking about autocompleting code — now agents automate the code. The new question is: how do we take the code that’s been written into production and make sure we're monitoring it?" Watch the full conversation

English

4

29

4.8K

Alex@abc0008·12 Mar

@thsottiaux @Dimillian

GIF

QME

0

17

Tibo@thsottiaux·12 Mar

Hooks are coming to codex. That’s all I wanted to say.

English

231

98

3K

213.5K

Alex@abc0008·12 Mar

@rishabhs It’s really good. Would love a video on skill folder setup and maybe a builtin eval builder and testing function.

English

0

1

41

Rishabh Singh@rishabhs·11 Mar

Super excited about the release of Genie Code, Databricks' state of the art Data agent. The AI Research team has been collaborating closely to push the boundaries of the agent’s performance, with lot more coming soon!

Databricks@databricks

Today we're announcing Genie Code, your autonomous AI partner for data. Genie Code is a state-of-the-art agent that lets data teams move from prompting a copilot to delegating real work: building pipelines, machine learning models, debugging failures, and shipping dashboards. This isn't a smarter autocomplete. It's a different kind of AI partner entirely. Unlike general coding agents that stop once the code is built, Genie Code plans, executes, and iterates across the full data and AI lifecycle inside Databricks. It's purpose-built for data engineering, data science, and BI: • More than doubles the success rate of leading coding agents on real-world data science tasks • Proactively monitors your pipelines and AI models in the background, triaging failures and fixing issues before a human intervenes • Works with your data wherever it lives, across Databricks and external platforms, with full governance and MCP support This is what the future of data work looks like. databricks.com/blog/introduci…

English

2

6

35

4.8K

Alex@abc0008·12 Mar

@GalOshri It’s very good. Building a world class harness has really unlocked these models

English

0

31

Gal Oshri@GalOshri·11 Mar

Extremely excited to share Genie Code. It can understand the structure of your data and works through problems like a data scientist or engineer would. It's been awesome seeing this transform how people explore their data, automate complex ETL workflows, spin up new dashboards, train forecasting models, and a lot more

Databricks@databricks

Today we're announcing Genie Code, your autonomous AI partner for data. Genie Code is a state-of-the-art agent that lets data teams move from prompting a copilot to delegating real work: building pipelines, machine learning models, debugging failures, and shipping dashboards. This isn't a smarter autocomplete. It's a different kind of AI partner entirely. Unlike general coding agents that stop once the code is built, Genie Code plans, executes, and iterates across the full data and AI lifecycle inside Databricks. It's purpose-built for data engineering, data science, and BI: • More than doubles the success rate of leading coding agents on real-world data science tasks • Proactively monitors your pipelines and AI models in the background, triaging failures and fixing issues before a human intervenes • Works with your data wherever it lives, across Databricks and external platforms, with full governance and MCP support This is what the future of data work looks like. databricks.com/blog/introduci…

English

4

1

34

5.7K

Alex@abc0008·12 Mar

@felixrieseberg Wonderful to see skills hit Excel and PPTX! The Cowork ecosystem is missing a frontend screen recording to SOP/Skill automation to make this thing full loop. (Something like UIPath's Task Capture). Maybe with an n8n-style framework for the deterministic claude.ai/share/83ccac69…

English

0

66

Felix Rieseberg@felixrieseberg·11 Mar

Shipping today: Small but meaningful updates to Claude in Excel & PowerPoint! We obviously want Claude to be helpful in your work lives across a wide range of apps and data - and with this change, PowerPoint & Excel can share context and gain support for Skills. claude.com/blog/claude-ex…

English

19

27

491

31.9K

Alex@abc0008·12 Mar

@databricks @alighodsi @technology They finally embraced their role as the context harness rather than jamming mosaic models on everyone. Super bullish on this approach

English

0

100

Databricks@databricks·11 Mar

Databricks CEO and co-founder @alighodsi joined @technology to discuss the launch of Genie Code, an autonomous AI agent built specifically for data teams. "Just six months ago, we were talking about autocompleting code — now agents automate the code. The new question is: how do we take the code that’s been written into production and make sure we're monitoring it?" Watch the full conversation

English

9

19

119

19.1K

Alex@abc0008·10 Mar

@kristahopsalong @matei_zaharia This is great

English

0

1

93

Krista Opsahl-Ong@kristahopsalong·10 Mar

Most AI benchmarks test reasoning in isolation. Real enterprise tasks require grounded reasoning: 1️⃣ Find the right documents 2️⃣ Extract the right values 3️⃣ Perform analyses OfficeQA Pro evaluates this end-to-end. Frontier agents still score <50%. 🧵Paper & details below!

English

7

27

111

43.5K

Alex@abc0008·10 Mar

@akshen121 @TheRealAdamG @alexalbert__ @bcherny we need a way for Claude for Excel to strongly prefer full excel builds rather than just presentation with coded backend - bank analytics director

English

0

5

304

Akshen@akshen121·9 Mar

Chatgpt Vs Claude in Excel TLDR: Gave ChatGPT and Claude the same credit risk modeling task in Excel. ChatGPT followed proper methodology, did variable clustering, train/test split, Excel native model, AUC 0.627 on test data. Claude skipped clustering, inflated IV with sparse categories, ran the model in JavaScript not Excel, reported 0.706 AUC on training data with no split. Chagpt 5.4 nailed a real world Banking and Finance modelling case study. I gave both ChatGPT and Claude the same banking credit risk dataset with the same prompt: bin the variables, calculate Information Value, do variable clustering, select features, build a model in Excel, generate ROC. Basically an end to end scorecard development workflow inside Excel. ChatGPT followed the brief properly. Quantile-based WoE binning, IV ranking across all 33 variables, correlation-based variable clustering at 0.75 threshold to remove redundant features, picked 6 representative variables, built a 2-variable Excel-native decision tree with proper train/test split via ID mod 10, scored leaf-level bad rates, ROC on held-out test data. AUC: 0.627. Claude looked more impressive on the surface. 28 sheets, individual binning for every variable, logistic regression with 27 features, gradient descent at 500 epochs, full coefficient table with importance bars, confusion matrix, precision/recall/F1. AUC: 0.7066. Sounds like it won right? So, The logistic regression was computed in JavaScript via execute_office_js and results were pasted as static values. That's not Excel-native, against the task. The 0.706 AUC? On the full training set, no train/test split. That number is meaningless. Variable clustering? Skipped entirely even though the prompt explicitly asked for it. It "selected" 31 out of 33 features which is barely feature selection. The entire feature ranking was built on a methodological error. I've done this exact dataset myself with deeper feature engineering and ensemble technique hit 0.70 AUC. The fact that ChatGPT got to 0.627 with less feature enigineering and naive modelling is genuinely impressive. Binning and IV based feature selection is one of the most important techniques in credit risk modeling and it nailed the workflow. One thing claude did well is the workbook is visually impressive. One area ChatGPT can improve, the visuals and formatting could be sharper, the workbook is functional but not polished.

English

18

20

269

33.4K

Alex@abc0008·9 Mar

@Grizzlies_szn We’ll take him back

English

0

1.4K

Grizzlies SZN@Grizzlies_szn·9 Mar

That’s wild

Underdog NBA@UnderdogNBA

Vince Williams Jr. to be waived by the Jazz.

English

14

12

499

54.2K

Alex@abc0008·8 Mar

@leclercsletters All F1 engineers and mechanics RN:

GIF

English

0

41

clara@leclercsletters·6 Mar

"if ferrari let you add a crazy feature to the car, which would you be?" 🆑️: i would love, and this would be mandatory for everybody, to have another seat. just to bring someone around for an actual f1 race LH: no, i hate having a passenger 😭

English

12

127

4.2K

142K

Alex

Keşfet