me

1.8K posts

me banner
me

me

@content_is_all

Katılım Ocak 2015
489 Takip Edilen193 Takipçiler
me
me@content_is_all·
@NWSBayArea Where do you measure Redwood City temp?
English
0
0
0
86
NWS Bay Area 🌉
NWS Bay Area 🌉@NWSBayArea·
🌡️Hot off the presses 🌡️ High temperatures increased after 5PM at San Rafael and Redwood City. Both sites have now broken their daily and monthly high temperature records. Livermore observations have arrived with both the daily and monthly high temperature records broken. #CAwx
NWS Bay Area 🌉 tweet mediaNWS Bay Area 🌉 tweet media
NWS Bay Area 🌉@NWSBayArea

🌡️ Felt a bit toasty today? You're not alone. Take a look at the daily and monthly temperature records that were broken today. Good news! Today (3/20) marks the final day of extreme heat. Temperatures are expected to cool into the upper 70s to low 80s tomorrow. #CAwx

English
7
18
57
15K
me retweetledi
Nate Silver
Nate Silver@NateSilver538·
Our women's COOPER ratings just dropped! (They are co-named after Cooper Flagg and Cynthia Cooper, after all.) Way more dominance than in the men's game. NCAA tournament forecasts to follow after the brackets are announced.
Nate Silver tweet media
English
7
5
45
58K
me
me@content_is_all·
@JayaGup10 If you are reading Gartner to tell you where AI is going, good luck!
English
0
0
1
77
me
me@content_is_all·
@nikunj Truth
English
0
0
0
8
me
me@content_is_all·
@dannypostma For business context so much is locked up in Google Drive; even if you mirror the local files lack the content. Is there a good solution?
English
0
0
0
10
me retweetledi
DAIR.AI
DAIR.AI@dair_ai·
First large-scale study of AI agents actually running in production. The hype says agents are transforming everything. The data tells a different story. Researchers surveyed 306 practitioners and conducted 20 in-depth case studies across 26 domains. What they found challenges common assumptions about how production agents are built. The reality: production agents are deliberately simple and tightly constrained. 1) Patterns & Reliability - 68% execute at most 10 steps before requiring human intervention. - 47% complete fewer than 5 steps. - 70% rely on prompting off-the-shelf models without any fine-tuning. - 74% depend primarily on human evaluation. Teams intentionally trade autonomy for reliability. Why the constraints? Reliability remains the top unsolved challenge. Practitioners can't verify agent correctness at scale. Public benchmarks rarely apply to domain-specific production tasks. 75% of interviewed teams evaluate without formal benchmarks, relying on A/B testing and direct user feedback instead. 2) Model Selection The model selection pattern surprised researchers. 17 of 20 case studies use closed-source frontier models like Claude Sonnet 4, Claude Opus 4.1, and GPT o3. Open-source adoption is rare and driven by specific constraints: high-volume workloads where inference costs become prohibitive, or regulatory requirements preventing data sharing with external providers. For most teams, runtime costs are negligible compared to the human experts the agent augments. 3) Agent Frameworks Framework adoption shows a striking divergence. 61% of survey respondents use third-party frameworks like LangChain/LangGraph. But 85% of interviewed teams with production deployments build custom implementations from scratch. The reason: core agent loops are straightforward to implement with direct API calls. Teams prefer minimal, purpose-built scaffolds over dependency bloat and abstraction layers. 4) Agent Control Flow Production architectures favor predefined static workflows over open-ended autonomy. 80% of case studies use structured control flow. Agents operate within well-scoped action spaces rather than freely exploring environments. Only one case allowed unconstrained exploration, and that system runs exclusively in sandboxed environments with rigorous CI/CD verification. 5) Agent Adoption What drives agent adoption? It's simply the productivity gains. 73% deploy agents primarily to increase efficiency and reduce time on manual tasks. Organizations tolerate agents taking minutes to respond because that still outperforms human baselines by 10x or more. 66% allow response times of minutes or longer. 6) Agent Evaluation The evaluation challenge runs deeper than expected. Agent behavior breaks traditional software testing. Three case study teams report attempting but struggling to integrate agents into existing CI/CD pipelines. The challenge: nondeterminism and the difficulty of judging outputs programmatically. Creating benchmarks from scratch took one team six months to reach roughly 100 examples. 7) Human-in-the-loop Human-in-the-loop evaluation dominates at 74%. LLM-as-a-judge follows at 52%, but every interviewed team using LLM judges also employs human verification. The pattern: LLM judges assess confidence on every response, automatically accepting high-confidence outputs while routing uncertain cases to human experts. Teams also sample 5% of production runs even when the judge expresses high confidence. In summary, production agents succeed through deliberate simplicity, not sophisticated autonomy. Teams constrain agent behavior, rely on human oversight, and prioritize controllability over capability. The gap between research prototypes and production deployments reveals where the field actually stands. Paper: arxiv.org/abs/2512.04123 Learn design patterns and how to build real-world AI agents in our academy: dair-ai.thinkific.com
DAIR.AI tweet media
English
62
227
1.2K
285.7K
me
me@content_is_all·
@sentdefender Moving the goal posts to avoid a score on her team
English
0
0
0
395
Chris Elmendorf
Chris Elmendorf@CSElmendorf·
I stumbled across the work of Arthur E. Stamps III this morning and, wow, my eyes have been opened! He's was (is?) an architect in San Francisco who wrote scores of academic papers on the mass public's aesthetic preferences & the failure of "design review" to serve them. 🧵/18
Chris Elmendorf tweet mediaChris Elmendorf tweet media
English
7
39
185
31.5K
me retweetledi
Yashar Ali 🐘
Yashar Ali 🐘@yashar·
California Governor Gavin Newsom: “What Donald Trump wants most is your fealty, your silence, your complicity in this moment. Do not give in to him.”
English
194
845
5.7K
126.9K
Tai Rattigan
Tai Rattigan@XOptimiser·
@El_Rooster_mart Biscoff is currency in the air, homie on the left is running the whole yard.
English
1
0
67
10.3K
OSINTdefender
OSINTdefender@sentdefender·
The United States voted today with Russia, North Korea, Belarus and 14 other Moscow-friendly countries, against a resolution condemning Russia’s invasion of Ukraine and calling for its occupied territories to be returned, which passed overwhelmingly in the U.N. General Assembly.
OSINTdefender tweet media
English
839
991
5.3K
1.6M
Max
Max@the_other_max·
@anothercohen Are people really out here 10 years into their career like "this year i'm finally going to take that STATS 201 course I've been thinking about"?
English
2
0
11
1.1K
Alex Cohen
Alex Cohen@anothercohen·
My husband is a lazy piece of shit. Here's what it taught me about b2b sales 🧵
Alex Cohen tweet media
English
138
31
1.8K
330.7K
me retweetledi
Nick Knudsen 🇺🇸
Nick Knudsen 🇺🇸@NickKnudsenUS·
🚨 BREAKING: Wow. This heart-rending ad, when tested, moves swing-state men 2.5 points away from Trump. Massive. Please share everywhere. Women are dying NOW in states with extreme bans. If Republicans win, a national abortion ban is next. Don't look away. #MAGAAbortionBan
English
4K
27.6K
88.6K
9.2M