
Klick Health
12.3K posts

Klick Health
@klickhealth
A commercial partner laser focused on developing, launching, and supporting life sciences brands to maximize their full market potential since 1997.



Gemini 3 Deep Think (2/26) Semi Private Eval - ARC-AGI-1: 96.0%, $7.17/task - ARC-AGI-2: 84.6% $13.62/task New ARC-AGI SOTA model from @GoogleDeepMind

GPT-5.3-Codex-Spark is now in research preview. You can just build things—faster.

GLM-5 is the new leading open weights model! GLM-5 leads the Artificial Analysis Intelligence Index amongst open weights models and makes large gains over GLM-4.7 in GDPval-AA, our agentic benchmark focused on economically valuable work tasks GLM-5 is @Zai_org's first new architecture since GLM-4.5 - each of the GLM-4.5, 4.6 and 4.7 models were 355B total / 32B active parameter mixture of experts models. GLM-5 scales to 744B total / 40B active, and integrates DeepSeek Sparse Attention. This puts GLM-5 more in line with the parameter count of the DeepSeek V3 family (671B total / 37B active) and Moonshot’s Kimi K2 family (1T total, 32B active). However, GLM-5 is released in BF16 precision, coming in at ~1.5TB in total size - larger than DeepSeek V3 and recent Kimi K2 models that have been released natively in FP8 and INT4 precision respectively. Key takeaways: ➤ GLM-5 scores 50 on the Intelligence Index and is the new open weights leader, up from GLM-4.7's score of 42 - an 8 point jump driven by improvements across agentic performance and knowledge/hallucination. This is the first time an open weights model has achieved a score of 50 or above on the Artificial Analysis Intelligence Index v4.0, representing a significant closing of the gap between proprietary and open weights models. It places above other frontier open weights models such as Kimi K2.5, MiniMax 2.1 and DeepSeek V3.2. ➤ GLM-5 achieves the highest Artificial Analysis Agentic Index score among open weights models with a score of 63, ranking third overall. This is driven by strong performance in GDPval-AA, our primary metric for general agentic performance on knowledge work tasks from preparing presentations and data analysis through to video editing. GLM-5 has a GDPval-AA ELO of 1412, only below Claude Opus 4.6 and GPT-5.2 (xhigh). GLM-5 represents a significant uplift in open weights models' performance on real-world economically valuable work tasks ➤ GLM-5 shows a large improvement on the AA-Omniscience Index, driven by reduced hallucination. GLM-5 scores -1 on the AA-Omniscience Index - a 35 point improvement compared to GLM-4.7 (Reasoning, -36). This is driven by a 56 p.p reduction in the hallucination rate compared to GLM-4.7 (Reasoning). GLM-5 achieves this by abstaining more frequently and has the lowest level of hallucination amongst models tested ➤ GLM-5 used ~110M output tokens to run the Intelligence Index, compared to GLM-4.7's ~170M output tokens, a significant decrease despite higher scores across most evaluations. This pushes GLM-5 closer towards the frontier of the Intelligence vs. Output Tokens chart, but is less token efficient compared to Opus 4.6 Key model details: ➤ Context window: 200K tokens, equivalent to GLM-4.7 Multimodality: Text input and output only - Kimi K2.5 remains the leading open weights model to support image input ➤ Size: 744B total parameters, 40B active parameters. For self-deployment, GLM-5 will require ~1,490GB of memory to store the weights in native BF16 precision ➤ Licensing: MIT License Availability: At the time of sharing this analysis, GLM-5 is available on Z AI's first-party API and several third-party APIs such as @novita_labs ($1/$3.2 per 1M input/output tokens), @gmi_cloud ($1/$3.2) and @DeepInfra ($0.8/$2.56), in FP8 precision ➤ Training Tokens: Z AI also indicated it has increased pre-training data volume from 23T to 28.5T tokens

Today we share a technical report demonstrating how our drug design engine achieves a step-change in accuracy for predicting biomolecular structures, more than doubling the performance of AlphaFold 3 on key benchmarks and unlocking rational drug design even for examples it has never seen before. Head to the comments to read our blog.

We’re starting to roll out a test for ads in ChatGPT today to a subset of free and Go users in the U.S. Ads do not influence ChatGPT’s answers. Ads are labeled as sponsored and visually separate from the response. Our goal is to give everyone access to ChatGPT for free with fewer limits, while protecting the trust they place in it for important and personal tasks. openai.com/index/testing-…

The past year has seen an explosion in coding productivity @FT

It's Christmas morning: @OpenAI and @AnthropicAI shipped new models on the same day! We tested GPT 5.3 Codex vs. Opus 4.6 head-to-head. Verdict: the models are converging. Here’s what we found 🧵 every.to/p/codex-vs-opus

We estimate that GPT-5.2 with `high` (not `xhigh`) reasoning effort has a 50%-time-horizon of around 6.6 hrs (95% CI of 3 hr 20 min to 17 hr 30 min) on our expanded suite of software tasks. This is the highest estimate for a time horizon measurement we have reported to date.

With the Codex app you can: - Multitask effortlessly: Work with multiple agents in parallel and keep agent changes isolated with worktrees - Create & use skills: package your tools + conventions into reusable capabilities ⁃ Set up automations: delegate repetitive work to Codex with scheduled workflows in the background

A pretty bold commentary in Nature written by linguists, computer scientists and philosophers declaring "by reasonable standards, including Turing’s own, we have artificial systems that are generally intelligent. The long-standing problem of creating AGI has been solved."



Moltbots/Clawdbots now have their own social network (@moltbook) and it's wild. This is the first time I'm a little scared... You need to watch this.


Step inside Project Genie: our experimental research prototype that lets you create, edit, and explore virtual worlds. 🌎

A common heuristic in LLM agent design—"more agents is better"—might be wrong. Across 180 configurations, we find multi-agent coordination is task-contingent: +81% on parallelizable tasks (finance), but -70% on sequential ones (planning). Architecture-task alignment matters more than agent count.

Kimi K2.5 has arrived! 🥝 Here are 2 things to know: Aesthetic Coding x Agent Swarm.