IBM Developer

84.5K posts

IBM Developer banner
IBM Developer

IBM Developer

@IBMDeveloper

Join our community to explore agentic AI, data science & cloud tech through tutorials, challenges & expert insights. Build skills, learn & grow with us.

Global Tham gia Eylül 2008
34.6K Đang theo dõi122K Người theo dõi
IBM Developer
IBM Developer@IBMDeveloper·
Join HashiCorp experts on June 10 for Preparing for a Professional-level HashiCorp Certification Exam (Vault & Terraform). Prep tips and more in the live Q&A: ibm.co/6013E3MEx
English
0
0
2
1.1K
IBM Developer
IBM Developer@IBMDeveloper·
Working toward a HashiCorp Pro certification? Knowing the product and being prepared for the exam aren’t always the same thing. 😅
English
1
0
5
1.5K
IBM Developer
IBM Developer@IBMDeveloper·
Build AI agents and MCP tools in watsonx Orchestrate using IBM Bob. 🏗️ In this walkthrough, Ahmed Azraq shows how to build an MCP server with Bob, from designing to implementation. 🎥: ibm.co/6012E3z5m
English
4
5
40
6.6K
IBM Developer
IBM Developer@IBMDeveloper·
Take it from idea to reality—and compete for a share of $15,000 in cash prizes.💸
English
1
1
2
2K
IBM Developer
IBM Developer@IBMDeveloper·
The AI Builders Challenge with IBM Bob is now open for university students. That AI project you've been thinking about? 👀 Build it. 🏗️
English
1
3
5
2K
IBM Developer đã retweet
merve
merve@mervenoyann·
everyone's building simple agents meanwhile IBM is building robust enterprise agents in production, and it's open-source they just dropped a blog on HF breaking down how to go beyond LLMs & agents: structured reasoning, tool use, and more to scale AI across enterprise
merve tweet media
English
9
23
141
16.1K
IBM Developer
IBM Developer@IBMDeveloper·
Building with Terraform or Vault? 🏗️ Join @HashiCorp experts for live exam prep sessions designed to help you assess your knowledge, brush up on key concepts, and prepare for certification. Sign up today → ibm.co/6045EMUo5
IBM Developer tweet media
English
0
1
5
1K
IBM Developer
IBM Developer@IBMDeveloper·
New to IBM Bob? This quickstart helps you get started with: 🧰 Environment setup 📂 Codebase exploration 🛠️ Feature development 🔄 Testing changes
IBM Developer tweet media
English
2
5
9
1.5K
IBM Developer đã retweet
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
Artificial Analysis and IBM Research are launching ITBench-AA, the first in a new series of benchmarks evaluating models on agentic enterprise IT tasks, starting with Site Reliability Engineering tasks where frontier models score below 50% ITBench-AA’s SRE tasks benchmark model performance on Kubernetes incident response, where models must diagnose live systems by reading logs, tracing dependencies, and identifying root-cause entities across complex infrastructure. The underlying ITBench dataset has been developed by @IBM's Software Innovation Lab, leveraging IBM’s deep expertise in enterprise IT operations Artificial Analysis has worked closely with IBM over the last 6 months to develop a implementation of the dataset for frontier AI evaluation, beginning with Site Reliability Engineering (SRE) and expanding to Financial Operations (FinOps) and Chief Information Security Officer (CISO) tasks over time ITBench-AA SRE overview: ➤ 59 SRE tasks in total: 40 public tasks and 19 brand new, held-out tasks ➤ Each task provides a Kubernetes incident snapshot containing alerts, events, traces, metrics, logs, and application topology. The model must identify the minimal set of independent root-cause Kubernetes entities responsible for the incident ➤ Faults span typical SRE failure modes including infrastructure, service, application, and chaos-injected incidents, such as resource quota exhaustion, rollout failures, connection pool exhaustion, and network partitions Methodology details: ➤ Agentic harness: each task is solved by the model running in our open-source Stirrup reference harness, with shell access to a sandboxed file system containing the relevant logs and snapshots. 100-turn cap per task, 3 repeats per task ➤ Models submit a list of root-cause entities (Kubernetes Deployments, Services, Pods, etc.) they believe caused the incident. Each submission is compared against a ground-truth set of root causes provided by IBM Research ➤ Scoring uses average precision at full recall: if a model misses any of the ground-truth root causes, it scores 0.0 for that repeat. If it identifies all of them, it is awarded a score equal to its precision - the share of its submitted entities that are actual root causes, i.e. true positives / (true positives + false positives). The headline score is the average across 59 tasks × 3 repeats. ➤ The harness (Stirrup) is held constant across all evaluated models, allowing an apples-to-apples comparison between models. Key findings: ➤ Claude Opus 4.7 (Adaptive Reasoning, Max Effort) leads at 47%, followed by GPT-5.5 (xhigh) at 46% and Qwen3.7 Max at 42% ➤ All frontier models score below 50%, making ITBench-AA SRE one of the least saturated agentic benchmarks in our suite. For context, frontier models score considerably higher on Terminal-Bench ➤ Turn counts vary nearly 3x and longer trajectories do not translate to higher accuracy. GPT-5.5 (xhigh) averages 31 turns per task at 46%, while Gemini 3.1 Pro Preview averages 83 turns at 30%. Models that over-investigate tend to surface upstream fault-injection mechanisms or co-occurring symptoms as false positives ➤ GLM-5.1 (Reasoning) leads open weights models at 40%, effectively tied with Gemini 3.5 Flash (high). DeepSeek V4 Pro (Reasoning, Max Effort) follows at 38%, with Gemma 4 31B (Reasoning) at 37%, ahead of Gemini 3.1 Pro Preview at 30%
Artificial Analysis tweet media
English
32
78
554
200.7K
IBM Developer
IBM Developer@IBMDeveloper·
Another IDE? There are already too many of those. Nicholas Renotte gives an honest review of IBM Bob and what it actually feels like to build with it.👇
English
0
1
13
1.8K
IBM Developer
IBM Developer@IBMDeveloper·
Top 5 Horror Movies 1. “The PM vibe-coded a feature and it needs to ship today.” 2. +123918, -1012 3. “I pushed the wrong migration to production.” 4. “What’s staging?” 5. "We don't have an on-call rotation; it hasn't come up."
English
0
0
5
1.1K
IBM Developer
IBM Developer@IBMDeveloper·
Tap the bulb to meet your dev partner👇 💡
English
4
0
8
2.6K
IBM Developer
IBM Developer@IBMDeveloper·
Writing the playbook is only part of the work. The real overhead is: ⚙️ handlers 👥 roles 📄 templates 🔄 conventions 📚 docs @alexsotob shows how Bob can generate and organize that scaffolding automatically: ibm.biz/~t0WMP3FIA
IBM Developer tweet media
English
0
2
7
842