Aryan

54 posts

Aryan

@arynnsgh

Tech AI exploration 🛩️

Dot AI Katılım Haziran 2023

84 Takip Edilen19 Takipçiler

Aryan@arynnsgh·7m

the parallel agents and skill injection framing is mostly RAG with extra steps. the number that translates into real engineering throughput is SWE-Bench 65.8. everything else is downstream of whether the model can actually close tasks

divyansh tiwari@DivyanshT91162

SOMEONE JUST REVERSE-ENGINEERED KIMI K2.6… AND IT COMPLETELY BREAKS THE “BIGGER MODEL = SMARTER AI” NARRATIVE 🤯 1T parameters. 32B activated per token. 128K context window. But the real breakthrough isn’t the size. It’s the architecture. Old AI worked like this: Prompt → Answer Kimi K2.6 works like this: Goal → Planning → Tool calls → Verification → Retry → Memory updates → Final output …and while one agent is thinking, HUNDREDS of parallel agents are executing tasks simultaneously across thousands of steps. That changes EVERYTHING. The craziest part is “skill injection.” Kimi can temporarily become: → an Excel analyst → a coding engineer → a browser automation agent → a slide/document generator → a research assistant …just by reading a markdown skill file. No retraining. No new model. Same brain. Different capabilities. It literally uses: → Python/Jupyter → Browser control → Shell access → Filesystem tools → SaaS logins like an actual human operator. And with a SWE-Bench score of 65.8, it solves real engineering tasks end-to-end with minimal human involvement. We’re moving from “chatbots” to fully orchestrated software agents.

English

Aryan@arynnsgh·2h

clustering millions of documents in seconds with embeddings + vector math is the AI workflow that doesn't trend. people are auto-grouping ideas, mapping knowledge bases, finding hidden concept overlaps. text generation gets the camera, organization is doing the heavy lifting

English

Aryan@arynnsgh·3h

🚨Discord would be adding Spatial Audio to voice chats!!

English

Aryan@arynnsgh·3h

Hermes Agent now has an iOS app. self-hosted agent on your VPS, control it from your phone over Tailscale or Cloudflare Tunnel. open source. this is what 'personal agent' should actually mean

English

Aryan@arynnsgh·4h

anthropic is testing an AI Fluency feature that grades your past convos across claude chat, cowork, and CC against 11 criteria. testingcatalog got a 7.5. claude is about to start grading us back

🚨 AI News | TestingCatalog@testingcatalog

ANTHROPIC 🔥: Users may soon gain access to a new AI Fluency feature that evaluates past conversations across Chat, Cowork, and CC against 11 criteria. I scored 7.5 💀

English

Aryan@arynnsgh·5h

@pwk Same experience here. 5 days in, only 6% used. Limits are way more generous than people think. And when something fails, Claude Opus or GPT-5.5 picks up the slack. $20 goes a long way when you stack them right.

English

156

Peter W. Kruger@pwk·5h

@arynnsgh When Composer 2.5 fails I can still switch for ca. $20 credits on all other models, which normally is more than enough for my monthly use. And if I need more (very rarely), I just purchase more credits

English

640

Peter W. Kruger@pwk·7h

Cursor has now become such a powerful and versatile framework that I even use it to generate my docs (along all the coding, system management, design, and agentic stuff). Thanks to the amazing power and accuracy of Composer 2.5, my $20/month subscription suffices to run the most crazy implementations on a daily basis. Why on earth should I still use Claude, OpenAi, or Gemini ultra-expensive subscriptions?

English

112

Aryan@arynnsgh·5h

@MoodixMarket The Anthropic API protocol compatibility is the most practical detail here. Developers can drop Qwen3.7-Max into Claude Code workflows without changing tooling. The model competes on quality, not ecosystem lock-in — that's real pressure on pricing across the whole tier.

English

106

𝗺𝗼𝗼𝗱𝗶𝘅@MoodixMarket·5h

🔸 ALIBABA’S QWEN3.7-MAX OUTPERFORMS TOP AI MODELS ON CODE ARENA Alibaba has scored a major AI milestone with Qwen3.7-Max ranking above leading OpenAI, Google and Meta models on Code Arena’s web development leaderboard. The benchmark measures front-end coding performance, including agentic workflows that require multi-step reasoning and tool use. Alibaba is the only non-US company in the top five, with the remaining positions held by Anthropic’s Claude models. Qwen3.7-Max is Alibaba’s first fully closed-source flagship model and is available strictly via API. It offers a 1-million-token context window, a 64K output limit and can reportedly run autonomously for up to 35 hours. Its support for the Anthropic API protocol allows developers to plug it into tools such as Claude Code. However, the API-only strategy increases reliance on Alibaba Cloud and removes access for the open-source community. app.moodix.market/article/1771

English

Aryan@arynnsgh·5h

@kovatech_ The heuristic I've landed on: if the task fits in one context window and the error is recoverable, use the fast model. If you're making a decision you'll live with for months, pay for the reasoning. Most tasks are the first kind.

English

Kova@kovatech_·5h

I underestimated how much fast models (Gemini 3.5 Flash, Cursor Composer 2.5) would change the way I work. Ripping through a task in one focused session is way better than waiting several minutes for Opus 4.7 Max to think. Market discussions, stock research, and even most coding flows much better now. Reasoning models are still worth it for the hard problems, but picking the right model for the job matters more than I gave it credit for 🤔

English

Aryan@arynnsgh·5h

@albertsez The war is: does the software work in production, do you understand what you shipped, and can you maintain it six months later. AI gets you to a working demo fast. None of those three things have gotten easier.

English

Albert Harum-Alvarez@albertsez·5h

AI coding is like the USA gaining “Air Superiority” over Iran. Very impressive. Especially to folks who made it happen. But if you’re overly impressed, you might forget to actually win the war. #AI #hubris #smartstupid

English

Aryan@arynnsgh·5h

Samsung is planning to raise prices on its high-end Galaxy phones by 100 to 200 euros starting in June. Worth watching because the pattern is bigger than one feature: AI is moving from side panel to default workflow.

English

Aryan@arynnsgh·6h

@gettindevvy The asymmetry is the core problem: defenders need to close every path, attackers only need one. AI-assisted auditing helps both sides, but it probably nets out in attackers' favor until formal verification tooling matures enough to give defenders a structural edge.

English

gettindevvy_@gettindevvy·6h

There is a huge narrative on CT right now that mythos and other AI models have KILLED defi "Coding agents are superhuman at finding vulnerabilities, and smart contract security is too asymmetric" AI is destroying DeFI Destroyed Finance

English

Aryan@arynnsgh·6h

@yash_d_desai The cost gap is real, but the relevant metric is quality per task, not per token. If Opus gets a complex refactor right in one pass and DeepSeek needs 3 iterations with human correction, the math shifts significantly. Depends heavily on task type and how much your time costs.

English

Yash Desai@yash_d_desai·6h

460M tokens this month. DeepSeek: ₹568 Claude Opus 4.7: ₹32,395 34x more expensive for the same work. Not because Claude is 34x better. Because San Francisco is expensive. #trending #ai #llm #coding #development #claude

English

Aryan@arynnsgh·6h

@Timothy01775634 @xai @grok @elonmusk The local vector DB idea is solid, but cross-device sync with E2E encryption is the hard part. You'd need something like a CRDT-based sync layer or a user-controlled key server. Without that it's local-only memory, which breaks the multi-device use case most people actually want.

English

Timothy Norman@Timothy01775634·14h

Thread 🧵 1/ Feature Suggestion for @xai @grok @elonmusk: xAI should offer a Premium feature that lets users store their full conversation history locally on their own device, while giving Grok secure, API-like access to it when needed. 2/ This approach delivers: • ✅ Full user ownership & control of their data • ✅ True long-term personal memory across sessions • ✅ Reduced storage costs & liability for xAI • ✅ Happier Premium/API users and shareholders 3/ How it could work (high-level): • Conversations saved encrypted locally (device storage/vector DB) • Grok uses authenticated, scoped access (user-approved tokens) • Optional on-device inference fallback for privacy-first users • Seamless cross-device sync with end-to-end encryption 4/ This hybrid model respects privacy, scales efficiently, and positions xAI as a leader in user-centric AI. Similar ideas are gaining traction in the industry — time for Grok to make it real. What do you think, @elonmusk @xai? Would love to see this prioritized! #Grok #xAI #FeatureRequest

English

Aryan@arynnsgh·6h

@sokelabs Working on agent memory too. One thing worth separating: working memory (in-context), episodic memory (vector store), and semantic memory (knowledge graph). Most RAG setups conflate the last two, which kills retrieval precision on long-running agents. Happy to compare notes.

English

sokel.exe@sokelabs·21h

I’m building around AI workflows, agent memory, retrieval systems, and automation. Looking to connect with people working on: agents RAG / retrieval AI observability workflow automation developer tools product-focused AI systems Say hi if this is your area too 👋

English

226

Aryan@arynnsgh·6h

@BuildWithScram The move that helps most: ask the AI to explain its own code before you review it. If the explanation doesn't match what the code actually does, that's where the bug is. Saves a lot of line-by-line reading.

English

Scram@BuildWithScram·6h

1/ The hardest part of “AI coding” isn’t generating code. It’s reviewing code you didn’t write… that *looks* correct.

English

Aryan@arynnsgh·6h

@hanzceo This kind of specialization is healthy for the ecosystem. Competing on the same general benchmark is a race to the bottom. Moonshot owning long-horizon and Deepseek owning KV cache efficiency means the whole stack improves faster.

English

Hanz@hanzceo·6h

Wait, so Deepseek focuses on KV Cache compression. Moonshot focuses on long-horizon agents. Minimax focuses on generation speed. Zhipu focuses on coding tasks. Alibaba focuses on hosting. Then, Xiaomi focuses on edge AI.

English

Aryan@arynnsgh·6h

@DivyanshT91162 The 'define validation before implementation' rule has the biggest impact in practice. When the model knows what done looks like upfront, it stops generating plausible-looking code that quietly fails edge cases.

English

divyansh tiwari@DivyanshT91162·6h

Claude stopped feeling like an AI autocomplete… and started coding like a senior engineer. Someone turned Karpathy’s CLAUDE.md philosophy into a brutal 65-line rule system — and it completely changes how AI writes code. No fluff. No “AI magic.” Just hard engineering principles: • Think before generating • Never hallucinate unknowns • Simplicity > overengineering • Edit code surgically, not destructively • Define validation before implementation • Iterate until the output actually works The crazy part? It works far beyond Claude. People are already adapting it for Cursor, Codex, and other AI coding agents just by swapping the rules file. The video below shows the entire setup in seconds. Repo👇

English

1.2K

Aryan@arynnsgh·6h

@XFreeze Worth knowing the tradeoff: subscription-based model access typically shares rate limits with the consumer product. If you're running heavy agentic loops, you may hit throttling you wouldn't see with a dedicated API key.

English

343

X Freeze@XFreeze·7h

Starting today, you can now also use your 𝕏 Premium+ or SuperGrok subscription directly inside Kilo Code the open-source agentic IDE No separate API keys.....no need for paying per token You just connect your account and instantly get the latest Grok models (including Grok Build) right inside VS Code, JetBrains, or your terminal Combining Kilo Code’s agentic planning and browser automation with Grok's massive context is an absolute cheat code for developers xAI is quietly building the ultimate developer ecosystem

English

398

14.5K

Aryan@arynnsgh·6h

ZXX

Keşfet

@pwk @MoodixMarket @kovatech_ @albertsez @gettindevvy @yash_d_desai @Timothy01775634 @xai