DisguisedScholar

1.4K posts

DisguisedScholar

@docczeus

Founder, former | @iiit, @NTU, and @USouthFlorida | Researcher | Accessibility Warrior Moot thought to be neutral since my wife is Liberal 🫣.

Tampa, FL Katılım Ocak 2010

418 Takip Edilen62 Takipçiler

DisguisedScholar@docczeus·1d

If a company releases a smart head-based product, like caps, bands, or even smart glasses, be very careful. Your brain waves are the new product..

English

DisguisedScholar@docczeus·1d

@nvidia @OpenAI Nvidia doing this because @claudeai chose @Google TPU's.

English

NVIDIA@nvidia·1d

Congratulations @OpenAI on bringing Codex to more of the software workflow. 🎉 Codex is becoming a system for more of a developer’s workflow, helping them move across tools, create richer outputs, adapt to how they work, and carry longer-running tasks forward.

OpenAI@OpenAI

Codex for (almost) everything. It can now use apps on your Mac, connect to more of your tools, create images, learn from previous actions, remember how you like to work, and take on ongoing and repeatable tasks.

English

1.3K

97.4K

DisguisedScholar@docczeus·1d

but isn't this week we were adding as command in claude.md.

Claude@claudeai

Introducing Claude Opus 4.7, our most capable Opus model yet. It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back. You can hand off your hardest work with less supervision.

English

DisguisedScholar@docczeus·4d

@signulll @Skye its just looks like windows phone theme

English

signüll@signulll·4d

excited to share what we have been up to. your iphone’s home screen hasn’t changed in ~20 years. it’s the same static grid of icons since launch with zero awareness of your actual life. @skye is a new agentic home screen for iphone. no telegram. no mac mini. & no claws required. skye is ambient intelligence that just works. it continuously listens to your context & acts on it. it builds your reading lists, gives you personalized weather, drafts email replies, prepares you for meetings & trips, flags suspicious charges, works through your reminders, tracks your health, & gives you one tap intel on wherever you are (restaurants, museums, neighborhoods, etc). all surfaced on your home screen. over the next few posts i’ll break down how it works, why we built it, & why we think it deserves to exist in the world. beta starts today. if you’re on the list, you’ll get access very soon. app store shortly after. deeply appreciate you all following along on this fun little journey. also please join our discord !

English

496

146

4.2K

1.3M

DisguisedScholar@docczeus·4d

@stevibe can we create support for this on aimarketcap.tech?

English

stevibe@stevibe·4d

I built a macOS app for benchmarking local LLMs. 6 test suites. Multiple providers. One workspace. Open source. There are hundreds of local models now. New ones every week. How do you actually pick one? Leaderboards test for general ability. But if you're building an agent that chains tool calls, or a pipeline that extracts structured data, or a code assistant that needs to debug Rust, you need to know if the model handles that specific thing. Not in theory. On your hardware. With your prompts. The benchmarks that exist are either locked behind papers, too abstract to map to real failures, or impossible to extend. You can't add your own test cases. You can't test what matters to your use case. That's what BenchLocal is for. It's a benchmark platform where every test is practical, deterministic, and built around real-world tasks. And you can build your own tests. It ships with 6 Bench Packs TODAY: → ToolCall-15 — tool-use accuracy → BugFind-15 — debugging capabilities → DataExtract-15 — structured data extraction → InstructFollow-15 — constraint-heavy instruction following → ReasonMath-15 — practical reasoning and math → StructOutput-15 — validator-backed structured output Every pack has 15 fixed scenarios. Every score is deterministic and verifiable. Some of you saw ToolCall-15 and BugFind-15 — the individual test packs I open-sourced over the past few weeks. People ran them, filed issues, sent PRs. But managing separate repos, separate scripts, separate results doesn't scale. BenchLocal puts everything in one place. What the app does: > Workspace with tabs — run BugFind-15 in one tab, ToolCall-15 in another. > Any provider — Ollama, llama.cpp, OpenRouter, any OpenAI-compatible endpoint. Local and cloud, same interface. > Run modes — serial, batch per model, batch per test case, or fully parallel. > Test histories — every run saved. Compare any previous session. But the part I'm most excited about isn't the app. It's the ecosystem. BenchLocal is a platform. Each Bench Pack is a plugin. I'm shipping an SDK so anyone can build their own — test what matters to you, package it, share it. Install and uninstall packs right inside the app, same way you'd manage extensions in VS Code. The registry is GitHub-based, fully public. I built 6 packs. I want the community to build the next 60. Theme system built in too — because if I'm staring at benchmark results for hours, it should at least look good. v0.1.0 is macOS only. Windows and Linux are coming. MIT licensed. Everything — the app, the bench packs, the SDK — is open. PRs welcome. Bench Packs even more welcome.

English

300

49.2K

DisguisedScholar@docczeus·4d

@Anaya_sharma876 get nvbroadcast.

Nederlands

Anaya@Anaya_sharma876·6d

I’m a Windows user, just installed Linux for the first time What should I do first?

English

534

1.2K

1.5M

DisguisedScholar@docczeus·11 Nis

Codex one command and 5H limit is over. Are you guys noticing the same? @OpenAI

English

DisguisedScholar@docczeus·10 Nis

@TheAhmadOsman @heydave7 try this out;github.com/Hkshoonya/spec…

English

Ahmad@TheAhmadOsman·10 Nis

@heydave7 You're using the wrong model on this machine x.com/TheAhmadOsman/…

Ahmad@TheAhmadOsman

Only MoEs should be used on DGX Sparks Unified memory is bandwidth constrained, MoEs help a lot because only a small subset of parameters is processed per token In practice, MoEs are the difference between triple digit tokens/sec under concurrent load & single digit tokens/sec

English

8.5K

Dave Lee@heydave7·10 Nis

This afternoon I picked up a new Nvidia DGX Spark computer with the goal of trying to run Gemma 4 31b (4bit) on it locally as a server. Just 1.5 hours later, it’s working! Using Open WebUI on my MacBook as the interface and it’s connecting to my DGX Spark running as a Gemma 4 server.

English

100

967

129.9K

DisguisedScholar@docczeus·10 Nis

@NVIDIAAIDev try this out; github.com/Hkshoonya/spec…

English

NVIDIA AI Developer@NVIDIAAIDev·9 Nis

If VRAM isn’t eaten by weights, it can go to KV cache and batch size. FlexTensor’s planned tensor offload displaces weight storage into host RAM, so inference stacks like vLLM can scale context and throughput on fixed hardware instead of immediately jumping to multiple GPUs.

Piotr Nawrot@p_nawrot

💾🚀 Run Llama-3.1-405B FP8 (410GB) on a single 180GB GPU #NVIDIA Introducing FlexTensor — NVIDIA's new library that makes host RAM a transparent extension of your GPU memory. One call: flextensor.offload(model). No model rewrites, no framework changes. Works with vLLM, HuggingFace, and any PyTorch model. Traditional offloading is reactive — move data when you run out of memory, stall the GPU while you wait. FlexTensor instead profiles your model's layer access patterns, then solves a knapsack optimization to schedule prefetches that overlap with compute. By the time a layer needs its weights, they're already there. The freed VRAM gives vLLM more room for KV cache — enabling 4x longer contexts (8K→32K) or 4x larger batches. For video generation (Wan2.2-T2V-A14B on GB200): +0.1% overhead. Handles FP8, custom Triton kernels, and multi-GPU. Profiles saved to disk — no warmup on repeated runs. Check it out: github.com/ai-dynamo/flex…

English

498

43.8K

DisguisedScholar@docczeus·10 Nis

@p_nawrot try this out, github.com/Hkshoonya/spec…

English

Piotr Nawrot@p_nawrot·9 Nis

English

218

55.4K

DisguisedScholar@docczeus·9 Nis

Its a marketing gimmick, all these few companies have already stake in them. One can stop the AI by continously monitor and update guardrails you can already see the day and hour of time Claude behaves differently, its thinking capability is poor between 12 est to 4 est. It's just one reason to give it to elite and normal people dont feel bad about not getting AI. These companies using these models do have there internal models and Anthropic is not worried they can train it. I think its the era of elite control. Your hardware first got expensive now your AI will get expensive both out of hands

English

DisguisedScholar@docczeus·9 Nis

@garrytan so you are saying only elite can use AI

English

Garry Tan@garrytan·9 Nis

I think it is inevitable that Anthropic and OpenAI eventually roll out $1000/mo and $10,000/mo plans and then reserve the absolute best frontier models to metered access

Peter Yang@petergyang

As much as I love using Claude Max and ChatGPT Pro, I don't think these all-you-can-use AI subscriptions will last forever. Here's my new deep dive that covers: → Why Anthropic cut off OpenClaw access → How to run local models on your Mac → What I'm seeing on the ground in China 📌 Read now: creatoreconomy.so/p/the-all-you-…

English

248

103

2.3K

532.6K

DisguisedScholar@docczeus·8 Nis

They keep bringing updates on things that people already built and dont don't use claude tokens for, or I am wrong and Claude knows what customers want actually.

Claude@claudeai

Introducing Claude Managed Agents: everything you need to build and deploy agents at scale. It pairs an agent harness tuned for performance with production infrastructure, so you can go from prototype to launch in days. Now in public beta on the Claude Platform.

English

DisguisedScholar@docczeus·8 Nis

So fairness will be optimized, and world will revolve around few. Get yourself hardware atleast run one small model for sanctity.

Anthropic@AnthropicAI

We do not plan to make Mythos Preview generally available. Our goal is to deploy Mythos-class models safely at scale, but first we need safeguards that reliably block their most dangerous outputs. We’ll begin testing those safeguards with an upcoming Claude Opus model.

English

DisguisedScholar@docczeus·8 Nis

@sama resetting 12 hours before most users weekly reset triggers.

English

Sam Altman@sama·8 Nis

To celebrate 3 million weekly codex users, we are resetting usage limits. We will do this every million users up to 10 million. Happy building!

English

1.9K

1.3K

27.4K

DisguisedScholar@docczeus·7 Nis

@sukh_saroy we already know that! Through a garbage equation to AI it will says its awesome.

English

Sukh Sroay@sukh_saroy·5 Nis

🚨Everyone thinks GPT can do math they're wrong new paper called SenseMath just proved LLMs don't have number sense at all this changes everything about how we should use them:

English

173

191

972

580.8K

DisguisedScholar@docczeus·6 Nis

@Dan_Jeffries1 yes, as these things are effective if one person doing not 100's.

English

Daniel Jeffries@Dan_Jeffries1·6 Nis

This is why I don't waste time reading resumes or cover letters anymore.

Unwind AI@unwind_ai_

Holy smokes...someone just turned Claude Code into a full job search system. It evaluated 740+ job offers, generated 100+ tailored CVs, and actually landed him a Head of Applied AI role. And it's 100% OPENSOURCE.

English

8.1K

DisguisedScholar@docczeus·6 Nis

@WazzCrypto have you tried rapidapi?

English

Wazz@WazzCrypto·6 Nis

hosting - $5 domain - $10 ai sub - $19 X API - $148,674 ads - $50 someone who is good at the economy please help me budget this. my X app is dying

Elon Musk@elonmusk

Try using the X API

English

597

38.7K

DisguisedScholar@docczeus·6 Nis

@lubinho_k @digitalshane_ I have done this and was happy till now, but as the code increases the connections and hallucination also increases try running 3 instance of Claude and ask it to check your codebase.😅

English

Luckforest@lubinho_k·4 Nis

Yeah, big batch coding is where things fall apart. Dont do it. What helped me: break everything into small chunks and track them. I use a plan.md where every feature gets broken down into batches, and every batch into individual chunks. I tell Claude to work on one chunk at a time, finish it, then move on. After each chunk, a hook automatically marks it as done and moves it to plan-archive.md with a summary of what was completed. So you always have a history in case you need to revisit something later, but your plan.md stays clean and focused on what's next. The difference is massive. Instead of Claude trying to hold an entire feature in its head and producing messy code, it's focused on one small, clear task. The output quality goes way up. One more thing that works well: if you're using Claude in Cursor, let Composer review and rate each chunk on a scale of 1-10 after Claude finishes it. Composer is solid at auditing and flagging issues. Then feed that review back to Claude and ask if it agrees and whether anything needs fixing. Composer usually rates things between 7.5 and 9, and sometimes it catches things Claude would then fix on the spot. Takes a bit more time per chunk, but the results are so much better than one big batch that ends up getting rewritten three times.

English

299

22.4K

Shane@digitalshane_·4 Nis

I'm not joking, I had Claude write a big batch of code last night. I am troubleshooting rn. I asked it to review, It said this is trash code and it needs completely reworked. We are spending credits to run in circles.

English

537

270

6.2K

298K

DisguisedScholar@docczeus·5 Nis

@cyb3rops is it working well? as far as I know its damn slow.( HIGH Latency)

English

Florian Roth ⚡️@cyb3rops·4 Nis

I agree. I switched to ollama-cloud/gemma4:31b-cloud

ollama@ollama

🦞Ollama's cloud is one of the best places to run OpenClaw. $20 plan is enough for most day to day OpenClaw usage with open models! To make the switch, all you need is to open the terminal and type: ollama launch openclaw Choose a model: kimi-k2.5:cloud glm-5:cloud minimax-m2.7:cloud If you are affected, Ollama welcomes you!! ❤️

English

319

59.5K

Keşfet

@nvidia @OpenAI @claudeai @Google @signulll @Skye @skye @stevibe