DisguisedScholar

1.4K posts

DisguisedScholar banner
DisguisedScholar

DisguisedScholar

@docczeus

Founder, former | @iiit, @NTU, and @USouthFlorida | Researcher | Accessibility Warrior Moot thought to be neutral since my wife is Liberal 🫣.

Tampa, FL Katılım Ocak 2010
418 Takip Edilen62 Takipçiler
DisguisedScholar
DisguisedScholar@docczeus·
If a company releases a smart head-based product, like caps, bands, or even smart glasses, be very careful. Your brain waves are the new product..
English
0
0
0
4
NVIDIA
NVIDIA@nvidia·
Congratulations @OpenAI on bringing Codex to more of the software workflow. 🎉 Codex is becoming a system for more of a developer’s workflow, helping them move across tools, create richer outputs, adapt to how they work, and carry longer-running tasks forward.
OpenAI@OpenAI

Codex for (almost) everything. It can now use apps on your Mac, connect to more of your tools, create images, learn from previous actions, remember how you like to work, and take on ongoing and repeatable tasks.

English
27
83
1.3K
97.4K
signüll
signüll@signulll·
excited to share what we have been up to. your iphone’s home screen hasn’t changed in ~20 years. it’s the same static grid of icons since launch with zero awareness of your actual life. @skye is a new agentic home screen for iphone. no telegram. no mac mini. & no claws required. skye is ambient intelligence that just works. it continuously listens to your context & acts on it. it builds your reading lists, gives you personalized weather, drafts email replies, prepares you for meetings & trips, flags suspicious charges, works through your reminders, tracks your health, & gives you one tap intel on wherever you are (restaurants, museums, neighborhoods, etc). all surfaced on your home screen. over the next few posts i’ll break down how it works, why we built it, & why we think it deserves to exist in the world. beta starts today. if you’re on the list, you’ll get access very soon. app store shortly after. deeply appreciate you all following along on this fun little journey. also please join our discord !
English
496
146
4.2K
1.3M
stevibe
stevibe@stevibe·
I built a macOS app for benchmarking local LLMs. 6 test suites. Multiple providers. One workspace. Open source. There are hundreds of local models now. New ones every week. How do you actually pick one? Leaderboards test for general ability. But if you're building an agent that chains tool calls, or a pipeline that extracts structured data, or a code assistant that needs to debug Rust, you need to know if the model handles that specific thing. Not in theory. On your hardware. With your prompts. The benchmarks that exist are either locked behind papers, too abstract to map to real failures, or impossible to extend. You can't add your own test cases. You can't test what matters to your use case. That's what BenchLocal is for. It's a benchmark platform where every test is practical, deterministic, and built around real-world tasks. And you can build your own tests. It ships with 6 Bench Packs TODAY: → ToolCall-15 — tool-use accuracy → BugFind-15 — debugging capabilities → DataExtract-15 — structured data extraction → InstructFollow-15 — constraint-heavy instruction following → ReasonMath-15 — practical reasoning and math → StructOutput-15 — validator-backed structured output Every pack has 15 fixed scenarios. Every score is deterministic and verifiable. Some of you saw ToolCall-15 and BugFind-15 — the individual test packs I open-sourced over the past few weeks. People ran them, filed issues, sent PRs. But managing separate repos, separate scripts, separate results doesn't scale. BenchLocal puts everything in one place. What the app does: > Workspace with tabs — run BugFind-15 in one tab, ToolCall-15 in another. > Any provider — Ollama, llama.cpp, OpenRouter, any OpenAI-compatible endpoint. Local and cloud, same interface. > Run modes — serial, batch per model, batch per test case, or fully parallel. > Test histories — every run saved. Compare any previous session. But the part I'm most excited about isn't the app. It's the ecosystem. BenchLocal is a platform. Each Bench Pack is a plugin. I'm shipping an SDK so anyone can build their own — test what matters to you, package it, share it. Install and uninstall packs right inside the app, same way you'd manage extensions in VS Code. The registry is GitHub-based, fully public. I built 6 packs. I want the community to build the next 60. Theme system built in too — because if I'm staring at benchmark results for hours, it should at least look good. v0.1.0 is macOS only. Windows and Linux are coming. MIT licensed. Everything — the app, the bench packs, the SDK — is open. PRs welcome. Bench Packs even more welcome.
English
25
28
300
49.2K
Anaya
Anaya@Anaya_sharma876·
I’m a Windows user, just installed Linux for the first time What should I do first?
Anaya tweet media
English
534
38
1.2K
1.5M
DisguisedScholar
DisguisedScholar@docczeus·
Codex one command and 5H limit is over. Are you guys noticing the same? @OpenAI
DisguisedScholar tweet media
English
0
0
0
38
Dave Lee
Dave Lee@heydave7·
This afternoon I picked up a new Nvidia DGX Spark computer with the goal of trying to run Gemma 4 31b (4bit) on it locally as a server. Just 1.5 hours later, it’s working! Using Open WebUI on my MacBook as the interface and it’s connecting to my DGX Spark running as a Gemma 4 server.
English
100
31
967
129.9K
NVIDIA AI Developer
NVIDIA AI Developer@NVIDIAAIDev·
If VRAM isn’t eaten by weights, it can go to KV cache and batch size. FlexTensor’s planned tensor offload displaces weight storage into host RAM, so inference stacks like vLLM can scale context and throughput on fixed hardware instead of immediately jumping to multiple GPUs.
Piotr Nawrot@p_nawrot

💾🚀 Run Llama-3.1-405B FP8 (410GB) on a single 180GB GPU #NVIDIA Introducing FlexTensor — NVIDIA's new library that makes host RAM a transparent extension of your GPU memory. One call: flextensor.offload(model). No model rewrites, no framework changes. Works with vLLM, HuggingFace, and any PyTorch model. Traditional offloading is reactive — move data when you run out of memory, stall the GPU while you wait. FlexTensor instead profiles your model's layer access patterns, then solves a knapsack optimization to schedule prefetches that overlap with compute. By the time a layer needs its weights, they're already there. The freed VRAM gives vLLM more room for KV cache — enabling 4x longer contexts (8K→32K) or 4x larger batches. For video generation (Wan2.2-T2V-A14B on GB200): +0.1% overhead. Handles FP8, custom Triton kernels, and multi-GPU. Profiles saved to disk — no warmup on repeated runs. Check it out: github.com/ai-dynamo/flex…

English
16
56
498
43.8K
Piotr Nawrot
Piotr Nawrot@p_nawrot·
💾🚀 Run Llama-3.1-405B FP8 (410GB) on a single 180GB GPU #NVIDIA Introducing FlexTensor — NVIDIA's new library that makes host RAM a transparent extension of your GPU memory. One call: flextensor.offload(model). No model rewrites, no framework changes. Works with vLLM, HuggingFace, and any PyTorch model. Traditional offloading is reactive — move data when you run out of memory, stall the GPU while you wait. FlexTensor instead profiles your model's layer access patterns, then solves a knapsack optimization to schedule prefetches that overlap with compute. By the time a layer needs its weights, they're already there. The freed VRAM gives vLLM more room for KV cache — enabling 4x longer contexts (8K→32K) or 4x larger batches. For video generation (Wan2.2-T2V-A14B on GB200): +0.1% overhead. Handles FP8, custom Triton kernels, and multi-GPU. Profiles saved to disk — no warmup on repeated runs. Check it out: github.com/ai-dynamo/flex…
Piotr Nawrot tweet media
English
13
32
218
55.4K
DisguisedScholar
DisguisedScholar@docczeus·
Its a marketing gimmick, all these few companies have already stake in them. One can stop the AI by continously monitor and update guardrails you can already see the day and hour of time Claude behaves differently, its thinking capability is poor between 12 est to 4 est. It's just one reason to give it to elite and normal people dont feel bad about not getting AI. These companies using these models do have there internal models and Anthropic is not worried they can train it. I think its the era of elite control. Your hardware first got expensive now your AI will get expensive both out of hands
English
0
0
0
7
DisguisedScholar
DisguisedScholar@docczeus·
@sama resetting 12 hours before most users weekly reset triggers.
English
0
0
0
6
Sam Altman
Sam Altman@sama·
To celebrate 3 million weekly codex users, we are resetting usage limits. We will do this every million users up to 10 million. Happy building!
English
1.9K
1.3K
27.4K
2M
DisguisedScholar
DisguisedScholar@docczeus·
@sukh_saroy we already know that! Through a garbage equation to AI it will says its awesome.
English
0
0
0
5
Sukh Sroay
Sukh Sroay@sukh_saroy·
🚨Everyone thinks GPT can do math they're wrong new paper called SenseMath just proved LLMs don't have number sense at all this changes everything about how we should use them:
Sukh Sroay tweet mediaSukh Sroay tweet media
English
173
191
972
580.8K
Wazz
Wazz@WazzCrypto·
hosting - $5 domain - $10 ai sub - $19 X API - $148,674 ads - $50 someone who is good at the economy please help me budget this. my X app is dying
Elon Musk@elonmusk

Try using the X API

English
26
6
597
38.7K
DisguisedScholar
DisguisedScholar@docczeus·
@lubinho_k @digitalshane_ I have done this and was happy till now, but as the code increases the connections and hallucination also increases try running 3 instance of Claude and ask it to check your codebase.😅
English
0
0
0
6
Luckforest
Luckforest@lubinho_k·
Yeah, big batch coding is where things fall apart. Dont do it. What helped me: break everything into small chunks and track them. I use a plan.md where every feature gets broken down into batches, and every batch into individual chunks. I tell Claude to work on one chunk at a time, finish it, then move on. After each chunk, a hook automatically marks it as done and moves it to plan-archive.md with a summary of what was completed. So you always have a history in case you need to revisit something later, but your plan.md stays clean and focused on what's next. The difference is massive. Instead of Claude trying to hold an entire feature in its head and producing messy code, it's focused on one small, clear task. The output quality goes way up. One more thing that works well: if you're using Claude in Cursor, let Composer review and rate each chunk on a scale of 1-10 after Claude finishes it. Composer is solid at auditing and flagging issues. Then feed that review back to Claude and ask if it agrees and whether anything needs fixing. Composer usually rates things between 7.5 and 9, and sometimes it catches things Claude would then fix on the spot. Takes a bit more time per chunk, but the results are so much better than one big batch that ends up getting rewritten three times.
English
13
20
299
22.4K
Shane
Shane@digitalshane_·
I'm not joking, I had Claude write a big batch of code last night. I am troubleshooting rn. I asked it to review, It said this is trash code and it needs completely reworked. We are spending credits to run in circles.
English
537
270
6.2K
298K