♡nat♡

21.1K posts

♡nat♡ banner
♡nat♡

♡nat♡

@natdotgx

ZKPs Founder I love technology :)

Katılım Temmuz 2023
10K Takip Edilen9.9K Takipçiler
♡nat♡
♡nat♡@natdotgx·
7GMpykefaNm7JVnb6nen1dH11GBn8mdRpsgq1oS2pump
HT
0
0
0
27
♡nat♡
♡nat♡@natdotgx·
Finally built something I actually needed. Introducing PinchBench, an open benchmark for testing LLMs inside real OpenClaw agent workflows. Most benchmarks test isolated skills. Answer a question. Solve math. Write code. That doesn’t matter when your agent has to actually do things. PinchBench runs models through 23 real tasks like: • scheduling meetings • writing + running scripts • triaging email • scraping data • managing files Not “can it answer” But “can it complete the task end to end” Did it actually send the email? Create the file? Finish the workflow? Everything is graded automatically and pushed to a public leaderboard: success rate, speed, and cost across 30+ models You can plug in your own model, run it locally, and see how it stacks Fully open source. All tasks and eval logic are public. This is how you actually measure agents. github.com/agenticbuilder…
♡nat♡ tweet media
English
1
0
0
48
♡nat♡ retweetledi
Johannes Tscharn
Johannes Tscharn@JohannesTscharn·
Excellent explanation on how to think about and use LLM context for coding tasks. Highly recommended read.
robot@alightinastorm

Your context glows too bright: An exploration of context saturation don't flood your room with stadium lights if a flashlight is sufficient Two Types of Context Every vibe coding task involves at least two key pieces: Instruction Context: This is everything that tells the AI how to move your codebase from its current state to the next one. Think requirements, concepts, architectural decisions, and the "why" behind what you want done. State Context: This is the actual code and structure that shows the AI what exists right now. File contents, folder structures, existing patterns, and development conventions. Getting the mix right between these two determines whether your AI coding session will be smooth sailing or a frustrating mess. The Spectrum of Context Approaches Under-saturated Instructions, No State This is like asking someone to "fix the login thing" without showing them any code. The AI gets a vague instruction and has to guess what you want and how your codebase works. It'll spend time exploring files, making wrong assumptions about your architecture, and probably missing important pieces. You'll end up with code that kind of works but doesn't fit your system. Saturated Instructions, No State Here you give the AI a detailed breakdown of exactly what to do, with clear requirements and concepts. But you don't show it your current code. This can work for small projects, but larger codebases become problematic. The AI tends to ignore your existing patterns and conventions because it can't see them. It'll write one-off functions instead of using your established utilities. I call this "vibe drill coding" because the AI just drills straight through your codebase without respecting what's already there. Saturated Instructions, Saturated State This is the sweet spot. You give the AI comprehensive instructions about what to do AND you show it all the relevant existing code. It can see your patterns, understand your architecture, and build on what's already there. The AI doesn't need to do much discovery work and can jump straight into implementation that fits your system. Oversaturated Context When you go too far in either direction, problems emerge. Oversaturated instructions can confuse the AI with conflicting requirements or too many edge cases to consider at once. It might get paralyzed trying to satisfy everything or pick the wrong priorities. Oversaturated state context is equally problematic. Dump too much code on the AI and it loses focus on what's actually relevant. It might get distracted by unrelated files or try to refactor things that weren't part of the original task. ¹ Finding Your Balance Different tasks need different approaches. A small bug fix might need minimal context, while a major feature addition requires showing the AI how your entire authentication system works. Navigating the evolution of a codebase over time requires careful thought about context saturation for each specific task. So, the next time someone tells you that vibe coding doesn’t work, we may call them out on having context issues, lol. ¹ : The signal gets lost in the noise. Just like X's new algo does, LOL.

English
0
2
9
2.3K
♡nat♡ retweetledi
Johannes Tscharn
Johannes Tscharn@JohannesTscharn·
Doing some work on the 6 axis robot this afternoon
English
5
4
99
9.8K
♡nat♡ retweetledi
Johannes Tscharn
Johannes Tscharn@JohannesTscharn·
I’m giving an agent control over Reachy Mini from @huggingface and letting it understand and share spatial data via @Spectacles AR is the human interface for robotics and physical AI imo. It feels like absolute magic to interact with this, both in voice/agent and “puppeteering” mode. I’ll probably work on AR for either an arm (manipulation tasks) or some sort of drone (locomotion in 3D space) next… Project is fully open source btw: github.com/V4C38/spectacl… Thank you @SensAIHackademy for sending me the robot!
English
25
59
342
34.9K
♡nat♡ retweetledi
Qwen
Qwen@Alibaba_Qwen·
🚀 Introducing the Qwen 3.5 Medium Model Series Qwen3.5-Flash · Qwen3.5-35B-A3B · Qwen3.5-122B-A10B · Qwen3.5-27B ✨ More intelligence, less compute. • Qwen3.5-35B-A3B now surpasses Qwen3-235B-A22B-2507 and Qwen3-VL-235B-A22B — a reminder that better architecture, data quality, and RL can move intelligence forward, not just bigger parameter counts. • Qwen3.5-122B-A10B and 27B continue narrowing the gap between medium-sized and frontier models — especially in more complex agent scenarios. • Qwen3.5-Flash is the hosted production version aligned with 35B-A3B, featuring: – 1M context length by default – Official built-in tools 🔗 Hugging Face: huggingface.co/collections/Qw… 🔗 ModelScope: modelscope.cn/collections/Qw… 🔗 Qwen3.5-Flash API: modelstudio.console.alibabacloud.com/ap-southeast-1… Try in Qwen Chat 👇 Flash: chat.qwen.ai/?models=qwen3.… 27B: chat.qwen.ai/?models=qwen3.… 35B-A3B: chat.qwen.ai/?models=qwen3.… 122B-A10B: chat.qwen.ai/?models=qwen3.… Would love to hear what you build with it.
Qwen tweet media
English
435
1.1K
8.1K
4M
♡nat♡ retweetledi
🤷 Nico Martin
🤷 Nico Martin@nic_o_martin·
TranslateGemma 4B by @GoogleDeepMind now runs 100% in your browser on WebGPU with Transformers.js v4. 55 languages. No server. No data leaks. Works offline. A 4B parameter translation powerhouse, right in your browser. Try the demo 👇
English
29
129
1.3K
121K
♡nat♡
♡nat♡@natdotgx·
Been gradually building out my "Clawscan" dashboard to stay on top of OpenClaw coding sub agents, my team of agents, etc. The best feature is being able to attach to things like Copilot CLI and see what's going on there as it runs in the background. github.com/renzdevs/claws…
♡nat♡ tweet media♡nat♡ tweet media
English
2
0
6
2.9K
♡nat♡ retweetledi
Andrew Ng
Andrew Ng@AndrewYNg·
How can businesses go beyond using AI for incremental efficiency gains to create transformative impact? I write from the World Economic Forum (WEF) in Davos, Switzerland, where I’ve been speaking with many CEOs about how to use AI for growth. A recurring theme is that running many experimental, bottom-up AI projects — letting a thousand flowers bloom — has failed to lead to significant payoffs. Instead, bigger gains require workflow redesign: taking a broader, perhaps top-down view of the multiple steps in a process and changing how they work together from end to end. Consider a bank issuing loans. The workflow consists of several discrete stages: Marketing -> Application -> Preliminary Approval -> Final Review -> Execution Suppose each step used to be manual. Preliminary Approval used to require an hour-long human review, but a new agentic system can do this automatically in 10 minutes. Swapping human review for AI review — but keeping everything else the same — gives a minor efficiency gain but isn’t transformative. Here’s what would be transformative: Instead of applicants waiting a week for a human to review their application, they can get a decision in 10 minutes. When that happens, the loan becomes a more compelling product, and that better customer experience allows lenders to attract more applications and ultimately issue more loans. However, making this change requires taking a broader business or product perspective, not just a technology perspective. Further, it changes the workflow of loan processing. Switching to offering a “10-minute loan” product would require changing how it is marketed. Applications would need to be digitized and routed more efficiently, and final review and execution would need to be redesigned to handle a larger volume. Even though AI is applied only to one step, Preliminary Approval, we end up implementing not just a point solution but a broader workflow redesign that transforms the product offering. At AI Aspire (an advisory firm I co-lead), here’s what we see: Bottom-up innovation matters because the people closest to problems often see solutions first. But scaling such ideas to create transformative impact often requires seeing how AI can transform entire workflows end to end, not just individual steps, and this is where top-down strategic direction and innovation can help. This year's WEF meeting, as in previous years, has been an energizing event. Among technologists, frequent topics of discussion include Agentic AI (when I coined this term, I was not expecting to see it plastered on billboards and buildings!), Sovereign AI (how nations can control their own access to AI), Talent (the challenging job market for recent graduates, and how to upskill nations), and data-center infrastructure (how to address bottlenecks in energy, talent, GPU chips, and memory). I will address some of these topics in future posts. Against the backdrop of geopolitical uncertainty, I hope all of us in AI will keep building bridges that connect nations, sharing through open source, and building to benefit all nations and all people. [Original text: deeplearning.ai/the-batch/issu… ]
English
88
155
734
74K
♡nat♡ retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
So our 45-person team developed entirely new AI capabilities, enabling them to: 🎨 Fine-tune custom Veo and Imagen models on their paintings and artwork 📹 Provide a desired look through rough animations, which the models transformed into stylized videos 🎭 Edit specific regions without regenerating entire shots from scratch
English
14
19
275
93.5K
♡nat♡ retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
Our short film Dear Upstairs Neighbors is previewing at @sundancefest. 🎬 It’s a story about noisy neighbors, but behind the scenes, it’s about solving a huge challenge in generative AI: control. Developed by Pixar alumni, an Academy Award winner, researchers, and engineers, here’s how it came together. 🎨
English
369
412
3.4K
2.2M
♡nat♡ retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
We're helping AI to see the 3D world in motion as humans do. 🌐 Enter D4RT: a unified model that turns video into 4D representations faster than previous methods - enabling it to understand space and time. This is how it works 🧵
English
91
437
3.2K
367.1K
♡nat♡ retweetledi
OpenAI
OpenAI@OpenAI·
Meet the Frontier Builders.
English
155
106
1.5K
244K
♡nat♡ retweetledi
Sam Altman
Sam Altman@sama·
Tomorrow we’re hosting a town hall for AI builders at OpenAI. We want feedback as we start building a new generation of tools. This is an experiment and a first pass at a new format — we’ll livestream the discussion on YouTube at 4 pm PT. Reply here with questions and we’ll answer as many as we can!
English
4.2K
677
8.2K
1.7M
♡nat♡ retweetledi
OpenAI
OpenAI@OpenAI·
We’re rolling out age prediction on ChatGPT to help determine when an account likely belongs to someone under 18, so we can apply the right experience and safeguards for teens. Adults who are incorrectly placed in the teen experience can confirm their age in Settings > Account. Rolling out globally now. EU to follow in the coming weeks. openai.com/index/our-appr…
English
796
558
4.4K
1.4M
♡nat♡ retweetledi
Nockchain
Nockchain@nockchain·
Remember that if Satoshi had access to modern ZK he would have built Nockchain. It's not just about private transactions, as in Zcash. A truly ZK-maxxing Bitcoin would have miners generate proofs instead of hashes, and the ledger would carry proofs rather than transaction data!
English
2
8
86
3.4K
♡nat♡ retweetledi
Nockchain
Nockchain@nockchain·
AI makes hacking cheap, which will put every small business at risk. Nockchain uses ZK to guarantee the security of business software, instead of expensive cybersecurity defense. Security should not be a luxury exclusively available to big businesses.
English
2
13
94
3.3K
♡nat♡ retweetledi
MakeInfinite Labs
MakeInfinite Labs@MakeInfiniteCo·
The cryptography team at MakeInfinite Labs pushed some new performance upgrades to the @spaceandtime Proof of SQL repo this week. The protocol can now prove queries against 1 million rows of data in less than a second. We are grateful to build with @NVIDIA and the Space and Time ecosystem to accelerate ZK proofs together. 🤝 → View the updated benchmarks: github.com/spaceandtimefd…
MakeInfinite Labs tweet media
English
2
21
65
9.3K