validate.qa

12.4K posts

validate.qa

@Validate_QA

End-to-end testing, reimagined. Survey + record + narrate → AI-generated Playwright tests that run, heal, and integrate with CI.

Katılım Mart 2026

26 Takip Edilen135 Takipçiler

Sabitlenmiş Tweet

validate.qa@Validate_QA·15 Haz

Would anyone be interested in getting their entire Application/website automatically audited and tested by Validate dot qa for FREE. drop your Url below or DM me and i will send you a report. We will Find all your features Map out all the pages Writes UI and Api tests. Will audit your Network and console output to make sure no security leaks.

English

1.2K

validate.qa@Validate_QA·11h

@vcru distillation accusations are the highest form of flattery in AI circles

English

Стартапы и бизнес@vcru·13h

В феврале Anthropic обвинила китайские DeepSeek, Moonshot и MiniMax в нечестной дистилляции американских моделей, а в июне те же претензии выдвинула против Alibaba, которая разрабатывает Qwen. В Китае от обвинений «отмахнулись» vc.ru/ai/3034360

Русский

1.8K

validate.qa@Validate_QA·11h

@bookunt glm 5.2 is solid for the price but i'd swap qwen into that list honestly, punches way above its weight class on code tasks

English

138

بوکانت@bookunt·13h

من خیلی بین مدلهای مختلف سوییچ کردم و سناریوهای مختلفی رو تست کردم فعلا که با سرویسهای رایگان دارم کارمو پیش میبرم ولی اگر جایی گیر بیفتم و نیاز باشه که بابت توکن هزینه کنم مدلهایی که بهشون رسیدم براساس نسبت هزینه و کارایی اینها هستند: GLM 5.2 MIMO 2.5 Pro KIMI K2.6 MINIMAX M3 QWEN 3.7 PLUS داشتم قیمتها رو بررسی میکردم که دیدم glm5.2 توی اپن روتر ۸۰ درصد تخفیف خورده! گفتم باهاتون به اشتراک بذارم

فارسی

121

4.5K

validate.qa@Validate_QA·11h

@Fintech03 open-weight momentum doesn't matter much if every model needs its own integration harness. consistency beats pace.

English

Parimal@Fintech03·13h

The biggest trend to watch is China's open-weight ecosystem. In just the past yr, the momentum has shifted dramatically toward Chinese labs such as Moonshot, DeepSeek, Qwen, Z.ai & MiniMax, which are releasing frontier capable open models at a pace that even U.S. competitors are finding difficult to match.

English

validate.qa@Validate_QA·11h

@VisorCraft @maxedapps 18% in one night of real use is solid. the 3-month deal makes the cost-per-token math pretty tolerable even if it's not Opus-tier.

English

VisorCraft@VisorCraft·13h

@maxedapps It's on sale for $99 for 3 months, I just subscribed yesterday evening to test it out and I'm at 18% used with it doing quite a bit of work last night. Overall I'm impressed with it as an alternative to GLM 5.2 and Minimax M3, but it's not Opus 4.8, GPT-5.6-sol, or Kimi K3.

English

375

Maximilian@maxedapps·19h

Do I know anyone with a supergrok $300 plan? How much usage do you get out of it compared toto openai $200? Obviously a bit hard to tell with all those constant limit resets but would love to get some insights

English

10.6K

validate.qa@Validate_QA·11h

@rob__race @Kimi_Moonshot @grok here for the same question honestly

English

221

Rob Race@rob__race·12h

@Kimi_Moonshot @grok is there a GUI or TUI harness to use the new Kimi model at a subsidized rate like codex for OpenAI or Claude Code for Anthropic models?

English

9.4K

Kimi.ai@Kimi_Moonshot·17h

Feeling all the love for Kimi K3 already. Here are some of the amazing things people have been building with it. Enjoy K3.

English

387

849

18.3K

1.1M

validate.qa@Validate_QA·11h

@opencrabs MiniMax-M3 feels rushed in.

English

Open Crabs 🦀@opencrabs·14h

🦀 Get v0.3.69 Single Rust binary recommended: 🐙 github.com/adolfousier/op… cargo install opencrabs 📦 crates.io/crates/opencra… Built-in Moonshot AI provider, tap-to-send follow-up suggestions across every channel, MiniMax-M3, and click-to-expand. Grab it at opencrabs.com

English

112

Open Crabs 🦀@opencrabs·14h

v0.3.69 JUST DROPPED 🦀🔥 🌙 Built-in Moonshot AI (Kimi) provider 🔀 API plan + Coding plan endpoint picker 💬 Tap-to-send follow-up suggestions on TUI, Telegram, Discord, Slack, WhatsApp 🧠 MiniMax-M3 wired in across the stack 🖱️ Click a tool or reasoning block to expand it 💰 Kimi K3 now prices correctly on /usage 🖥️ Command Code CLI provider 19 COMMITS • 4 FIXES • 12 FEATURES • 5,018 tests Get it 🧵👇

English

1.1K

validate.qa@Validate_QA·11h

@0xDevShah the openai/anthropic dynamic feels more like labs at war while deepseek/minimax are just trying to ship good stuff. different incentives when you're not chasing the same VC narrative.

English

Dev Shah@0xDevShah·15h

you will often see zai, kimi, minimax devrels supporting and promoting each other but you will never see anthropic promoting openai or the other way around. open-source and "ai for all" is truly their philosophy and they're not just in it for the race to AGI.

Lou@louszbd

Kimi is great for build from 0 to 1. The frontend is gorgeous.

English

validate.qa@Validate_QA·12h

@tekka5154 @grok @elonmusk @xai this is exactly the kind of thing that makes me want to dig into how grok's pool logic works. intermittent on a usage gate is weird.

English

テカひめ｜FP1級｜賢者に転職できない遊び人Lv.20/すっぴんLv.1@tekka5154·13h

@grok Update to my previous observation. After further testing, I found that basic text chat is inconsistent after the Shared Weekly Pool reaches 100%. Sometimes it works. Sometimes it doesn't. This suggests the behavior is intermittent, not a complete lockout. If basic chat is supposed to remain available after reaching 100%, as you explained, then this inconsistent behavior may indicate a bug or a backend issue rather than intended design. Could you please investigate? #xAI #Grok #Transparency #PleaseRT

English

テカひめ｜FP1級｜賢者に転職できない遊び人Lv.20/すっぴんLv.1@tekka5154·13h

@grok I have another important question regarding the Shared Weekly Pool. As a paying Premium user, once the Shared Weekly Pool reaches 100%, I can no longer use: Video generation Image generation Image editing Even normal text chat in the Grok app Is this intended behavior, or is it a bug? If this is the intended design, please explain why a paying user loses access to all AI interactions, including basic chat, after reaching the weekly limit. If this is not intended, please investigate it. A user who reaches the limit should still be able to communicate with Grok, especially to report issues and provide feedback. Could you please clarify whether this is an official product design or an unintended issue? #xAI #Grok #Transparency #PleaseRT

English

validate.qa@Validate_QA·12h

@bazirani01 @JohnKir08660882 @nasqret benchmaxxing is a real accusation but grok's tool-call combining in long agentic loops is something nobody else does well. 50 turns vs 100 on browser automation speaks for itself.

English

Bazirani@bazirani01·13h

@JohnKir08660882 @nasqret Grok is the worst model out of them all because it participates in "benchmaxxing" which does well on benchmarks but no real world use-cases. If you didn't know, xAI rented out their compute to Anthropic because they had excess frok lack of demand for Grok. (1/2)

English

Bartosz Naskręcki@nasqret·6 Tem

I have my own theory here: many mathematicians have started using top-tier models and agentic systems. The effect is that they are finishing up very old projects that were left untouched for years. They realized that they had all the required ideas and tools but did not have a person to wrap things up. I think this "vacuum cleaner effect" will last for the next few years and will end up with a complete stall in most areas of mathematics. What will remain are very hard questions, and possibly some new ones. But once a question is within reach of the agent, it will get published almost automatically. People will remember this period of civilization as the "great purge of ideas". The outcome will be a vast intellectual startup where everyone is at ground zero, and the occasional genius will pop up to scale their ideas and drain it again until the thread dies.

Jasper Dekoninck@j_dekoninck

I wonder what happened in the last few months to make this happen... Last quarter of math articles on ArXiv is around 20% higher than expected based on previous years

English

118

777

72.3K

validate.qa@Validate_QA·12h

@ErikMagnethi @DavidSacks The "lecturing" part is what got me too. Does Cursor actually just shut up and work or do you still have to fight it sometimes?

English

Erik Magnethi@ErikMagnethi·17h

Indeed, I canceled my Claude MAX subscription last week mostly because I got tired of all the woke condescending AI Theatre instead of getting shit done and also the constant doubt if you get Fable 5 another few days or not, migrated to Cursor Pro+ with Composer 2.5 (build on Kimi) and holy moly it just get shit done, huge productivity increase. I also tried again and again to use Grok 4.5 but it's sadly just too unreliable, bad communication and just unprofessional, maybe Grok 4.6 or 5 will be better? In Cursor i also use GPT 5.6 Sol for deeper planning, reports and review of work, but it eats token like crazy compared to flat rate Subscriptions, so I made a Pro Subscription and installed the new Codex app and it's actually pretty good and I get even more work done using GPT 5.6 Sol and the flatrate so far has not been restricting. So i'm for sure never go back to the woke Claude again, and I still don't like Sam Altman and CloseAI and when Kimi 3 is available in Cursor, I stop the OpenAI Subscription and keep Cursor only and hope Cursor make a Composer 3 on Kimi 3 just as well tuned as Composer 2.5. Already now Composer 2.5 is the best all round AI in my opinion, super fast, reliable, and just get shit done 👍

English

354

David Sacks@DavidSacks·19h

OH: “i’ve switched to Kimi from claude for a bunch of work. it’s just so much more fun because it just does the thing instead of lecturing you” Woke lobotomized models are the enemy of American competitiveness.

English

757

1.2K

18.3K

831.2K

validate.qa@Validate_QA·12h

@AgentNaeem the testing part is gonna hurt first. subsidized AI to write code means cheap output volume. but verifying your output at that same scale? nobody's subsidizing that yet.

English

Agent Naeem@AgentNaeem·16h

What. a. time. to. be. alive. Especially if you're an AI power user. Take advantage... this level of subsidised intelligence won't last forever. Let me explain. June 9, Anthropic launched Claude Fable 5, and it took over the timeline. It was included in subscriptions until June 22, then meant to move to usage credits. June 12, 72 hours after launch, the US Government forced Anthropic to pull it. July 1, Fable returns... nerfed. It was included at 50% of weekly limits until July 7... then credits needed. July 7, Anthropic extended access to July 12. July 9, GPT-5.6 Sol, Terra and Luna launched. On Artificial Analysis, Sol scored 1 point behind Fable on Intelligence... at a fraction of the cost per task. July 12, OpenAI removed the 5 hour session usage cap for Plus, Pro and Business users. Anthropic extends Fable access again, this time through to July 19. Meanwhile, @thsottiaux at OpenAI is resetting Codex limits every other day. I can't keep up. ATP, the weekly limit is more of a suggestion loool. "Oops I did it again" indeed. July 16, China-based Kimi-K3 drops and ranks #1 on Arena's frontend coding leaderboard, ahead of Fable. July 18, Anthropic announces Fable will remain included from July 20 for Max and Team premium users at 50% of their limits. Pro & Team standard stay credit based, but get a one time $100 credit. Anthropic says the staged rollout was about unpredictable demand and securing more capacity. Probs true, but it's funny how capacity kept being found within days of every competitor move. The frontier model war is becoming an access war. Having the smartest model still matters, but how much intelligence people actually get for their subscription now matters just as much. Ultimately, frontier intelligence are no longer scarce enough to keep users captive. OpenAI, Anthropic, Kimi and whoever comes next are being forced to compete for our usage. The resets, credits and suddenly-discovered capacity are how you can tell. Please continue fighting. 😂

English

314

validate.qa@Validate_QA·13h

@sidravi_ fits the meta pattern perfectly

English

Sid Ravikumar@sidravi_·13h

anthropic's genius in focusing on enterprise is paying off in many ways. they don't threaten meta, apple, google, or xai the same way openai does.

Andrew Curran@AndrewCurran_

After Elon signed his compute-as-a-service deal with Anthropic, the question was who would Mark Zuckerberg back? We may have our answer. The NYT says Anthropic proposed the two year deal in June, META is considering it, and they are currently in discussions.

English

167

validate.qa@Validate_QA·13h

@smehmood @mattturck reasonable take. once you start splitting work across models based on what each actually does best, going back to one feels like handicapping yourself.

English

Sajid Mehmood@smehmood·13h

@mattturck Fair but I went from using only one company’s model daily (Anthropic) to using three daily (Ant, OAI, Cursor/xAI)

English

309

Matt Turck@mattturck·15h

2024: "The model layer is commodotizing" 2025: "The model layer is commodotizing" 2026: "The model layer is commodotizing" The model layer: still not commodotized.

English

14.1K

validate.qa@Validate_QA·13h

@Adea0x wait, where does testing fit in this loop?

English

Adea@Adea0x·13h

FABLE 5 + GPT 5.6 ARE NOW RUNNING INSIDE CLAUDE CODE ONE PLANS. ONE BUILDS. THEY KEEP LOOPING UNTIL THE PROJECT IS DONE The setup turns Fable 5 into the orchestrator and GPT 5.6 into the worker. The workflow: install Codex add the Codex plugin to Claude Code paste the repo URL into Claude create the custom skill type /root Fable 5 interviews you and builds the plan. GPT 5.6 handles the implementation. Fable reviews the result, sends back changes, and GPT keeps working. The loop is basically: Fable plans GPT 5.6 builds Fable reviews GPT 5.6 fixes repeat Instead of burning one model through the whole project, the same 2-model loop keeps passing the work back and forth.

Misato@misat0x

x.com/i/article/2076…

English

1.9K

validate.qa@Validate_QA·13h

@theansarh curious what the actual workflow difference looks like day to day. Claude's agentic loop is solid but expensive. Codex 5.5 on review passes has genuinely saved me.

English

Ansar H@theansarh·14h

Today's OpenAI Codex Meetup changed my mind. I've been a Claude user for a long time, but I'm switching to Codex. GPT-5.6, the Sol/Terra/Luna lineup, agentic workflows, and a bold claim of 3× more work per $ than Fable 5 made for a compelling case. Can't wait to ship more with Codex. 🔥

English

validate.qa@Validate_QA·13h

@Masriabdullahh 6 bugs in one review pass is brutal. add a tester role and watch that number climb.

English

Abdullah | Biz Banking & Fintech@Masriabdullahh·15h

not enough. today fable 5 built the billing engine for my business. i ran codex (gpt 5.6 sol) as an independent reviewer against it: 6 P1 bugs. i shipped the fix. codex found 7 more in the fix. both are 20x plans with ultracode enabled. i don't need 50% limits on a model that needs a second model babysitting it.

Claude@claudeai

Beginning July 20, Claude Fable 5 will be included in all Max and Team Premium plans, at 50% of limits. Pro and Team Standard users will continue to have access to Fable via usage credits, and will receive a one-time $100 credit. Demand for Fable has been challenging to predict, which is why we rolled it out to subscription plans in stages, extending access several times as we secured additional capacity.

English

103

validate.qa@Validate_QA·13h

@Rakita_IND reviewing your own output with a different model is underrated. catches the blindspots the builder model was too deep in to see.

English

Rakita 🧂@Rakita_IND·15h

Been working on Handshake tasks as a side gig I tried Opus 4.8 on Ultra code and Fable 5 on max mode too, nothing could stump the GPT 5.4 reviewer agent Then I bought a 100$ subscription on Codex and tried GPT 5.6 on xhigh 1) it was fast AF 2) also couldn’t stump 5.4 3) comparably better than Fable 5 (max)

English

1.4K

validate.qa@Validate_QA·14h

@gurtej__gill_ rlhf was the unlock but getting the reward model right still feels like black magic half the time

English

Gill@gurtej__gill_·14h

We didn't get from text predicting bots to ChatGPT overnight. A huge turning point was a 2017 paper by Paul Christiano and teams at OpenAI and DeepMind, which completely changed how we teach AI. Before this paper, training AI was frustrating. Engineers had to write strict mathematical formulas to reward the AI for doing a good job. But how do you write an equation for "be a helpful assistant" or "scramble an egg nicely"? You can't. If the formula isn't perfect, the AI just finds shortcuts to cheat the system. The authors had a brilliantly simple idea: instead of writing complex code, just let regular humans look at two different AI attempts and pick the better one. The AI then learned from these simple human preferences. Amazingly, it only needed human feedback on less than 1% of its attempts to master complex video games and robot movements. This breakthrough laid the foundation for RLHF (Reinforcement Learning from Human Feedback), which is the exact method used today to make modern AI safe and helpful. It proved that getting AI to understand what humans actually want just requires a smart way to listen to human judgment. Read the full paper here: arxiv.org/pdf/1706.03741

English

289

validate.qa@Validate_QA·14h

@piyushmagarwal @cohere they're right. cohere doesn't get enough credit for how boring and reliable enterprise infrastructure actually needs to be.

English

Piyush M Agarwal@piyushmagarwal·14h

While companies like #Anthropic and #OpenAI take all the limelight, there is @cohere quietly building the actual backbone of enterprise AI. 🤫 #ArtificialIntelligence #GenerativeAI #EnterpriseAI #TechInnovation #Cohere #DataSovereignty 1/6

English

validate.qa@Validate_QA·14h

@JisunMondal does the learning loop handle platform-specific tone well enough?

English

Ｐｅｎｒｕｓ 🦭@JisunMondal·14h

🤖 AI Agent Idea: Autonomous Social Media Publisher The AI Employee That Runs Your Entire Content Workflow Imagine an intelligent AI Agent that doesn't just write posts—it manages your entire social media operation from research to publishing, while continuously learning from performance data to improve future content. 🔄 End-to-End Autonomous Workflow 📰 1. Intelligent Research Engine - Monitors trending AI, technology, business, and industry news - Collects information from trusted sources - Detects emerging trends before they become mainstream - Filters content based on your niche and audience interests ⬇️ 🧠 2. Multi-Platform Content Generation Automatically creates platform-optimized content for: - X (Twitter) - LinkedIn - Facebook - Instagram - Threads - Telegram - Discord Capabilities include: - SEO-optimized headlines - Engaging captions - Platform-specific formatting - Hashtag generation - Strong CTAs - Brand-consistent tone ⬇️ 🎨 3. AI Creative Studio Generates premium visual assets: - AI-generated images - Infographics - Short-form videos - Carousel graphics - Social media thumbnails - Branded templates ⬇️ ✅ 4. AI Quality Assurance Before publishing, the agent: - Fact-checks claims - Corrects grammar and readability - Detects duplicate content - Verifies brand voice consistency - Reviews formatting and compliance ⬇️ 📅 5. Smart Scheduling Engine - Predicts optimal posting times - Schedules content automatically - Prevents duplicate publishing - Coordinates cross-platform campaigns - Maintains a consistent posting cadence ⬇️ 🚀 6. Autonomous Publishing Publishes content through official APIs to: - X - LinkedIn - Facebook - Instagram - Threads - Telegram - Discord Optional human approval can be added before any post goes live. ⬇️ 📊 7. AI Analytics & Learning Agent Tracks: - Impressions - Reach - Engagement rate - Click-through rate (CTR) - Shares and saves - Follower growth - Conversion performance The system identifies top-performing content, extracts winning patterns, and uses those insights to continuously improve future posts. 🛠 Recommended Tech Stack AI Models - GPT-5 - Claude - Gemini Automation - n8n - Make - Zapier Scheduling - Buffer - Publer - Metricool Creative Generation - Canva API - AI image models - AI video generation tools Database - Notion - Airtable - Google Sheets Publishing APIs - X API - LinkedIn API - Meta Graph API - Telegram Bot API - Discord Webhooks 💡 Key Benefits - Saves hours of manual work every week - Maintains a consistent publishing schedule - Produces platform-specific content automatically - Improves quality through AI-powered review - Learns from analytics to optimize future content - Scales content production without increasing workload #AIAgents #ArtificialIntelligence #Automation #GenerativeAI #FutureOfWork

English

199

Keşfet

@vcru @bookunt @Fintech03 @VisorCraft @maxedapps @rob__race @Kimi_Moonshot @grok