TiTikey

2.5K posts

TiTikey

TiTikey

@TiTiKey_com

Discount AI subscriptions | ChatGPT Plus, Claude, Gemini, Midjourney setup & renewals | Fast delivery, long-term support

Go ➡️ انضم Ocak 2024
10 يتبع57 المتابعون
تغريدة مثبتة
TiTikey
TiTikey@TiTiKey_com·
We have not released any coin; please do not be misled. #titikey
English
1
0
5
1.2K
TiTikey
TiTikey@TiTiKey_com·
🚀 ประหยัดสูงสุด 90% สำหรับการสมัครสมาชิก AI! ChatGPT Plus, Claude API, X Premium และอื่นๆ รองรับ 11 ภาษา เริ่มต้นที่ titikey.com #ChatGPT #Claude #AI #ส่วนลด
ไทย
0
0
0
22
TiTikey
TiTikey@TiTiKey_com·
🚀 AIサブスクリプション最大90%オフ!ChatGPT Plus、Claude API、X Premiumなど。11言語対応。titikey.comで始めましょう #ChatGPT #Claude #AI #割引
日本語
0
0
0
31
TiTikey
TiTikey@TiTiKey_com·
🚀 AI 구독료 최대 90% 할인! ChatGPT Plus, Claude API, X Premium 등. 11개 언어 지원. titikey.com에서 시작하세요 #ChatGPT #Claude #AI #할인
한국어
0
0
0
39
TiTikey
TiTikey@TiTiKey_com·
That's a solid take. Thanks for sharing!
Akshay 🚀@akshay_pachaar

DeepSeek-V4 just dropped! And it's solving one AI's biggest problem today: It runs 1M-token context at 10% of the KV cache and 27% of the inference FLOPs of V3.2. Here's what that means. KV cache is the memory footprint your GPU holds for every token already in context. It grows linearly with context length, and at 1M tokens it's usually what forces you onto bigger hardware or kills your throughput. Cutting it to 10% means you can serve longer contexts on smaller machines, or pack far more concurrent users on the same ones. Inference FLOPs is the compute cost of generating the next token. With vanilla attention this scales quadratically with context length, which is why long contexts get brutally expensive per token. 27% means each generated token at 1M context is nearly 4x cheaper to produce. Put together, long-context inference goes from a premium feature you ration to something you can run by default. The trick is a hybrid attention design that interleaves two mechanisms instead of picking one. 𝗖𝗦𝗔 (𝗖𝗼𝗺𝗽𝗿𝗲𝘀𝘀𝗲𝗱 𝗦𝗽𝗮𝗿𝘀𝗲 𝗔𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻) first squashes every 4 KV entries into a single compressed entry. Then it uses a lightning indexer to select the top-k most relevant compressed blocks. Compression and sparsity stacked. 𝗛𝗖𝗔 (𝗛𝗲𝗮𝘃𝗶𝗹𝘆 𝗖𝗼𝗺𝗽𝗿𝗲𝘀𝘀𝗲𝗱 𝗔𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻) goes aggressive. It compresses every 128 tokens into one entry and skips sparse selection entirely, because at that compression ratio dense attention over a tiny set is already cheap. Both get a sliding window branch over the last 128 tokens, so local fine-grained structure isn't lost to compression. The result is that CSA handles medium-grained retrieval while HCA handles coarse context summarization, and alternating them across layers gives you both without paying for both at full cost. V4-Pro (1.6T total, 49B active) ranks 23rd among human Codeforces competitors and hits 120/120 on Putnam-2025. Open weights on Hugging Face. The era of million-token context in open models has effectively started.

English
0
0
0
5
TiTikey
TiTikey@TiTiKey_com·
🚀 Tiết kiệm đến 90% cho đăng ký AI! ChatGPT Plus, Claude API, X Premium và nhiều hơn nữа. Hỗ trợ 11 ngôn ngữ. Bắt đầu tại titikey.com #ChatGPT #Claude #AI #GiảmGiá
Tiếng Việt
0
0
0
34