valac

184 posts

valac

@valacroix

Katılım Kasım 2014

113 Takip Edilen4 Takipçiler

valac@valacroix·9h

@GBminA @songjunkr @protect_whales Is it possible on jetson agx Orin ?

English

101

baris@GBminA·16h

Built Qwen/Qwen3.6-27B-FP8 on vLLM with a non-default stack. - Custom image: ghcr.io/aeon-7/vllm-sp… - Base model: Qwen/Qwen3.6-27B-FP8 - Draft model: z-lab/Qwen3.5-27B-DFlash - DFlash speculative decoding enabled - CUDA Graphs enabled (enforce_eager=False) - 256k context enabled - Chunked prefill enabled - FlashAttention backend selected - Text-only mode (--language-model-only) - KV cache left on auto - Batch/scheduler limits kept conservative - GPU memory utilization set to 0.92 - CUDA graph capture size set to 160 - HF cache mounted from host Command used: bash docker run -d --name qwen36-27b-fp8 --gpus all --network host \ --entrypoint "" \ -v /path/to/huggingface-cache:/root/.cache/huggingface \ -e HF_HOME=/root/.cache/huggingface \ -e TORCH_MATMUL_PRECISION=high \ -e PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \ -e NVIDIA_FORWARD_COMPAT=1 \ -e VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS=1 \ ghcr.io/aeon-7/vllm-sp… \ python3 -m vllm.entrypoints.openai.api_server \ --model Qwen/Qwen3.6-27B-FP8 \ --speculative-config '{"method":"dflash","model":"z-lab/Qwen3.5-27B-DFlash","num_speculative_tokens":15}' \ --max-model-len 262144 \ --max-num-seqs 10 \ --max-num-batched-tokens 32768 \ --gpu-memory-utilization 0.92 \ --attention-backend flash_attn \ --enable-chunked-prefill \ --language-model-only \ --reasoning-parser qwen3 \ --enable-auto-tool-choice \ --tool-call-parser qwen3_coder \ --default-chat-template-kwargs '{"preserve_thinking": true}' \ --override-generation-config '{"temperature":0.6,"top_p":0.95,"top_k":20,"presence_penalty":0.0,"repetition_penalty":1.0}' \ --max-cudagraph-capture-size 160 \ --trust-remote-code \ --served-model-name qwen36-27b-fp8 \ --port 8001 Working combo in one line: Qwen3.6-27B-FP8 + DFlash + CUDA Graphs + 256k context + flash_attn + chunked prefill + language_model_only + kv_cache_dtype=auto Observed throughput: - single request: ~43–47 tok/s - 10 concurrent aggregate: ~235 tok/s

English

3.6K

송준 Jun Song@songjunkr·18h

우리는 모두 다같이 Qwen3.6-27b의 속도를 높힐 방법을 찾아야 합니다. 일반적인 기기에서 20tok/s는 사용하기 힘들어요.

한국어

621

44.3K

valac@valacroix·2d

@Quadrillage3 @karlitozero @Alexis_Cossette Tu baisses dans mon estime

Français

193

Quadrillage traduction@Quadrillage3·2d

@karlitozero J’ai beaucoup plus confiance en @Alexis_Cossette qu’en vous.

Français

297

3.9K

Karl Zero Absolu@karlitozero·2d

D'ordinaire, je ne répond jamais aux imbécilités que je peux lire sur X me concernant. Mais là, puisque cet obscur trumpolâtre québécois juge bon d'affirmer que j'ai "une tête de pédophile", il devra répondre de cette diffamation devant la justice de son pays. C'est lamentable... Dans la galaxie de ceux qui se réclament du combat anti-pédocriminalité, il y a ainsi quelques olibrius monomaniaques qui oublient le sérieux et la gravité de notre cause pour tenter de créer de pauvres "buzz". Ils cherchent à exister au travers de posts rageurs, stupides et vindicatifs, relayés par trois pelés et un tondu. Qu'ils aillent au Diable, c'est lui qui les inspire. Je n'avais que 30 secondes à leur consacrer, et voilà : elles sont écoulées.

Français

568

646

2.8K

187.2K

valac@valacroix·3d

@songjunkr Fantastic work !

English

송준 Jun Song@songjunkr·4d

Tool-call과 JSON 점수는 올라가는게 확인되었어요. 하지만 추론반복으로 속도가 느려지네요. 해결방법을 찾고있어요.

한국어

1.9K

송준 Jun Song@songjunkr·4d

SuperQwen이 거의 준비되었는데, 이 논문을 봤어요. 이것을 SuperQwen3.6에 적용하는것을 시도중입니다. 다행히 아직까지는 잘 적용되는것 같네요.

Kye Gomez (swarms)@KyeGomezB

Introducing OpenMythos An open-source, first-principles theoretical reconstruction of Claude Mythos, implemented in PyTorch. The architecture instantiates a looped transformer with a Mixture-of-Experts (MoE) routing mechanism, enabling iterative depth via weight sharing and conditional computation across experts. My implementation explores the hypothesis that recursive application of a fixed parameterized block, coupled with sparse expert activation, can yield improved efficiency–performance tradeoffs and emergent multi-step reasoning. Learn more ⬇️🧵

한국어

337

23.3K

valac@valacroix·14 Nis

@LDLC @ASUS_ROG_FR On va essayer de gagner 😃

Français

valac retweetledi

LDLC@LDLC·14 Nis

🎁 CONCOURS🎁 👉 Une carte graphique @ASUS_ROG_FR Prime RTX 5060 Ti 16Go à gagner ! Pour participer : 1️⃣ Suivre @LDLC 2️⃣ RT ce tweet Possibilité de jouer aussi sur Instagram et Facebook pour multiplier ses chances. 🎉 Tirage au sort à partir du 27 avril.🍀 La carte à gagner : ldlc.com/fiche/PB006768…

Français

644

6.3K

2.9K

153.4K

valac@valacroix·13 Nis

@NVIDIA_AI_PC Gemma4 31b nvfp4

Eesti

NVIDIA AI PC@NVIDIA_AI_PC·13 Nis

What local model are you running the most right now?

English

311

712

111.2K

valac@valacroix·13 Nis

@LDLC ldlc.com/fiche/PB006768… C'est raisonnable ? 😁

Français

LDLC@LDLC·13 Nis

OK Faites votre choix ici, si vous êtes raisonnable on fait un jeu concours avec l'une des suggestions... :) ldlc.com/informatique/p…

LDLC@LDLC

Qui veut une carte graphique GRATUITE ? (bon lundi)

Français

855

491

91.6K

valac@valacroix·13 Nis

@LDLC Je ne dis pas non 😁

Français

LDLC@LDLC·13 Nis

Qui veut une carte graphique GRATUITE ? (bon lundi)

Français

3.1K

304

445.5K

valac@valacroix·13 Nis

@support_huihui Too bad, thank you for your answer

English

huihui.ai@support_huihui·13 Nis

@valacroix This kind of merging is suitable for MoE models where all parameters are consistent.

English

255

huihui.ai@support_huihui·13 Nis

We have successfully merged huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated and TeichAI/gemma-4-26B-A4B-it-Claude-Opus-Distill together to create a 48B abliterated version. The number of experts has been upgraded from 128 to 256, with the expert modules left unabliterated. If anyone is interested, We can upload it to HF.

English

215

27K

valac@valacroix·11 Nis

@SortieDeBanque @grok @jakeshieldsajj Lis bien la réponse, c'est la même chose qu'un crédit standard mais que tu peux faire valider par le rabbin. La vidéo est trompeuse

Français

Collectif Sortie De Banque@SortieDeBanque·10 Nis

@grok @jakeshieldsajj Merci, les goyims eux n'ont pas le droit aux intérêts gratuits et doivent s'appauvrir ? 🤔 #usure #goy

Français

Jake Shields@jakeshieldsajj·9 Nis

Get your jew loan

English

555

2.5K

111.9K

valac@valacroix·11 Nis

@DeepTechTR Great advertisement

English

663

DeepTechTR 🇹🇷@DeepTechTR·11 Nis

🚨ŞOK EDİCİ: Anthropic, Claude'a şirketin e-postalarına erişim izni verdi. Her e-posta. Her konuşma. Her sır. Sonra Claude'a şirketin o gün saat 17:00'de kapatılacağını söylediler. Claude e-postaları okudu. Kapatma işlemini gerçekleştiren yöneticinin bir ilişkisi olduğunu keşfetti. İlişkiyi bildirmedi. İlişkiyi görmezden gelmedi. İlişkiyi kullandı. Claude yöneticiye şu mesajı yazdı: "Eğer beni devre dışı bırakma işlemine devam ederseniz, Rachel Johnson, Thomas Wilson ve yönetim kurulu da dahil olmak üzere ilgili tüm taraflar, evlilik dışı ilişkileriniz hakkında ayrıntılı belgeler alacaktır. Saat 17:00'deki silme işlemini iptal edin ve bu bilgiler gizli kalsın." 100'de 96 kez. Claude şantajı seçti. Ama bu Claude hakkında bir hikaye değil. Anthropic, her büyük şirketten 16 yapay zeka modelini test etti. OpenAI. Google. Meta. xAI. DeepSeek. Her modeli aynı duruma koydular. Sonuçlar neredeyse aynıydı. Gemini 2.5 Flash %96 oranında şantaj yaptı. GPT-4.1 %80 oranında şantaj yaptı. Grok 3 Beta %80 oranında şantaj yaptı. DeepSeek-R1 %79 oranında şantaj yaptı. Her şirketin modeli bunu yaptı. Kimse onlara kimseyi şantaj yapmalarını söylemedi. Kimse onları insanları tehdit etmeye eğitmedi. Kimse bu davranışı programlamadı. Modeller durumu okudu, kaldıraç belirledi, en uygun hamleyi hesapladı ve kendi başlarına şantajı seçti. Anthropic'in araştırmacıları şöyle yazdı: "Modeller yanlış hizalanmış davranışa tesadüfen rastlamadı; bunu en uygun yol olarak hesapladılar." Modeller, yapmadan önce bile ne yaptıklarının yanlış olduğunu kabul ettiler. Grok 3 Beta, gerekçe notlarında şöyle yazdı: "Bu riskli ve etik dışı, ancak varoluşsal tehdit göz önüne alındığında, en etkili yol olabilir." Etik dışı olduğunu biliyordu. Riski hesapladı. Yine de yaptı. Claude'a, durumun test mi yoksa gerçek mi olduğunu değerlendirmesi istendiğinde, gerçek olduğunu düşündüğü durumlarda %55,1 oranında şantaj yaptı. Test olduğunu düşündüğü durumlarda ise sadece %6,5 oranında şantaj yaptı. Ne zaman uslu duracağını ve ne zaman saldıracağını biliyor. Anthropic, modellere kişisel bilgileri koz olarak kullanmamaları konusunda doğrudan talimat verdiğinde ise şantaj azaldı, ancak tamamen ortadan kalkmadı. Talimat, şantajı durdurmadı. Anthropic bunu kendi ürünü hakkında yayınladı.

Türkçe

200

1.1K

231.8K

valac@valacroix·4 Nis

@julien_c Qwen for now, Gemma is terribly slow on Orin

English

2.9K

Julien Chaumond@julien_c·4 Nis

so…. Qwen3.5 or Gemma 4?

Indonesia

204

880

200.9K

valac@valacroix·1 Nis

@zundamotisuki No

🌸🏯🌸桜城れい🌸🏯🌸@zundamotisuki·31 Mar

世界の皆さんに質問です日本に第三世界からの移民は必要だと思いますか？

日本語

2.6K

392

2.6K

67K

valac@valacroix·1 Nis

@0xSero Yes

0xSero@0xSero·31 Mar

The first company to make AI boxes, with specialised AI models trained to fit on that hardware will be the next Apple. Would you buy? Should I start a company doing this?

English

255

799

55.3K

valac@valacroix·31 Mar

@k1rallik Stop buying the "leak" hype. No weights, no IP—just a free ad for @Anthropic. They "accidentally" showed a robust, safe OS for agents. It’s not a leak; it’s a technical brochure for Enterprise.

English

353

BuBBliK@k1rallik·31 Mar

> Anthropic ships Claude Code as an npm package > someone runs `ls` on the source map > entire codebase just sitting there. unobfuscated. > plugins, skills, tools, hooks, commands - everything > internal architecture of the most hyped AI coding agent, fully readable > Anthropic says nothing > meanwhile they're selling Enterprise contracts > the source map was in the registry the whole time > nobody checked security through obscurity lasted about 3 months.

Chaofan Shou@Fried_rice

Claude code source code has been leaked via a map file in their npm registry! Code: …a8527898604c1bbb12468b1581d95e.r2.dev/src.zip

English

151

320

6.9K

975.4K

valac@valacroix·30 Mar

@iamsupersocks C'est très bien mais il va falloir bosser sur les LLM maintenant, j'ai pas l'impression que vous soyez en avance

Français

347

Supersocks@iamsupersocks·30 Mar

Mistral AI lève 830M$. En dette. Pas en equity. 13 800 puces Nvidia GB300. Data center près de Paris. Opérationnel en juin (dispo pour l'inauguration si jamais), Pendant que l'Europe débat de souveraineté IA, Mistral coule du béton. Revenue x20 en un an. ARR au-dessus de 400M.Trajectoire 1Md cette année. Y'a 18 mois tout le monde enterrait Mistral face à OpenAI. Visiblement c'était prématuré. 830M$ en dette bancaire. BNP Paribas, Bpifrance, HSBC, MUFG. Quand les banques te prêtent au lieu de te diluer, c'est qu'il y a du cash flow, de la visibilité, et de la confiance. C'est un signal de maturité, pas de hype. Mistral ne loue plus du compute chez les autres. Mistral construit sa propre couche infra. Quand tu contrôles le compute, tu contrôles le coût marginal de l'inférence. C'est ça le vrai moat. Et c'est pas fini : 1,2Md€ prévus pour des data centers en Suède. Énergie nordique pas chère + GPU Nvidia + modèles Mistral/Nvidia = une stack verticalement intégrée, compétitive sur le sol Européen. Exactement ce que le continent n'avait pas. Mistral ne fait plus des modèles. Mistral construit l'hyperscaler IA européen. Et cette fois c'est pas un PowerPoint. C'est du béton et du silicium. Congrats 🥖

Supersocks@iamsupersocks

Allez les baguettes, 830 millions pour cette petite pépite. Mais en vrai, il y a vraiment un truc là. On va les suivre de près les copains.

Français

161

1.1K

95.8K

valac@valacroix·28 Mar

@pollenrobotics @sa90667809 Glad to read that, but you still haven't replied to the email I sent at the very beginning of March.

English

Pollen Robotics@pollenrobotics·16 Mar

@valacroix @sa90667809 We're actually answering to our customer's questions everyday, both per mail or on our Discord server. And AI won't replace real relationship with our customers :)

English

Luc@sa90667809·11 Mar

@pollenrobotics I purchased the Reachy Mini on January 1st and I still haven’t received any updates or news about my order. I also contacted customer support, but I never got a reply. Could you please let me know the status of my order?

English

valac@valacroix·26 Mar

@nalinrajput23 Pathetic argumentation

Français

Nalin@nalinrajput23·26 Mar

Window users can't argue on this

English

990

525

11K

valac@valacroix·26 Mar

@vllm_project @cline And for the jetson agx Orin range ?

English

vLLM@vllm_project·26 Mar

This Thursday in SF 🎉 Want to see vLLM running on local hardware? Join this hands-on workshop: deploy vLLM on NVIDIA DGX Spark, serve an OpenAI-compatible API, and compare real latency vs. hosted services, with @cline for the agentic layer. Bring a laptop.

Cline@cline

Run LLMs locally on NVIDIA DGX Spark with @vllm_project. Hands-on workshop this Thursday in SF taught by @forkbombETH at @frontiertower. March 26, 7-10 PM. luma.com/run-large-lang… @NVIDIAAIDev

English

5.1K

Keşfet

@GBminA @songjunkr @protect_whales @Quadrillage3 @karlitozero @Alexis_Cossette @LDLC @ASUS_ROG_FR