Marko Tasic

8.1K posts

Marko Tasic banner
Marko Tasic

Marko Tasic

@mtasic85

System Architect. Software Dev. AI/ML Specialist. Tech Consultant. SMBs & Enterprises. Opinions are my own. CEO/CTO at @tangledgroup

Serbia, United States Katılım Nisan 2009
1.7K Takip Edilen605 Takipçiler
Sabitlenmiş Tweet
Marko Tasic
Marko Tasic@mtasic85·
We are happy to announce that @TangledGroup has published tangled-nx-sql v0.1.0 which seamlessly adds SQL database as a persistence layer to NetworkX graphs, zero in-memory footprint, NetworkX API compatibility. All examples and test can be found here: pypi.org/project/tangle…
Marko Tasic tweet media
English
1
1
2
103
Marko Tasic
Marko Tasic@mtasic85·
This is how ideology kills technology. Book example. I really hope more people will move from Debian and Ubuntu now and give chance to others Linux distributions. xubuntu.org/news/releases/…
English
0
0
0
3
Marko Tasic
Marko Tasic@mtasic85·
🤏🫘This will probably sound weird, but I migrated from opencode to pi because I needed something lightweight. I got it, but now I need even more lightweight agent/harness. I really like: ``` pi --provider "llama.cpp" --model "Qwen/Qwen3.6-27B" -p "size of README.md" ``` But everything else around it, still makes it too heavy for me. Damn TUI. 🤣
English
0
0
0
46
Marko Tasic retweetledi
CG
CG@cgtwts·
> be chinese ai labs > while claude and openai are in cold war > kimi dropped k2.6 using deepseek's v3 architecture > the same week deepseek drops v4 using kimi's muon optimizer > 1.6 trillion parameters & 1M context > both match or beat closed models on benchmarks while being 8x cheaper > both build on each other's breakthroughs > keep shipping frontier LLMs with far less or nerfed NVIDA GPUs > and keep them 100% open sourced the real battle is not between models, it's open source vs closed.
DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English
49
351
4.5K
251.8K
Marko Tasic retweetledi
Physics In History
Physics In History@PhysInHistory·
Understand. Don't memorize. Learn principles, not formulas. - R. Feynman
Physics In History tweet media
English
65
1.4K
8K
169.4K
Marko Tasic retweetledi
tetsuo
tetsuo@tetsuoai·
how CNNs see images 16 boxes covering the core CNN stack. tensors, filters, feature maps, stride, padding, channels, pooling, receptive fields, mental model
tetsuo tweet mediatetsuo tweet mediatetsuo tweet mediatetsuo tweet media
English
13
140
803
30.2K
Marko Tasic retweetledi
Science Simplified
Quantum entanglement isn't instant - scientists have timed It's 'BIRTH' at 232 attoseconds, A quintillion of a second.
English
67
358
2.5K
245K
Marko Tasic retweetledi
China Research Collective
Yao class (姚班) genius Yao Shunyu (姚顺雨) left OpenAI in late 2025 to return to China, joining Tencent Hunyuan Known in China as one of "The Four Foundation Model Heroes" (基模四杰) Along with Yang Zhilin (Kimi), Lin Junyang (Ex-Qwen), and Tang Jie (Z.ai)
China Research Collective tweet media
Shunyu Yao@ShunyuYao12

Our goal is to build practical models with comprehensive capabilities beyond open benchmarks. And the only way to do it to co-design with diverse products while scaling solidly. Tencent has the best product ecosystem and a solid, low-ego culture, and we are just getting started!

Filipino
11
133
1.2K
187.5K
Marko Tasic
Marko Tasic@mtasic85·
🟡Current setup, 22.81GiB / 24.00GiB VRAM 🫡35 t/s tg 🟡250 t/s tg, with speculative decoding 🤗Unsloth Qwen3.6-27B Q4_K_M -ctk/v q5_0 🟩Nvidia RTX 3090 24GB / CUDA 12.9.1 CUDA_VISIBLE_DEVICES=%i ./llama-server -hf unsloth/Qwen3.6-27B-GGUF:Q4_K_M -ngl -1 -fa on -fit off --metrics --props --slots --host 0.0.0.0 --port 808%i -dev CUDA0 --temp 0.7 --top-p 0.8 --top-k 20 --min-p 0.0 --presence-penalty 1.5 --repeat-penalty 1.0 --reasoning off --alias "Qwen/Qwen3.6-27B" --api-key sk-pvQ5YvAy -c 262144 -ctk q5_0 -ctv q5_0 --spec-default --no-mmproj-offload -b 1024 -ub 256 CUDA_VISIBLE_DEVICES=0,1,2,3 ./llama-server -hf ggml-org/GLM-OCR-GGUF:Q8_0 -ngl -1 -fa on -fit off --metrics --props --slots --host 0.0.0.0 --port 8090 --reasoning off --alias "zai-org/GLM-OCR" --api-key sk-pvQ5YvAy -c 32768 -ctk q5_1 -ctv q5_1 -np 1 --spec-default --no-mmproj-offload
Marko Tasic tweet media
Marko Tasic@mtasic85

🟠Current setup, 23.37GiB / 24.00GiB VRAM 🤨30 t/s tg 🤔with speculative decoding 250 t/s tg 🤗Qwen3.6-27B Q3_K_M -ctk q8_0 -ctv q8_0 CUDA_VISIBLE_DEVICES=%i ./llama-server -hf unsloth/Qwen3.6-27B-GGUF:Q3_K_M -ngl -1 -fa on -fit off --metrics --props --slots --host 0.0.0.0 --port 808%i -dev CUDA0 --temp 0.7 --top-p 0.8 --top-k 20 --min-p 0.0 --presence-penalty 1.5 --repeat-penalty 1.0 --reasoning off --alias "Qwen/Qwen3.6-27B" -c 262144 -ctk q8_0 -ctv q8_0 --spec-default --no-mmproj-offload CUDA_VISIBLE_DEVICES=0,1,2,3 ./llama-server -hf ggml-org/GLM-OCR-GGUF:Q8_0 -ngl -1 -fa on -fit off --metrics --props --slots --host 0.0.0.0 --port 8090 --reasoning off --alias "zai-org/GLM-OCR" -c 32768 -ctk q5_1 -ctv q5_1 -np 1 --spec-default --no-mmproj-offload

English
0
0
0
46
Marko Tasic
Marko Tasic@mtasic85·
🟠Current setup, 23.37GiB / 24.00GiB VRAM 🤨30 t/s tg 🤔with speculative decoding 250 t/s tg 🤗Qwen3.6-27B Q3_K_M -ctk q8_0 -ctv q8_0 CUDA_VISIBLE_DEVICES=%i ./llama-server -hf unsloth/Qwen3.6-27B-GGUF:Q3_K_M -ngl -1 -fa on -fit off --metrics --props --slots --host 0.0.0.0 --port 808%i -dev CUDA0 --temp 0.7 --top-p 0.8 --top-k 20 --min-p 0.0 --presence-penalty 1.5 --repeat-penalty 1.0 --reasoning off --alias "Qwen/Qwen3.6-27B" -c 262144 -ctk q8_0 -ctv q8_0 --spec-default --no-mmproj-offload CUDA_VISIBLE_DEVICES=0,1,2,3 ./llama-server -hf ggml-org/GLM-OCR-GGUF:Q8_0 -ngl -1 -fa on -fit off --metrics --props --slots --host 0.0.0.0 --port 8090 --reasoning off --alias "zai-org/GLM-OCR" -c 32768 -ctk q5_1 -ctv q5_1 -np 1 --spec-default --no-mmproj-offload
Marko Tasic@mtasic85

🟡Current setup, 21.53GiB / 24.00GiB VRAM 🧐40 t/s tg 🧐with speculative decoding 200 t/s tg CUDA_VISIBLE_DEVICES=%i ./llama-server -hf unsloth/Qwen3.6-27B-GGUF:IQ4_NL -ngl -1 -fa on -fit off --metrics --props --slots --host 0.0.0.0 --port 808%i -dev CUDA0 --temp 0.7 --top-p 0.8 --top-k 20 --min-p 0.0 --presence-penalty 1.5 --repeat-penalty 1.0 --reasoning off --alias "Qwen/Qwen3.6-27B" -c 262144 -ctk q4_0 -ctv q4_0 --spec-default --no-mmproj-offload CUDA_VISIBLE_DEVICES=0,1,2,3 ./llama-server -hf ggml-org/GLM-OCR-GGUF:Q8_0 -ngl -1 -fa on -fit off --metrics --props --slots --host 0.0.0.0 --port 8090 --reasoning off --alias "zai-org/GLM-OCR" -c 32768 -ctk q5_1 -ctv q5_1 -np 1 --no-mmproj-offload

English
0
0
0
84
Marko Tasic
Marko Tasic@mtasic85·
🟡Current setup, 21.53GiB / 24.00GiB VRAM 🧐40 t/s tg 🧐with speculative decoding 200 t/s tg CUDA_VISIBLE_DEVICES=%i ./llama-server -hf unsloth/Qwen3.6-27B-GGUF:IQ4_NL -ngl -1 -fa on -fit off --metrics --props --slots --host 0.0.0.0 --port 808%i -dev CUDA0 --temp 0.7 --top-p 0.8 --top-k 20 --min-p 0.0 --presence-penalty 1.5 --repeat-penalty 1.0 --reasoning off --alias "Qwen/Qwen3.6-27B" -c 262144 -ctk q4_0 -ctv q4_0 --spec-default --no-mmproj-offload CUDA_VISIBLE_DEVICES=0,1,2,3 ./llama-server -hf ggml-org/GLM-OCR-GGUF:Q8_0 -ngl -1 -fa on -fit off --metrics --props --slots --host 0.0.0.0 --port 8090 --reasoning off --alias "zai-org/GLM-OCR" -c 32768 -ctk q5_1 -ctv q5_1 -np 1 --no-mmproj-offload
Marko Tasic tweet media
Marko Tasic@mtasic85

🟠Current setup, 22.64GiB / 24.00GiB VRAM pattern on 4x @nvidia RTX 3090, @Alibaba_Qwen and @Zai_org models and @UnslothAI and @ggml_org quants on llama.cpp 🧐40 t/s tg 🤯with speculative decoding 400 t/s tg CUDA_VISIBLE_DEVICES=%i ./llama-server -hf unsloth/Qwen3.6-27B-GGUF:IQ4_NL -ngl -1 -fa on -fit off --metrics --props --slots --host 0.0.0.0 --port 808%i -dev CUDA0 --temp 0.7 --top-p 0.8 --top-k 20 --min-p 0.0 --presence-penalty 1.5 --repeat-penalty 1.0 --reasoning off --alias "Qwen/Qwen3.6-27B" -c 262144 -ctk q4_0 -ctv q4_0 --spec-default CUDA_VISIBLE_DEVICES=0,1,2,3 ./llama-server -hf ggml-org/GLM-OCR-GGUF:Q8_0 -ngl -1 -fa on -fit off --metrics --props --slots --host 0.0.0.0 --port 8090 --reasoning off --alias "zai-org/GLM-OCR" -c 32768 -ctk q5_1 -ctv q5_1 -np 1 --no-mmproj-offload

English
0
0
0
92
Marko Tasic
Marko Tasic@mtasic85·
🟠Current setup, 22.64GiB / 24.00GiB VRAM pattern on 4x @nvidia RTX 3090, @Alibaba_Qwen and @Zai_org models and @UnslothAI and @ggml_org quants on llama.cpp 🧐40 t/s tg 🤯with speculative decoding 400 t/s tg CUDA_VISIBLE_DEVICES=%i ./llama-server -hf unsloth/Qwen3.6-27B-GGUF:IQ4_NL -ngl -1 -fa on -fit off --metrics --props --slots --host 0.0.0.0 --port 808%i -dev CUDA0 --temp 0.7 --top-p 0.8 --top-k 20 --min-p 0.0 --presence-penalty 1.5 --repeat-penalty 1.0 --reasoning off --alias "Qwen/Qwen3.6-27B" -c 262144 -ctk q4_0 -ctv q4_0 --spec-default CUDA_VISIBLE_DEVICES=0,1,2,3 ./llama-server -hf ggml-org/GLM-OCR-GGUF:Q8_0 -ngl -1 -fa on -fit off --metrics --props --slots --host 0.0.0.0 --port 8090 --reasoning off --alias "zai-org/GLM-OCR" -c 32768 -ctk q5_1 -ctv q5_1 -np 1 --no-mmproj-offload
Marko Tasic tweet media
English
0
0
0
88
Marko Tasic retweetledi
Marko Tasic retweetledi
Maxime Rivest 🧙‍♂️🦙🐧
dspy can feel like a blackbox sometimes. Got UI can completly mitigate that! This is the streaming of a prompt optimization run in dspy. You see every single token as they are generated. The instructions and the evaluations!
GIF
English
11
28
288
10.4K
Marko Tasic retweetledi
Samay
Samay@Samaytwt·
Unpopular opinion: "AI makes everyone a developer" is true the same way "cameras makes everyone a photographer"
Samay tweet media
English
764
3.3K
29.1K
1M