Cossale — oss/acc

320 posts

Cossale — oss/acc

Cossale — oss/acc

@XCossale

Working with LLMs and Diffusion models. prev - @revancedapp. Available for contracts. @KeplerSystems

India Bergabung Nisan 2018
537 Mengikuti123 Pengikut
Cossale — oss/acc
Cossale — oss/acc@XCossale·
Pi agent core by @badlogicgames makes it so much easier to integrate LLMs into apps without having to create custom harnesses
Cossale — oss/acc tweet media
English
0
0
2
120
Max Weinbach
Max Weinbach@mweinbach·
What do we think? Will this be successful?
Max Weinbach tweet media
English
7
1
92
13.2K
Cossale — oss/acc
Cossale — oss/acc@XCossale·
@norpadon It's also very very good at audio understanding. Competes fairly with Gemini 2.5 Pro (beating flash!). Unfortunate that it's limited to only 30 seconds.
English
0
0
0
88
Artur Chakhvadze
Artur Chakhvadze@norpadon·
Gemma3n is still by far the most galaxy-brained edge architecture and it’s not even close It’s a shame that almost nobody gives a damn about it because they never released a paper and its a pain in the ass to implement properly
English
4
2
120
12.1K
Zach Mueller
Zach Mueller@TheZachMueller·
Every AI researcher has that one model they love deeply, even if it’s outdated
English
29
0
70
7.7K
vega
vega@vega_holdings·
@max_paperclips this fucking terrifies me for having a google account that's decades old that if i tell gemini to stop being a retard on day they will delete my account
English
3
0
14
269
Cossale — oss/acc
Cossale — oss/acc@XCossale·
@Teknium @teortaxesTex Because there is no serious alternative at the moment. Gemini is basically your only **reliable** option for multi-modality and it gets even harder for non-English tasks.
English
0
0
1
16
Teknium 🪽
Teknium 🪽@Teknium·
@teortaxesTex What does the multimodality get them long term - products mostly or do you see an RSI angle to multimodality focus
English
2
0
2
771
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Contra Lisan I don't think this is benchmaxing, btw. Indeed I suspect that Gemini's performance on procedural generation of graphics (it goes beyond SVG) is a product of its deep agvantages in multimodality, and it's GDM's long-term bet. Like OAI's reasoning, and Ant's coding.
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media
Design Arena@Designarena

BREAKING: Gemini 3.1 Pro Preview has landed in #1 on SVG Arena by Design Arena with an ELO of 1421 This 87-point lead the largest winning margin that we've seen a model have on SVG Arena since the arena launch Huge congratulations to the @GoogleDeepMind team!

English
8
4
62
6.3K
Alpin
Alpin@AlpinDale·
@yacineMTB IDK I rent 5090s for about 20 cents an hour.
English
4
0
20
1.5K
Cossale — oss/acc
Cossale — oss/acc@XCossale·
@anametolast @dhtikna Qwen 3 235B hosted by W & B is probably better deal than both of those for $.10 and in and output. Using it alot recently for synth data. Around the same performance as gpt oss 120b but half the price and bf16. Also hi ktibow, long time no see after revanced!
English
0
0
2
101
KTibow
KTibow@anametolast·
@dhtikna this is untrue.
KTibow tweet mediaKTibow tweet media
English
2
0
1
127
Ankith 🐋/acc
Ankith 🐋/acc@dhtikna·
Complete MOGGING by Deepseek, 673B whale costs the same as 120B gpt-oss, no competition by a mile in architechture efficiency Remember they make $0.80 in profit per $1.00
Ankith 🐋/acc tweet media
English
9
2
146
17.2K
Cossale — oss/acc
Cossale — oss/acc@XCossale·
@YouJiacheng @Zai_org I think the model is supposed to move the file to another folder so it can be downloaded but it doesn't do it half the time. Tell the model that you can't see the file and it should move it correctly. Had the same issues with claude site few months ago
English
0
0
1
70
Alexander Doria
Alexander Doria@Dorialexander·
Data annotation is probably one of the few areas where the harness could be the product right now. Models have the capacity, even the meta-skill to orchestrate subagents for full pipelines, just badly controlled and implemented.
English
8
1
63
3K
elie
elie@eliebakouch·
i think we don't realize the impact that deepseek had on the open ecosystem, there is so much from them that you can find in almost every frontier open llm today > most of the open frontier models follow the "finegrain + sparse + shared expert" deepseek moe recipe > a lot of them use MLA > first (with minicpm) to use sparse attention in prod (DSA) > first to do reasoning in the open with R1 > GRPO which is the foundation for most of the newer RL algorithms > they also innovated on the training recipe at scale, first to do fp8? MTP? load balancing schemes that now other lab is using > advance training/inference infra with oss release like DeepEP that pretraining lib like megatron use i'm so grateful deepseek exists
English
11
35
291
25.6K
Cossale — oss/acc
Cossale — oss/acc@XCossale·
@0xSero @BinxNet @fatihozgen85 You can use your ai-data-extraction repo and then filter by Edit tool to get FIM dataset. Here is my test on `Qwen3-0.6B` with just ~4k rows and 5 minutes of training.
Cossale — oss/acc tweet media
English
0
0
1
68
0xSero
0xSero@0xSero·
I just had to make a new video of GLM-4.7-Flash - Helping me refactor VLLM studio - Did a data analytics report for work - Managed to search my tweets - Made me a fully playable Pacman in 1 shot - Great at browser use This model is too good to be this small, the full thing will fit on a macbook, it's fast, precise and can do pretty much anything Sonnet can. It somehow benches higher than Sonnet 4, 3.7, Opus 4.5 no thinking, all the GPT models from last year and more. youtu.be/_SDyaPYmIxU
YouTube video
YouTube
English
30
26
324
38.2K
Cossale — oss/acc
Cossale — oss/acc@XCossale·
@kalomaze @celestepoasts My main issue with GLM models is that the instruction following is kind of bad. I get to test all models at work, but I would prefer to use Minimax 2.1 due to better instruction following and consistency, despite it being less smart than GLM models. Same with Gemini models.
English
0
0
0
30
kalomaze
kalomaze@kalomaze·
@celestepoasts glm models are usually quaint, if not exactly super polished on the post training side, i never really get "its super burnt" feelings about them
English
1
0
32
986
kalomaze
kalomaze@kalomaze·
this is so weird to me because sonnet3.6 was ~50 on swebench before they really started going all in on agents. there really is a lot of low hanging fruit left, even for the mid range ~30b total params models with more post training iteration
elie@eliebakouch

GLM team is now using MLA!! this is pretty insane model with 30B total param and about 4B active. very nice release in terms of structure it's approximatively the same depth as glm4.5 air and qwen3 30B A3B, 64 total expert instead of 128, but they only active 5 instead of 9 if you count the shared expert

English
8
3
211
25.2K
Cossale — oss/acc
Cossale — oss/acc@XCossale·
@0xSero Extracted all my sessions. Time to fine-tune an auto-complete model
Cossale — oss/acc tweet media
English
0
0
2
552
0xSero
0xSero@0xSero·
Here's the dumbest engineering idea to get GLM-4.7 at 32GB VRAM Make skills out of all your most successful AI chats. Store the sessions under that skill, link every chat that uses that skill. You can also do this to refine and personalize small models. If you're using this tech enough you will easily rack up 100s of billions of tokens (training data.) - Take a big open model, anyone of them is fine. - Take all your skills you've accumulated along with all the session chats. - Organize very very well, or ask GPT/Claude to lol. - Now you have a calibration dataset. - Rent a 8x H100/H200 pod for 200$ - Prune your desired model around your calibration set. - It'll be dumb as a brick in most places, but you retain full fidelity of activations. - Use the big models to build a control board - Click button to get your macbook to do your workflows. No reason this shouldn't work.
0xSero tweet media
English
18
16
334
23.8K
elie
elie@eliebakouch·
Most web data in (very) low resource languages is Bible and Wikipedia. The rest? @huggingface data team ran Gemma3 27B for 3 months to translate it into english, to improve translation models and to bring cultural context from 500+ language communities into english training data. Here is the full pipeline huggingface.co/datasets/Huggi…
elie tweet media
Guilherme Penedo@gui_penedo

We are releasing a large scale synthetic dataset: 💬FineTranslations. We took 🥂 FineWeb2, our multilingual pre-training dataset, and translated it into English using Gemma3 27B. The result is a massive parallel corpora, with more than 1 trillion tokens!

English
8
19
192
37.1K
Cossale — oss/acc
Cossale — oss/acc@XCossale·
What if spec-first API development is better for LLM workflows than code-first? LLMs degrade with large codebases. Spec-first keeps context small at every step: User requirements → [LLM] → OpenAPI spec (300 lines) Spec → [oapi-codegen] → Type-safe interfaces Interface + logic → [LLM] → Handlers (50 lines each) The spec compresses requirements into something reviewable. Tools like oapi-codegen handle boilerplate. Each handler gets focused context. LLMs removed what made spec-first tedious (implementing boilerplate), revealing what was always valuable: bounded context, mechanical consistency, independent regeneration. The spec becomes the compression layer that keeps the whole process tractable. Could generalize beyond APIs.
English
0
0
0
163