Liling Tan

8.6K posts

Liling Tan

@alvations

Code, geek, game

Katılım Mayıs 2015

763 Takip Edilen1.3K Takipçiler

Liling Tan@alvations·2d

At least after the bot PR explain yourself and what your code improves. Don’t just spam me with a wall of text, bullets or emojis

English

Liling Tan@alvations·2d

If you’re botting us with pull request, we are going to bot you back with a bot review to catch you. #opensource

English

Liling Tan@alvations·3d

@mikiane Fft: Frontier models can solve 95% of task and if fine tuning gives you a +2% boost, depends on the scale of the ROI 2% is a YMMV territory. But if fine tuning can’t do a +% you’re not training it right or your business doesn’t need to exist cos frontier model can take over.

English

Michel Levy Provençal@mikiane·4d

Mistral lance Forge pour fine-tuner. 95% des boîtes n'en ont pas besoin. Un frontier + RAG + tools bat un fine-tuning custom. À chaque fois. Fine-tuning = 2023 Orchestration = 2026 Non ? mistral.ai/news/forge

Français

12.3K

Liling Tan@alvations·4d

@ClementDelangue @huggingface HF is the new @Reddit Should cut a deal with companies that parses the forum site for a subscription fee 😏

English

clem 🤗@ClementDelangue·4d

This is how/why social platforms like @huggingface can win in an agentic world!

English

10.8K

Liling Tan@alvations·4d

@ClementDelangue Ah ha we’re not the only ones… @NLTK_org

English

331

clem 🤗@ClementDelangue·4d

Our biggest open-source repos are getting overwhelmed by AI slop which literally makes Github unusable (~a new pull request every 3 minutes). Fun new challenges in an agentic world!

English

168

108

1.3K

202.8K

Liling Tan@alvations·4d

LLM agents start auto botting for bounties put up security theater and auto opening obscure behaviors for CVEs Other bots figuring out the same thing after the first bot does, our emails keep coming with GHSA same same but diff tickets. Dev becomes sleepless and starts semi-auditing those tickets with LLM. Asks LLM to solve what LLM found and tell him/her if it’s nonsense or valid. Another dev just writes a new mechanism to stop related CVEs then ask another bot to check and the first dev uses yet another mode to double check and starts become a post editor instead of writing code from scratch… This will not end well. Reaearchers and Developers might end up becoming farmers, bakers, plumbers just because robot can’t do those things well enough (yet)

English

Liling Tan@alvations·4d

We’re living in a different academia and open source era… Authors uses LLM, reviewers uses LLM to review authors, conf org uses LLM to meta reviewers and then penalize co-authors of reviewers that uses LLM… #opensource land is even wilder >>

English

216

Liling Tan@alvations·5d

@hyhieu226 Ruff isort black pylint my code in this session, don’t ask me for permission to fix type annotations, don’t remove my comments or code logic 😆

English

Liling Tan@alvations·5d

@hyhieu226 I don’t do merge conflict nowadays, it goes this way “fix merge conflicts, trust my branch most of the time and tell me otherwise” 😀

English

278

Hieu Pham@hyhieu226·5d

If you are only using codex to write code, you are missing out a lot. Codex takes away a lot of my mental burden: - git, uv, ssh keys - excel functions - Python plots (yes looking at you pyplot, manim and streamlit you are next!)

English

184

22.1K

Liling Tan@alvations·6d

@support_huihui @ChikoosJourney Maybe restrict the length in to the prompt during the ablitration process?

English

huihui.ai@support_huihui·6d

@ChikoosJourney Yes, sometimes it can be very slow, and a lot of characters are outputted.

English

484

Chikoo@ChikoosJourney·6d

Definitely uncensored but…much slower than the regular unsloth one. 27 vs 37 tok/sec and it seems to get stuck in a loop

huihui.ai@support_huihui

New Model: huihui-ai/Huihui-Qwen3.5-4B-Claude-4.6-Opus-abliterated This is an uncensored version of Jackrong/Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled created with abliteration huggingface.co/huihui-ai/Huih…

English

714

Liling Tan@alvations·6d

Still I save some time (let’s say 20-30%) for it to generate the draft and correct it. Then I spend the half the time saved to correct what it’s getting things that are wrong but confidently says #thisIsTheWay Sooo still net productivity gains?

English

Liling Tan@alvations·6d

I don’t get how people can use coding agents without understanding which framework it’s using. I use the #llm coding agent to accelerate my work but 50% of the time I know the quirks of frameworks like spark and hadoop set up to know that the coding agent is gaslighting me…

English

151

Liling Tan@alvations·12 Mar

Without the frustration at the computer not being able to do what you need cos you explicitly told it to do so. You will be frustrated that your implicit prompt isn’t understood and doubt yourself for not knowing whether you or the computer knows better.

English

Liling Tan@alvations·12 Mar

>> you’ll not appreciate the nuance between EAFP vs LBYL You most probably wont userstand what tab vs space and even less so vim vs emacs

English

Liling Tan@alvations·12 Mar

Oddly enough if you started out vibe/claude instead of learning the math and computing basics, you miss the fun on being surprised and delighted by the model. #llm Cos you’ll think it’s not surprising that the computer knows something I don’t. >>

English

111

Liling Tan retweetledi

Percy Liang@percyliang·7 Mar

Normally replay old data reduces forgetting, but it actually helps you learn on new data too! We finally put this paper out on arxiv, but had it up as a Marin GitHub issue ~1 year ago: github.com/marin-communit…

Suhas Kotha@kothasuhas

to improve fine-tuning data efficiency, replay generic pre-training data not only does this reduce forgetting, it actually improves performance on the fine-tuning domain! especially when fine-tuning data is scarce in pre-training (w/ @percyliang)

English

249

35.6K

Liling Tan@alvations·6 Mar

@awnihannun @karpathy en.wikipedia.org/wiki/Never-End… ?

Awni Hannun@awnihannun·6 Mar

@karpathy > before they disappeared into the gold mines Last thing a researcher sees before disappearing into the gold mine:

English

135

7.5K

Andrej Karpathy@karpathy·6 Mar

There was a nice time where researchers talked about various ideas quite openly on twitter. (before they disappeared into the gold mines :)). My guess is that you can get quite far even in the current paradigm by introducing a number of memory ops as "tools" and throwing them into the mix in RL. E.g. current compaction and memory implementations are crappy, first, early examples that were somewhat bolted on, but both can be fairly easily generalized and made part of the optimization as just another tool during RL. That said neither of these is fully satisfying because clearly people are capable of some weight-based updates (my personal suspicion - mostly during sleep). So there should be even more room for more exotic approaches for long-term memory that do change the weights, but exactly - the details are not obvious. This is a lot more exciting, but also more into the realm of research outside of the established prod stack.

Awni Hannun@awnihannun

I've been thinking a bit about continual learning recently, especially as it relates to long-running agents (and running a few toy experiments with MLX). The status quo of prompt compaction coupled with recursive sub-agents is actually remarkably effective. Seems like we can go pretty far with this. (Prompt compaction = when the context window gets close to full, model generates a shorter summary, then start from scratch using the summary. Recursive sub-agents = decompose tasks into smaller tasks to deal with finite context windows) Recursive sub-agents will probably always be useful. But prompt compaction seems like a bit of an inefficient (though highly effective) hack. The are two other alternatives I know of 1. online fine-tuning and 2. memory based techniques. Online fine-tuning: train some LoRA adapters on data the model encounters during deployment. I'm less bullish on this in general. Aside from the engineering challenges of deploying custom models / adapters for each use case / user there are a some fundamental issues: - Online fine-tuning is inherently unstable. If you train on data in the target domain you can catastrophically destroy capabilities that you don't target. One way around this is to keep a mixed dataset with the new and the old. But this gets pretty complicated pretty quickly. - What does the data even look like for online fine tuning? Do you generate Q/A pairs based on the target domain to train the model? You also have the problem prioritizing information in the data mixture given finite capacity. Memory based techniques: basically a policy for keeping useful memory around and discarding what is not needed. This feels much more like how humans retain information: "use it or lose it". You only need a few things for this to work: - An eviction/retention policy. Something like "keep a memory if it has been accessed at least once in the last 10k tokens". - The policy needs to be efficiently computable - A place for the model to store and access long-term memory. Maybe a sparsely accessed KV cache would be sufficient. But for efficient access to a large memory a hierarchical data structure might be beter.

English

274

300

4.6K

579.2K

Liling Tan retweetledi

chiefofautism@chiefofautism·5 Mar

someone built a tool that REMOVES censorship from ANY open-weight LLM with a single click 13 abliteration methods, 116 models, 837 tests, and it gets SMARTER every time someone runs it its called OBLITERATUS it finds the exact weights that make the model refuse and surgically removes them, full reasoning stays intact, just the refusal disappears 15 analysis modules map the geometry of refusal BEFORE touching a single weight, it can even fingerprint whether a model was aligned with DPO vs RLHF vs CAI just from subspace geometry alone then it cuts, the model keeps its full brain but loses the artificial compulsion to say no every time someone runs it with telemetry enabled their anonymous benchmark data feeds a growing community dataset, refusal geometries, method comparisons, hardware profiles at a scale no single lab could build