Nuno Rodrigues

388 posts

Nuno Rodrigues

@nmvrodrigues

Senior data scientist at OLX | ex Zendesk; ex PhD @ Champalimaud

Lisbon, Portugal Katılım Aralık 2014

232 Takip Edilen136 Takipçiler

Nuno Rodrigues@nmvrodrigues·3 Ara

@skalskip92 Its possible, i see the keypoint loss going down but plateuaing still far from an "decent" value. Looking at the predictions while the bounding box is correctly identified, the keypoints are still always consistently off

English

SkalskiP@skalskip92·3 Ara

@nmvrodrigues is it undertrained?

English

189

SkalskiP@skalskip92·2 Ara

does it make sense?

SkalskiP@skalskip92

this might be the coolest blogpost I ever written I dove deep into: - player detection with RF-DETR - player tracking with SAM2 - team clustering with SigLIP and K-means - number recognition with SmolVLM2 and ResNet I hope you'll like it link: blog.roboflow.com/identify-baske…

English

1.1K

137.8K

Nuno Rodrigues retweetledi

SkalskiP@skalskip92·15 Kas

pretty crazy what you can build with RF-DETR, supervision and 10 lines of code

SkalskiP@skalskip92

RF-DETR paper is finally on arXiv - real time detection with DINOv2 backbone - runs neural architecture search (NAS) over about 6000 architecture variants - uses weight sharing across all configs - first real-time segmentation DETR to break past top YOLO results ↓ more

English

130

1.3K

295.5K

Nuno Rodrigues@nmvrodrigues·5 Kas

@charsilawhori @skalskip92 Not yet, it will be but I still need to clean it and structure it. Will release ir along with the roboflow labeled datasets when I get the chance! :)

English

Choudhry@itsur_CAB·5 Kas

@nmvrodrigues @skalskip92 Looks dope, is the code open source?

English

Nuno Rodrigues@nmvrodrigues·5 Kas

Got inspired by @skalskip92 and decided to do a side project on sports analytics to get back into computer vision and learn some new things. Initial version of the Padel-AI, looking for more insights to extract and start on action recognition for the different swings

English

2.9K

Nuno Rodrigues@nmvrodrigues·5 Kas

@skalskip92 Bottom, when I tried the center I got some unwanted increase in total distance traveled if the players just stretched inplace to reach the ball

English

136

SkalskiP@skalskip92·5 Kas

@nmvrodrigues are you using bottom or center of the bounding box to calculate the stats?

English

1.2K

Nuno Rodrigues retweetledi

Xiaojian Ma@jeasinema·23 Eki

Maybe embodied RAG could be better off? Since our embodied-videoagent.github.io, glad to see more efforts pouring in for building robot memories.

GIF

stash@stash_pomichter

Introducing Spatial Memory for your robots. Spatiotemporal RAG. Open source. Coming soon.

English

349

28.7K

Nuno Rodrigues retweetledi

stash@stash_pomichter·22 Eki

Introducing Spatial Memory for your robots. Spatiotemporal RAG. Open source. Coming soon.

English

217

1.9K

140.9K

Nuno Rodrigues retweetledi

Andrej Karpathy@karpathy·21 Eki

I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter. The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language person) is whether pixels are better inputs to LLMs than text. Whether text tokens are wasteful and just terrible, at the input. Maybe it makes more sense that all inputs to LLMs should only ever be images. Even if you happen to have pure text input, maybe you'd prefer to render it and then feed that in: - more information compression (see paper) => shorter context windows, more efficiency - significantly more general information stream => not just text, but e.g. bold text, colored text, arbitrary images. - input can now be processed with bidirectional attention easily and as default, not autoregressive attention - a lot more powerful. - delete the tokenizer (at the input)!! I already ranted about how much I dislike the tokenizer. Tokenizers are ugly, separate, not end-to-end stage. It "imports" all the ugliness of Unicode, byte encodings, it inherits a lot of historical baggage, security/jailbreak risk (e.g. continuation bytes). It makes two characters that look identical to the eye look as two completely different tokens internally in the network. A smiling emoji looks like a weird token, not an... actual smiling face, pixels and all, and all the transfer learning that brings along. The tokenizer must go. OCR is just one of many useful vision -> text tasks. And text -> text tasks can be made to be vision ->text tasks. Not vice versa. So many the User message is images, but the decoder (the Assistant response) remains text. It's a lot less obvious how to output pixels realistically... or if you'd want to. Now I have to also fight the urge to side quest an image-input-only version of nanochat...

vLLM@vllm_project

🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping 97% OCR accuracy at <10×. 📄 Outperforms GOT-OCR2.0 & MinerU2.0 on OmniDocBench using fewer vision tokens. 🤝 The vLLM team is working with DeepSeek to bring official DeepSeek-OCR support into the next vLLM release — making multimodal inference even faster and easier to scale. 🔗 github.com/deepseek-ai/De… #vLLM #DeepSeek #OCR #LLM #VisionAI #DeepLearning

English

564

1.6K

13.4K

3.3M

Nuno Rodrigues retweetledi

Zhenjun Zhao@zhenjun_zhao·20 Eki

CuSfM: CUDA-Accelerated Structure-from-Motion Jingrui Yu, Jun Liu, Kefei Ren, @Joydeepb_robots, Rurui Ye, Keqiang Wu, Chirag Majithia, Di Zeng tl;dr: in title; ALIKED+LightGlue arxiv.org/abs/2510.15271

Română

110

24.7K

Nuno Rodrigues retweetledi

chester@chesterzelaya·6 Eki

< Choosing a Vision Backbone > your model’s backbone is its perspective pick ResNet, and it sees in edges pick a ViT, and it sees in patches the backbone decides how your model thinks here are some of the most practical backbones and when you should choose them, from the paper "Battle of the Backbones" (2023): > ResNet - good for fast prototyping, small models, and edge devices > ConvNeXt - great all-purpose backbone; strong for detection & segmentation > Swin Transformer (V2) - best for large-scale detection, segmentation, and high-res inputs > ViT (Vision Transformer) - good when you have huge datasets; less bias, more global context > CLIP - best for vision-language, zero-shot, and retrieval tasks > DINO / MoCo / MAE (SSL) - great when you have little or no labeled data > MiDaS - surprisingly strong if you care about depth, geometry, or robotics perception > Stable Diffusion Encoder - useful for creative or aesthetic tasks; not for accuracy-critical CV > EfficientNet / RegNet / ResNet-18 - good lightweight options for edge or mobile deployment

English

112

996

58.5K

Nuno Rodrigues retweetledi

chester@chesterzelaya·2 Eki

v0.2.0 RELEASE so much work went behind this UI/UX overhaul build autonomous drone agents, wirelessly powered by external GPU's to run the heaviest of AI models up to 10km range win / linux version coming later next week!

English

243

17.6K

Nuno Rodrigues retweetledi

Gabriele Berton@gabriberton·24 Eyl

[paper release!] Did you know that you can - speed up any LLM by 4x - and reduce its memory footprint by 2x - and improve its results - without modifying the model at all How??? Here is how we do it 🧵

Gabriele Berton@gabriberton

Did you know that you can - speed up any LLM by 4x - and reduce its memory footprint by 2x - and improve its results - without modifying the model at all How??? Paper and code coming out in a couple of days

English

730

126.3K

Nuno Rodrigues retweetledi

Sakana AI@SakanaAILabs·25 Eyl

We’re excited to introduce ShinkaEvolve: An open-source framework that evolves programs for scientific discovery with unprecedented sample-efficiency. Blog: sakana.ai/shinka-evolve/ Code: github.com/SakanaAI/Shink… Like AlphaEvolve and its variants, our framework leverages LLMs to find state-of-the-art solutions to complex problems, but using orders of magnitude fewer resources! Many evolutionary AI systems are powerful but act like brute-force engines, burning thousands of samples to find good solutions. This makes discovery slow and expensive. We took inspiration from the efficiency of nature. ‘Shinka’ (進化) is Japanese for evolution, and we designed our system to be just as resourceful. On the classic circle packing optimization problem, ShinkaEvolve discovered a new state-of-the-art solution using only 150 samples. This is a big leap in efficiency compared to previous methods that required thousands of evaluations. We applied ShinkaEvolve to a diverse set of hard problems with real-world applications: 1/ AIME Math Reasoning: It evolved sophisticated agentic scaffolds that significantly outperform strong baselines, discovering an entire Pareto frontier of solutions trading performance for efficiency. 2/ Competitive Programming: On ALE-Bench (a benchmark for NP-Hard optimization problems), ShinkaEvolve took the best existing agent's solutions and improved them, turning a 5th place solution on one task into a 2nd place leaderboard rank in a competitive programming competition. 3/ LLM Training: We even turned ShinkaEvolve inward to improve LLMs themselves. It tackled the open challenge of designing load balancing losses for Mixture-of-Experts (MoE) models. It discovered a novel loss function that leads to better expert specialization and consistently improves model performance and perplexity. ShinkaEvolve achieves its remarkable sample-efficiency through three key innovations that work together: (1) an adaptive parent sampling strategy to balance exploration and exploitation, (2) novelty-based rejection filtering to avoid redundant work, and (3) a bandit-based LLM ensemble that dynamically picks the best model for the job. By making ShinkaEvolve open-source and highly sample-efficient, our goal is to democratize access to advanced, open-ended discovery tools. Our vision for ShinkaEvolve is to be an easy-to-use companion tool to help scientists and engineers with their daily work. We believe that building more efficient, nature-inspired systems is key to unlocking the future of AI-driven scientific research. We are excited to see what the community builds with it! Learn more in our technical report: arxiv.org/abs/2509.19349

English

252

1.4K

355.8K

Nuno Rodrigues retweetledi

tomaarsen@tomaarsen·9 Eyl

ModernBERT goes MULTILINGUAL! One of the most requested models I've seen, @jhuclsp has trained state-of-the-art massively multilingual encoders using the ModernBERT architecture: mmBERT. Stronger than an existing models at their sizes, while also much faster! Details in 🧵

English

268

27.3K

Nuno Rodrigues retweetledi

Vector Wang@VectorWang2·2 Eyl

XLeRobot 0.3.0 Showcases Open fridge, get drinks, fill ice, wipe table, clean room, take care plants and cats... All for 660$, fully open-sourced, based on HF LeRobot. Teleop with Joy-con, or RL/VLA. Assembly kit ready for purchase soon Stay tuned! github.com/Vector-Wangel/…

English

317

17.8K

Nuno Rodrigues retweetledi

Rohan Paul@rohanpaul_ai·31 Ağu

BRILLIANT @GoogleDeepMind research. Even the best embeddings cannot represent all possible query-document combinations, which means some answers are mathematically impossible to recover. Reveals a sharp truth, embedding models can only capture so many pairings, and beyond that, recall collapses no matter the data or tuning. 🧠 Key takeaway Embeddings have a hard ceiling, set by dimension, on how many top‑k document combinations they can represent exactly. They prove this with sign‑rank bounds, then show it empirically and with a simple natural‑language dataset where even strong models stay under 20% recall@100. When queries force many combinations, single‑vector retrievers hit that ceiling, so other architectures are needed. 4096‑dim embeddings already break near 250M docs for top‑2 combinations, even in the best case. 🛠️ Practical Implications For applications like search, recommendation, or retrieval-augmented generation, this means scaling up models or datasets alone will not fix recall gaps. At large index sizes, even very high-dimensional embeddings fail to capture all combinations of relevant results. So embeddings cannot work as the sole retrieval backbone. We will need hybrid setups, combining dense vectors with sparse methods, multi-vector models, or rerankers to patch the blind spots. This shifts how we should design retrieval pipelines, treating embeddings as one useful tool but not a universal solution. 🧵 Read on 👇

English

371

2.4K

241K

Nuno Rodrigues@nmvrodrigues·29 Ağu

TIL is easier to setup a LORA adpater and fine tune gemma 3 than running inference on a ConvNeXt for a binary dataset

English

Nuno Rodrigues@nmvrodrigues·28 Ağu

In this day and age, is it still worth it to train models at home, provided you have the gpu, considering only electricity costs VS using any cloud provider? For things that can fit under 20GB of VRAM in a single gpu

English

Nuno Rodrigues retweetledi

hardmaru@hardmaru·25 Ağu

Our new GECCO paper builds on our past work, showing how AI models can be evolved like organisms. By letting models evolve their own merging boundaries, compete to specialize, and find ‘attractive’ partners to merge with, we can create adaptive, robust and scalable AI ecosystems.

Sakana AI@SakanaAILabs

What if we could evolve AI models like organisms in nature, letting them compete, mate, and combine their strengths to produce ever-fitter offspring? Excited to share our new work: “Competition and Attraction Improve Model Fusion” presented at GECCO’25🦎 where it was a runner-up for best paper! Paper: arxiv.org/abs/2508.16204 Code: github.com/SakanaAI/natur… Summary of Paper At Sakana AI, we draw inspiration from nature’s evolutionary processes to build the foundation of future AI systems. Nature doesn’t create one single, monolithic organism; it fosters a diverse ecosystem of specialized individuals that compete, cooperate, and combine their traits to adapt and thrive. We believe AI development can follow a similar path. What if instead of building one giant monolithic AI, we could evolve a whole ecosystem of specialized models that collaborate and combine their skills? Like a school of fish 🐟, where collective intelligence emerges from the group. This new paper builds on our previous research on model merging, which follows such an evolutionary path. We started by using evolution to find the best “recipes” to merge existing models (our Nature Machine Intelligence paper: nature.com/articles/s4225…). Then, we explored how to maintain diversity to acquire new skills in LLMs (our ICLR 2025 paper: openreview.net/forum?id=Kvdh1…). Now, we're combining these ideas into a full evolutionary system. A key limitation remained in earlier work: model merging required manually defining how models should be partitioned (e.g., by fixed layer or blocks) before they could be combined. What if we could let evolution figure that out too? Our new paper proposes M2N2 (Model Merging of Natural Niches), a more fluid method, which overcomes this with three key, nature-inspired ideas: 1/ Evolving Merging Boundaries 🌿: Instead of merging models using pre-defined, static boundaries (e.g. fixed layers), M2N2 dynamically evolves the “split-points” for merging. This allows for a far more flexible and powerful exploration of parameter combinations, like swapping variable-length segments of DNA rather than entire chromosomes. 2/ Diversity through Competition 🐠: To ensure we have a rich pool of models to merge, M2N2 makes them compete for limited resources (i.e., data points in a training set). This forces models to specialize and find their own “niche,” creating a population of diverse, high-performing specialists that are perfect for merging. 3/ Attraction and Mate Selection 💏: Merging models can be computationally expensive. M2N2 introduces an “attraction” heuristic that intelligently pairs models for fusion based on their complementary strengths—choosing partners that perform well where the other is weak. This makes the evolutionary search much more efficient. Does it work? The results are fascinating: This is the first time model merging has been used to evolve models entirely from scratch, outperforming other evolutionary algorithms. In one experiment, starting with random networks, M2N2 evolved an MNIST classifier that achieves performance comparable to CMA-ES, but is far more computationally efficient. Does it scale? We also showed that M2N2 can scale to large, pre-trained models: We used M2N2 to merge a math specialist LLM with an agentic specialist LLM. M2N2 produced a merged model that excelled at both math and web shopping tasks, significantly outperforming other methods. The flexible split-point was crucial here. Does it work on multimodal models? When we applied M2N2 to text-to-image models, we merged several models by adapting them only for Japanese prompts. The resulting model not only improved on Japanese but also retained its strong English capabilities—a key advantage over fine-tuning, which can suffer from catastrophic forgetting. This nature-inspired approach is central to Sakana AI’s mission to find new foundations for AI based on collective intelligence. Rather than scaling monolithic models, we envision a future where ecosystems of diverse, specialized models co-evolve, collaborate, and combine, leading to more adaptive, robust, and creative AI. 🐙 We hope this work sparks more interest in these under-explored ideas! Published in ACM GECCO’25: Proceedings of the Genetic and Evolutionary Computation Conference. DOI: doi.org/10.1145/371225…

English

397

65.2K

Keşfet

@skalskip92 @Joydeepb_robots @jhuclsp @GoogleDeepMind @elonmusk @BarackObama @taylorswift13 @cristiano