Superlinked

645 posts

Superlinked banner
Superlinked

Superlinked

@superlinked

The data engineer’s solution to turning data into vector embeddings.

Katılım Eylül 2019
309 Takip Edilen661 Takipçiler
Superlinked
Superlinked@superlinked·
Right now SO many companies are paying per token for LLM APIs. At scale, that gets expensive very quickly. What’s interesting is that in many cases there are open models with similar capabilities that you can run yourself. The difference is that instead of paying per token, you are paying for GPU infrastructure. The gap between those two models of pricing can easily be one or two orders of magnitude. That is why more teams are starting to look seriously at self-hosting. If you can run the models reliably in your own environment, the cost savings become hard to ignore. @Svonava talks about this shift and why infrastructure for running many specialized models efficiently is becoming an important part of modern AI systems.
English
0
1
2
169
Superlinked
Superlinked@superlinked·
The self hosting small models is an increasingly emerging topic as of late, but where's the evidence? The team was in Belgrade last week, presenting alongside @TopK and @Perplexity, answering that exact question. @f_makraduli presented "The Case for Self-Hosting Small Models". *TLDR: Small models are quietly winning in production AI.* Open source has exploded to over 2.6M models, and open-weight systems are now only about 1 to 3 months behind proprietary frontier models. In some cases, they already match top-tier performance at a fraction of the cost At the same time, task-specific models consistently outperform general LLMs where it matters. They are faster, cheaper, easier to run, and trained on more relevant data. That is why they power things like search, ranking, and extraction in real systems today It appears the future is not one giant model, but many smaller models doing specific jobs to a better standard. Thanks to @KayaVC for the invite!
Superlinked tweet media
English
0
0
2
43
Superlinked
Superlinked@superlinked·
GPUs can deliver hundreds of TFLOPS, so why are they often underutilised during inference? Because the real constraint is often memory bandwidth, not compute. With small batches, GPUs spend much of their time waiting for data to move through memory. The compute cores sit idle because weights and activations cannot be fetched fast enough. Increase the batch size and things start to change. Memory access becomes more efficient, the GPU stays busy doing matrix multiplications, and the bottleneck shifts from memory bandwidth to raw compute. That transition is key to understanding why batching matters so much for inference performance. Filip's article breaks down this shift clearly and explains how it shapes real world GPU utilization. Check it out here: buff.ly/E0dbSHD
Superlinked tweet media
English
0
0
0
24
Superlinked
Superlinked@superlinked·
If you are running search or large scale data processing, you have probably experienced: -Rising API costs. -Experimenting until something breaks in production. -Memory constraints and throughput ceilings that block real workloads. We're working on an alternative... On Feb 27 at 4 PM GMT, @Svonava will preview the Superlinked Inference Engine, our open source software for running Small Language Models in your own cloud. Join us on Maven Live for Optimizing Search & Data Processing with Self-hosted SLMs. We’ll cover: • When SLMs beat LLMs for search and data tasks • How to support 35+ model architectures and LoRAs in production • Designing a multi model cluster pushing 1M tokens per second • How teams cut 95%+ of managed API costs Daniel will be joining AI-Search masterminds @treygrainger and @softwaredoug for this free lightning lesson! Join us here: buff.ly/amM5yBI
Superlinked tweet media
English
0
2
3
328
Superlinked
Superlinked@superlinked·
“System X is fast because it’s written in Rust.” Is this true 100% of the time? Most people assume embedding inference speed comes down to the code they write. Python versus Rust, frameworks etc. In practice, almost none of that is decisive. What really affects embedding latency is memory. GPUs are extremely fast at calculations but comparatively slow at moving data. Generating an embedding is mostly about reading and writing large model weights and intermediate tensors instead of crunching numbers. That is why techniques like Flash Attention (used by popular inference model TEI) matter. They reorganise computation so more work stays in fast on chip cache instead of repeatedly hitting slower GPU memory. Quantisation helps for the same reason. Smaller weights mean less data to move. If you want faster embeddings, start thinking about memory, cache locality, and data movement to realise some actual gains. Or better yet, read Filip’s full deep-dive on the matter here: buff.ly/Kq1y8kZ
Superlinked tweet media
English
0
0
2
91
Superlinked
Superlinked@superlinked·
Using open-source solutions to productionise your embeddings can get you a long way, but the efficiency problem that faces ML and AI Engineers still needs solving… *Some models can generate dense, sparse, and multi vector embeddings in one pass, but today you usually need multiple API calls because these outputs are handled separately. *Running and testing multiple models in production is costly and complex, with limited support for serving many models efficiently when VRAM is constrained. *Differences in embeddings, pooling strategies, and model quirks require careful handling by users, and current systems lack flexible ways to support new model types without code changes. @f_makraduli takes a deep dive into the existing open source inference solutions, what they do well, and what they’re ultimately missing to make everyone’s jobs easier (and to get the most out of your GPUs). Check out the article here: buff.ly/U4bsFOB
Superlinked tweet media
English
0
1
6
221
Superlinked
Superlinked@superlinked·
Problems with your text-embedding models? Filip explains the common issues with the traditional approach to search + embeddings. Superlinked has a smarter approach, using a MIXTURE of embeddings instead. Check out the video to find out more.
English
1
1
5
430
Superlinked
Superlinked@superlinked·
Think you know the vector embeddings space well? Think again! Your embeddings are wrong! @Svonava will open the hood on today’s “state-of-the-art” text and image embeddings at GenAI Week 2025, Silicon Valley on Thursday 17 July, 2:00 – 2:40 PM (PT). Why attend? See the breaking point: examples of pre-trained embeddings failing on tasks that look trivial on paper. Learn how the big players fix it: a peek into FAANG-style models that fuse dozens of real-world signals (price, location, co-purchase graphs, margins and more). Walk away with a blueprint: a Mixture-of-Encoders strategy you can replicate without a research lab. Two case studies: A fashion retailer that unlocked seven-figure incremental revenue. A jobs marketplace that boosted matching quality while cutting infra costs. If you build search, recommendations or retrieval pipelines, this session will save you months of trial and error. 👉 Register here buff.ly/Q6wTVyG and add our keynote “Your Embeddings Are Wrong” to your schedule. Follow Daniel for the chance to get your hands on free tickets to the conference. See you in Santa Clara! 🎟️ hashtag#AI hashtag#GenAI hashtag#VectorSearch hashtag#RecommenderSystems hashtag#MachineLearning
Superlinked tweet media
English
0
1
3
323
Superlinked retweetledi
Qdrant
Qdrant@qdrant_engine·
Smarter search ≠ more keywords. It means understanding meaning, filtering fast, and ranking by real intent. 🚀 On May 15th we’re live with @superlinked showing how it all works in production with @qdrant_engine. 👉 RSVP: lu.ma/p30sy66f
Qdrant tweet media
English
1
3
17
901
Superlinked
Superlinked@superlinked·
Multi-vector is a loaded term these days - it can mean late interaction-compatible representations where one model output is not pooled into a single vector but we use a set of vectors to represent the given (usually string) input. This is done to increase the accuracy of representing that one aspect of our data. It can also mean, like in our case, the ability to run different models to capture *different* aspects of the data, each producing one or multiple vectors. We do this to encode numerical/geospatial and other properties along the textual/image properties of our data objects. Obviously both approaches can be combined for maximum retrieval accuracy and control :-)
English
1
0
1
42
GeraDeluxer
GeraDeluxer@GeraDeluxer·
A very good example of this is @superlinked
GeraDeluxer tweet media
Weaviate AI Database@weaviate_io

Traditional vector embeddings represent entire documents as single vectors. But what if we could capture more nuanced relationships? Enter 𝗺𝘂𝗹𝘁𝗶-𝘃𝗲𝗰𝘁𝗼𝗿 𝗲𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴𝘀. 𝗪𝗵𝗮𝘁 𝗮𝗿𝗲 𝘁𝗵𝗲𝘆? Instead of one vector per document, multi-vector embeddings (like ColBERT) represent each document with multiple vectors. For example: • Single vector: [0.0412, 0.1056, 0.5021,...] • Multi-vector: [[0.0543,...], [0.0123,...], [0.4299,...]] 𝗪𝗵𝘆 𝗮𝗿𝗲 𝘁𝗵𝗲𝘆 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹? Multi-vector embeddings enable "late interaction" - a technique that matches individual parts of texts rather than comparing them as whole units. This preserves fine-grained meaning and enables more precise matching. 𝗛𝗼𝘄 𝗶𝘁 𝘄𝗼𝗿𝗸𝘀: 1. Each token/part of text gets its own vector 2. During a search, each query vector finds its best match in the document 3. Individual matches are combined for a final similarity score 𝗞𝗲𝘆 𝗕𝗲𝗻𝗲𝗳𝗶𝘁𝘀: • Better handling of word order • More precise phrase matching • Improved search accuracy for longer texts 𝗧𝗿𝗮𝗱𝗲-𝗼𝗳𝗳𝘀 𝘁𝗼 𝗖𝗼𝗻𝘀𝗶𝗱𝗲𝗿: - Generally larger sizes (longer text ➡️ larger vectors) - Higher memory & storage costs - Increased inference & search time 𝗜𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻: Weaviate v1.30 now supports multi-vector embeddings for production environments through: 1. ColBERT model integration (via @JinaAI_ ) 2. Custom multi-vector embeddings 3. Quantization techniques for multi-vector embeddings Want to learn more? Join our upcoming technical session: lu.ma/weaviate-relea…

English
1
0
2
60
Superlinked
Superlinked@superlinked·
🌟 We just hit 1K GitHub Stars! Huge thanks to everyone who starred, shared, or contributed 💖 If you're building smart search or recommendations with natural language, check out our OSS framework 👇 🔗 buff.ly/HbePxh5
Superlinked tweet media
English
0
1
7
297
Superlinked
Superlinked@superlinked·
🔍 Personalised search is evolving — semantic relevance isn’t enough. With Superlinked, blend meaning and time using recency_space + negative_filter to surface fresh, relevant results. Build an agentic AI research agent 👇 📘 buff.ly/M3lfnQPbuff.ly/l1RhkMd
Superlinked tweet media
English
0
1
4
204
Superlinked
Superlinked@superlinked·
Superlinked is joining the AI in Production Conference on March 12! 🚀 Watch @Svonava on a panel discussing GenAI vs. traditional ML platforms. Don’t miss this deep dive into AI in production! 🔥 Join us: buff.ly/XvYPXvD #AI #ML #GenA
Superlinked tweet media
English
1
0
1
201
Superlinked
Superlinked@superlinked·
AI is transforming e-commerce! A recent study shows how personalization and predictive analytics drive purchase intent & satisfaction. At Superlinked, we’re helping brands harness AI for personalized, high-converting shopping experiences. Ready to level up? #AI #Ecommerce
Superlinked tweet media
English
0
0
0
135
Superlinked
Superlinked@superlinked·
Ever wanted to use AI to create your own D&D monster army? 🧌 Find out how, in our first collaboration with @HackerNoon. We explore how to use multi-attribute vector search to create the perfect monster to suit any game. Read the full guide here 🧌👉 lnkd.in/e3Xfe_zQ
Superlinked tweet media
English
0
1
1
433
Superlinked
Superlinked@superlinked·
@YonatanLavy Want us to roll you a backend that you can call from your TS? ^_^ What are you building?
English
1
0
1
14
Yonatan Lavy
Yonatan Lavy@YonatanLavy·
Does anyone know of an alternative to @superlinked for embedding in typescript? The goal is to take structured data + unstructured data of any type and length and rely on retrieval good enough over time without manually adjusting the method
English
1
0
1
18
Superlinked
Superlinked@superlinked·
🦈 BE A SHARK 🦈 One of our core values: Move impatiently toward maximum value. Our Superlinked Shark is now a sticker! Catch us at events to snag one. We don’t bite. 😁
Superlinked tweet media
English
0
0
2
189