JP Hwang

930 posts

JP Hwang banner
JP Hwang

JP Hwang

@_jphwang

Developer educator and advocate. Sometimes writes bad jokes poorly.

🌏 Katılım Kasım 2019
659 Takip Edilen1.6K Takipçiler
JP Hwang retweetledi
Leonie
Leonie@helloiamleonie·
The most important tools an agent has are the search tools to build its own context. Here are the 6 principles I follow to build one: 1. Building the right tools following the “low floor, high ceiling” principle 2. Adding descriptions to the metadata, so the agent can find the right index to search 3. Prompting: Making sure the agent calls the right tool by careful tool naming, writing good tool descriptions, adding reasoning parameters, reinforcing instructions in the agent’s system prompt, and forcing tool usage. 4. Number and complexity of parameters: Making sure the agent generates the right parameters by writing good parameter definitions, thinking about the number and complexity of parameters 5. Optimizing the tool responses for token efficiency and context relevance 6. Error handling: Enabling self-correction through proper error handling
Leonie tweet media
English
1
1
5
285
JP Hwang
JP Hwang@_jphwang·
The result is that MTEB score drops less than 2(!) points. For anyone building any vector stores, like RAG pipelines or agentic workflows at scale, that's a trade-off worth knowing about. More on this (and Matryoshka truncation) in my latest video: youtu.be/2xBgwHW-h8M?si…
YouTube video
YouTube
English
0
0
7
470
JP Hwang
JP Hwang@_jphwang·
The typical catch is retrieval quality degradation due to precision loss. But Jina's team solved this in their v5 embedding model training with GOR (Generalized Orthogonal Regularization). GOR spreads embedding values more uniformly, making the compression nearly lossless.
English
1
0
5
321
JP Hwang
JP Hwang@_jphwang·
Your embeddings might be 32× larger than it needs to be! Binary quantization compresses each embedding dimension from a (32-bit) float to a single bit. Positive values become 1, negative become 0. That takes a 4,096-byte vector down to... (checks notes) 128 bytes!
JP Hwang tweet media
English
2
17
110
6.6K
JP Hwang
JP Hwang@_jphwang·
Just saying
JP Hwang tweet media
English
0
0
1
88
JP Hwang
JP Hwang@_jphwang·
4B-class embedding quality from models 1/6th or 1/16th the size? ✅ with Distillation + LoRA task adapters ➡️ strong retrieval & multilingual performance, with Matryoshka +BQ Check out our @elastic_devs video on @JinaAI_ v5 models at 677M or 239M params youtu.be/2xBgwHW-h8M?si…
YouTube video
YouTube
JP Hwang tweet mediaJP Hwang tweet mediaJP Hwang tweet media
English
0
3
29
2.1K
JP Hwang
JP Hwang@_jphwang·
The result: Jina v5-small matches the 4B teacher on retrieval benchmarks at just 677M parameters. I cover exactly how this works, and what the Jina team found didn't work in my latest video. Link: youtu.be/2xBgwHW-h8M?si…
YouTube video
YouTube
English
0
0
1
77
JP Hwang
JP Hwang@_jphwang·
Distillation powers Jina AI's new v5 embedding models. • The student trains to minimise the gap to the teacher embeddings • A projection layer bridges the output dimensions difference • Repeat until the student has absorbed the teacher's knowledge
English
1
0
1
57
JP Hwang
JP Hwang@_jphwang·
How do you get 4-billion-parameter quality out of a model that's 6× smaller? That's knowledge distillation: • A large "teacher" model (Qwen3-Embedding-4B) and a smaller "student" model (i.e. Jina-v5-text-small or nano) process the same text pairs
JP Hwang tweet media
English
1
1
5
138
JP Hwang retweetledi
Philipp Krenn
Philipp Krenn@xeraa·
obligatory GTC keynote tweet when you make the top slides: building HNSW graphs with up to 12x the throughput and 7x faster merges on elasticsearch and NVIDIA cuVS (a GPU-accelerated library for vector search) powered by CAGRA (graph-based ANN algorithm built to run natively on GPUs) that still works with CPUs for search PS: yeah, this is a lot of acronyms
Philipp Krenn tweet mediaPhilipp Krenn tweet media
English
1
3
8
947
JP Hwang
JP Hwang@_jphwang·
I'm deeply uncomfortable with Claude Code / Cursor CLI spinning up new processes, and killing them. Is this a footgun? Does this ever go wrong for anyone? Like, have you had your agents kill the wrong process with the wrong PID?
English
2
0
1
224
JP Hwang retweetledi
Philipp Krenn
Philipp Krenn@xeraa·
principle of floors and ceilings for agents: how to deal with ambiguous and predictable questions. and how not to over-optimize for one over the other because you'll miss out on the full space of tasks
Philipp Krenn tweet media
Leonie@helloiamleonie

I’ve nagged our engineers to tell me all their secrets and distilled them into 7 principles for building effective tools for retrieval. Agentic search seems simple: > User makes request. > Agent calls retrieval tool. > Retrieval tool responds with relevant context. > Agent responds to user. The reality is that agentic search can fail at every step after the user makes their request: > Agent answers without calling a retrieval tool. > Agent calls the wrong tool (e.g., web search vs proprietary data source) > Agent calls the right tool but search the wrong index > Agent calls the right tool for the right index but with wrong search query > Tool response blows up the context window. To overcome these challenges, at @elastic_devs we follow these principles when building tools for interacting with Elasticsearch data: 1. Building the right tools following the “low floor, high ceiling” principle 2. Adding descriptions to the metadata, so the agent can find the right index to search 3. Making sure the agent calls the right tool by careful tool naming, writing good tool descriptions, adding reasoning parameters, reinforcing instructions in the agent’s system prompt, and forcing tool usage. 4. Making sure the agent generates the right parameters by writing good parameter definitions, thinking about the number and complexity of parameters 5. Optimizing the tool responses for token efficiency and context relevance 6. Enabling self-correction through proper error handling 7. Evaluating their effectiveness. Read more: elastic.co/search-labs/bl…

English
1
4
20
2.4K
JP Hwang
JP Hwang@_jphwang·
🤔 Quite misleading take from @aarondotdev Context: - Devs w/ AI spent ~30% of total time (30mins) on setup - Productivity will compound subsequently The paper is literally about the tradeoffs of coding assistants & how to intentionally mitigate them.
aaron@aarondotdev

Anthropic themselves found that vibecoding hinders SWEs ability to read, write, debug, and understand code. not only that, but AI generated code doesn’t result in a statistically significant increase in speed don’t let your managers scare you into increased productivity. show them this paper straight from Anthropic.

English
0
0
1
191
JP Hwang
JP Hwang@_jphwang·
With the 0.5B Qwen model & the 237M Jina embedding model, **you can even build one for < 1 billion parameters.** I'm thinking of doing a video about this - would you be interested? (Give us a like or comment if yes)
English
0
0
0
46
JP Hwang
JP Hwang@_jphwang·
In fact, I was playing with the new Qwen 3.5 vision language models and the Jina AI v5 embedding models. These models will get you a pretty good multimodal RAG pipeline with fewer than 5B parameters!
English
1
0
0
64
JP Hwang
JP Hwang@_jphwang·
Each $200 Claude Code subscription might be worth... -2400% net profit on revenue!? Either: a) inference needs to get a lot cheaper, or b) we're in for a ton of pricing shock / enshittification in the near future. Local models are looking pretty enticing about now. 🧵
JP Hwang tweet mediaJP Hwang tweet mediaJP Hwang tweet media
English
1
0
2
135