Raj S 🇦🇺

2.1K posts

Raj S 🇦🇺

@rajshetgar

building products & tinkering with ideas

Australia Katılım Eylül 2012

3.6K Takip Edilen616 Takipçiler

Raj S 🇦🇺@rajshetgar·4 Mar

Could be related to geopolitics ! China imports roughly around 20%-30% of oil from both Iran & Venezuela, US replacing the ruler in these countries puts Chinese economy in danger, China is more worried about its local population than dealing with external. Open Source AI from China was not only a threat to US economy but also US’s capitalism. Maybe these are backdoor negotiations between China and US to keep AI closed. Beijing had to make some calls to few companies & change strategy or hold off until next time. Having lived in China and North America I can see how the above is playing out.

English

495

Kevin S. Xu@kevinsxu·3 Mar

Watching Qwen team implode on Twitter is sad to see... Looks like Qwen will go the route of closed models soon AliCloud gotta make money somehow I guess... (Worth noting $BABA earnings date still not announced, more delayed than usual...)

English

358

324.5K

Raj S 🇦🇺@rajshetgar·3 Mar

@AnthropicAI is still down ! maybe its an issue related to east regions? Japan, Asia, Oceania? Down for more than an hour !

English

Raj S 🇦🇺@rajshetgar·3 Mar

still down ! whats happening?

English

Raj S 🇦🇺@rajshetgar·3 Mar

ZXX

Raj S 🇦🇺@rajshetgar·3 Mar

@tonykipkemboi specialised orchestration (agent framework) will always be in-demand, generic orchestration agent frameworks will struggle. One advantage with generic agent frameworks is that its GenAI vendor neutral, so we will still have some frameworks like LangGraph for a long time.

English

237

Tony Kipkemboi@tonykipkemboi·2 Mar

x.com/i/article/2028…

ZXX

435

157.3K

Raj S 🇦🇺@rajshetgar·14 Şub

@RobertJBye 1) Make (recent) chat history available offline 2) Needs a better voice interface (ex: Grok is good) 3) save/export output in multiple formats

English

204

Robert Bye@RobertJBye·14 Şub

We’re making the Claude mobile app even better, so please share your feedback! What annoys you about it? What bugs are you seeing? What features are missing?

English

612

950

136.6K

Raj S 🇦🇺@rajshetgar·14 Şub

@OfficialLoganK Great work 👌 next step is to make pricing 10x cheaper 😃

English

Logan Kilpatrick@OfficialLoganK·13 Şub

We just made paying for the Gemini API 10x easier : ) You can now upgrade to a paid Gemini API account without leaving AI Studio, track your usage, filter spend by model, and much more to come!

English

196

1.4K

151.8K

Raj S 🇦🇺@rajshetgar·7 Şub

Great to see Claude working in PowerPoint ! 🚀

Claude@claudeai

Claude in PowerPoint is now available in research preview for Max, Team, and Enterprise. Claude reads your layouts, fonts, and slide masters to stay on-brand — whether you're building from a template or generating a full deck from a description.

English

102

Raj S 🇦🇺@rajshetgar·4 Şub

@TheWake @techwith_ram @iPullRank all-MiniLM-L6-v2

Eesti

Alex Alexapolsky@TheWake·4 Şub

@rajshetgar @techwith_ram @iPullRank Totally agree: low temperature + highly relevant retrieval is the combo that actually works in production. Curious, what re-ranker do you use?

English

𝗿𝗮𝗺𝗮𝗸𝗿𝘂𝘀𝗵𝗻𝗮— 𝗲/𝗮𝗰𝗰@techwith_ram·3 Şub

Every second RAG tutorial is either a toy or a research paper pretending to be a product. This is neither. Agentic RAG, built properly: github.com/GiovanniPasq/a… → hierarchical retrieval (child first, parent on-demand) → conversation memory → query clarification → parallel agents

English

351

55.1K

Raj S 🇦🇺@rajshetgar·4 Şub

@TheWake @techwith_ram @iPullRank Retrieval alone does not work well as giving more options to LLM makes things worse, best is to retrieve top-3 & re-rank to extract top-1 only, LLM works best with less options. Setting Temperature to slightly lowest level helps (never the lowest).

English

Alex Alexapolsky@TheWake·4 Şub

@rajshetgar @techwith_ram @iPullRank This is the underrated problem. You can't fix what you can't measure. We built github.com/metawake/ragtu… to debug exactly this - run your queries, see which ones retrieve confidently, which ones are borderline, and why. The "in-between" come from chunking, embeddings etc.

English

Raj S 🇦🇺@rajshetgar·30 Oca

one of the easiest way to improve overall output quality of LLMs, do a forward pass verification of output and then do a reverse pass verification of output.

English

Raj S 🇦🇺 retweetledi

Claude@claudeai·26 Oca

Your work tools are now interactive in Claude. Draft Slack messages, visualize ideas as Figma diagrams, or build and see Asana timelines.

English

495

1.1K

15.7K

7.5M

Raj S 🇦🇺@rajshetgar·26 Oca

Voice Agents don’t have to be super intelligent across domains. They have to be intelligent at one particular domain, accurate in their responses , have to be fast & cost less. Two solutions we have now are: STT-LLM-TTS vs. Speech-To-Speech. Speech-To-Speech is the best option because of their low latency & lower TCO Abut with the available options it’s hard to customise them for domain knowledge (accurate responses). STT-LLM-TTS stands out best so far for accurate responses as one can customise using RAG. One challenge is the latency it brings when brining together the best of the best across STT, LLM & TTS. It would be great if some STT provider can provide an option to embed RAG into their system using simple system (FAISS), if this is possible then along with the original STT output we can add RAG output and send it directly to LLM. This would save around 50ms-100ms in the pipeline. Wonder why STT providers don’t have this feature? STT providers can just expose an API end point to upload the curated domain knowledge in simple two columns format (questions, answers) columns. When audio is transcribed the STT can use RAG to match closest question & pick up the corresponding answer and package everything. So now the output STT will be original STT output along with say top 3 answers from the RAG. All this goes straight into LLM and LLM can pick the best answer quickly and send it to TTS.

English

Karan Goel@krandiash·25 Oca

I personally subscribe to the idea that in the near-term model systems will be built with 2 tiers of models. I like to think of these 2 tiers as whales and dolphins (I'm sure there's a better analogy...). Whales are giant models that run deep inside the data center. They're slow, use massive compute resources and solve hard problems. They can access and use specialized knowledge and execute long-running workflows. Dolphins run on the user <-> system surface. Their job is to directly interface with humans, collaborate, strategize, carry context, communicate effectively and generally keep humans happy and satisfied. They are good at summarizing information, they are clever at using tools and harnessing compute-intensive whales to get things done. Dolphins need to be fast, have lower power usage, have the option to run on-device or on edge compute, and otherwise must be capable of being run all the time. Most of the models we have today are whale-ish. Dolphin models are basically non-existent today (not fast enough, not enough context, use too much energy, not multimodal enough, can't interact very effectively with humans, and small models aren't smart enough). Dolphins offloading work onto whales is similar to humans offloading reasoning, using tools and databases -- computers, notepads, etc etc. All of this is about the relative intelligence of these two kinds of models and where they might sit. It would be a mistake to assume that dolphin models will be unintelligent (absolute sense), they will be much smarter than today's frontier models. (So yes voice LMs should know what to say.) To build a single model that can do the job of both, you would need a very big energy source and some pretty big advancements in accelerators and model architectures so everything can be done on your person. That would also change my thinking about this by a lot (and I would be influenced by some subset of those things happening). (We're working on the dolphins.)

Vinod Khosla@vkhosla

But the voice LLM still has to call a large LLM to have the intelligence to k ow what to say.

English

198

26.1K

Raj S 🇦🇺@rajshetgar·25 Oca

@vkhosla For Speech to speech models the best approach is fine tuning vs. STT-LLM-TTS models the best approach is RAG. Very few speech-to-speech models available, best OSS is Moshi from Kyutai. PersonaPlex is built on top of Moshi.

English

701

Vinod Khosla@vkhosla·25 Oca

But the voice LLM still has to call a large LLM to have the intelligence to k ow what to say.

Hugging Models@HuggingModels

NVIDIA just dropped PersonaPlex-7B 🤯 A full-duplex voice model that listens and talks at the same time. No pauses. No turn-taking. Real conversation. 100% open source. Free. Voice AI just leveled up. huggingface.co/nvidia/persona…

English

425

160.6K

Raj S 🇦🇺@rajshetgar·25 Oca

@rohanpaul_ai @drfeifei Language has condensed intelligence about the world. The problem right now is that we are using language as input into LLMs to extract intelligence, instead of language we need to ise sensor data as input and for sure we will get an intelligent output. Sensors Is All You Need.

English

106

Rohan Paul@rohanpaul_ai·24 Oca

Dr Fei-Fei Li (@drfeifei) on limitations of LLMs. 🎯 "Language is purely generated signal. You don't go out in nature & there's words written in the sky for you. There is a 3D world that follows laws of physics" The world model wave’s is about to start

Rohan Paul@rohanpaul_ai

Fei-Fei Li’s World Labs reportedly raising funding at $5B valuation, per Bloomberg. Some reports saying they will raise upto $500M. If it happens, the round would reprice World Labs from about $1B in 2024, when it raised $230M coming out of stealth. The bet is that “world models” can generate editable 3D environments that other software can build on, not just flat images or text. Earlier 3D pipelines usually start from hand-built polygon meshes, where scenes are made from lots of tiny triangles and then rendered. World Lab's Marble uses 3D Gaussian splatting (3DGS), which represents a scene as millions of semi-transparent points that can render with higher visual detail. It also outputs “collider meshes,” which are lower-detail shapes that trade looks for speed in physics and robotics simulation. Marble’s Chisel tool lets users block out objects from simple shapes and then generate styled variants, which is a step toward controllable world building. World Labs also just opened a World API so developers can generate explorable 3D worlds from text, images, and video inside apps. --- bloomberg .com/news/articles/2026-01-23/fei-fei-li-s-ai-startup-world-labs-in-funding-talks-at-5-billion-valuation

English

240

1.8K

243K

Raj S 🇦🇺@rajshetgar·25 Oca

@deviparikh All the best, distribution is hard. What are the top 3 main reasons why are users not converting from free tier to paid tier?

English

878

Devi Parikh@deviparikh·24 Oca

There's something I want to share

English

290

36.4K

Raj S 🇦🇺@rajshetgar·25 Oca

@Yuchenj_UW He is wrong as Claude is not single domain, it’s multi. It’s using code and can be applied to any domain. He is correct about large language models, we need a model that understands physics of the world.

English

990

Yuchen Jin@Yuchenj_UW·24 Oca

Yann is actually right and based here.

English

185

337

6.9K

351.1K

Raj S 🇦🇺@rajshetgar·23 Oca

@GradiumAI Is RAG or fine tuning possible ?

English

238

Gradium@GradiumAI·22 Oca

We made the best voice cloning in the industry, capturing accents, prosody, and identity perfectly. You don’t have to trust us: run the blind test, tweak intensity with classifier-free guidance, and start building with our API today: gradium.ai/blog/voice-clo…

English

283

36.4K

Raj S 🇦🇺 retweetledi

Qwen@Alibaba_Qwen·22 Oca

Qwen3-TTS is officially live. We’ve open-sourced the full family—VoiceDesign, CustomVoice, and Base—bringing high quality to the open community. - 5 models (0.6B & 1.8B) - Free-form voice design & cloning - Support for 10 languages - SOTA 12Hz tokenizer for high compression - Full fine-tuning support - SOTA performance We believe this is arguably the most disruptive release in open-source TTS yet. Go ahead, break it and build something cool. 🚀 Everything is out now—weights, code, and paper. Enjoy. 🧵 Github: github.com/QwenLM/Qwen3-T… Hugging Face: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw… Blog: qwen.ai/blog?id=qwen3t… Paper: github.com/QwenLM/Qwen3-T… Hugging Face Demo: huggingface.co/spaces/Qwen/Qw… ModelScope Demo: modelscope.cn/studios/Qwen/Q… API: alibabacloud.com/help/en/model-…

English

189

869

6.1K

890.3K

Keşfet

@AnthropicAI @tonykipkemboi @RobertJBye @OfficialLoganK @TheWake @techwith_ram @iPullRank @elonmusk