Raj S 🇦🇺

2.1K posts

Raj S 🇦🇺 banner
Raj S 🇦🇺

Raj S 🇦🇺

@rajshetgar

building products & tinkering with ideas

Australia Katılım Eylül 2012
3.6K Takip Edilen616 Takipçiler
Raj S 🇦🇺
Raj S 🇦🇺@rajshetgar·
Could be related to geopolitics ! China imports roughly around 20%-30% of oil from both Iran & Venezuela, US replacing the ruler in these countries puts Chinese economy in danger, China is more worried about its local population than dealing with external. Open Source AI from China was not only a threat to US economy but also US’s capitalism. Maybe these are backdoor negotiations between China and US to keep AI closed. Beijing had to make some calls to few companies & change strategy or hold off until next time. Having lived in China and North America I can see how the above is playing out.
English
0
0
3
495
Kevin S. Xu
Kevin S. Xu@kevinsxu·
Watching Qwen team implode on Twitter is sad to see... Looks like Qwen will go the route of closed models soon AliCloud gotta make money somehow I guess... (Worth noting $BABA earnings date still not announced, more delayed than usual...)
English
13
24
358
324.5K
Raj S 🇦🇺
Raj S 🇦🇺@rajshetgar·
@AnthropicAI is still down ! maybe its an issue related to east regions? Japan, Asia, Oceania? Down for more than an hour !
English
0
0
1
24
Raj S 🇦🇺
Raj S 🇦🇺@rajshetgar·
still down ! whats happening?
English
1
0
1
26
Raj S 🇦🇺
Raj S 🇦🇺@rajshetgar·
@tonykipkemboi specialised orchestration (agent framework) will always be in-demand, generic orchestration agent frameworks will struggle. One advantage with generic agent frameworks is that its GenAI vendor neutral, so we will still have some frameworks like LangGraph for a long time.
English
0
0
0
237
Raj S 🇦🇺
Raj S 🇦🇺@rajshetgar·
@RobertJBye 1) Make (recent) chat history available offline 2) Needs a better voice interface (ex: Grok is good) 3) save/export output in multiple formats
English
0
0
4
204
Robert Bye
Robert Bye@RobertJBye·
We’re making the Claude mobile app even better, so please share your feedback! What annoys you about it? What bugs are you seeing? What features are missing?
English
612
21
950
136.6K
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
We just made paying for the Gemini API 10x easier : ) You can now upgrade to a paid Gemini API account without leaving AI Studio, track your usage, filter spend by model, and much more to come!
English
196
54
1.4K
151.8K
Raj S 🇦🇺
Raj S 🇦🇺@rajshetgar·
@TheWake @techwith_ram @iPullRank Retrieval alone does not work well as giving more options to LLM makes things worse, best is to retrieve top-3 & re-rank to extract top-1 only, LLM works best with less options. Setting Temperature to slightly lowest level helps (never the lowest).
English
1
0
0
19
Raj S 🇦🇺
Raj S 🇦🇺@rajshetgar·
one of the easiest way to improve overall output quality of LLMs, do a forward pass verification of output and then do a reverse pass verification of output.
English
0
0
2
41
Raj S 🇦🇺 retweetledi
Claude
Claude@claudeai·
Your work tools are now interactive in Claude. Draft Slack messages, visualize ideas as Figma diagrams, or build and see Asana timelines.
English
495
1.1K
15.7K
7.5M
Raj S 🇦🇺
Raj S 🇦🇺@rajshetgar·
Voice Agents don’t have to be super intelligent across domains. They have to be intelligent at one particular domain, accurate in their responses , have to be fast & cost less. Two solutions we have now are: STT-LLM-TTS vs. Speech-To-Speech. Speech-To-Speech is the best option because of their low latency & lower TCO Abut with the available options it’s hard to customise them for domain knowledge (accurate responses). STT-LLM-TTS stands out best so far for accurate responses as one can customise using RAG. One challenge is the latency it brings when brining together the best of the best across STT, LLM & TTS. It would be great if some STT provider can provide an option to embed RAG into their system using simple system (FAISS), if this is possible then along with the original STT output we can add RAG output and send it directly to LLM. This would save around 50ms-100ms in the pipeline. Wonder why STT providers don’t have this feature? STT providers can just expose an API end point to upload the curated domain knowledge in simple two columns format (questions, answers) columns. When audio is transcribed the STT can use RAG to match closest question & pick up the corresponding answer and package everything. So now the output STT will be original STT output along with say top 3 answers from the RAG. All this goes straight into LLM and LLM can pick the best answer quickly and send it to TTS.
English
0
0
0
61
Karan Goel
Karan Goel@krandiash·
I personally subscribe to the idea that in the near-term model systems will be built with 2 tiers of models. I like to think of these 2 tiers as whales and dolphins (I'm sure there's a better analogy...). Whales are giant models that run deep inside the data center. They're slow, use massive compute resources and solve hard problems. They can access and use specialized knowledge and execute long-running workflows. Dolphins run on the user <-> system surface. Their job is to directly interface with humans, collaborate, strategize, carry context, communicate effectively and generally keep humans happy and satisfied. They are good at summarizing information, they are clever at using tools and harnessing compute-intensive whales to get things done. Dolphins need to be fast, have lower power usage, have the option to run on-device or on edge compute, and otherwise must be capable of being run all the time. Most of the models we have today are whale-ish. Dolphin models are basically non-existent today (not fast enough, not enough context, use too much energy, not multimodal enough, can't interact very effectively with humans, and small models aren't smart enough). Dolphins offloading work onto whales is similar to humans offloading reasoning, using tools and databases -- computers, notepads, etc etc. All of this is about the relative intelligence of these two kinds of models and where they might sit. It would be a mistake to assume that dolphin models will be unintelligent (absolute sense), they will be much smarter than today's frontier models. (So yes voice LMs should know what to say.) To build a single model that can do the job of both, you would need a very big energy source and some pretty big advancements in accelerators and model architectures so everything can be done on your person. That would also change my thinking about this by a lot (and I would be influenced by some subset of those things happening). (We're working on the dolphins.)
Vinod Khosla@vkhosla

But the voice LLM still has to call a large LLM to have the intelligence to k ow what to say.

English
20
11
198
26.1K
Raj S 🇦🇺
Raj S 🇦🇺@rajshetgar·
@vkhosla For Speech to speech models the best approach is fine tuning vs. STT-LLM-TTS models the best approach is RAG. Very few speech-to-speech models available, best OSS is Moshi from Kyutai. PersonaPlex is built on top of Moshi.
English
0
0
4
701
Raj S 🇦🇺
Raj S 🇦🇺@rajshetgar·
@rohanpaul_ai @drfeifei Language has condensed intelligence about the world. The problem right now is that we are using language as input into LLMs to extract intelligence, instead of language we need to ise sensor data as input and for sure we will get an intelligent output. Sensors Is All You Need.
English
0
0
1
106
Rohan Paul
Rohan Paul@rohanpaul_ai·
Dr Fei-Fei Li (@drfeifei) on limitations of LLMs. 🎯 "Language is purely generated signal. You don't go out in nature & there's words written in the sky for you. There is a 3D world that follows laws of physics" The world model wave’s is about to start
Rohan Paul@rohanpaul_ai

Fei-Fei Li’s World Labs reportedly raising funding at $5B valuation, per Bloomberg. Some reports saying they will raise upto $500M. If it happens, the round would reprice World Labs from about $1B in 2024, when it raised $230M coming out of stealth. The bet is that “world models” can generate editable 3D environments that other software can build on, not just flat images or text. Earlier 3D pipelines usually start from hand-built polygon meshes, where scenes are made from lots of tiny triangles and then rendered. World Lab's Marble uses 3D Gaussian splatting (3DGS), which represents a scene as millions of semi-transparent points that can render with higher visual detail. It also outputs “collider meshes,” which are lower-detail shapes that trade looks for speed in physics and robotics simulation. Marble’s Chisel tool lets users block out objects from simple shapes and then generate styled variants, which is a step toward controllable world building. World Labs also just opened a World API so developers can generate explorable 3D worlds from text, images, and video inside apps. --- bloomberg .com/news/articles/2026-01-23/fei-fei-li-s-ai-startup-world-labs-in-funding-talks-at-5-billion-valuation

English
65
240
1.8K
243K
Raj S 🇦🇺
Raj S 🇦🇺@rajshetgar·
@deviparikh All the best, distribution is hard. What are the top 3 main reasons why are users not converting from free tier to paid tier?
English
0
0
0
878
Devi Parikh
Devi Parikh@deviparikh·
There's something I want to share
English
25
15
290
36.4K
Raj S 🇦🇺
Raj S 🇦🇺@rajshetgar·
@Yuchenj_UW He is wrong as Claude is not single domain, it’s multi. It’s using code and can be applied to any domain. He is correct about large language models, we need a model that understands physics of the world.
English
0
0
1
990
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
Yann is actually right and based here.
Yuchen Jin tweet media
English
185
337
6.9K
351.1K
Gradium
Gradium@GradiumAI·
We made the best voice cloning in the industry, capturing accents, prosody, and identity perfectly. You don’t have to trust us: run the blind test, tweak intensity with classifier-free guidance, and start building with our API today: gradium.ai/blog/voice-clo…
Gradium tweet media
English
10
33
283
36.4K
Raj S 🇦🇺 retweetledi
Qwen
Qwen@Alibaba_Qwen·
Qwen3-TTS is officially live. We’ve open-sourced the full family—VoiceDesign, CustomVoice, and Base—bringing high quality to the open community. - 5 models (0.6B & 1.8B) - Free-form voice design & cloning - Support for 10 languages - SOTA 12Hz tokenizer for high compression - Full fine-tuning support - SOTA performance We believe this is arguably the most disruptive release in open-source TTS yet. Go ahead, break it and build something cool. 🚀 Everything is out now—weights, code, and paper. Enjoy. 🧵 Github: github.com/QwenLM/Qwen3-T… Hugging Face: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw… Blog: qwen.ai/blog?id=qwen3t… Paper: github.com/QwenLM/Qwen3-T… Hugging Face Demo: huggingface.co/spaces/Qwen/Qw… ModelScope Demo: modelscope.cn/studios/Qwen/Q… API: alibabacloud.com/help/en/model-…
Qwen tweet media
English
189
869
6.1K
890.3K