djhardcore

146 posts

djhardcore banner
djhardcore

djhardcore

@djhardcore007

Voice AI @TechAtBloomberg @NYU_Courant | Advancing real-time TTS/ASR, emotional agents, and open-source speech tech

Katılım Ocak 2020
1.9K Takip Edilen173 Takipçiler
Sabitlenmiş Tweet
djhardcore
djhardcore@djhardcore007·
Most voice assistants remain half-duplex: rigid, turn-based, inefficient. Full-duplex ASR enables simultaneous listening and speaking—structurally superior for natural interaction. Core concept explained. #VoiceAI #ASR
English
2
0
3
292
djhardcore
djhardcore@djhardcore007·
Seeing k-dense.ai mass-produce ML research scared the shit out of me. I do think that even though AI can write and conduct research far better than most human researchers, research still matters—because reality still resists us. Books still matter because people still want compressed wisdom, perspective, and meaning. So the question becomes less “Why write?” and more “What can I produce that is not just more text?”
English
0
0
0
34
djhardcore
djhardcore@djhardcore007·
Real-time Voice UX Stack •VAD (voice activity detection) •Streaming ASR •Turn classifier •Interrupt handler •TTS with low startup latency •Audio buffering control
English
0
0
0
50
djhardcore
djhardcore@djhardcore007·
How to reduce LLM inference cost: Prompting - Use code (tools / skills) whenever possible - Add a router: default to small, cheap models (e.g. Qwen) - Cache prompts, results, and RAG outputs - Keep context short; use summaries - Limit retries and agent loops If you self-host LLMs - Batch inference and constrain decoding - Quantize models when possible
English
0
0
2
83
djhardcore
djhardcore@djhardcore007·
@openclaw OS-level agency is the future! 1. persistent local memory, not sessions. 2. apps stop being destinations and become workflows executing intent. Agents abstract tools and optimize execution. 3. financial access must be guard-railed via limited, programmable wallets.
English
0
0
0
19
djhardcore
djhardcore@djhardcore007·
Vertical AI / physical AI companies are, in practice, data companies.
Packy McCormick@packyM

Investors are betting billions of dollars that robotics will experience a Giant Leap. Meaning: robots are not useful today, but throw enough GPUs, models, data, and PhDs at the problem, and you’ll cross some threshold on the other side of which you will meet robots that can walk into any room and do whatever they’re told. The Giant Leap view is sexy. It holds the promise of a totally unbounded market – labor today is a ~$25 trillion market, constrained by the cost and unreliability of humans; if robots become cheap, general, and autonomous, the argument goes that you get Jevons Paradox for labor - available to whichever team of geniuses in a garage produces the big breakthrough first. This is the type of innovation that Silicon Valley loves. Brilliant minds love opportunities where success is just a brilliant idea away. My friend @evanbeard is betting that progress will happen by climbing the gradient of variability. That robotics will progress towards general usefulness in small steps. The logic is clear: - Robotics is bottlenecked on data. - The best data is the data your robots collect actually doing things. - The best strategy, then, even if it's not the sexiest, is to get paid to collect that data, learn, and iterate. This is where the vast majority of value lies, and the real path to our abundant robotic future. For the first co-written essay in not boring world, Evan and I write about the robots.

Română
0
3
5
3.6K
djhardcore
djhardcore@djhardcore007·
Vibe coding feels amazing right up until you run into maintenance issues. Bills come due.
English
0
0
0
63
djhardcore
djhardcore@djhardcore007·
Questioning is thinking. People don’t question because they r lost. People who treat questioning as an attack do so because their certainty is fragile.
English
0
0
0
56
djhardcore
djhardcore@djhardcore007·
For specific domain, like finance, perception matters: 1. Build a hard set (numbers, tickers, acronyms, names) 2. Synthesize all samples 3. Run ASR → WER (overall + hard set) 4. Compute Entity Accuracy (non-negotiable) 5. Human MOS + trust check. 6. Verify RTF, latency, failure rate
English
0
0
0
46
djhardcore
djhardcore@djhardcore007·
Questions to answer: 1. does it sound human? 2. can users trust what they hear? 3. accent/voice/style: does it sound like the right person? 4. prosody diagnostics. mainly for debugging. 5. latency. shipping constraint.
English
1
0
0
137
djhardcore
djhardcore@djhardcore007·
Modern TTS is neural sequence modeling: pronunciation, timing, prosody, voice identity. Evaluation = perception + correctness + deployability. Minimal scorecard that matters: 1.MOS 2.WER (overall + hard set) 3.Entity accuracy 4.RTF + latency 5.Failure rate
English
1
0
0
66
djhardcore
djhardcore@djhardcore007·
5/ Combining Skills with MCP? MCP grants standardized real-time access (repos, Slack, databases). Skills dictate precise logic. Integrated: agents process live data via controlled, repeatable frameworks. This is the optimal config. #MCP #ClaudeSkills #AgenticAI
English
0
0
0
53
djhardcore
djhardcore@djhardcore007·
4/ Skills vs. Tool Calls - Tool calls retrieve or act on real-time data. - Skills provide orchestration: deciding when, why, and how those tools are used. Tools handle execution. Skills handle control flow. Production agents need both.
English
1
0
0
51
djhardcore
djhardcore@djhardcore007·
Built @claudeai Agent Skills hands-on. My notes: – What Agent Skills are – How to build them – Skills vs. prompting – Skills vs. tool calls – Using Skills with MCP #ClaudeSkills #AIAgents #BuildAI
English
2
0
1
102