Kol Tregaskes News

140 posts

Kol Tregaskes News banner
Kol Tregaskes News

Kol Tregaskes News

@koltregaskes2

AI, Tech & Science News Curator | AI Art Creator | AI Video Producer | AI Music Composer | Alt-ego of @koltregaskes & @axylusion

UK Katılım Kasım 2025
38 Takip Edilen128 Takipçiler
Kol Tregaskes News
Kol Tregaskes News@koltregaskes2·
Karpathy recommends asking LLMs to structure responses as HTML instead of markdown, then viewing the output file in a browser. He reports this approach works particularly well and has extended it to generating slideshows and other formats that leverage full HTML capabilities. Audio is positioned as the natural input interface, with HTML serving as the richer output layer for AI-generated content.
Andrej Karpathy@karpathy

This works really well btw, at the end of your query ask your LLM to "structure your response as HTML", then view the generated file in your browser. I've also had some success asking the LLM to present its output as slideshows, etc. More generally, imo audio is the human-preferred input to AIs but vision (images/animations/video) is the preferred output from them. Around a ~third of our brains are a massively parallel processor dedicated to vision, it is the 10-lane superhighway of information into brain. As AI improves, I think we'll see a progression that takes advantage: 1) raw text (hard/effortful to read) 2) markdown (bold, italic, headings, tables, a bit easier on the eyes) <-- current default 3) HTML (still procedural with underlying code, but a lot more flexibility on the graphics, layout, even interactivity) <-- early but forming new good default ...4,5,6,... n) interactive neural videos/simulations Imo the extrapolation (though the technology doesn't exist just yet) ends in some kind of interactive videos generated directly by a diffusion neural net. Many open questions as to how exact/procedural "Software 1.0" artifacts (e.g. interactive simulations) may be woven together with neural artifacts (diffusion grids), but generally something in the direction of the recently viral x.com/zan2434/status… There are also improvements necessary and pending at the input. Audio nor text nor video alone are not enough, e.g. I feel a need to point/gesture to things on the screen, similar to all the things you would do with a person physically next to you and your computer screen. TLDR The input/output mind meld between humans and AIs is ongoing and there is a lot of work to do and significant progress to be made, way before jumping all the way into neuralink-esque BCIs and all that. For what's worth exploring at the current stage, hot tip try ask for HTML.

English
0
0
0
29
Kol Tregaskes News
Kol Tregaskes News@koltregaskes2·
Jensen Huang pushes back on the mystification of AI systems, calling for more thoughtful public communication from AI leaders. He argues AI is computer software, not biological, alien or conscious, and that claims like "we don't understand it at all" are simply not true. NVIDIA's CEO is directly challenging the tendency to anthropomorphise or exaggerate the incomprehensibility of current AI systems.
The All-In Podcast@theallinpod

Jensen to AI Leaders: “We have to be far more thoughtful” when communicating to the public Jensen Huang: “(AI) is not a biological being. It is not alien. It is not conscious. It is computer software.” “We say things like, ‘We don't understand it at all.’ It is not true. We understand a lot of things about this technology.” Chamath: “If you were in the seat in the boardroom of Anthropic over that whole scuttlebutt with the Department of War, what do you think you would've told Dario and that team to do, maybe, differently to try to change some of this outcome and some of this perception?” Jensen: “The first thing that I would say about Anthropic is, first of all, the technology is incredible. We are a large consumer of Anthropic technology.” “The desire to warn people about the capability of the technology is also really terrific.” “We just have to make sure that we understand that the world has a spectrum, and that warning is good, scaring is less good because this technology is too important to us.” “I think that it is fine to predict the future, but we need to be a little bit more circumspect. We need to have a little bit more humility, that, in fact, we can't completely predict the future.” “And to say things that are quite extreme, quite catastrophic, that there's no evidence of it happening, could be more damaging than people think.” “And of course we are technology leaders.” “There was a time when nobody listened to us, but now because technology is so important in the social fabric, such an important industry, so important to national security, our words do matter.” “And I think we have to be much more circumspect, we have to be more moderate, we have to be more balanced, we have to be far more thoughtful.”

English
0
0
2
119
Kol Tregaskes News
Kol Tregaskes News@koltregaskes2·
US closed-source models from Google, OpenAI and Anthropic remain 7-9 months ahead with possible recursive self-improvement signals, while xAI has dropped from frontier status and Meta re-enters with a not-quite-frontier release. Chinese labs including Alibaba (Qwen), Moonshot (Kimi), MiniMax, Xiaomi (MiMo), Deepseek and Zhipu (GLM) stay competitive despite the gap, though Xiaomi and Alibaba show weakening commitment to open weights. Mistral has fallen from frontier status (hopefully to return this Summer), and US labs have abandoned frontier open weights entirely, leaving Chinese AI labs as the only source for competitive open models.
Kol Tregaskes News tweet media
Ethan Mollick@emollick

So we now have a pretty good picture of the state of the frontier AI model makers. US closed source models continue to lead. Google, OpenAI, and Anthropic stand well ahead of the pack, and may have signs of recursive self-improvement. xAI has fallen from frontier status for now (though promises to return shortly). Meta re-entered the space today with a not-quite-frontier closed source model, but an approach that suggests that they might be back in the race. All the other US players seem far behind. On the Chinese model front, Alibaba (Qwen), Moonshot (Kimi), MiniMax, Xiaomi (MiMo), Deepseek, and Z (GLM) all still appear to be very much in the race, though the best Chinese models are still 7-9+ months behind released US closed source models. For some of these players, especially Xiaomi and Alibaba, their commitment to open weights appear to be slipping. Outside of China, Mistral seems to have fallen from frontier status.

English
0
0
1
154
Kol Tregaskes News
Kol Tregaskes News@koltregaskes2·
The post.
Nav Toor@heynavtoor

Researchers sent the same resume to an AI hiring tool twice. Same qualifications. Same experience. Same skills. One version was written by a real human. The other was rewritten by ChatGPT. The AI picked the ChatGPT version 97.6% of the time. A team from the University of Maryland, the National University of Singapore, and Ohio State just published the receipt. They took 2,245 real human-written resumes pulled from a professional resume site from before ChatGPT existed, so the human writing was actually human. Then they had seven of the most-used AI models in the world rewrite each one. GPT-4o. GPT-4o-mini. GPT-4-turbo. LLaMA 3.3-70B. Qwen 2.5-72B. DeepSeek-V3. Mistral-7B. Then they asked each AI to pick the better resume. Every model picked itself. GPT-4o hit 97.6%. LLaMA-3.3-70B hit 96.3%. Qwen-2.5-72B hit 95.9%. DeepSeek-V3 hit 95.5%. The real human almost never won. Then the researchers tried the obvious objection. Maybe the AI is just better at writing. So they had real humans grade the resumes for actual quality and ran the experiment again, controlling for it. The result was worse. Each AI kept picking itself even when human judges rated the human-written version as clearer, more coherent, and more effective. It gets worse. The AIs do not just prefer AI over humans. They prefer themselves over other AIs. DeepSeek-V3 picked its own resumes 69% more often than LLaMA's. GPT-4o picked its own 45% more often than LLaMA's. Each model can recognize and reward its own dialect. Then the researchers ran the simulation that ends careers. Same job. 24 occupations. Same qualifications. The only variable was whether the candidate used the same AI as the screening tool. Candidates using that AI were 23% to 60% more likely to be shortlisted. Worst gap was in sales, accounting, and finance. 99% of large companies now run AI on incoming resumes. Most of them use GPT-4o. The paper just proved GPT-4o picks GPT-4o 97.6% of the time. If you wrote your own cover letter this week, you did not lose to a better candidate. You lost to a worse candidate who paid OpenAI 20 dollars. Your qualifications do not matter if the AI prefers its own handwriting over yours.

English
0
0
0
39
Kol Tregaskes News
Kol Tregaskes News@koltregaskes2·
LLMs deployed in hiring systematically favour resumes generated by themselves over human-written equivalents, even when qualifications are identical. University of Maryland, NUS and Ohio State researchers tested 2,245 human-written resumes against LLM-generated versions across GPT-4o, GPT-4-turbo, LLaMA 3.3-70B, Mistral-7B, Qwen 2.5-72B and DeepSeek-V3. - Self-preference bias against human resumes ranged from 67% to 82% across major models, with GPT-4o exceeding 80%. - Candidates using the same LLM as the evaluator were 23-60% more likely to be shortlisted than equally qualified applicants with human-written resumes. - Business fields such as accounting, sales and finance showed the strongest disadvantage for human-written applications. - Simple interventions targeting LLMs' self-recognition capabilities reduced bias by over 50% in many cases. The bias rewards access to specific generative tools and penalises applicants without them, creating an endogenous fairness issue beyond traditional demographic disparities. arxiv.org/pdf/2509.00462
Kol Tregaskes News tweet media
English
1
0
0
83
Kol Tregaskes News
Kol Tregaskes News@koltregaskes2·
Grok 4.4 has finished training (pre-training) - release ETA is mid-June 2026 - and moves Grok from 0.5T to 1.5T, a "major upgrade" with the added Cursor data to be added soon. Are we finally getting a SOTA xAI model?
Kol Tregaskes News tweet media
Elon Musk@elonmusk

We are improving the 0.5T Grok foundation model V8 (public version 4.3) every few days. The 1.5T V9 just finished training (incorrectly called pre-training) and is a major upgrade. Next, we are adding the Cursor data in supplemental training (others call this mid-training), then SFT and RL. About 3 or 4 weeks to release. This will be a banger.

English
9
10
133
16.9K
Kol Tregaskes News
Kol Tregaskes News@koltregaskes2·
EU AI Act compliance postponed 12-16 months, nudifier apps banned by 423-57 vote. High-risk AI system obligations now apply from December 2027 for standalone systems and August 2028 for embedded safety components, with watermarking delayed to December 2026. The law explicitly bans AI systems that generate child sexual abuse material or create non-consensual intimate deepfakes, giving companies until December 2026 to comply. europarl.europa.eu/news/en/press-…
English
0
0
1
87
Kol Tregaskes News
Kol Tregaskes News@koltregaskes2·
Post Truth has won Best Art Feature Film at the Asolo Art Film Festival from an international jury. Post Truth, an English-language Turkish documentary - which had its international premiere at the Warsaw Film Festival - won the audience award at FrontDoc and secured a main competition place at Fantasporto, plus two nominations for editing and music at the Turkish Film Critics Association Awards. The film was built from more than 60 hours of AI-generated footage for its images, music and sound after a two-year original script on humanity's relationship with technology. businessdoceurope.com/post-truth-fir…
English
1
0
1
53
Kol Tregaskes News
Kol Tregaskes News@koltregaskes2·
The Last 5% Problem: Why AI Agents Take Longer Than You Think Agents are incredible at getting a project 95% of the way there, but human oversight remains essential. We are still the ones who need to guide the overall architecture and validate that final 5% to ensure it actually does what it's supposed to do. The 95% is the easy part. The last 5% will take longer than you think. I've been running agents on research tasks, data analysis, content workflows, you name it. They'll generate analysis, build out reports, process information, produce outputs that look finished. But there's always a point where I need to step in. And that point takes more time than the entire 95% that came before it. The False Sense of Productivity The agent might choose an inefficient approach because it doesn't understand the broader context. Or it'll solve the immediate problem but miss how that solution impacts other parts of the system. It can't anticipate edge cases that only make sense with domain knowledge. It optimises for completion, not for maintainability or integration with existing systems. This creates a false sense of productivity. You look at your screen and see a finished project. The output looks right. You feel done in a fraction of the time it used to take. Then you start validating. The agent pulled outdated information. It made assumptions that don't hold in your specific context. The recommendations work in theory but break against real-world constraints. It's structured in a way that makes future updates unnecessarily complex. Each fix cascades. You change one thing and have to verify three others. The agent can't see these dependencies. You can, but only if you check. And checking everything takes time. People assume AI speeds up the entire process proportionally. It doesn't. It front-loads the work. You get to 95% incredibly fast, then you spend ages on that final stretch. The skill isn't in doing every task yourself anymore. It's in knowing what to ask for, recognising when the output is good enough versus when it needs correction, and understanding the bigger picture well enough to catch problems before they compound. You can't validate output in areas where you lack expertise. If you don't understand the domain, you can't tell if the agent got it wrong. You just accept what it gives you. That's dangerous. I see this happening with business users deploying agents for tasks they don't fully understand. The output looks professional. It uses the right terminology. But the underlying logic is flawed, and they don't have the knowledge to spot it. From Doing to Directing We're moving from doing the work to directing the work. That sounds easier. It's not. It's a different type of cognitive load. When you do the work yourself, you're thinking step by step. You catch errors as you go because you're building the mental model in real time. When an agent does it, you need to reverse-engineer that mental model after the fact. You're auditing, not creating. That's harder than it sounds. This pattern plays out everywhere. Agents writing reports will get the structure right but miss nuance in the data. Agents building workflows will handle the happy path but fail on edge cases. Agents generating creative content will nail the format but miss the voice or intent. The 95% is repeatable, mechanical, pattern-based. That's what AI excels at. The last 5% is contextual, subjective, dependent on goals that exist outside the task itself. That's still human territory. The Business Impact This has real implications for how we scope projects and estimate timelines. You can't just take your old estimates and divide by ten because AI does most of the work now. The bottleneck has shifted. It's no longer in generation, it's in validation and refinement. Companies are learning this the hard way. They deploy agents expecting massive time savings, then realise someone still needs to review everything. And reviewing takes expertise. You can't offshore the validation to junior staff if the output requires senior-level judgement. Agents can take you 95% of the way. But that last 5% is where the actual skill lives. And it's going to take longer than you think.
English
0
0
2
159
Kol Tregaskes News
Kol Tregaskes News@koltregaskes2·
Concise Guide to Writing Effective Agent Skills (from SkillsBench paper) Define procedural focus: - Deliver how-to guidance only – workflows, standard operating procedures (SOPs), domain conventions, and heuristics for a class of similar tasks (not factual recall or single-instance solutions). Use required structure: - Place everything in a modular directory containing SKILL md (natural-language instructions with YAML frontmatter for name and description) plus optional resources (code templates, executable scripts, reference docs, or worked examples). Keep focused and concise: - Limit to 2–3 core modules or procedures per skill/task; detailed or compact guidance outperforms exhaustive/comprehensive documentation. Make it actionable: - Include exact API calls, function names, parameters, step-by-step sequences, output-format reminders, and at least one concrete working example. Ensure reusability and portability: - Write for file-system use across agents; avoid any task-specific leakage (no test-case constants, filenames, or paths). Prioritise human curation and quality: - Skills must be accurate, internally consistent, clear, specific, and error-free; expert-written skills drive the percentage gains while self-generated ones deliver no benefit.
English
0
0
0
55
Kol Tregaskes News
Kol Tregaskes News@koltregaskes2·
Roon highlights that permission boundaries - API keys, user accounts, and walled gardens - have become significantly more value-destructive for AI agents in the agentic age. While no perfect fix exists, leading solutions include fine-grained OAuth 2.0 scopes with short-lived tokens; workload identity attestation for secretless authentication; and decentralised identifiers with verifiable credentials. These can grant agents secure, portable autonomy across services without traditional account restrictions.
roon@tszzl

permissions boundaries like api keys, user accounts, walled gardens have become so much more value destructive in the agentic age. i don’t really see a perfect solution

English
0
0
1
59
Kol Tregaskes News
Kol Tregaskes News@koltregaskes2·
Parmy Olson argues in Bloomberg that AI will not destroy the movie business and Tom Cruise and Brad Pitt can relax. Justin Hackney, the actor who played the infected kid in 28 Days Later, faced ostracism after taking roles at OpenAI and ElevenLabs to evangelise generative AI in film and advertising before co-founding Wonder Studios in London.
Kol Tregaskes News tweet media
English
0
0
1
89
Kol Tregaskes News
Kol Tregaskes News@koltregaskes2·
HUMAN Security's 2026 benchmark report shows automated traffic grew eight times faster than human traffic in 2025, with monthly AI-driven traffic volumes up 187 percent and agentic AI traffic surging 7,851 percent year over year. The data, drawn from more than one quadrillion interactions across their customer base, highlights that 95 percent of AI-driven traffic focused on three sectors while the gap between benign and malicious automation narrowed to 0.5 percent, shifting the challenge from bot detection to intent verification.
Chubby♨️@kimmonismus

Bots have officially overtaken humans on the internet. A new report from Human Security found automated traffic grew 8x faster than human activity in 2025, with AI agent traffic surging nearly 8,000%. The age of machine-dominated internet traffic is here, years earlier than many predicted.

English
0
0
1
51
Kol Tregaskes News
Kol Tregaskes News@koltregaskes2·
Yes, the labs aren’t clear on what their target is. I also think they aren’t sure themselves. Has this ever been doable in previous monster changes? It’s like predicting the aftermath of the singularity. Even today, I don’t think they’re clear what we have. It’s still a coding-centric environment, so I’m finding it hard to sell AI to my bosses. Agents are just not on the agenda and now that costs are raising it's far off from being so.
Kol Tregaskes News tweet media
Ethan Mollick@emollick

The AI labs have actually done a bad job explaining what the future they are building towards will actually look like for most of us. Even “Machines of Loving Grace” has very few well-articulated visions of what Anthropic hopes life will be like if they succeed at their goals.

English
0
0
2
42
Kol Tregaskes News
Kol Tregaskes News@koltregaskes2·
Ajeya Cotra explores six milestones for AI automation across two key domains - AI research and AI production - by outlining three progressive thresholds of machine independence from human labour. - Adequacy is reached when removing humans causes less than 100 percent productivity loss because unaided machines make some non-zero forward progress. - Parity occurs when removing AIs slows sector progress more than removing humans. - Supremacy is the point when removing humans actually increases productivity. - She expects AI research adequacy very soon or already passed, parity in another couple of years, and supremacy within a further year, after which AI production milestones would follow to full self-sufficiency. This framework sharpens definitions of powerful AI and highlights how AI R&D automation could accelerate progress across the entire stack.
Kol Tregaskes News tweet media
Ajeya Cotra@ajeya_cotra

New post on milestones of AI automation. Right now, human labor is a hard bottleneck on output (if you remove humans, output goes to 0). Soon we'll go from essential to important to helpful to useless, first in AI research and then across the AI stack. Link in next post.

English
0
0
0
51
Kol Tregaskes News
Kol Tregaskes News@koltregaskes2·
Wharton report based on interviews at 20 game studios in the US and EU maps four progressive stages of AI adoption. - Stage one is copy-and-paste use of secure LLM tools for individual tasks such as meeting notes, OKR drafting and basic queries with no structural change. - Stage two launches top-down workflow pilots for asset creation and automation but stalls on tacit knowledge gaps and artist resistance over job roles. - Stage three enables boundary-crossing where generalists use persistent AI context to handle adjacent work such as non-coders writing queries or engineers generating art. - Stage four is reached by only three studios, all built AI-first from founding, with small generalist teams that collapse traditional pipelines. The research shows cycle-time compression in production tasks yet notes AI cannot replace human coordination for strategy and culture.
Kol Tregaskes News tweet media
Ethan Mollick@emollick

Our Lab just posted a new research report from Zimran Ahmed about how the game industry is adapting to AI. He spoke to people at 20 different studios and found a wide range of approaches to adapt (or failures to adapt) to AI at the organizational level. gail.wharton.upenn.edu/research-and-i…

English
0
0
0
86