Roger Oriol

100 posts

Roger Oriol

@rogiia

Katılım Ocak 2016

74 Takip Edilen34 Takipçiler

Roger Oriol@rogiia·6d

Build a Basic AI Agent From Scratch ruxu.dev/articles/ai/bu…

English

Roger Oriol retweetledi

Andrej Karpathy@karpathy·7 Mar

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)

English

1.1K

3.6K

28.4K

11M

Roger Oriol@rogiia·16 Nis

In Full-auto mode, Codex can not only read and write files, but also run shell commands in an environment confined around the current directory and with network disabled. In the future, you will be able to whitelist some shell commands to run with network enabled.

English

Roger Oriol@rogiia·16 Nis

It can use multimodal input, allows sandboxing your development environment to secure your computer and also allows the use of context files, ~/.codex/instructions.md for global instructions for Codex and ./codex.md in the project root for project-specific context.

English

Roger Oriol@rogiia·16 Nis

github.com/openai/codex Along with the launch of o3 and o4-mini, OpenAI has released a coding assitant for the terminal: Codex. You can use it to create new projects, make changes to existing projects or ask the model to explain code to you, all in the terminal.

English

Roger Oriol@rogiia·14 Nis

In the benchmarks, GPT-4.1 easily beats GPT-4.5 at a lower price and higher speed. For this reason, OpenAI has said they will be deprecating GPT-4.5 in 3 months time.

English

Roger Oriol@rogiia·14 Nis

After the unimpressive release of GPT-4.5 a month and a half ago, OpenAI is now releasing a new version - backwards. Today, they released three new models, exclusive to the API: GPT-4.1, GPT-4.1 mini and GPT-4.1 nano. openai.com/index/gpt-4-1/

English

Roger Oriol@rogiia·8 Nis

This is inevitable, all of them want to win the AI race at any cost. If you don't want to be fooled by ever-slightly-increasing benchmarks, you should set up your own benchmarks that measure their performance on your own use cases.

English

Roger Oriol@rogiia·8 Nis

We now have acknowledgement from LMArena of what we already knew: AI labs are cheating to get their models as high as possible in the LMArena leaderboard / benchmarks.

Arena.ai@arena

We've seen questions from the community about the latest release of Llama-4 on Arena. To ensure full transparency, we're releasing 2,000+ head-to-head battle results for public review. This includes user prompts, model responses, and user preferences. (link in next tweet) Early analysis shows style and model response tone was an important factor (demonstrated in style control ranking), and we are conducting a deeper analysis to understand more! (Emoji control? 🤔) In addition, we're also adding the HF version of Llama-4-Maverick to Arena, with leaderboard results published shortly. Meta’s interpretation of our policy did not match what we expect from model providers. Meta should have made it clearer that “Llama-4-Maverick-03-26-Experimental” was a customized model to optimize for human preference. As a result of that we are updating our leaderboard policies to reinforce our commitment to fair, reproducible evaluations so this confusion doesn’t occur in the future.

English

Roger Oriol retweetledi

Jerry Liu@jerryjliu0·5 Nis

In case you missed it, we launched 3 HUGE updates this week to @llama_index to help make it the most advanced, versatile multi-agent framework. 1. Jinja-style prompts - our new RichPromptTemplate lets you build dynamic prompts instead of hacking together the prompt f-strings and trying to intersperse it with outer logic. 2. Full multimodal support for agents - pass a chat message with interleaving text and images into a multi-agent system. 3. CodeAct agent - the next step beyond chain-of-thought is learning to execute code. Either use our pre-built agent out-of-the-box or learn to build it from scratch. Big shoutout to @LoganMarkewich and @masci for this. If you’re looking to build agents, here’s an entire stack of resources to help you get started: New Prompt Docs: docs.llamaindex.ai/en/latest/modu… docs.llamaindex.ai/en/latest/modu… docs.llamaindex.ai/en/latest/exam… CodeActAgent: docs.llamaindex.ai/en/latest/exam… CodeActAgent from scratch: docs.llamaindex.ai/en/latest/exam… Multimodal Agents: #multi-modal-agents" target="_blank" rel="nofollow noopener">docs.llamaindex.ai/en/stable/modu…

English

4.8K

Roger Oriol@rogiia·6 Nis

Overall, these look like very capable frontier models that can compete with OpenAI, Anthropic and Google while at the same time being open-source, which is a huge win. Check out Meta's post on the models' architecture and benchmarks. ai.meta.com/blog/llama-4-m…

English

Roger Oriol@rogiia·6 Nis

Llama 4 Reasoning We have no details on what it's going to be, just the announcement that it's coming soon. llama.com/llama4-reasoni…

English

Roger Oriol@rogiia·6 Nis

Meta has finally released the Llama 4 family of models that Zuckerberg hyped up so much. They are open-source, multimodal, MoE models. First impression, these models are massive. None of these will be able to run in the average computer with a decent GPU or Mac Mini. Let's see:

English

Roger Oriol@rogiia·30 Mar

Very interesting papers from Antropic-affiliated scientists where they investigate how human concepts are smeared across a model's neurons and how these features affect model outputs and allow the model to plan ahead its outputs: transformer-circuits.pub/2025/attributi… + transformer-circuits.pub/2025/attributi…

English

Roger Oriol@rogiia·22 Mar

The Anthropic team has discovered an interesting approach to LLM thinking capabilities. Instead of making the model think deeply before answering or taking an action, they experimented with giving the model a think tool: anthropic.com/engineering/cl…

English

Roger Oriol@rogiia·22 Mar

Here's my article on the set of best practices that I follow when I write my web pages: ruxu.dev/articles/web/h…

English

Roger Oriol@rogiia·22 Mar

How to Write a Good index.html File Every web developer has been there: you're starting a new project and staring at an empty index.html file. You try to remember, which tags were meant to go in the head again? Which meta tags are best practice and which ones are deprecated?

English

Keşfet

@llama_index @LoganMarkewich @masci @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates