Sabitlenmiş Tweet
Daniel Hami
807 posts

Daniel Hami
@daniel__hami
The relationship between humans and technology has always been a question of balance. Technology only becomes truly valuable when it serves human values.
Slovak Republic Katılım Aralık 2017
132 Takip Edilen102 Takipçiler
Daniel Hami retweetledi

Can we trust AI agents with critical enterprise tasks? Absolutely not.
Introducing Wow (World of Workflows), the first Agentic Safety benchmark that proves that frontier LLMs fail miserably under safety constraints at enterprise tasks.
🧵 WoW demonstrates that LLM agents are “dynamically blind”. They fail to track the downstream ripple effects of their actions against complex enterprise rule sets. In an enterprise, that’s a safety and compliance hazard.
Our research shows how the future of enterprise AI requires proactive agent architectures and Wow is just a starting point.
📌 It’s now available to all researchers at: github.com/Skyfall-Resear…
Full blog here: skyfall.ai/blog/wow-bridg…
English

@thomasbail_ Crazy how much changes when you actually start paying attention to the small inputs! This hits hard!!
English

Want to cut your LLM input costs by 50% and speed up responses?
Use prompt caching (prefix caching) it stores the model’s internal work for the start of your prompt and reuses it when the next request begins with the same tokens.
This can:
Slash latency by up to 80%
Cut input token costs by roughly half (or more) for long prompts
But here’s the trap. The cache only hits if the beginning of the prompt matches exactly (same tokens, spaces, punctuation). If you put dynamic stuff like timestamps or user IDs at the very top, the cache is useless.
Do this instead:
Put large static context / instructions at the start
(e.g. full Hamlet text + analysis rules)
Append the changing question / input at the end
So for repeated questions on Hamlet, always send:
[Hamlet text + instructions] + [specific question]
That way, the big shared prefix gets cached and reused across many calls.
Pro tips:
Works best on prompts ≥1k tokens (OpenAI)
Cache usually lasts 5–10 min of inactivity
For OpenAI, the first 256 tokens form the cache key
Structure your prompts wisely, and you’ll pay way less for the same context.

English
Daniel Hami retweetledi

@daniel__hami Hood morning gang!
Rise n Grind, we gotta eat
English

What is a parameter in LLMs?
Parameters are the key numbers inside the model that it learns during training to make predictions. They include weights and biases. (If you see, for instance, “20B” after an LLM’s name "gpt-oss:20b
", it means the model has approximately 20 billion parameters)
Weights: Determine how strongly one neuron’s output affects another neuron’s input. Larger weights mean stronger influence, negative weights can invert the effect, and small weights mean little influence.
Biases: Extra values added to the weighted input sum of a neuron that allow the neuron to activate even if the inputs sum to zero.
Intuition:
Imagine you have a straight line: y=ax
This always passes through the origin (0,0).
If you add a bias (offset): y=ax+b
then the line shifts up or down on the coordinate plane. This “shift” is the bias.
The same happens in a neuron: the bias sets where the neuron starts to “activate”, not just how strongly the inputs affect it.

English

What are transformers and why are they so popular?
Transformers are a kind of neural network architecture that became the standard building block for modern LLMs like GPT, Claude, and Gemini. They are designed to handle sequences (like text) very efficiently and with strong understanding of context.
The Core Idea Transformers process entire sequences at once, not word-by-word like older models. Each token can "look at" every other token simultaneously and decide what's relevant, that's self-attention.
A tiny, intuitive example with a short sentence.
“The cat sat on the mat.”
Tokenized (simplified):
[The] [cat] [sat] [on] [the] [mat]
What self-attention does (idea)
Imagine the model is currently processing the word “sat” and wants to understand it well.
Self‑attention lets “sat” look at all the other words and decide how important each one is:
It might give high attention to “cat” (who sat?) and “mat” (where?).
It might give lower attention to words like “the” or “on” (less important for the meaning of “sat” here).
So for “sat” you could imagine attention weights like:
to “cat” → 0.4
to “mat” → 0.4
to “The / on / the” → together 0.2
Then the model mixes the information from all tokens, weighted by these numbers, to create a new, richer vector for “sat” that already “knows” who sat and where.

English

Which one do you use and for what purpose?
Base Models: Foundational models that predict the next sequence of text based on input. They are used in a structured manner for Q&A interactions.
Chat Models: These evolved from base models and use reinforcement learning from human feedback, enabling more conversational interactions through a structured set of user and assistant messages.
Reasoning Models: Designed to analyze problems step-by-step before answering, they incorporate techniques like 'chain of thought' prompting to improve the depth of responses.
Hybrid Models: These combine characteristics of chat and reasoning models, allowing the AI to adjust the reasoning depth based on question complexity. Newer models like Gemini Pro 25 and GPT-5 exemplify this approach.

English







