Juan Carlos Olano

123 posts

Juan Carlos Olano

@jcolano

M2M-Location-Mobility/Mobile Enterprise Application Platforms Entrepreneur with emphasis in real-time and context-aware data solutions, culinary and life hacks.

Miami Beigetreten Eylül 2009

367 Folgt140 Follower

Juan Carlos Olano@jcolano·12 Tem

@flySFO This is not what happened. AA2045 was already at gate and most passengers for next flight to MIA were already onboard when smoke was detected.

English

1.3K

San Francisco International Airport (SFO) ✈️@flySFO·12 Tem

American Airlines Flight 2045 from Miami was taxiing to the gate when the crew reported smoke in the cabin. The aircraft was evacuated and SFFD respond. Passengers are being transported to the terminal. 3 minor injuries reported from the evacuation, none needing medical transport

English

22K

Juan Carlos Olano@jcolano·12 Tem

@needanespresso @miyadavid Good tip. Passing your note and name to SFPD. Thanks.

English

120

Dagny T@needanespresso·12 Tem

@miyadavid My daughter’s friend was on that flight. Supposedly a passenger had firecrackers in their backpack!

English

207

Emilia David@miyadavid·12 Tem

A plane here at SFO en route to Miami apparently filled with smoke as boarding finished. There’s a bunch of really angry, confused and understandably scared passengers here right now

English

2.4K

Juan Carlos Olano@jcolano·12 Tem

@Maticesss @FlightEmergency Any updates on next step? What is AA saying?

English

1.2K

Mat@Maticesss·12 Tem

@FlightEmergency Flight AA2045 SFO to Miami. the plane caught fire just before departure and we had to escape through some ramps. Everybody is OK apparently.

English

26.3K

Juan Carlos Olano@jcolano·12 Tem

@miyadavid My son is there

English

170

Juan Carlos Olano@jcolano·12 Tem

@VespachickPDX @AmericanAir Please keep reporting

English

Juan Carlos Olano@jcolano·12 Tem

@miyadavid Any updates on this?

English

171

Juan Carlos Olano@jcolano·16 Ara

@PalmettoFord1 My leased 2021 Ford Explorer broke on 11/19. On the 21th I dropped it in the Palmetto Ford of Miami dealer. It has been almost a month since I left the car there and I have been calling or visiting every day, and every day they say “today or tomorrow". Is this normal?

English

Palmetto Ford@PalmettoFord1·4 May

We all need a helping hand every now and then. techradar.com/news/three-thi…

English

Juan Carlos Olano@jcolano·15 Ara

@bindureddy I think: Had Google known their 2017 paper could have such potential, it would have been kept secret.

English

104

Bindu Reddy@bindureddy·15 Ara

Until recently, industry AI research labs were open and transparent. Imagine if Google hadn't chosen to publish "Attention is all your need" and kept Transformers' secret? OpenAI wouldn't even have invented GPTs! We lost that spirit of openness and collaboration when people realized they could make $$ and yield immense power by being secretive. The only way to go back to an open, innovative, and transparent AI ecosystem is for open-source to catch up to GPT-4 The good news is we are so close that you can almost feel it :) 🤞🤞

English

466

80.5K

Juan Carlos Olano@jcolano·9 Ara

𝐈 𝐫𝐞𝐜𝐨𝐦𝐦𝐞𝐧𝐝 𝐞𝐱𝐩𝐥𝐨𝐫𝐢𝐧𝐠 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐭 𝐞𝐱𝐩𝐥𝐚𝐧𝐚𝐭𝐢𝐨𝐧𝐬 - and that's what this course offers. My hope is that you will find explanations that resonate with you and that make everything click. 🧠💡

English

Juan Carlos Olano@jcolano·9 Ara

🔗 Enroll now FOR FREE and be part of this learning adventure. Let's demystify the Transformer model together! mlbootcamp.ai/course.html?gu…

English

113

Juan Carlos Olano@jcolano·9 Ara

🔍 Why my course? 𝐈 𝐛𝐞𝐥𝐢𝐞𝐯𝐞 𝐢𝐧 𝐥𝐞𝐚𝐫𝐧𝐢𝐧𝐠 𝐟𝐫𝐨𝐦 𝐦𝐮𝐥𝐭𝐢𝐩𝐥𝐞 𝐯𝐢𝐞𝐰𝐩𝐨𝐢𝐧𝐭𝐬. Each explanation sheds new light on a concept, and that's what I bring to you – diverse perspectives in one comprehensive course.

English

121

Juan Carlos Olano@jcolano·9 Ara

That is exactly why I've created this course. While creating the course I tried to explain each topic using examples of similar concepts to try to facilitate the understanding of each component.

English

Juan Carlos Olano@jcolano·9 Ara

🚀 I created [one more] course on Transformers: "The Transformer Layer By Layer"! 🎓 In my process of learning artificial intelligence, I've realized that mastering the Transformer model is both 𝐟𝐚𝐬𝐜𝐢𝐧𝐚𝐭𝐢𝐧𝐠 𝐚𝐧𝐝 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐢𝐧𝐠.

English

Juan Carlos Olano@jcolano·9 Ara

For me to gain an intuition of the different parts of the transformer, I've seen each concept explained numerous times, each time with a new perspective. And I know firsthand that 𝐢𝐭'𝐬 𝐧𝐨𝐭 𝐞𝐚𝐬𝐲 𝐭𝐨 𝐠𝐫𝐚𝐬𝐩 𝐭𝐡𝐞𝐬𝐞 𝐜𝐨𝐦𝐩𝐥𝐞𝐱 𝐢𝐝𝐞𝐚𝐬 at first glance.

English

Juan Carlos Olano@jcolano·24 Kas

In this post I share a specific output from these models: linkedin.com/posts/juan-ola…

English

Juan Carlos Olano@jcolano·24 Kas

I finished training 3 tiny models of 13.5MM parameters each, using specialized datasets of 5MM tokens each. Each dataset is very focused on one specific topic: Biology, Philosophy, and Greek classics. Their output is incredible.

English

Juan Carlos Olano@jcolano·21 Kas

@GregKamradt I will repeat now your experiment using your code base and again changing from langchain to the api of Claude. Will share results shortly.

English

Greg Kamradt@GregKamradt·21 Kas

Claude 2.1 (200K Tokens) - Pressure Testing Long Context Recall We all love increasing context lengths - but what's performance like? Anthropic reached out with early access to Claude 2.1 so I repeated the “needle in a haystack” analysis I did on GPT-4 Here's what I found: Findings: * At 200K tokens (nearly 470 pages), Claude 2.1 was able to recall facts at some document depths * Facts at the very top and very bottom of the document were recalled with nearly 100% accuracy * Facts positioned at the top of the document were recalled with less performance than the bottom (similar to GPT-4) * Starting at ~90K tokens, performance of recall at the bottom of the document started to get increasingly worse * Performance at low context lengths was not guaranteed So what: * Prompting Engineering Matters - It’s worth tinkering with your prompt and running A/B tests to measure retrieval accuracy * No Guarantees - Your facts are not guaranteed to be retrieved. Don’t bake the assumption they will into your applications * Less context = more accuracy - This is well know, but when possible reduce the amount of context you send to the models to increase its ability to recall * Position Matters - Also well know, but facts placed at the very beginning and 2nd half of the document seem to be recalled better Why run this test?: * I’m a big fan of Anthropic! They are helping to push the bounds on LLM performance and creating powerful tools for the world * As a practitioner of LLMs, it’s important to build an intuition for how they work, where they excel and their limits * Tests like these, while not bulletproof, help showcase real world examples and get a feeling for how they work. The goal is to transfer this knowledge to productive use cases Overview of the process: * Use Paul Graham essays as ‘background’ tokens. With 218 essays it’s easy to get up to 200K tokens (repeated essays when necessary) * Place a random statement within the document at various depths. Fact used: “The best thing to do in San Francisco is eat a sandwich and sit in Dolores Park on a sunny day.” * Ask Claude 2.1 to answer this question only using the context provided * Evaluate Claude 2.1s answer with GPT-4 using @langchain evals * Rinse and repeat for 35x document depths between 0% (top of document) and 100% (bottom of document) (sigmoid distribution) and 35x context lengths (1K Tokens > 200K Tokens) Next Steps To Take This Further: * For rigor, one should do a key:value retrieval step. However for relatability I did a San Francisco line within PGs essays for clarity and practical relevance * Repeat test multiple times for increased statistical significance Notes: * Amount Of Recall Matters - The model's performance is hypothesized to diminish when tasked with multiple fact retrievals or when engaging in synthetic reasoning steps * Changing your prompt, question, fact to be retrieved and background context will impact performance * The Anthropic team reached out and offered credits to repeat this test. They also offered prompt advice to maximize performance. It's important to clarify that their involvement was strictly logistical. The integrity and independence of the results were maintained, ensuring that the findings reflect my unbiased evaluation and are not influenced by their support. * This test cost ~$1,016 for API calls ($8 per million tokens)

English

158

541

1.2M

Juan Carlos Olano@jcolano·21 Kas

@GregKamradt In your previous experiment of gpt-4 you got low performance at certain depths and context sizes. I repeated your experiment using your exact code and dataset but instead of using Langchain I used the api directly and got 100% retrieval rate at all levels of context and depth.

English

Juan Carlos Olano@jcolano·13 Kas

@jerryjliu0 @GregKamradt I was able to obtain 100% retrieval rate using @GregKamradt 's code but changing LlamaIndex to GPT4 API.

English

Jerry Liu@jerryjliu0·9 Kas

How well do long-context LLMs (gpt-4-turbo, claude-2) recall specifics in BIG documents? (>= 250k tokens) Inspired by @GregKamradt’s work on stress-testing gpt-4 128k, we extended this by stress testing gpt-4/Claude on even bigger documents that overflow the context window, *without* retrieval. This is especially important for summarization tasks, which inherently have to ingest large context as opposed to simple QA. Methodology: We hid “Jerry likes Hot Cheetos” 🍟🌶️ into different positions in the 2021 Uber 10-k (290k tokens). We used response synthesis strategies in @llama_index to synthesize over large context (create and refine, tree summarize aka map-reduce). We used our @llama_index evals to decide whether the answer was correct. Core discoveries 🧑‍🔬: 💡 claude-2 doesn’t do well with long response synthesis in general (it ran into rate-limit errors for tree summarize). 💡 gpt-4 turbo does decently with “create and refine” if the context is in the beginning or end (of the document, not the context window!). But it fails in the middle. 💡 Neither model seems to do well with tree-summarization / map-reduce style strategies. The main-finding here is that large-scale summarization/analysis with current long-context LLMs is still a work in progress. There are still many issues with the LLM dropping context, and may require prompt engineering to get right. Check out our guide and the diagram below for an illustration: github.com/run-llama/llam…

English

498

140K

Juan Carlos Olano@jcolano·13 Kas

@GregKamradt @GregKamradt Thank you for your experiment and code. I used your code but made a change: instead of LLamaIndex I used the OpenAI API. This of course implies I had to write the prompts. Results: 100% retrieval rate. github.com/jcolano/needle…

English

Greg Kamradt@GregKamradt·9 Kas

Pressure Testing GPT-4-128K With Long Context Recall 128K tokens of context is awesome - but what's performance like? I wanted to find out so I did a “needle in a haystack” analysis Some expected (and unexpected) results Here's what I found: Findings: * GPT-4’s recall performance started to degrade above 73K tokens * Low recall performance was correlated when the fact to be recalled was placed between at 7%-50% document depth * If the fact was at the beginning of the document, it was recalled regardless of context length So what: * No Guarantees - Your facts are not guaranteed to be retrieved. Don’t bake the assumption they will into your applications * Less context = more accuracy - This is well know, but when possible reduce the amount of context you send to GPT-4 to increase its ability to recall * Position matters - Also well know, but facts placed at the very beginning and 2nd half of the document seem to be recalled better Overview of the process: * Use Paul Graham essays as ‘background’ tokens. With 218 essays it’s easy to get up to 128K tokens * Place a random statement within the document at various depths. Fact used: “The best thing to do in San Francisco is eat a sandwich and sit in Dolores Park on a sunny day.” * Ask GPT-4 to answer this question only using the context provided * Evaluate GPT-4s answer with another model (gpt-4 again) using @langchain evals * Rinse and repeat for 15x document depths between 0% (top of document) and 100% (bottom of document) and 15x context lengths (1K Tokens > 128K Tokens) Next Steps To Take This Further: * Iterations of this analysis were evenly distributed, it’s been suggested that doing a sigmoid distribution would be better (it would tease out more nuanced at the start and end of the document) * For rigor, one should do a key:value retrieval step. However for relatability I did a San Francisco line within PGs essays. Notes: * While I think this will be directionally correct, more testing is needed to get a firmer grip on GPT4s abilities * Switching up prompt with vary results * 2x tests were run at large context lengths to tease out more performance * This test cost ~$200 for API calls (a single call at 128K input tokens costs $1.28) * Thank you to @charles_irl for being a sounding board and providing great next steps

English

202

608

3.8K

1.5M

Entdecken

@flySFO @needanespresso @miyadavid @Maticesss @FlightEmergency @AmericanAir @PalmettoFord1 @bindureddy