ExtensityAI

22 posts

ExtensityAI banner
ExtensityAI

ExtensityAI

@ExtensityAI

Katılım Şubat 2023
75 Takip Edilen103 Takipçiler
ExtensityAI retweetledi
Leo 🏴‍☠️
Leo 🏴‍☠️@xm4ch1ne·
I've managed to converge to a somehow consistent and highly rewarding interactions with the main models. Related to search, I dropped perplexity few months ago. I only use perplexity nowadays for their curated news section, which I read every day. What did this switch for me was o3-mini. That's how I always did the search thereafter and saw perplexity didn't provide even diminishing returns. I find funny that my brain still defaults to "@p" when I'm browsing which is my shortcut for querying perplexity. Now, with o4-mini the search experience is simply incredible. In terms of ideation and conceptual explorations, o3 is a beast. I could chat my entire day with it, digging deeper down threads of argumentation. As always, *I* am the guardrail in this process. I also really like how they implemented the image analysis feature; I find something quite satisfying and appealing watching it "focus" on images. I posted before that I recommend everyone traveling to go into an antique shop and explore with o3 the objects. It's a fascinating experience. In terms of writing, Claude 3.7 is very good. I've also had a nice time with GPT 4.1 for generating reports but I'm not yet sure it has the same quality as Claude when it comes to writing in general. I've also spent quite some time with DeepSeek R1, and I can't wait for R2. It's still one of my favorite models, and I consult it all the time during ideation. Its reasoning trace is like an oasis in the desert. It's a very clever model. It's somewhat unusable in production settings since it's rather slow, but for critically exploring topics it is an absolute goto. Regarding code, even though I was quite bullish on o4-mini, it's certainly lacking something that o3-mini had. o4-mini (high) is very good at laying the strategy of the code that needs to be written. So my current workflow involves asking o4-mini (high) first about my task, then using Sonnet 3.7 ET for the actual implementation. Sonnet 3.7 ET is very good engineer. Just don't use o3 for coding, I started thinking that's NOT why it was made for. Also be careful with o3 in general (and I think o4-mini as well): o3 replaces common Unicode characters; so it still looks like a dash (-) but it's not the common dash. Good threads here: x.com/KaixuanHuang1/… Regarding Gemini 2.5, I've tested it briefly and I don't get the hype. Where Gemini always excelled in my opinion is long-context window specific tasks and transcriptions. I've never used Grok other than the X's integration. I guess it's fine. But I generally don't care about it.
English
0
1
4
2.1K
ExtensityAI retweetledi
Marius-Constantin Dinu
Marius-Constantin Dinu@DinuMariusC·
🧠➕🔢 Research automation in action! We generated a new paper: primality test using circulant matrix eigenvalue structures—in just 24 hours. From math concept → formal paper → working implementation, our neurosymbolic platform compressed what would typically take months.
GIF
English
3
15
53
5.6K
ExtensityAI retweetledi
Sabine Hossenfelder
Sabine Hossenfelder@skdh·
I genuinely don't understand why some people are still bullish about LLMs. I use GPT, Grok, Gemini, Mistral etc every day in the hope they'll save me time searching for information and summarizing it. They continue to fabricate links, references, and quotes, like they did from day one. I ask them to give me a source for an alleged quote, I click on the link, it returns a 404 error. I Google for the alleged quote, it doesn't exist. They reference a scientific publication, I look it up, it doesn't exist. Happens all the time. Yes, it has gotten somewhat better in the past 2 years in that with DeepSearch and chains of thought about 50-60% or so of the references exist. By my personal estimate currently GPT 4o DeepResearch is the best one. Grok in particular often doesn't include references even if asked. It can't seem to link even to tweets. It's hugely frustrating. Yes, I have tried Gemini, and actually it was even worse in that it frequently refuses to even search for a source and instead gives me instructions for how to do it myself. Stopped using it for that reason. I also use them for quick estimates for orders of magnitude and they get them wrong all the time. One thing they do save me time with is unit conversion and collecting all kinds of constants. You'd think though that this shouldn't take a 100 million++ LLM to get done. Yesterday I uploaded a paper to GPT to ask it to write a summary and it told me the paper is from 2023, when the header of the PDF clearly says it's from 2025. I don't even know what the heck is going on there, but intelligence ain't it. I sense that a lot of people now think knowledge graphs will fix the LLM-issue, but no, they will not. They cannot. Even in the case that knowledge graphs would prevent logical inconsistency 100%, there are a lot of text-constructions that are perfectly logically consistent but have zero relation to reality. Companies will keep pumping up LLMs until the day a newcomer puts forward a different type of AI model that will swiftly outperform them. On that day, it will become apparent that a lot of companies have been hugely overvalued. It will be a very bad day for the stock market.
English
1.2K
944
6.9K
1.9M
ExtensityAI retweetledi
Marius-Constantin Dinu
Marius-Constantin Dinu@DinuMariusC·
📢 Exciting developments in bridging the gap between machine learning research and production: We introduce PyFlow.ts! 🚀 PyFlow.ts is great tool that tackles the 'last mile' problem in ML deployment by enabling seamless integration of Python machine learning models with TypeScript frontends. 🐍🔧 It simplifies the traditionally daunting tasks of API development, type safety, and client code generation with a streamlined approach using a single decorator to connect Python models directly to TypeScript applications. 🌐📜 One standout feature of PyFlow.ts is its ability to automate the integration of Python ML code with FastAPI and TypeScript, handling serialization and type conversion automatically. This tool is particularly beneficial for small development teams lacking specialized knowledge in API and frontend development. 💡🛠️ PyFlow.ts can be used in real-world applications such as computer vision, NLP, and financial analysis, offering a much-needed low-code, fast deployment solution for efficient ML model deployment. 📈🤖 For those interested in exploring the intricacies of PyFlow.ts further, read more in the article here: [PyFlow.ts Full Article](extensity.ai/blog/pyflow-ts…) 📚🔍 Learn more about how TypeScript can power AI and ML with this real-world use case: [TypeScript in AI/ML](@RC.Adhikari/typescript-supports-ai-and-ml-a-real-world-usecase-042914cd6d96" target="_blank" rel="nofollow noopener">medium.com/@RC.Adhikari/t…) ✨💻 Check out the detailed implementation and case study in our GitHub repository: [GitHub PyFlow.ts Repo](github.com/ExtensityAI/Py…) 📄🔗 #MachineLearning #ArtificialIntelligence #Development #Innovation #Python #TypeScript #FastAPI #MLOps
Marius-Constantin Dinu tweet media
English
0
1
8
299
ExtensityAI retweetledi
Marius-Constantin Dinu
Marius-Constantin Dinu@DinuMariusC·
Thrilled to announce the completion of my PhD about "Parameter Choice and Neuro-Symbolic Approaches for Deep Domain-Invariant Learning". Immense gratitude to my supervisor @HochreiterSepp, co-supervisor Werner Zellinger, and advisor @GaryMarcus for their invaluable guidance. My research explores broad AI systems that generalize across diverse tasks and adapt to new scenarios, focusing on approaches with and without gradient-based updates. It tackles the challenge of creating AI robust to distribution shifts and capable in unseen domains. The thesis analyzes domain adaptation (DA) and neuro-symbolic (NeSy) approaches, with an emphasis on large language models and deep domain-invariant learning. First, I explore advanced DA techniques and model-selection methods, for situations where parameter updates are feasible. Second, I focus on bridging symbolic and sub-symbolic AI paradigms through NeSy approaches, and establish a framework for scalable, generalizable broad AI systems across various problem settings that does not require parameter updates and enables next-generation agentic workflows without parameter updates. The research contributes to more adaptable AI approaches for real-world applications, paving the way for systems that truly understand context and adapt rapidly to new scenarios. By incorporating agentic workflows, it opens up new possibilities for AI systems to autonomously navigate complex, multistep tasks across diverse domains. Profound thanks to my supervisors, advisors, and all those who supported this journey. Excited to see the impact of this work on AI's future! Full thesis: arxiv.org/abs/2410.06235 #AI #MachineLearning #NeuroSymbolicAI #GenerativeAI
English
5
3
32
2.8K
ExtensityAI retweetledi
Benedikt Alkin
Benedikt Alkin@benediktalkin·
Excited to introduce Vision-LSTM (ViL): a new backbone for vision built on the xLSTM architecture. ViL creates patch tokens from an image and processes them with alternating bi-directional mLSTM blocks, where odd blocks process the sequence from the opposite direction. 🧵
Benedikt Alkin tweet media
Johannes Brandstetter@jo_brandstetter

Introducing Vision-LSTM - making xLSTM read images 🧠It works ... pretty, pretty well 🚀🚀 But convince yourself :) We are happy to share code already! 📜: arxiv.org/abs/2406.04303 🖥️: nx-ai.github.io/vision-lstm/ All credits to my stellar PhD @benediktalkin

English
2
33
136
32.6K
ExtensityAI retweetledi
Sepp Hochreiter
Sepp Hochreiter@HochreiterSepp·
New exciting research by @DinuMariusC with @ajayp95 (U of Pennsylvania) and @ExtensityAI. We show LLM self-improvement with synthetic data for web agent tasks on WebArena, and introduce an extended VERTEX score for measuring the trajectory quality of agent workflows.
Marius-Constantin Dinu@DinuMariusC

Excited to present our work “Large Language Models Can Self-Improve At Web Agent Tasks”. We show that synthetic data self-improvement boosts task completion by 31% on WebArena and introduce quality metrics for measuring autonomous agent workflows. #AI #MachineLearning #LLMs [1/n]

English
1
11
47
6.9K
ExtensityAI retweetledi
Leo 🏴‍☠️
Leo 🏴‍☠️@xm4ch1ne·
Proudly introducing the SymbolicAI framework in a formal light! This work is the culmination of countless hours of research, crossing out many phrases, writing and rewriting pages to find the right wording. Let's talk about the vision underlying SymbolicAI. 1/n
Marius-Constantin Dinu@DinuMariusC

🚀 SymbolicAI – a framework for logic-based approaches combining generative models and solvers. Alongside, we introduce a benchmark and empirical measure to evaluate SOTA LLMs in AI-centric workflows. Read more in our paper arxiv.org/abs/2402.00854 #MachineLearning 🧠💡[1/n]

English
1
3
10
1.8K
ExtensityAI
ExtensityAI@ExtensityAI·
We are very proud to announce that after working on this project for over one year, our team has released a first version of our new benchmark, which defines a new baseline for AI-centric workflows using LLMs at their core. Find out more at extensity.ai
Marius-Constantin Dinu@DinuMariusC

🚀 SymbolicAI – a framework for logic-based approaches combining generative models and solvers. Alongside, we introduce a benchmark and empirical measure to evaluate SOTA LLMs in AI-centric workflows. Read more in our paper arxiv.org/abs/2402.00854 #MachineLearning 🧠💡[1/n]

English
0
0
3
253
ExtensityAI retweetledi
Marius-Constantin Dinu
Marius-Constantin Dinu@DinuMariusC·
I am sure this is part of it... github.com/ExtensityAI/sy… ExtensityAI's is a non-profit, and its core mission is to democratize what OpenAI is not telling you any more. See also this Reddit to see the fine-tuned query models: reddit.com/r/ChatGPT/comm… Support us at our mission. :)
Marius-Constantin Dinu tweet media
Carlos E. Perez@IntuitMachine

1/n Breaking News! OpenAI has uncovered an emergent new cognitive capability, yet nobody is demanding answers! We are distracted by OpenAI governance politics and not the real issue!!!

English
0
1
3
354