AIMultiple

168 posts

AIMultiple banner
AIMultiple

AIMultiple

@AIMultiple

We provide transparent, in-depth, data-driven AI industry analysis to help businesses explore AI, machine learning and other emerging technology use cases

San Francisco Katılım Aralık 2018
95 Takip Edilen711 Takipçiler
AIMultiple retweetledi
Cem Dilmegani
Cem Dilmegani@dilmegani·
AI making our lives easier
Cem Dilmegani tweet media
English
0
1
3
108
AIMultiple retweetledi
Cem Dilmegani
Cem Dilmegani@dilmegani·
It reminds me of Grok 4 which aced every good benchmark with a holdout dataset like LiveCodeBench. AI influencers were impressed. Then we tested it and got disappointed. In most cases, like the hallucination benchmark below, it failed to reach the top position.
Cem Dilmegani tweet media
English
2
1
2
254
AIMultiple retweetledi
Cem Dilmegani
Cem Dilmegani@dilmegani·
ChatGPT's new agent almost broke our benchmark. We'll soon need a harder test. This benchmark is not based on a public dataset that is included in OpenAI's models. While we explain the task clearly, data is not public, therefore models not have access to it.
Cem Dilmegani tweet media
English
1
1
1
325
AIMultiple
AIMultiple@AIMultiple·
This is a benchmark performed on the holdout set. We published 1 example question but the rest of the 100 questions are not public. Therefore, models can't just respond with the answers in their training set.
English
1
0
0
192
AIMultiple
AIMultiple@AIMultiple·
We are introducing AI LMC-Eval, a coding benchmark with 100 questions & tested on 7 leading LLMs. LMC stands for Logic / Math Coding. We presented the LLM with high school level logic and math problems and instructed it to write Python to solve them.
AIMultiple tweet media
English
2
0
2
330
AIMultiple
AIMultiple@AIMultiple·
Agentic AI is still mostly hype. We asked 5 AI agents to fetch a prices of a specific product from original sources and got only 20% of the results. Should we try this with other agents? research.aimultiple.com/ai-agents/
AIMultiple tweet media
English
0
1
5
398
AIMultiple
AIMultiple@AIMultiple·
Web scraping enables businesses to get a bulk list of their target audience’s email addresses. It reduces human errors in manually entering email addresses into a database and accelerates marketing processes. To learn more, read our comprehensive article. research.aimultiple.com/email-scraping/
English
0
1
3
0
AIMultiple
AIMultiple@AIMultiple·
Psychological factors such as users’ sentiments regarding policy changes or new investments greatly influence how stock prices change. In this article, we’ll explore how sentiment analysis can be applied to stock market forecasts. #StockMarkets research.aimultiple.com/sentiment-anal…
English
0
1
2
0
AIMultiple
AIMultiple@AIMultiple·
IoT enables a myriad of different business applications. Knowing those IoT use cases can help businesses integrate IoT technologies into their investment decisions. That is why we created the most comprehensive list of IoT use cases in industries. #IoT research.aimultiple.com/iot-applicatio…
English
0
0
2
0
AIMultiple
AIMultiple@AIMultiple·
AI presents opportunities for cybersecurity professionals to improve their cyber defenses and new threats as cyber attackers leverage modern, publicly available machine learning algorithms. Check our comprehensive article on AI security. #cyberattack research.aimultiple.com/ai-security/
English
0
0
2
0
AIMultiple
AIMultiple@AIMultiple·
Annotated data is integral to many machine learning and artificial intelligence applications. At the same time, it is one of the most time-consuming and labor-intensive parts of ML projects. Here, we explore what data annotation is and why it matters. research.aimultiple.com/data-annotatio…
English
0
0
2
0