Simon FL

212 posts

Simon FL banner
Simon FL

Simon FL

@simonfl

Husband to @pearlsesq, Software engineer @databricks, French Canadian (i.e. likes poutine and hockey), previously @stripe, @SlackHQ, @Foursquare, @Google

New York, NY Entrou em Mart 2007
545 Seguindo935 Seguidores
Simon FL retweetou
Michael Bendersky
Michael Bendersky@bemikelive·
We just published OfficeQA Pro - a set of 133 challenging questions from the original OfficeQA benchmark. Even the best frontier agents still struggle on OfficeQA Pro with common issues stemming from errors in parsing, retrieval, and visual reasoning.
Michael Bendersky tweet media
English
1
8
24
2.3K
Simon FL retweetou
Krista Opsahl-Ong
Krista Opsahl-Ong@kristahopsalong·
Most AI benchmarks test reasoning in isolation. Real enterprise tasks require grounded reasoning: 1️⃣ Find the right documents 2️⃣ Extract the right values 3️⃣ Perform analyses OfficeQA Pro evaluates this end-to-end. Frontier agents still score <50%. 🧵Paper & details below!
Krista Opsahl-Ong tweet media
English
7
27
110
44.2K
Simon FL retweetou
Simon FL
Simon FL@simonfl·
Why is it "two buck chuck" and not "two bucks chuck"? Is this like the "maple leafs" vs "maple leaves"? English plurals are insane
English
1
0
3
168
Simon FL retweetou
Michael Bendersky
Michael Bendersky@bemikelive·
Since joining @databricks, our research team has been hard at work on Agent Bricks, a new product that helps enterprises develop state-of-the-art domain-specific agents. We are now releasing a research blog about Agent Learning from Human Feedback (ALHF) databricks.com/blog/agent-lea…
English
2
20
101
9.9K
Simon FL retweetou
Jonathan Frankle
Jonathan Frankle@jefrankle·
RLVR isn't just for math and coding! At @databricks, it's impacting products and users across domains. One example: SQL Q&A. We hit the top of the BIRD single-model single-generation leaderboard with our standard TAO+RLVR recipe - the one rolling out in our Agent Bricks product.
Jonathan Frankle tweet media
English
3
15
107
23.1K
Simon FL
Simon FL@simonfl·
Hey @minimax_ai, I'm trying to serve M1-80k on vLLM. Your docs say "a server with 8 H800s can process inputs up to 2 million tokens" but then recommend --max_model_len 4096. What settings did you use for 2M tokens? I'm trying this on 8 H100s.
English
0
1
6
1.4K
Simon FL
Simon FL@simonfl·
@harryh I sadly have no experience to offer then.
English
1
0
0
89
Harry Heymann 🥑
Harry Heymann 🥑@harryh·
@simonfl AC not PTAC. All a central system for the whole apartment. Not window based.
English
1
0
0
190
Harry Heymann 🥑
Harry Heymann 🥑@harryh·
AC guy sent me a $1,400 bill for some maintenance. Says a new transformer (no idea what kind exactly) was $500 and a new pump was $600. Do spare parts for an AC really cost this much? Feels a bit like I'm getting ripped off.
English
6
0
4
1.8K
Clément Miao
Clément Miao@clementmiao·
@ffx no i want it to be prominent, with very bright rgb leds, and it makes loud laser sounds for each key press
English
1
0
1
40
Clément Miao
Clément Miao@clementmiao·
I don't have one yet, but seems like the biggest issue with the Vision Pro for productivity on the go is the bad virtual keyboard when you're not connected to your mbp. I can see split keyboard on your pants being popular within certain groups as a solution.
Clément Miao tweet media
English
1
0
1
305
Simon FL retweetou
Erebus
Erebus@IdemErebus·
DONDA is the new FAANG Deepmind Open AI Nvidia Databricks Anthropic
English
110
392
2.6K
351.6K
Mara
Mara@what_mara_said·
All I want to do, is relax, in the sky, in a very comfortable hot air ballon, and eat swedish fish
English
1
0
1
30
Simon FL
Simon FL@simonfl·
@harryh My bad, I must have misconstrued something I heard!
English
1
0
1
44
Harry Heymann 🥑
Harry Heymann 🥑@harryh·
@simonfl I don't think that was my mindset? I've just never had the stability in my life to potentially buy before.
English
1
0
1
61
Harry Heymann 🥑
Harry Heymann 🥑@harryh·
I was about 1% apart from a seller on an apartment negotiation and he table flipped, pulled his last offer, and is apparently delisting the apartment. He's gonna ride it out with his low mortgage and see what happens. Sign of the times I guess.
English
2
0
9
2.4K
Zack, definitely not an advanced AI
About 10 years ago I closed my @etrade account. They messed up the math and left it with $0.01 after closing. A decade later they're still sending monthly statements for this penny, but I can't tell them to just keep it because "only active accounts can call customer service." 🤦🏻‍♂️
English
2
0
3
255
Simon FL
Simon FL@simonfl·
@harryh Sidney is much better at this, except it gets confused easily, I think because of the other people who will show up on your LinkedIn profile, it thought I currently had @leok 's job . I assume it's the same problem with ChatGPT
Simon FL tweet media
English
0
0
3
114
Simon FL
Simon FL@simonfl·
@TheKhandyman is you taking notes on a criminal fucking conspiracy?
English
1
0
2
53
Manav Khandelwal
Manav Khandelwal@TheKhandyman·
Having older colleagues is great because they understand all of my Wire references immediately
English
3
0
12
845