Cédric Lombion

768 posts

Cédric Lombion

Cédric Lombion

@clombion

Founder, Civic Literacy Initiative. Former data lead @okfn, @schoolofdata. Making sense of the collision btwn tech & society through (data/algo) literacies.

work with us → Katılım Ocak 2011
534 Takip Edilen583 Takipçiler
archie 🦋
archie 🦋@archieemwood·
my python env is totally borked been here before only solution at this point is to buy a new laptop
English
1
0
11
984
Nico Ritschel
Nico Ritschel@nicoritschel·
@JohnKutay There’s some fundamental architecture decisions in competitors that help with intentionality. I don’t think DBT is pexpressive enough for us to cling to that spec.
English
2
0
0
260
Nico Ritschel
Nico Ritschel@nicoritschel·
...is DBT the Tableau of data modeling?
English
5
2
22
4.3K
Rahul B (@rahulbot@vis.social)
My book has a cover! "Community Data - Creative Approaches to Empowering People with Information" coming in Nov from @OxUniPress. (Follow rahulbot on instagram for updates on the silly "book launch" video I'm working on 🚀+📘)
Rahul B (@rahulbot@vis.social) tweet media
English
1
1
8
238
Cédric Lombion
Cédric Lombion@clombion·
@mattbeane I stopped at 10' when it said "5 genres, 5 more to go" before suddenly skipping to the next paper. I would have expected more callbacks to the first paper being discussed while discussing the second one. It felt like I had to do the connections myself, which I find distracting.
English
1
0
1
54
Matt Beane
Matt Beane@mattbeane·
Okay, I'll toss a (STUNNING) Google NotebookLM podcast into the mix. PhD students often struggle to learn how to write good papers. So do I! This is 100% AI-generated, from three seminal papers with practical how-to guidance. Wow. Simply, wow: soundcloud.com/matt-beane-235…
English
3
5
21
1.9K
Jeremy Howard
Jeremy Howard@jeremyphoward·
Did you know there's a Python and CLI lib with full auto-complete providing 100% always-updated coverage of the >1000 methods in the entire @GitHub REST API? It's called ghapi. I've been working on it nearly 4 years now. Give it try--it's pretty fun!😃 github.com/fastai/ghapi
English
8
81
598
42.1K
Cédric Lombion
Cédric Lombion@clombion·
The rise of social stats led to two interpretations — those using them to identify social ailments and advocate for change — those twisting them into flawed biological conclusions in order to advance practices like eugenics. (Alain Supiot) The more things change...
Cédric Lombion tweet mediaCédric Lombion tweet media
English
1
0
0
96
Cédric Lombion
Cédric Lombion@clombion·
@evidence_dev Would be great to have access to an RSS feed for your blog! Much nicer to follow product updates from my RSS reader than across email / social media.
English
0
0
2
25
Cédric Lombion
Cédric Lombion@clombion·
@mikorulez Useful insights, thanks! Any pointers on the tools / collaboration frameworks that you used?
English
2
0
0
16
Jacob Matson
Jacob Matson@matsonj·
@clombion Too much memory consumption in the Monte Carlo sim to do it w/o staging it in steps. Also at some point I have to add tiebreakers which are very complex logically. Need to preprocess a bit.
English
1
0
0
25
Cédric Lombion
Cédric Lombion@clombion·
@matsonj Sorry if I wasn't clear. I was asking if duckdb + evidence was not enough for this kind of pipeline. It was not about the choice of sql.
English
1
0
0
43
Cédric Lombion
Cédric Lombion@clombion·
One of the most inspiring sentences I've read recently. Also thankful that this is much more than a lucky strike of brilliance: @adriennemaree has developed the topic across several publications—all promptly added to my reading list.
Cédric Lombion tweet media
English
0
0
0
49
Cédric Lombion
Cédric Lombion@clombion·
@archieemwood @evidence_dev What I mean is that the philosophy behind Evidence seems to lean toward consuming the db and writing the SQL in Evidence, rather than generating a Datasette endpoint that I consume in Evidence. Or is it?
English
1
0
0
25
archie 🦋
archie 🦋@archieemwood·
@clombion @evidence_dev evidence caches everything during the build so API data will be as fresh as the most recent build
English
1
0
0
26
archie 🦋
archie 🦋@archieemwood·
most charting tools require inordinate amounts of config for a decent looking chart in @evidence_dev a great looking chart is 5 lines of code
archie 🦋 tweet mediaarchie 🦋 tweet mediaarchie 🦋 tweet mediaarchie 🦋 tweet media
English
3
5
61
2.8K
Cédric Lombion
Cédric Lombion@clombion·
@archieemwood @evidence_dev But reading how evidence works, is there any point in hooking it up to the API if I have access to the SQLite behind it? As Evidence stores stored and converts all the data anyway?
English
1
0
0
32
Cédric Lombion
Cédric Lombion@clombion·
@archieemwood @evidence_dev I've been using Streamlit to create small internal apps that consume a Datasette API. And I've been meaning to test Evidence because of how slow Streamlit can get while caching the data.
English
2
0
1
60
Cédric Lombion
Cédric Lombion@clombion·
@archieemwood @evidence_dev The user here is the biz analyst building the data-driven website right? In this case an interesting use of llm would be to generate x different type of charts based on the same data to compare readability. UI could be freeform or use dataviz grammar to help guide prompt.
English
3
0
2
54
Cédric Lombion
Cédric Lombion@clombion·
@ethanf_17 Is there a reason to extract and transform the data directly with DuckDB instead of doing it with pandas (or polars) and then loading it in a DuckDB file?
English
1
0
1
109
Ethan
Ethan@ethanf_17·
I pulled data from the Offerings, Issuers, and FormDSubmissions CSVs and combined the data into one big Dataframe using DuckDB I'm not in love with string SQL queries like this so if anyone has syntax suggestions please let me know.
Ethan tweet media
English
3
0
2
1.4K
Ethan
Ethan@ethanf_17·
I built a dashboard using @evidence_dev with data scrapped from the SEC and a pipeline built with @duckdb. I thought I'd write a quick guide on how to do this since it was super easy and fun.
Ethan tweet media
English
5
34
327
56.2K
Cédric Lombion
Cédric Lombion@clombion·
@simonw That allows them to deploy quickly to demonstrate the use case, then find funding after. Sounds like a good use case for serverless to me? Though not sustainable for the platforms themselves probably.
English
0
0
1
19
Cédric Lombion
Cédric Lombion@clombion·
@simonw I had a meeting yesterday with a gig department that was responsible for updating a CSV data file but had not control over its publication, as another department managed the open data portal. I suggested Datasette + Vercel to them as a way to circumvent the issue.
English
1
0
2
185
Simon Willison
Simon Willison@simonw·
It's taken a few years but it feels to me like the shine on serverless is starting to wear off
WebDevCody@webdevcody

I’ve been on a project at work for 5+ years now, and I’d say some of the biggest technical pain points have included dynamodb, serverless, api gateway. Might be a skill issue, but if I did it all over I’d say just always use postgres and deploy containers to a managed service until there is a reason not to for most projects. Dynamo is great when you know exactly what you’re building from the start. It’s also good if from the start you know you’ll be dealing with a lot of data that can’t work well in postgres (guess what sql has been handling lots of data for a long time). Dynamo becomes a pain when you’re doing agile development. SQL is a lot more forgiving when requirements change. Dynamo takes forever to loop over all your entries and update them. Updating 10 million records takes almost an hour, and that’s including doing parallel scans. “Bro, just increase your provisioned WCU! Sure, but you know it takes at least 20-30 minutes for that to finish updating your instance”? Doing the same update using sql takes 5 min on a 2 cpu machine with 8gb memory. Your inability to easily query for data in dynamo is bad. “Bro, just use GSI!” Ok, now you’re cost for writes are doubled, and each gsi is async updated so again when you need to update all entries, it takes time for your GSI to update fully. Accidentally picked a bad partition, sort key? Have fun writing a bunch of code just to migrate your data to a new table. The dynamo docs say “know your access patterns before you make your single table pattern”… most product owners can’t even describe what they want, you expect us to design our access patterns correct from the start? Lambda is a great when you have a specific need to quickly scale from 0 to 1000s of isolated workers. For example, we have a use case where we need to loop over hundreds of data entries and generate unique pdfs for each one. Lambda shines with this, but now you’ll basically need to use sqs or a queue system to orchestrate it all. Btw sqs has its own set of gotchyas, such as events might be delivered twice so you better write your code to make sure you don’t double process the same event. Luckily lambda supports running containers now, but previously it was a huge pain when you installed a package that requires a node-gyp binary which means now you need to build that inside the correct docker image that is compatible with the lambda runtime and then create a lambda layer containing those binaries. Save yourself the hassle and just always use containers for all running code. Probably just stop using node or JavaScript on the backend if possible, it’s pretty awful. 100% don’t use a mono lambda for your api. Api gateway is a pain and is typically used for putting a rest api in front of your lambdas if you want to make an api using a mono lambda. Works great until your lambda takes more than 30 seconds; api gateway will time out your requests. That means you need to instead go async events and figure out another solution to notify your users (websockets, sse) when the request is done. Have fun getting either to work on lambda. You’ll end up using api gateway v2 websockets that has more gotchyas. Connections auto timeout after 15 minutes, so you need to add ping pong logic, max connection of 2 hours, so again have fun writing more logic for those limits. Cold starts are a real issue as your code grows; which makes you find ways to lazy import functions if deploying a mono lambda api. Don’t forget deploying your lambda has a 250mb limit which is the biggest pain in the ass. Again, just run containers on lambda if you must use them. Add on top you’ll end up using terraform or another IaC tool just to get all this stuff deployed. SST is great, but if you think about it, they created it because we all admit deploying stuff to aws is a nightmare, especially lambda. Idk I’m just burned out on this entire ecosystem. Just let me deploy a single go server that renders html at this point.

English
11
6
181
30.9K