Cédric Lombion

768 posts

Cédric Lombion

@clombion

Founder, Civic Literacy Initiative. Former data lead @okfn, @schoolofdata. Making sense of the collision btwn tech & society through (data/algo) literacies.

work with us → Katılım Ocak 2011

534 Takip Edilen583 Takipçiler

Cédric Lombion@clombion·8 Eki

@_TylerHillery @archieemwood @datajoely @astral_sh Also installed uv out of sheer frustration when I faced the same issue ahah

English

Tyler Hillery@_TylerHillery·8 Eki

@archieemwood @datajoely @astral_sh I recommend installing uv with their standalone installer #standalone-installer" target="_blank" rel="nofollow noopener">docs.astral.sh/uv/getting-sta…

English

archie 🦋@archieemwood·8 Eki

my python env is totally borked been here before only solution at this point is to buy a new laptop

English

984

Cédric Lombion@clombion·8 Eki

@nicoritschel @JohnKutay Which competitors are you thinking of?

English

Nico Ritschel@nicoritschel·8 Eki

@JohnKutay There’s some fundamental architecture decisions in competitors that help with intentionality. I don’t think DBT is pexpressive enough for us to cling to that spec.

English

260

Nico Ritschel@nicoritschel·8 Eki

...is DBT the Tableau of data modeling?

English

4.3K

Cédric Lombion@clombion·2 Eki

@rahulbot @OxUniPress Looking forward to it!

English

Rahul B (@[email protected])@rahulbot·2 Eki

My book has a cover! "Community Data - Creative Approaches to Empowering People with Information" coming in Nov from @OxUniPress. (Follow rahulbot on instagram for updates on the silly "book launch" video I'm working on 🚀+📘)

Rahul B (@rahulbot@vis.social) tweet media

English

238

Cédric Lombion@clombion·20 Eyl

@mattbeane I stopped at 10' when it said "5 genres, 5 more to go" before suddenly skipping to the next paper. I would have expected more callbacks to the first paper being discussed while discussing the second one. It felt like I had to do the connections myself, which I find distracting.

English

Matt Beane@mattbeane·19 Eyl

Okay, I'll toss a (STUNNING) Google NotebookLM podcast into the mix. PhD students often struggle to learn how to write good papers. So do I! This is 100% AI-generated, from three seminal papers with practical how-to guidance. Wow. Simply, wow: soundcloud.com/matt-beane-235…

English

1.9K

Cédric Lombion@clombion·2 Eyl

@jeremyphoward Clicking the link about Github actions in the blog post leads to a 404 page ghapi.fast.ai/tutorial_actio…

English

Jeremy Howard@jeremyphoward·1 Eyl

Here's an overview of ghapi on the official GitHub blog: github.blog/developer-skil…

English

Jeremy Howard@jeremyphoward·1 Eyl

Did you know there's a Python and CLI lib with full auto-complete providing 100% always-updated coverage of the >1000 methods in the entire @GitHub REST API? It's called ghapi. I've been working on it nearly 4 years now. Give it try--it's pretty fun!😃 github.com/fastai/ghapi

English

598

42.1K

Cédric Lombion@clombion·25 Ağu

Text is a summary from a section of a series of lectures (in French) by Alain Supiot, titled "Governance by numbers" (7th lecture). A companion book was edited, and it received an English translation lawcat.berkeley.edu/record/1164158 Highlights from nplusonemag.com/issue-47/essay…

English

Cédric Lombion@clombion·25 Ağu

The rise of social stats led to two interpretations — those using them to identify social ailments and advocate for change — those twisting them into flawed biological conclusions in order to advance practices like eugenics. (Alain Supiot) The more things change...

English

Cédric Lombion@clombion·24 Ağu

@mikecodemonkey @Linux_Mint I have a 2013 iMac that is unbearably slow at this point, especially due to the HDD. Would installing Linux help?

English

Cédric Lombion@clombion·22 Ağu

@evidence_dev Would be great to have access to an RSS feed for your blog! Much nicer to follow product updates from my RSS reader than across email / social media.

English

evidence@evidence_dev·22 Ağu

Evidence: the fastest way to deploy data to your users

Caleb@calebfahlgren

In 2024, it takes 5 min to write some SQL and Markdown and deploy a beautiful dashboard. Here's a dashboard of @huggingface hub stats powered by @evidence_dev and @duckdb

English

1.3K

Cédric Lombion@clombion·22 Ağu

@mikorulez Useful insights, thanks! Any pointers on the tools / collaboration frameworks that you used?

English

Cédric Lombion@clombion·22 Ağu

@matsonj Thanks, that's useful!

English

Jacob Matson@matsonj·22 Ağu

@clombion Too much memory consumption in the Monte Carlo sim to do it w/o staging it in steps. Also at some point I have to add tiebreakers which are very complex logically. Need to preprocess a bit.

English

Jacob Matson@matsonj·22 Ağu

400 stars yay! I know its a vanity metric but dang if it isn't a good one :)

Jacob Matson@matsonj

399 now...

English

2.7K

Cédric Lombion@clombion·22 Ağu

@matsonj Sorry if I wasn't clear. I was asking if duckdb + evidence was not enough for this kind of pipeline. It was not about the choice of sql.

English

Jacob Matson@matsonj·22 Ağu

@clombion I used sql based tools bc I use sql.

English

Cédric Lombion@clombion·17 Ağu

One of the most inspiring sentences I've read recently. Also thankful that this is much more than a lucky strike of brilliance: @adriennemaree has developed the topic across several publications—all promptly added to my reading list.

English

Cédric Lombion@clombion·31 Tem

@archieemwood @evidence_dev What I mean is that the philosophy behind Evidence seems to lean toward consuming the db and writing the SQL in Evidence, rather than generating a Datasette endpoint that I consume in Evidence. Or is it?

English

archie 🦋@archieemwood·31 Tem

@clombion @evidence_dev evidence caches everything during the build so API data will be as fresh as the most recent build

English

archie 🦋@archieemwood·30 Tem

most charting tools require inordinate amounts of config for a decent looking chart in @evidence_dev a great looking chart is 5 lines of code

English

2.8K

Cédric Lombion@clombion·31 Tem

@archieemwood @evidence_dev But reading how evidence works, is there any point in hooking it up to the API if I have access to the SQLite behind it? As Evidence stores stored and converts all the data anyway?

English

Cédric Lombion@clombion·31 Tem

@archieemwood @evidence_dev I've been using Streamlit to create small internal apps that consume a Datasette API. And I've been meaning to test Evidence because of how slow Streamlit can get while caching the data.

English

Cédric Lombion@clombion·29 Tem

@archieemwood @evidence_dev The user here is the biz analyst building the data-driven website right? In this case an interesting use of llm would be to generate x different type of charts based on the same data to compare readability. UI could be freeform or use dataviz grammar to help guide prompt.

English

archie 🦋@archieemwood·28 Tem

rendering a new component on the fly with @evidence_dev (very directed output for now)

Jacob Matson@matsonj

@archieemwood @evidence_dev Rendering a new component on the fly would be sick, but could it work reliably enough??

English

1.8K

Cédric Lombion@clombion·28 Tem

@ethanf_17 Is there a reason to extract and transform the data directly with DuckDB instead of doing it with pandas (or polars) and then loading it in a DuckDB file?

English

109

Ethan@ethanf_17·27 Tem

I pulled data from the Offerings, Issuers, and FormDSubmissions CSVs and combined the data into one big Dataframe using DuckDB I'm not in love with string SQL queries like this so if anyone has syntax suggestions please let me know.

English

1.4K

Ethan@ethanf_17·27 Tem

I built a dashboard using @evidence_dev with data scrapped from the SEC and a pipeline built with @duckdb. I thought I'd write a quick guide on how to do this since it was super easy and fun.

English

327

56.2K

Cédric Lombion@clombion·26 Tem

@simonw That allows them to deploy quickly to demonstrate the use case, then find funding after. Sounds like a good use case for serverless to me? Though not sustainable for the platforms themselves probably.

English

Cédric Lombion@clombion·26 Tem

@simonw I had a meeting yesterday with a gig department that was responsible for updating a CSV data file but had not control over its publication, as another department managed the open data portal. I suggested Datasette + Vercel to them as a way to circumvent the issue.

English

185

Simon Willison@simonw·26 Tem

It's taken a few years but it feels to me like the shine on serverless is starting to wear off

WebDevCody@webdevcody

I’ve been on a project at work for 5+ years now, and I’d say some of the biggest technical pain points have included dynamodb, serverless, api gateway. Might be a skill issue, but if I did it all over I’d say just always use postgres and deploy containers to a managed service until there is a reason not to for most projects. Dynamo is great when you know exactly what you’re building from the start. It’s also good if from the start you know you’ll be dealing with a lot of data that can’t work well in postgres (guess what sql has been handling lots of data for a long time). Dynamo becomes a pain when you’re doing agile development. SQL is a lot more forgiving when requirements change. Dynamo takes forever to loop over all your entries and update them. Updating 10 million records takes almost an hour, and that’s including doing parallel scans. “Bro, just increase your provisioned WCU! Sure, but you know it takes at least 20-30 minutes for that to finish updating your instance”? Doing the same update using sql takes 5 min on a 2 cpu machine with 8gb memory. Your inability to easily query for data in dynamo is bad. “Bro, just use GSI!” Ok, now you’re cost for writes are doubled, and each gsi is async updated so again when you need to update all entries, it takes time for your GSI to update fully. Accidentally picked a bad partition, sort key? Have fun writing a bunch of code just to migrate your data to a new table. The dynamo docs say “know your access patterns before you make your single table pattern”… most product owners can’t even describe what they want, you expect us to design our access patterns correct from the start? Lambda is a great when you have a specific need to quickly scale from 0 to 1000s of isolated workers. For example, we have a use case where we need to loop over hundreds of data entries and generate unique pdfs for each one. Lambda shines with this, but now you’ll basically need to use sqs or a queue system to orchestrate it all. Btw sqs has its own set of gotchyas, such as events might be delivered twice so you better write your code to make sure you don’t double process the same event. Luckily lambda supports running containers now, but previously it was a huge pain when you installed a package that requires a node-gyp binary which means now you need to build that inside the correct docker image that is compatible with the lambda runtime and then create a lambda layer containing those binaries. Save yourself the hassle and just always use containers for all running code. Probably just stop using node or JavaScript on the backend if possible, it’s pretty awful. 100% don’t use a mono lambda for your api. Api gateway is a pain and is typically used for putting a rest api in front of your lambdas if you want to make an api using a mono lambda. Works great until your lambda takes more than 30 seconds; api gateway will time out your requests. That means you need to instead go async events and figure out another solution to notify your users (websockets, sse) when the request is done. Have fun getting either to work on lambda. You’ll end up using api gateway v2 websockets that has more gotchyas. Connections auto timeout after 15 minutes, so you need to add ping pong logic, max connection of 2 hours, so again have fun writing more logic for those limits. Cold starts are a real issue as your code grows; which makes you find ways to lazy import functions if deploying a mono lambda api. Don’t forget deploying your lambda has a 250mb limit which is the biggest pain in the ass. Again, just run containers on lambda if you must use them. Add on top you’ll end up using terraform or another IaC tool just to get all this stuff deployed. SST is great, but if you think about it, they created it because we all admit deploying stuff to aws is a nightmare, especially lambda. Idk I’m just burned out on this entire ecosystem. Just let me deploy a single go server that renders html at this point.

English

181

30.9K

Keşfet

@_TylerHillery @archieemwood @astral_sh @nicoritschel @JohnKutay @rahulbot @OxUniPress @mattbeane