Data For Science

7.3K posts

Data For Science banner
Data For Science

Data For Science

@data4sci

Take Control Of Your Data. Join our Data Science Briefing newsletter for the best in #DataScience and #MachineLearning

Manhattan, NY Tham gia Şubat 2015
458 Đang theo dõi1.4K Người theo dõi
Tweet ghim
Data For Science
Data For Science@data4sci·
☣️ The latest post on Epidemic Modeling is now out! ☣️ In this short post, we explore how to model demographic processes and the impact they have on the epidemic as it spreads through the population. The link is in the comments
Data For Science tweet media
English
1
1
2
509
Data For Science đã retweet
emily north
emily north@north0fnorth·
found the original 4k+ resolution artemis ii moon photos rather than the compressed 1080p ones official government accounts have been posting and the details are absolutely spectacular images.nasa.gov
emily north tweet mediaemily north tweet mediaemily north tweet mediaemily north tweet media
English
46
3.4K
20.9K
398.6K
Data For Science đã retweet
Simon Kuestenmacher
Simon Kuestenmacher@simongerman600·
This watershed map of Spain shows whether a drop of water ends up in the Atlantic or in the Mediterranean. Beautiful visualization by @joewdavies.
Simon Kuestenmacher tweet media
English
52
983
5.2K
379K
Data For Science
Data For Science@data4sci·
The Cult Of Vibe Coding Is Insane - by Bram Cohen — Bram Cohen makes a clear case that “vibe coding” isn’t a harmless style choice—it’s a decision to accept lower-quality software. Worth reading if you care about… bramcohen.com/p/the-cult-of-…
English
0
0
0
29
Data For Science
Data For Science@data4sci·
The Evolving Foundations of Math | Quanta Magazine — Math’s foundations aren’t a settled monolith—they’re still being revised in response to new ideas and edge cases. This piece is a clear look at what’s… quantamagazine.org/series/the-evo…
English
0
0
0
40
Data For Science đã retweet
Andrej Karpathy
Andrej Karpathy@karpathy·
LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.
English
2.6K
6.4K
54.2K
18.9M
Data For Science
Data For Science@data4sci·
Data Science Briefing #312 — A quick, practical roundup for data folks, with a note on the relaunch of the Data For Science site. Worth a skim if you like staying current without wading through long posts. data4science.kit.com/posts/data-sci…
English
0
0
0
19
Data For Science đã retweet
Robert Lomas
Robert Lomas@Dr_Robert_Lomas·
God Speed Artemis II I’ve waited 57 years to see this again. It was a Saturn V last time. Good Luck Integrity and her crew.
English
0
1
3
209
Data For Science
Data For Science@data4sci·
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation — A practical look at post-training LLMs using a cascade-style RL setup plus on-policy distillati… research.nvidia.com/labs/nemotron/…
English
0
0
0
56
Data For Science
Data For Science@data4sci·
Welcome to FastMCP - FastMCP — FastMCP looks like a clean, Python-first way to stand up MCP servers and clients without a lot of ceremony. Worth a skim if you want a practical starting point for building MCP-enabled apps. gofastmcp.com/getting-starte…
English
0
0
0
25
Data For Science
Data For Science@data4sci·
Designing AI for Disruptive Science - Asimov Press — A clear argument for why bigger models alone won’t deliver scientific breakthroughs. Useful framing on what it takes to design AI that actually changes how discoveries get made. asimov.press/p/ai-science
English
0
0
0
29
Data For Science
Data For Science@data4sci·
Learn LangGraph and Build Conversational AI with Python — Clear, practical intro to LangGraph for structuring conversational AI as graphs instead of tangled if/else logic—useful if your Py… freecodecamp.org/news/learn-lan…
English
0
0
0
35
Data For Science
Data For Science@data4sci·
A shared model-based linguistic space for transmitting our thoughts from brain to brain in natural conversations - PubMed — Fascinating evidence that natural conversation may rely on a shared, context-sensitive “linguistic space” that… pubmed.ncbi.nlm.nih.gov/39096896/
English
0
0
0
23
Data For Science
Data For Science@data4sci·
ClaudeCodeTools/Presentations/main.pdf at main · aspi6246/ClaudeCodeTools · GitHub — Handy reference for Claude Code tools, with a PDF you can skim quickly and a repo you can fork if you want to extend… github.com/aspi6246/Claud…
English
0
0
0
46
Data For Science đã retweet
Daniel Vassallo
Daniel Vassallo@dvassallo·
When I got interested in computer programming in the 90s, there were no programming books in my country. I had to wait for our yearly family vacation. We'd connect through London and I'd rush to Oxford Street to buy whatever books I could. A few minutes, once a year, to decide what I'd study for the next 12 months. And they had to fit in my 20kg luggage. Now education is free and accessible to everyone. Anyone with motivation can learn whatever they want. What a revolution!
Mark Cuban@mcuban

I’m going to tell you how much worse it was at the start of the PC Revolution for white collar workers trying to adapt, vs today with AI Today, presumably every white collar worker has access to a smart phone and/or a PC/laptop. Back then, a PC cost $4,995 , an off brand was $3,995. 5k in 1984 is about $16k today. It was really expensive. The only reason I could learn how to code and support software is because my job let me take home a PC to learn. By reading the software manual. Literally. RTFM. Or pay to go to training. Classes that started at hundreds of dollars then. It was expensive. It absolutely limited who could get ahead. Today, ANYONE can go to their browser, to the AI LLM website of their choice, and type in the words “I’m a novice with zero computer background, teach me how to create an agent that reads my email and …” That concept applies to LEARNING ANYTHING Think about what this means. Any employee of any company can say “ I need to learn how to xyz for my job , which is to do the following: Tell me what more information do you need to help me be more efficient, productive and promotable”. Or “ what new skills can you teach me that will help me reduce my chances of getting laid off “. Or “what suggestions do you have for me to communicate to my boss, who I barely know, to help my chances of staying employed “ These aren’t great prompts. But they are a start that anyone can take. Think about how incredible that is. Back in the day was so much harder for white collar workers. It was harder for new grads because unless they took comp sci, they probably had never used a PC. Big Companies are going to cut jobs. No question about it. Small companies is are going to need more and more AI literate thinkers who can help them compete or get an edge What I tell every entrepreneur, and it’s more crucial today. “ when you run with the elephants there are the quick and the dead. Adopt tech quickly , you can out maneuver big companies. “

English
37
15
343
38.1K