Big Data Analytics

2.1K posts

Big Data Analytics

@DatAnalyticsBCN

Barcelona 加入时间 Eylül 2015

2.5K 关注1.9K 粉丝

Big Data Analytics 已转推

Erick@ErickSky·1 Mar

Un loco con un doctorado creó una enciclopedia visual interactiva open source para entender cómo funciona la IA, en plan, locura, entren para que vean. Website: encyclopediaworld.github.io/howaiworks/ Repo en el primer comentario.

Español

689

167.5K

Big Data Analytics 已转推

Leontxo García@leontxogarcia·16 Şub

Sigo mucho las noticias sobre IA por su gran relación con el ajedrez. Este artículo de Ramón López de Mántaras me parece de lectura obligada para los interesados: elpais.com/tecnologia/202…

Español

1.9K

Big Data Analytics 已转推

Matt Shumer@mattshumer_·10 Şub

x.com/i/article/2021…

ZXX

6.6K

28.3K

118.9K

86.1M

Big Data Analytics 已转推

Joachim Schork@JoachimSchork·25 Ara

Simulated annealing is a powerful optimization technique inspired by the annealing process in metallurgy. It helps find solutions to complex problems by allowing the system to escape local optima, gradually improving over time. ✔️ Simulated annealing can be used to solve combinatorial problems, such as the traveling salesman problem, by finding the shortest possible route that connects a set of points. ✔️ It is effective for problems with large solution spaces and when traditional optimization methods struggle. ❌ The algorithm's performance heavily depends on the cooling schedule and temperature decay. Choosing the wrong parameters can lead to poor solutions or unnecessarily long computation times. ❌ Simulated annealing typically converges slower than other methods like gradient-based algorithms, making it less efficient for certain types of problems. The visualization from Wikipedia demonstrates how simulated annealing is applied to the traveling salesman problem, optimizing a route to minimize the distance between 125 points: Source: en.wikipedia.org/wiki/Simulated… 🔹 In R, the GenSA package allows for effective global optimization without requiring derivatives, making it ideal for non-differentiable problems. 🔹 In Python, the simanneal library offers a simple and effective way to implement simulated annealing for large-scale combinatorial optimization problems. For more insights on methods like simulated annealing, subscribe to my newsletter on Statistics, Data Science, R, and Python! Learn more: eepurl.com/gH6myT #Python #R4DS #RStats #datascienceenthusiast #coding

GIF

English

7.2K

Big Data Analytics 已转推

Massimo@Rainmaker1973·23 Ara

In 1963, the U.S. introduced ZIP codes, with the Postal Service explaining what the numbers mean and how to use them.

English

971

4.4K

191.3K

Big Data Analytics 已转推

François Chollet@fchollet·3 Ara

To perfectly understand a phenomenon is to perfectly compress it, to have a model of it that cannot be made any simpler. If a DL model requires millions parameters to model something that can be described by a differential equation of three terms, it has not really understood it, it has merely cached the data.

English

161

154

1.6K

122.6K

Big Data Analytics 已转推

Akshay 🚀@akshay_pachaar·14 Eyl

K-Means has two major problems: - Number of clusters must be known - Doesn't handle outliers But there's a solution! Introducing DBSCAN, a Density based clustering algorithm. 🚀 Read more...👇

English

507

44.7K

Big Data Analytics 已转推

Dr Kareem Carr@kareem_carr·29 Kas

statistics is cool

English

353

2.1K

62K

Big Data Analytics 已转推

Nassim Nicholas Taleb@nntaleb·24 Kas

BS DETECTION DUJOUR Fabiano must be the most stupid person to get an MD in the entire history of Medicine. R^2 = .025 as ACKNOWLEDGED by the authors of the paper. Has ZERO clinical & practical significance.

Nicholas Fabiano, MD@NTFabiano

Going to bed late is associated with a higher IQ.

English

515

110K

Big Data Analytics 已转推

Fy 🪷@__Fiamy·21 Kas

MA@0kktsu0

@__Fiamy Veo que a todos los que llegamos a la misma conclusión matematica que son una matriz nos estas refutando y veo que no cederás en la posición... Prefiero dar un paso al costado y apegarme a mi conocimiento pero respetar tu opinión Porque estoy bien con no tenrt 100pre la razón.

ZXX

168

2.9K

126.7K

Big Data Analytics 已转推

Valeriy M., PhD, MBA, CQF@predict_addict·20 Kas

🚀 Mastering Boosting: See Functional Gradient Descent in Action If you work in data science, one of the best ways to really understand an algorithm is to implement it from scratch. Gradient Boosting Decision Trees (GBDT) – the engine behind CatBoost, XGBoost, and LightGBM – are often described as “just an ensemble of trees,” but their real power comes from the optimization process behind them: Functional Gradient Descent. The video below visually walks through this idea step by step. 👇 🎬 What the Animation Shows We follow a simple Gradient Boosting Regressor as it learns to fit a noisy, non-linear dataset: 1. Iteration 0 – The Starting Point The model’s prediction (red line) is flat. It starts as a constant equal to the mean of the target values Y. This is the initial function: F0(x). 2. Gradient Step – Computing Pseudo-Residuals. For squared error loss, these residuals are exactly the gradient of the loss with respect to the current model. 3. Weak Learner – Fitting a Tree to the Errors A shallow decision tree h_m is then fit to these residuals. This tree learns where the current model is making the largest errors and how to correct them. 4. Update Step – Correcting the Model The ensemble is updated by adding a scaled version of this new tree: As the video plays, you see the red prediction curve F_M gradually evolve from a flat line into a flexible function that closely tracks the underlying data. 📚 Want to Go Deeper with CatBoost? If you’d like to turn this intuition into production-grade skills with modern gradient boosting, check out Mastering CatBoost Pro: 👉 valeman.gumroad.com/l/MasteringCat… Perfect if you want to truly understand what’s happening under the hood of boosting models—not just call .fit() and hope for the best.

GIF

English

204

11.1K

Big Data Analytics 已转推

Alec Helbling@alec_helbling·17 Kas

Many dimensionality reduction algorithms share a few central principles. 1. Construct a graph that captures the data's local structure 2. Measure "geodesic" distances between points using the graph 3. Project the points to a lower dimension while preserving these distances

English

100

803

88.6K

Big Data Analytics 已转推

François Chollet@fchollet·14 Kas

The ladder of intelligence is the ladder of abstraction. L1: Memorizing answers (no generalization) L2: Interpolative retrieval of answers, pattern matching, memorizing answer-generating rules (local generalization) L3: Synthesizing causal rules on the fly (strong generalization) L4: Discovering general principles, metacognition (extreme generalization) To achieve compounding AI you need to reach L4.

English

114

361

2.9K

197.9K

Big Data Analytics 已转推

Probability and Statistics@probnstat·2 Kas

Here's a probability puzzle that breaks everyone's brain: How many people do you need in a room for a >50% chance that at least two of them share a birthday? What's your guess? 100? 150? 183? The answer is shockingly small. [1/5]

English

6.3K

Big Data Analytics 已转推

krupa@krupaad·28 Eki

I spent months illustrating how Transformers actually work. Not just what they do, but why they’re built this way. The history, design choices, and intuition behind every layer. From RNNs → Attention → Multi-Head → FFNs → Positional Encoding. Here's everything I wish I knew:

English

373

3.3K

242.8K

Big Data Analytics 已转推

Tivadar Danka@TivadarDanka·28 Eki

In machine learning, we use the dot product every day. However, its definition is far from revealing. For instance, what does it have to do with similarity? There is a beautiful geometric explanation behind:

English

514

24.5K

Big Data Analytics 已转推

Matt Pocock@mattpocockuk·27 Eki

LLM's are just big language models. And language models are pretty easy to understand:

English

232

2.1K

232.7K

Big Data Analytics 已转推

Probability and Statistics@probnstat·21 Eki

Zero-sum games are where one player's gain is an opponent's loss. This is the core concept behind Generative Adversarial Networks (GANs) in machine learning. A "Generator" network wins by creating fake data that fools a "Discriminator," whose win is catching fakes. This constant, zero-sum competition forces both to improve, resulting in hyper-realistic AI-generated images. In real life, it models classic games like poker or chess and some financial market transactions. Image Source: share.google/n9k3cC5OYvIQ78…

English

114

10.1K

Big Data Analytics 已转推

Probability and Statistics@probnstat·21 Eki

The math of LLMs is a fusion of three key areas: 1) Probability: They are massive statistical models that predict the next word based on the probability of what's come before. 2) Linear Algebra: All words and concepts are encoded as high-dimensional vectors (embeddings). The model's entire knowledge is stored in vast matrices of weights. 3) Calculus: The model learns by minimizing error using backpropagation, which is a massive application of the chain rule to calculate gradients and update every weight. Image source: share.google/JviYUw1rp16poo…

English

279

1.6K

96.7K

发现

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry