Big Data Analytics

2.1K posts

Big Data Analytics banner
Big Data Analytics

Big Data Analytics

@DatAnalyticsBCN

Big Data | Data Science | Analytics | Stats | IoT | Machine Learning | DataViz | Privacy | #BigData #FinTech #DataScience #IoT #ML #DataViz #AI 😀

Barcelona 加入时间 Eylül 2015
2.5K 关注1.9K 粉丝
Big Data Analytics 已转推
Erick
Erick@ErickSky·
Un loco con un doctorado creó una enciclopedia visual interactiva open source para entender cómo funciona la IA, en plan, locura, entren para que vean. Website: encyclopediaworld.github.io/howaiworks/ Repo en el primer comentario.
Español
14
689
4K
167.5K
Big Data Analytics 已转推
Leontxo García
Leontxo García@leontxogarcia·
Sigo mucho las noticias sobre IA por su gran relación con el ajedrez. Este artículo de Ramón López de Mántaras me parece de lectura obligada para los interesados: elpais.com/tecnologia/202…
Español
3
8
14
1.9K
Big Data Analytics 已转推
Joachim Schork
Joachim Schork@JoachimSchork·
Simulated annealing is a powerful optimization technique inspired by the annealing process in metallurgy. It helps find solutions to complex problems by allowing the system to escape local optima, gradually improving over time. ✔️ Simulated annealing can be used to solve combinatorial problems, such as the traveling salesman problem, by finding the shortest possible route that connects a set of points. ✔️ It is effective for problems with large solution spaces and when traditional optimization methods struggle. ❌ The algorithm's performance heavily depends on the cooling schedule and temperature decay. Choosing the wrong parameters can lead to poor solutions or unnecessarily long computation times. ❌ Simulated annealing typically converges slower than other methods like gradient-based algorithms, making it less efficient for certain types of problems. The visualization from Wikipedia demonstrates how simulated annealing is applied to the traveling salesman problem, optimizing a route to minimize the distance between 125 points: Source: en.wikipedia.org/wiki/Simulated… 🔹 In R, the GenSA package allows for effective global optimization without requiring derivatives, making it ideal for non-differentiable problems. 🔹 In Python, the simanneal library offers a simple and effective way to implement simulated annealing for large-scale combinatorial optimization problems. For more insights on methods like simulated annealing, subscribe to my newsletter on Statistics, Data Science, R, and Python! Learn more: eepurl.com/gH6myT #Python #R4DS #RStats #datascienceenthusiast #coding
GIF
English
4
15
96
7.2K
Big Data Analytics 已转推
Massimo
Massimo@Rainmaker1973·
In 1963, the U.S. introduced ZIP codes, with the Postal Service explaining what the numbers mean and how to use them.
English
50
971
4.4K
191.3K
Big Data Analytics 已转推
François Chollet
François Chollet@fchollet·
To perfectly understand a phenomenon is to perfectly compress it, to have a model of it that cannot be made any simpler. If a DL model requires millions parameters to model something that can be described by a differential equation of three terms, it has not really understood it, it has merely cached the data.
English
161
154
1.6K
122.6K
Big Data Analytics 已转推
Akshay 🚀
Akshay 🚀@akshay_pachaar·
K-Means has two major problems: - Number of clusters must be known - Doesn't handle outliers But there's a solution! Introducing DBSCAN, a Density based clustering algorithm. 🚀 Read more...👇
Akshay 🚀 tweet media
English
4
93
507
44.7K
Big Data Analytics 已转推
Dr Kareem Carr
Dr Kareem Carr@kareem_carr·
statistics is cool
Dr Kareem Carr tweet media
English
33
353
2.1K
62K
Big Data Analytics 已转推
Fy 🪷
Fy 🪷@__Fiamy·
Fy 🪷 tweet media
MA@0kktsu0

@__Fiamy Veo que a todos los que llegamos a la misma conclusión matematica que son una matriz nos estas refutando y veo que no cederás en la posición... Prefiero dar un paso al costado y apegarme a mi conocimiento pero respetar tu opinión Porque estoy bien con no tenrt 100pre la razón.

ZXX
31
168
2.9K
126.7K
Big Data Analytics 已转推
Valeriy M., PhD, MBA, CQF
Valeriy M., PhD, MBA, CQF@predict_addict·
🚀 Mastering Boosting: See Functional Gradient Descent in Action If you work in data science, one of the best ways to really understand an algorithm is to implement it from scratch. Gradient Boosting Decision Trees (GBDT) – the engine behind CatBoost, XGBoost, and LightGBM – are often described as “just an ensemble of trees,” but their real power comes from the optimization process behind them: Functional Gradient Descent. The video below visually walks through this idea step by step. 👇 🎬 What the Animation Shows We follow a simple Gradient Boosting Regressor as it learns to fit a noisy, non-linear dataset: 1. Iteration 0 – The Starting Point The model’s prediction (red line) is flat. It starts as a constant equal to the mean of the target values Y. This is the initial function: F0(x). 2. Gradient Step – Computing Pseudo-Residuals. For squared error loss, these residuals are exactly the gradient of the loss with respect to the current model. 3. Weak Learner – Fitting a Tree to the Errors A shallow decision tree h_m is then fit to these residuals. This tree learns where the current model is making the largest errors and how to correct them. 4. Update Step – Correcting the Model The ensemble is updated by adding a scaled version of this new tree: As the video plays, you see the red prediction curve F_M gradually evolve from a flat line into a flexible function that closely tracks the underlying data. 📚 Want to Go Deeper with CatBoost? If you’d like to turn this intuition into production-grade skills with modern gradient boosting, check out Mastering CatBoost Pro: 👉 valeman.gumroad.com/l/MasteringCat… Perfect if you want to truly understand what’s happening under the hood of boosting models—not just call .fit() and hope for the best.
GIF
English
0
26
204
11.1K
Big Data Analytics 已转推
Alec Helbling
Alec Helbling@alec_helbling·
Many dimensionality reduction algorithms share a few central principles. 1. Construct a graph that captures the data's local structure 2. Measure "geodesic" distances between points using the graph 3. Project the points to a lower dimension while preserving these distances
English
21
100
803
88.6K
Big Data Analytics 已转推
François Chollet
François Chollet@fchollet·
The ladder of intelligence is the ladder of abstraction. L1: Memorizing answers (no generalization) L2: Interpolative retrieval of answers, pattern matching, memorizing answer-generating rules (local generalization) L3: Synthesizing causal rules on the fly (strong generalization) L4: Discovering general principles, metacognition (extreme generalization) To achieve compounding AI you need to reach L4.
English
114
361
2.9K
197.9K
Big Data Analytics 已转推
Probability and Statistics
Here's a probability puzzle that breaks everyone's brain: How many people do you need in a room for a >50% chance that at least two of them share a birthday? What's your guess? 100? 150? 183? The answer is shockingly small. [1/5]
English
4
5
58
6.3K
Big Data Analytics 已转推
krupa
krupa@krupaad·
I spent months illustrating how Transformers actually work. Not just what they do, but why they’re built this way. The history, design choices, and intuition behind every layer. From RNNs → Attention → Multi-Head → FFNs → Positional Encoding. Here's everything I wish I knew:
English
70
373
3.3K
242.8K
Big Data Analytics 已转推
Tivadar Danka
Tivadar Danka@TivadarDanka·
In machine learning, we use the dot product every day. However, its definition is far from revealing. For instance, what does it have to do with similarity? There is a beautiful geometric explanation behind:
Tivadar Danka tweet media
English
8
60
514
24.5K
Big Data Analytics 已转推
Matt Pocock
Matt Pocock@mattpocockuk·
LLM's are just big language models. And language models are pretty easy to understand:
Matt Pocock tweet media
English
70
232
2.1K
232.7K
Big Data Analytics 已转推
Probability and Statistics
Probability and Statistics@probnstat·
Zero-sum games are where one player's gain is an opponent's loss. This is the core concept behind Generative Adversarial Networks (GANs) in machine learning. A "Generator" network wins by creating fake data that fools a "Discriminator," whose win is catching fakes. This constant, zero-sum competition forces both to improve, resulting in hyper-realistic AI-generated images. In real life, it models classic games like poker or chess and some financial market transactions. Image Source: share.google/n9k3cC5OYvIQ78…
Probability and Statistics tweet media
English
0
13
114
10.1K
Big Data Analytics 已转推
Probability and Statistics
Probability and Statistics@probnstat·
The math of LLMs is a fusion of three key areas: 1) Probability: They are massive statistical models that predict the next word based on the probability of what's come before. 2) Linear Algebra: All words and concepts are encoded as high-dimensional vectors (embeddings). The model's entire knowledge is stored in vast matrices of weights. 3) Calculus: The model learns by minimizing error using backpropagation, which is a massive application of the chain rule to calculate gradients and update every weight. Image source: share.google/JviYUw1rp16poo…
Probability and Statistics tweet media
English
22
279
1.6K
96.7K