DataInterview

45 posts

DataInterview banner
DataInterview

DataInterview

@datainterview

๐Ÿš€ Land Dream Data, Quant & AI Jobs on https://t.co/B83Otkqc2r โœ๏ธ 1000+ interview questions ๐Ÿ‘จโ€๐Ÿ’ป Quant / Data / ML / AI Interview Courses ๐Ÿ“š Coding Problems

New York City Katฤฑlฤฑm ลžubat 2019
1 Takip Edilen10 Takipรงiler
DataInterview
DataInterview@datainterviewยท
Here's a Pandas cheatsheet for interviews. ๐Ÿ‘‹ Let's explore together โ†“ ๐Ÿ“ฅ ๐—œ/๐—ข & ๐—–๐—ฟ๐—ฒ๐—ฎ๐˜๐—ถ๐—ผ๐—ป โ€ข Read CSV, Parquet, JSON, Excel โ€ข Build DataFrames from dicts or NumPy โ€ข Inspect with .shape, .dtypes, .describe() โ€ข Cast types, sort values, drop duplicates ๐Ÿ”€ ๐—ฆ๐—ฒ๐—น๐—ฒ๐—ฐ๐˜ & ๐—ง๐—ฟ๐—ฎ๐—ป๐˜€๐—ณ๐—ผ๐—ฟ๐—บ โ€ข loc vs iloc indexing โ€ข Boolean masks and .query() filtering โ€ข .apply(), .assign(), .pipe() chains โ€ข String methods, DateTime accessors โ€ข Rename, drop, reindex columns ๐Ÿ“Š ๐—”๐—ด๐—ด๐—ฟ๐—ฒ๐—ด๐—ฎ๐˜๐—ฒ & ๐—ฅ๐—ฒ๐˜€๐—ต๐—ฎ๐—ฝ๐—ฒ โ€ข GroupBy with named agg โ€ข Merge, concat, join โ€ข Pivot tables and melt โ€ข Handle missing data (fillna, interpolate) โ€ข Rolling windows and pct_change() โ€ข Multi-index slicing with .xs() Save this for your next interview. ๐Ÿ‘‰ Land Data, Quant, AI jobs on datainterview.com
DataInterview tweet media
English
0
0
0
0
DataInterview
DataInterview@datainterviewยท
What is the Sharpe Ratio? (in ML interviews) ๐Ÿ‘‹ Let's learn together โ†“ ๐—ฆ๐—ต๐—ฎ๐—ฟ๐—ฝ๐—ฒ ๐—ฅ๐—ฎ๐˜๐—ถ๐—ผ ๐—บ๐—ฒ๐—ฎ๐˜€๐˜‚๐—ฟ๐—ฒ๐˜€ ๐—ฒ๐˜…๐—ฐ๐—ฒ๐˜€๐˜€ ๐—ฟ๐—ฒ๐˜๐˜‚๐—ฟ๐—ป ๐—ฝ๐—ฒ๐—ฟ ๐˜‚๐—ป๐—ถ๐˜ ๐—ผ๐—ณ ๐—ฟ๐—ถ๐˜€๐—ธ. It tells you how much extra return you get for taking on volatility. Higher is better. A portfolio with Sharpe = 2 earns twice the excess return per unit of risk compared to one with Sharpe = 1. Think of it as return efficiency. You want the most bang for your buck in risk. ๐Ÿ“ ๐—ง๐—ต๐—ฒ ๐—ณ๐—ผ๐—ฟ๐—บ๐˜‚๐—น๐—ฎ: S = (Rp - Rf) / ฯƒp Where: Rp โ†’ portfolio return Rf โ†’ risk-free rate (T-bills, typically) ฯƒp โ†’ portfolio standard deviation (volatility) The numerator is excess return. The denominator is total risk. โšก ๐—›๐—ผ๐˜„ ๐˜๐—ผ ๐—ฐ๐—ฎ๐—น๐—ฐ๐˜‚๐—น๐—ฎ๐˜๐—ฒ ๐—ถ๐˜: โ‘  Get your portfolio returns over time โ‘ก Subtract the risk-free rate from each return โ‘ข Calculate the mean of those excess returns โ‘ฃ Calculate the standard deviation of returns โ‘ค Divide mean excess return by std dev For daily data, annualize by multiplying by โˆš252 (trading days). ๐Ÿง ๐—›๐—ผ๐˜„ ๐—ถ๐˜€ ๐—ถ๐˜ ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐˜ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—ฆ๐—ผ๐—ฟ๐˜๐—ถ๐—ป๐—ผ ๐—ฅ๐—ฎ๐˜๐—ถ๐—ผ? Sharpe uses total volatility (upside and downside) as the risk measure. It assumes you care equally about all deviations. Sortino only penalizes downside deviation. It ignores upside volatility because big gains aren't really "risk." Sortino is better when returns are asymmetric or you only care about losses. ๐ŸŽฏ ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐—ฝ๐—ฟ๐—ฒ๐˜๐—ถ๐—ป๐—ด ๐˜๐—ต๐—ฒ ๐˜ƒ๐—ฎ๐—น๐˜‚๐—ฒ: < 0 โ†’ losing money after accounting for risk-free rate 0 to 1 โ†’ okay, but not great compensation for risk 1 to 2 โ†’ good, strong risk-adjusted returns > 2 โ†’ excellent, rare and hard to sustain โœ๏ธ ๐—ช๐—ต๐—ฒ๐—ป ๐˜๐—ผ ๐˜‚๐˜€๐—ฒ ๐—ฆ๐—ต๐—ฎ๐—ฟ๐—ฝ๐—ฒ ๐—ฅ๐—ฎ๐˜๐—ถ๐—ผ: when comparing portfolios or strategies with different risk profiles, or when you need a single number to rank investment performance. ๐Ÿ‘‰ Land Data & AI jobs on datainterview.com
DataInterview tweet media
English
0
0
0
3
DataInterview
DataInterview@datainterviewยท
What is Correlation vs Causation? (in ML interviews) ๐Ÿ‘‹ Let's learn together โ†“ ๐—–๐—ผ๐—ฟ๐—ฟ๐—ฒ๐—น๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—บ๐—ฒ๐—ฎ๐˜€๐˜‚๐—ฟ๐—ฒ๐˜€ ๐—ต๐—ผ๐˜„ ๐˜๐˜„๐—ผ ๐˜ƒ๐—ฎ๐—ฟ๐—ถ๐—ฎ๐—ฏ๐—น๐—ฒ๐˜€ ๐—บ๐—ผ๐˜ƒ๐—ฒ ๐˜๐—ผ๐—ด๐—ฒ๐˜๐—ต๐—ฒ๐—ฟ. ๐—–๐—ฎ๐˜‚๐˜€๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—บ๐—ฒ๐—ฎ๐—ป๐˜€ ๐—ผ๐—ป๐—ฒ ๐—ฎ๐—ฐ๐˜๐˜‚๐—ฎ๐—น๐—น๐˜† ๐—ฑ๐—ฟ๐—ถ๐˜ƒ๐—ฒ๐˜€ ๐˜๐—ต๐—ฒ ๐—ผ๐˜๐—ต๐—ฒ๐—ฟ. Two variables can have perfect correlation (r = 0.98) without any causal link. Ice cream sales and drownings both rise in summer, but neither causes the other. The difference matters. If you intervene on one variable, does the other change? That's the test. ๐Ÿ“ ๐—ง๐—ต๐—ฒ ๐—ฐ๐—ผ๐—ฟ๐—ฟ๐—ฒ๐—น๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—ณ๐—ผ๐—ฟ๐—บ๐˜‚๐—น๐—ฎ: Corr(X,Y) = Cov(X,Y) / (ฯƒx ร— ฯƒy) Where: Cov(X,Y) โ†’ how X and Y vary together ฯƒx, ฯƒy โ†’ standard deviations Result ranges from -1 to +1 This is symmetric. Corr(X,Y) = Corr(Y,X). No direction implied. ๐Ÿ” ๐—ง๐—ต๐—ฟ๐—ฒ๐—ฒ ๐˜๐˜†๐—ฝ๐—ฒ๐˜€ ๐—ผ๐—ณ ๐—ฟ๐—ฒ๐—น๐—ฎ๐˜๐—ถ๐—ผ๐—ป๐˜€๐—ต๐—ถ๐—ฝ๐˜€: โ‘  ๐—–๐—ฎ๐˜‚๐˜€๐—ฎ๐—น: X directly causes Y (proven via RCT or natural experiment) โ‘ก ๐—–๐—ผ๐—ป๐—ณ๐—ผ๐˜‚๐—ป๐—ฑ๐—ฒ๐—ฑ: hidden variable Z drives both X and Y (summer heat โ†’ ice cream + drowning) โ‘ข ๐—ฆ๐—ฝ๐˜‚๐—ฟ๐—ถ๐—ผ๐˜‚๐˜€: pure coincidence (cheese consumption tracks PhDs, no mechanism) โšก ๐—–๐—ฎ๐˜‚๐˜€๐—ฎ๐—น ๐˜๐—ต๐—ถ๐—ป๐—ธ๐—ถ๐—ป๐—ด (๐—ฃ๐—ฒ๐—ฎ๐—ฟ๐—น'๐˜€ ๐—ฑ๐—ผ-๐—ฐ๐—ฎ๐—น๐—ฐ๐˜‚๐—น๐˜‚๐˜€): E[Y | do(X = x)] โ‰  E[Y] This asks: if I intervene and set X to x, does Y change? Requires ruling out confounders and reverse causality. Directional, not symmetric. ๐Ÿง ๐—ž๐—ฒ๐˜† ๐—ฝ๐—ถ๐˜๐—ณ๐—ฎ๐—น๐—น๐˜€ ๐˜๐—ผ ๐˜„๐—ฎ๐˜๐—ฐ๐—ต: ๐—ฅ๐—ฒ๐˜ƒ๐—ฒ๐—ฟ๐˜€๐—ฒ ๐—ฐ๐—ฎ๐˜‚๐˜€๐—ฎ๐—น๐—ถ๐˜๐˜† โ†’ maybe Y causes X, not the other way ๐—ฆ๐—ถ๐—บ๐—ฝ๐˜€๐—ผ๐—ป'๐˜€ ๐—ฝ๐—ฎ๐—ฟ๐—ฎ๐—ฑ๐—ผ๐˜… โ†’ trend flips when you split by subgroups ๐—–๐—ผ๐—ป๐—ณ๐—ผ๐˜‚๐—ป๐—ฑ๐—ฒ๐—ฟ๐˜€ โ†’ hidden variables create fake associations ๐ŸŽฏ ๐—›๐—ผ๐˜„ ๐˜๐—ผ ๐—ฒ๐˜€๐˜๐—ฎ๐—ฏ๐—น๐—ถ๐˜€๐—ต ๐—ฐ๐—ฎ๐˜‚๐˜€๐—ฎ๐˜๐—ถ๐—ผ๐—ป: RCTs โ†’ randomize to remove confounders Natural experiments โ†’ exploit external shocks DAGs โ†’ graph the causal structure Diff-in-diff โ†’ compare treated vs control, before/after โœ๏ธ ๐—ช๐—ต๐—ฒ๐—ป ๐˜๐—ผ ๐˜‚๐˜€๐—ฒ ๐—ฐ๐—ฎ๐˜‚๐˜€๐—ฎ๐—น ๐—ฟ๐—ฒ๐—ฎ๐˜€๐—ผ๐—ป๐—ถ๐—ป๐—ด: when you need to predict the effect of an intervention, not just observe patterns. Correlation finds signals. Causation tells you what happens when you act. ๐Ÿ‘‰ Land Data & AI jobs on datainterview.com
DataInterview tweet media
English
0
0
0
3
DataInterview
DataInterview@datainterviewยท
What is PCA for Returns? (in quant interviews) ๐Ÿ‘‹ Let's learn together โ†“ ๐—ฃ๐—–๐—” ๐—ณ๐—ผ๐—ฟ ๐—ฅ๐—ฒ๐˜๐˜‚๐—ฟ๐—ป๐˜€ decomposes asset return covariance into orthogonal risk factors. Instead of tracking hundreds of correlated stocks, you extract a few independent components that explain most of the variance. PC1 captures market moves, PC2 captures sector tilts, PC3 captures size effects. This turns a messy correlation matrix into clean, interpretable risk drivers. ๐Ÿ“ ๐—ง๐—ต๐—ฒ ๐—ฑ๐—ฒ๐—ฐ๐—ผ๐—บ๐—ฝ๐—ผ๐˜€๐—ถ๐˜๐—ถ๐—ผ๐—ป: ฮฃ = W ฮ› W^T Where: ฮฃ โ†’ return covariance matrix W โ†’ eigenvectors (factor loadings) ฮ› โ†’ eigenvalues (variance explained per PC) Each return projects as: r_t โ‰ˆ sum of w_k f_k,t for k=1 to K ๐Ÿ’ช ๐—›๐—ผ๐˜„ ๐—ถ๐˜ ๐˜„๐—ผ๐—ฟ๐—ธ๐˜€: โ‘  Compute covariance matrix from historical returns โ‘ก Solve eigenvalue problem to get W and ฮ› โ‘ข Sort components by eigenvalue (largest first) โ‘ฃ Keep top K components that hit your variance threshold (often 70-90%) โ‘ค Project returns onto these K factors ๐Ÿง ๐—›๐—ผ๐˜„ ๐—ถ๐˜€ ๐—ถ๐˜ ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐˜ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—™๐—ฎ๐—ฐ๐˜๐—ผ๐—ฟ ๐— ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€? Factor models (Fama-French, Barra) use predefined economic factors like value, momentum, or industry. PCA discovers factors purely from data. No economic labels. PC1 usually ends up being market beta, but PC2 and PC3 are statistical constructs you have to interpret after the fact. Factor models are easier to explain. PCA is more flexible and captures whatever actually drives your returns. โœ๏ธ ๐—ช๐—ต๐—ฒ๐—ป ๐˜๐—ผ ๐˜‚๐˜€๐—ฒ ๐—ฃ๐—–๐—” ๐—ณ๐—ผ๐—ฟ ๐—ฅ๐—ฒ๐˜๐˜‚๐—ฟ๐—ป๐˜€: when you need dimensionality reduction for portfolio risk, want to build factor models without economic priors, or need to detect regime changes in correlation structure. ๐Ÿ‘‰ Land Data & AI jobs on datainterview.com
DataInterview tweet media
English
0
0
0
3
DataInterview
DataInterview@datainterviewยท
What is Momentum Trading? (in Quant interviews) ๐Ÿ‘‹ Let's learn together โ†“ Momentum is a ๐˜๐—ฟ๐—ฎ๐—ฑ๐—ถ๐—ป๐—ด ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ฒ๐—ด๐˜† that buys assets trending up and sells assets trending down. The idea: past winners keep winning, past losers keep losing. This persistence happens because investors underreact to news and adjust prices slowly over 6-9 months. You're betting on continuation, not reversal. ๐Ÿ“ ๐—ง๐—ต๐—ฒ ๐˜€๐—ถ๐—ด๐—ป๐—ฎ๐—น: MOMโ‚œ = (Pโ‚œ / Pโ‚œโ‚‹โ‚™) - 1 Where: Pโ‚œ โ†’ current price Pโ‚œโ‚‹โ‚™ โ†’ price n periods ago (typically 12 months) MOMโ‚œ โ†’ percentage return over lookback window ๐Ÿ’ช ๐—›๐—ผ๐˜„ ๐—ถ๐˜ ๐˜„๐—ผ๐—ฟ๐—ธ๐˜€: โ‘  Calculate returns over lookback period (e.g., 12 months) โ‘ก Rank all assets by their momentum score โ‘ข Go long top decile (winners), short bottom decile (losers) โ‘ฃ Hold for 1 month, then rebalance โ‘ค Skip the most recent month to avoid short-term reversals ๐Ÿง ๐—›๐—ผ๐˜„ ๐—ถ๐˜€ ๐—ถ๐˜ ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐˜ ๐—ณ๐—ฟ๐—ผ๐—บ ๐— ๐—ฒ๐—ฎ๐—ป ๐—ฅ๐—ฒ๐˜ƒ๐—ฒ๐—ฟ๐˜€๐—ถ๐—ผ๐—ป? Mean reversion bets prices return to average after moving too far. You buy losers and sell winners. Momentum does the opposite. You buy winners and sell losers, betting the trend continues. Mean reversion works on shorter horizons (days to weeks). Momentum works on medium horizons (3-12 months). They can coexist: prices reverse short-term but trend medium-term. โœ๏ธ ๐—ช๐—ต๐—ฒ๐—ป ๐˜๐—ผ ๐˜‚๐˜€๐—ฒ ๐— ๐—ผ๐—บ๐—ฒ๐—ป๐˜๐˜‚๐—บ: when you believe behavioral persistence drives returns and you can handle sharp crashes during market reversals (like 2009). ๐Ÿ‘‰ Land Data & AI jobs on datainterview.com
DataInterview tweet media
English
0
0
0
3
DataInterview
DataInterview@datainterviewยท
What is Monte Carlo Simulation? (in data & quant interviews) ๐Ÿ‘‹ Letโ€™s learn together โ†“ Monte Carlo is a ๐—ป๐˜‚๐—บ๐—ฒ๐—ฟ๐—ถ๐—ฐ๐—ฎ๐—น ๐—บ๐—ฒ๐—ฑ๐—ต๐—ผ๐—ฑ that approximates expectations and integrals through repeated random sampling. You draw thousands (or millions) of random samples from a distribution, apply a function to each, then average the results. This gives you an estimate of the true expectation. Works when analytical solutions are impossible or too complex. ๐Ÿ“ ๐—ง๐—ต๐—ฒ ๐—ฒ๐˜€๐—ฑ๐—ถ๐—บ๐—ฎ๐—ฑ๐—ผ๐—ฟ: ฮผฬ‚โ‚™ = (1/N) ร— ฮฃ f(Xแตข), where Xแตข ~ p(x) Where: N โ†’ number of random samples f(Xแตข) โ†’ function applied to each sample p(x) โ†’ target distribution youโ€™re sampling from ฮผฬ‚โ‚™ โ†’ estimate of E[f(X)] โšก ๐—›๐—ผ๐˜€ ๐—ถ๐—ฑ ๐˜€๐—ผ๐—ฟ๐—ธ๐˜€: โ‘  Define the problem as an expectation E[f(X)] โ‘ก Draw N independent samples from distribution p(x) โ‘ข Evaluate f(Xแตข) for each sample โ‘ฃ Average all results to get your estimate โ‘ค Increase N to reduce error ๐Ÿง ๐—›๐—ผ๐˜€ ๐—ถ๐˜€ ๐—ถ๐—ฑ ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฑ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—ป๐˜‚๐—บ๐—ฒ๐—ฟ๐—ถ๐—ฐ๐—ฎ๐—น ๐—ถ๐—ป๐—ฑ๐—ฒ๐—ด๐—ฟ๐—ฎ๐—ฑ๐—ถ๐—ผ๐—ป? Grid-based methods (like quadrature) evaluate the function at fixed points and struggle in high dimensions. Monte Carlo uses random sampling, so error decreases at rate 1/โˆšN regardless of dimension. This makes it better for problems with many variables. Trade-off: slower convergence but scales to any dimension. โœ๏ธ ๐—ช๐—ต๐—ฒ๐—ป ๐—ฑ๐—ผ ๐˜‚๐˜€๐—ฒ ๐— ๐—ผ๐—ป๐—ฑ๐—ฒ ๐—–๐—ฎ๐—ฟ๐—น๐—ผ: when you need to estimate expectations in high dimensions, price options, simulate physical systems, or solve integrals with no closed form. ๐Ÿ‘‰ Land Data & AI jobs on datainterview.com
DataInterview tweet media
English
0
0
0
9
DataInterview
DataInterview@datainterviewยท
What is Gradient Descent? (in ML interviews) ๐Ÿ‘‹ Let's learn together โ†“ Gradient Descent is an ๐—ถ๐˜๐—ฒ๐—ฟ๐—ฎ๐˜๐—ถ๐˜ƒ๐—ฒ ๐—ผ๐—ฝ๐˜๐—ถ๐—บ๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—ฎ๐—น๐—ด๐—ผ๐—ฟ๐—ถ๐˜๐—ต๐—บ that finds the minimum of a loss function. It works by repeatedly taking steps in the direction opposite to the gradient. Think of it like walking downhill in fog. You can't see the bottom, but you can feel which way is steepest and move that direction. The gradient points uphill (steepest ascent), so we subtract it to go downhill. ๐Ÿ“ ๐—ง๐—ต๐—ฒ ๐˜‚๐—ฝ๐—ฑ๐—ฎ๐˜๐—ฒ ๐—ฟ๐˜‚๐—น๐—ฒ: ฮธโ‚œโ‚Šโ‚ = ฮธโ‚œ - ฮฑโˆ‡J(ฮธโ‚œ) Where: ฮธโ‚œ โ†’ current parameters ฮฑ โ†’ learning rate (step size) โˆ‡J(ฮธโ‚œ) โ†’ gradient of loss function at current position โšก ๐—›๐—ผ๐˜„ ๐—ถ๐˜ ๐˜„๐—ผ๐—ฟ๐—ธ๐˜€: โ‘  Start with random parameter values โ‘ก Compute gradient of loss with respect to parameters โ‘ข Update parameters by moving opposite to gradient โ‘ฃ Repeat until loss stops decreasing (convergence) The learning rate controls how big each step is. Too large and you overshoot the minimum. Too small and training takes forever. ๐Ÿ” ๐—›๐—ผ๐˜„ ๐—ถ๐˜€ ๐—ถ๐˜ ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐˜ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—ฆ๐˜๐—ผ๐—ฐ๐—ต๐—ฎ๐˜€๐˜๐—ถ๐—ฐ ๐—š๐——? Batch GD uses all N samples per update. Stable but slow. Stochastic GD uses one random sample per update. Noisy but fast. Mini-batch GD uses 32-512 samples. Best of both worlds and the default in practice. โœ๏ธ ๐—ช๐—ต๐—ฒ๐—ป ๐˜๐—ผ ๐˜‚๐˜€๐—ฒ ๐—š๐—ฟ๐—ฎ๐—ฑ๐—ถ๐—ฒ๐—ป๐˜ ๐——๐—ฒ๐˜€๐—ฐ๐—ฒ๐—ป๐˜: when you need to minimize a differentiable loss function and can compute gradients. It's the foundation of training neural networks. ๐Ÿ‘‰ Land Data & AI jobs on datainterview.com
DataInterview tweet media
English
0
0
0
0
DataInterview
DataInterview@datainterviewยท
What is Survival Analysis? (in data & quant interviews) ๐Ÿ‘‹ Let's learn together โ†“ Survival analysis models ๐˜๐—ถ๐—บ๐—ฒ-๐˜๐—ผ-๐—ฒ๐˜ƒ๐—ฒ๐—ป๐˜ ๐—ฑ๐—ฎ๐˜๐—ฎ with censoring. It predicts when events happen, not just whether they will. The event could be death, churn, machine failure, or loan default. The key challenge: some observations are censored (we don't see the event yet). This makes standard regression fail. You need methods that handle incomplete information. ๐Ÿ“ ๐—ง๐—ต๐—ฒ ๐—ฐ๐—ผ๐—ฟ๐—ฒ ๐—ณ๐˜‚๐—ป๐—ฐ๐˜๐—ถ๐—ผ๐—ป๐˜€: S(t) = P(T > t) = e^(-โˆซh(u)du) Where: S(t) โ†’ survival function (probability of surviving past time t) h(t) โ†’ hazard function (instantaneous failure rate at time t) T โ†’ time until event The hazard tells you risk at each moment. The survival function tells you cumulative probability of making it past t. ๐Ÿ’ช ๐—ž๐—ฎ๐—ฝ๐—น๐—ฎ๐—ป-๐— ๐—ฒ๐—ถ๐—ฒ๐—ฟ ๐—ฒ๐˜€๐˜๐—ถ๐—บ๐—ฎ๐˜๐—ผ๐—ฟ (๐—ป๐—ผ๐—ป-๐—ฝ๐—ฎ๐—ฟ๐—ฎ๐—บ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฐ): S(t) = product of (1 - di/ni) for all event times โ‰ค t Where: di โ†’ number of events at time i ni โ†’ number at risk just before time i โ‘  Sort event times โ‘ก At each event time, calculate proportion surviving โ‘ข Multiply survival proportions together โ‘ฃ Result: step function dropping at each event Handles right-censored data naturally by adjusting the risk set. โšก ๐—–๐—ผ๐˜… ๐—ฃ๐—ฟ๐—ผ๐—ฝ๐—ผ๐—ฟ๐˜๐—ถ๐—ผ๐—ป๐—ฎ๐—น ๐—›๐—ฎ๐˜‡๐—ฎ๐—ฟ๐—ฑ๐˜€ (๐˜€๐—ฒ๐—บ๐—ถ-๐—ฝ๐—ฎ๐—ฟ๐—ฎ๐—บ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฐ): h(t|X) = h0(t) ร— e^(ฮฒ'X) Where: h0(t) โ†’ baseline hazard (unspecified) ฮฒ'X โ†’ linear combination of covariates Key assumption: hazard ratios are constant over time. If someone has twice the risk at t=1, they have twice the risk at t=10. You don't estimate h0(t) directly. You estimate ฮฒ coefficients via partial likelihood. ๐Ÿง ๐—›๐—ผ๐˜„ ๐—ถ๐˜€ ๐—ถ๐˜ ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐˜ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—น๐—ผ๐—ด๐—ถ๐˜€๐˜๐—ถ๐—ฐ ๐—ฟ๐—ฒ๐—ด๐—ฟ๐—ฒ๐˜€๐˜€๐—ถ๐—ผ๐—ป? Logistic regression predicts binary outcomes (event yes/no) and ignores timing. It can't handle censoring properly. Survival analysis predicts when events occur and uses all available information from censored cases. It models the time dimension explicitly. Cox models also give you hazard ratios, which are easier to interpret than odds ratios for time-varying risk. ๐ŸŽฏ ๐—–๐—ผ๐—บ๐—บ๐—ผ๐—ป ๐—ฒ๐˜ƒ๐—ฎ๐—น๐˜‚๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—บ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฐ๐˜€: C-index โ†’ concordance between predicted risk and observed order (like AUC for survival). 1.0 is perfect, 0.5 is random. Log-rank test โ†’ compares survival curves between groups. Tests if two Kaplan-Meier curves differ significantly. Brier score โ†’ time-dependent prediction accuracy combining calibration and discrimination. โœ๏ธ ๐—ช๐—ต๐—ฒ๐—ป ๐˜๐—ผ ๐˜‚๐˜€๐—ฒ ๐˜€๐˜‚๐—ฟ๐˜ƒ๐—ถ๐˜ƒ๐—ฎ๐—น ๐—ฎ๐—ป๐—ฎ๐—น๐˜†๐˜€๐—ถ๐˜€: when you care about time until an event and have censored observations (customer churn, patient survival, equipment failure, loan default). ๐Ÿ‘‰ Land Data & AI jobs on datainterview.com
DataInterview tweet media
English
0
0
0
2
DataInterview
DataInterview@datainterviewยท
What is Bayes' Theorem? (in ML interviews) ๐Ÿ‘‹ Let's learn together โ†“ ๐—•๐—ฎ๐˜†๐—ฒ๐˜€' ๐—ง๐—ต๐—ฒ๐—ผ๐—ฟ๐—ฒ๐—บ ๐—ถ๐˜€ ๐—ฎ ๐—ณ๐—ผ๐—ฟ๐—บ๐˜‚๐—น๐—ฎ ๐—ณ๐—ผ๐—ฟ ๐˜‚๐—ฝ๐—ฑ๐—ฎ๐˜๐—ถ๐—ป๐—ด ๐—ฏ๐—ฒ๐—น๐—ถ๐—ฒ๐—ณ๐˜€ ๐˜„๐—ถ๐˜๐—ต ๐—ฒ๐˜ƒ๐—ถ๐—ฑ๐—ฒ๐—ป๐—ฐ๐—ฒ. It tells you how to revise your initial belief (prior) after seeing new data. This is the foundation of probabilistic reasoning in machine learning. You start with what you know, observe evidence, and calculate an updated belief (posterior). ๐Ÿ“ ๐—ง๐—ต๐—ฒ ๐—ณ๐—ผ๐—ฟ๐—บ๐˜‚๐—น๐—ฎ: P(A | B) = P(B | A) ร— P(A) / P(B) Where: P(A | B) โ†’ posterior (updated belief after seeing B) P(B | A) โ†’ likelihood (probability of evidence given hypothesis) P(A) โ†’ prior (initial belief before evidence) P(B) โ†’ marginal evidence (total probability of observing B) The denominator normalizes so probabilities sum to 1 across all hypotheses. โšก ๐—›๐—ผ๐˜„ ๐—ถ๐˜ ๐˜„๐—ผ๐—ฟ๐—ธ๐˜€ (๐—บ๐—ฒ๐—ฑ๐—ถ๐—ฐ๐—ฎ๐—น ๐˜๐—ฒ๐˜€๐˜ ๐—ฒ๐˜…๐—ฎ๐—บ๐—ฝ๐—น๐—ฒ): โ‘  Disease prevalence is 1% (prior) โ‘ก Test has 90% sensitivity and 5% false positive rate (likelihoods) โ‘ข You test positive โ‘ฃ Apply Bayes: P(sick | positive) = (0.90 ร— 0.01) / (0.90 ร— 0.01 + 0.05 ร— 0.99) โ‘ค Result: only 15.4% chance you are actually sick Most positive tests are false alarms when the disease is rare. The base rate matters more than you think. ๐ŸŽฏ ๐—ฃ๐—ผ๐˜€๐˜๐—ฒ๐—ฟ๐—ถ๐—ผ๐—ฟ ๐—ผ๐—ฑ๐—ฑ๐˜€ ๐—ณ๐—ผ๐—ฟ๐—บ (๐˜‚๐˜€๐—ฒ๐—ณ๐˜‚๐—น ๐—ณ๐—ผ๐—ฟ ๐˜€๐—ฒ๐—พ๐˜‚๐—ฒ๐—ป๐˜๐—ถ๐—ฎ๐—น ๐˜‚๐—ฝ๐—ฑ๐—ฎ๐˜๐—ฒ๐˜€): P(A|B) / P(ยฌA|B) = [P(B|A) / P(B|ยฌA)] ร— [P(A) / P(ยฌA)] This separates the update signal (likelihood ratio) from your prior belief (prior odds). Makes sequential updates cleaner. ๐Ÿง ๐—›๐—ผ๐˜„ ๐—ถ๐˜€ ๐—ถ๐˜ ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐˜ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—ณ๐—ฟ๐—ฒ๐—พ๐˜‚๐—ฒ๐—ป๐˜๐—ถ๐˜€๐˜ ๐—ถ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ? Frequentist methods treat parameters as fixed unknowns and use p-values. No prior beliefs allowed. Bayesian methods treat parameters as random variables with distributions. You start with a prior, update with data, and get a posterior distribution over possible values. Bayesian gives you probability statements about parameters. Frequentist gives you long-run error rates. ๐Ÿ’ช ๐—ž๐—ฒ๐˜† ๐—ฎ๐˜€๐˜€๐˜‚๐—บ๐—ฝ๐˜๐—ถ๐—ผ๐—ป๐˜€: You need a prior (subjective or empirical). Events must be exhaustive and mutually exclusive. Naive Bayes assumes conditional independence of features given the class (rarely true, but works anyway). ๐Ÿ” ๐—ฃ๐—ฟ๐—ฎ๐—ฐ๐˜๐—ถ๐—ฐ๐—ฎ๐—น ๐˜๐—ถ๐—ฝ๐˜€: Use log-odds to avoid numerical underflow with many features. Conjugate priors (like Beta for Bernoulli) make posterior updates analytically tractable. Always account for base rates or you will overweight the likelihood. โœ๏ธ ๐—ช๐—ต๐—ฒ๐—ป ๐˜๐—ผ ๐˜‚๐˜€๐—ฒ ๐—•๐—ฎ๐˜†๐—ฒ๐˜€' ๐—ง๐—ต๐—ฒ๐—ผ๐—ฟ๐—ฒ๐—บ: for Naive Bayes classifiers (text, spam filtering), Bayesian optimization (hyperparameter tuning), and Bayesian neural networks (uncertainty estimation). Anytime you want to quantify uncertainty or update beliefs with new data. ๐Ÿ‘‰ Land Data & AI jobs on datainterview.com
DataInterview tweet media
English
0
0
0
1
DataInterview
DataInterview@datainterviewยท
What is the Beta Distribution? (in data & quant interviews) ๐Ÿ‘‹ Let's learn together โ†“ The Beta Distribution is a ๐—ณ๐—น๐—ฒ๐˜…๐—ถ๐—ฏ๐—น๐—ฒ ๐—ฝ๐—ฟ๐—ผ๐—ฏ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜† ๐—ฑ๐—ถ๐˜€๐˜๐—ฟ๐—ถ๐—ฏ๐˜‚๐˜๐—ถ๐—ผ๐—ป ๐—ณ๐—ผ๐—ฟ ๐˜ƒ๐—ฎ๐—น๐˜‚๐—ฒ๐˜€ ๐—ฏ๐—ฒ๐˜๐˜„๐—ฒ๐—ฒ๐—ป ๐Ÿฌ ๐—ฎ๐—ป๐—ฑ ๐Ÿญ. It models probabilities themselves. Think conversion rates, click-through rates, or success proportions. Two shape parameters (ฮฑ and ฮฒ) control whether the distribution peaks near 0, near 1, or somewhere in between. It's the go-to distribution when you need to represent uncertainty about a probability. ๐Ÿ“ ๐—ง๐—ต๐—ฒ ๐—ณ๐—ผ๐—ฟ๐—บ๐˜‚๐—น๐—ฎ: f(x; ฮฑ, ฮฒ) = [x^(ฮฑ-1) ร— (1-x)^(ฮฒ-1)] / B(ฮฑ, ฮฒ) Where: x โ†’ value between 0 and 1 ฮฑ โ†’ shape parameter (pseudo-count of successes) ฮฒ โ†’ shape parameter (pseudo-count of failures) B(ฮฑ, ฮฒ) โ†’ Beta function normalizing constant โšก ๐—›๐—ผ๐˜„ ๐˜๐—ต๐—ฒ ๐—ฝ๐—ฎ๐—ฟ๐—ฎ๐—บ๐—ฒ๐˜๐—ฒ๐—ฟ๐˜€ ๐˜„๐—ผ๐—ฟ๐—ธ: โ‘  ฮฑ = ฮฒ = 1 gives you a uniform distribution (flat line) โ‘ก ฮฑ > ฮฒ pushes density toward 1 (more successes) โ‘ข ฮฒ > ฮฑ pushes density toward 0 (more failures) โ‘ฃ Larger ฮฑ + ฮฒ makes the distribution tighter (more confident) ๐ŸŽฏ ๐—›๐—ผ๐˜„ ๐—ถ๐˜€ ๐—ถ๐˜ ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐˜ ๐—ณ๐—ฟ๐—ผ๐—บ ๐˜๐—ต๐—ฒ ๐—•๐—ถ๐—ป๐—ผ๐—บ๐—ถ๐—ฎ๐—น? Binomial models the number of successes in n trials (discrete outcomes). Beta models the probability of success itself (continuous between 0 and 1). Beta is actually the conjugate prior for the Binomial. After observing data, your posterior is still Beta with updated parameters. โœ๏ธ ๐—ช๐—ต๐—ฒ๐—ป ๐˜๐—ผ ๐˜‚๐˜€๐—ฒ ๐—•๐—ฒ๐˜๐—ฎ: when you need to model uncertainty about a rate, proportion, or probability. Common in Bayesian A/B testing, conversion rate estimation, and Thompson Sampling for bandits. ๐Ÿ‘‰ Land Data & AI jobs on datainterview.com
DataInterview tweet media
English
0
0
0
3
DataInterview
DataInterview@datainterviewยท
What is GARCH Modeling? (in Quant interviews) ๐Ÿ‘‹ Let's learn together โ†“ GARCH is a ๐˜๐—ถ๐—บ๐—ฒ ๐˜€๐—ฒ๐—ฟ๐—ถ๐—ฒ๐˜€ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น ๐—ณ๐—ผ๐—ฟ ๐˜ƒ๐—ผ๐—น๐—ฎ๐˜๐—ถ๐—น๐—ถ๐˜๐˜† ๐—ฐ๐—น๐˜‚๐˜€๐˜๐—ฒ๐—ฟ๐—ถ๐—ป๐—ด. It captures how variance changes over time in financial returns. Big shocks create periods of high volatility that persist, then slowly decay back to normal. Think: market turbulence after a crash, where wild swings cluster together before calming down. ๐Ÿ“ ๐—ง๐—ต๐—ฒ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น (๐—š๐—”๐—ฅ๐—–๐—›(๐Ÿญ,๐Ÿญ)): ฯƒยฒโ‚œ = ฯ‰ + ฮฑยทฮตยฒโ‚œโ‚‹โ‚ + ฮฒยทฯƒยฒโ‚œโ‚‹โ‚ Where: ฯƒยฒโ‚œ โ†’ conditional variance at time t ฯ‰ โ†’ long-run baseline variance ฮฑ โ†’ ARCH term (impact of past shocks) ฮตยฒโ‚œโ‚‹โ‚ โ†’ squared residual from previous period ฮฒ โ†’ GARCH term (persistence of past volatility) โšก ๐—›๐—ผ๐˜„ ๐—ถ๐˜ ๐˜„๐—ผ๐—ฟ๐—ธ๐˜€: โ‘  Start with a mean model for returns (like ARMA) โ‘ก Extract residuals (shocks) โ‘ข Model variance of residuals using past shocks and past variance โ‘ฃ Estimate parameters via maximum likelihood โ‘ค Forecast future volatility using the fitted model High ฮฑ means shocks hit hard. High ฮฒ means volatility sticks around. Their sum (ฮฑ+ฮฒ) measures persistence. Close to 1 means volatility decays slowly. ๐Ÿง ๐—›๐—ผ๐˜„ ๐—ถ๐˜€ ๐—ถ๐˜ ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐˜ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—”๐—ฅ๐—–๐—›? ARCH only uses past shocks (ฮตยฒโ‚œโ‚‹โ‚, ฮตยฒโ‚œโ‚‹โ‚‚, ...) to model variance. You need many lags to capture persistence. GARCH adds past variance (ฯƒยฒโ‚œโ‚‹โ‚) as a predictor. This creates a more compact model with fewer parameters that still captures long memory in volatility. ARCH(p) with many lags becomes GARCH(1,1) with just three parameters. โœ๏ธ ๐—ช๐—ต๐—ฒ๐—ป ๐˜๐—ผ ๐˜‚๐˜€๐—ฒ ๐—š๐—”๐—ฅ๐—–๐—›: when modeling financial returns, risk management, or any time series where variance isn't constant and shocks create persistent volatility clusters. ๐Ÿ‘‰ Land Data & AI jobs on datainterview.com
DataInterview tweet media
English
0
0
0
2
DataInterview
DataInterview@datainterviewยท
What is Modern Portfolio Theory? (in quant interviews) ๐Ÿ‘‹ Let's learn together โ†“ ๐— ๐—ฃ๐—ง ๐—ถ๐˜€ ๐—ฎ ๐—ณ๐—ฟ๐—ฎ๐—บ๐—ฒ๐˜„๐—ผ๐—ฟ๐—ธ ๐—ณ๐—ผ๐—ฟ ๐—ฏ๐˜‚๐—ถ๐—น๐—ฑ๐—ถ๐—ป๐—ด ๐—ฝ๐—ผ๐—ฟ๐˜๐—ณ๐—ผ๐—น๐—ถ๐—ผ๐˜€ ๐˜๐—ต๐—ฎ๐˜ ๐—บ๐—ฎ๐˜…๐—ถ๐—บ๐—ถ๐˜‡๐—ฒ ๐—ฟ๐—ฒ๐˜๐˜‚๐—ฟ๐—ป ๐—ณ๐—ผ๐—ฟ ๐—ฎ ๐—ด๐—ถ๐˜ƒ๐—ฒ๐—ป ๐—ฟ๐—ถ๐˜€๐—ธ ๐—น๐—ฒ๐˜ƒ๐—ฒ๐—น. The big idea: diversification works because assets don't move in lockstep. By combining low-correlation assets, you reduce total portfolio risk without sacrificing returns. This is the foundation of quantitative finance. Harry Markowitz won a Nobel Prize for it. ๐Ÿ“ ๐—ง๐—ต๐—ฒ ๐—บ๐—ฎ๐˜๐—ต: Minimize: wแต€ฮฃw Subject to: wแต€ฮผ = ฮผโ‚š, wแต€1 = 1 Where: w โ†’ vector of asset weights ฮฃ โ†’ covariance matrix (captures correlations) ฮผ โ†’ expected returns vector ฮผโ‚š โ†’ target portfolio return Portfolio variance: ฯƒโ‚šยฒ = ฮฃแตขฮฃโฑผ wแตขwโฑผฯƒแตขโฑผ The key insight: total risk depends on pairwise covariances, not just individual variances. โšก ๐—›๐—ผ๐˜„ ๐—ถ๐˜ ๐˜„๐—ผ๐—ฟ๐—ธ๐˜€: โ‘  Define your universe of assets with expected returns and covariances โ‘ก Pick a target return level โ‘ข Solve the quadratic optimization to find weights that minimize variance โ‘ฃ Repeat for different target returns to trace out the efficient frontier โ‘ค Select the tangency portfolio (max Sharpe ratio) for optimal risk-adjusted returns The efficient frontier shows all portfolios with maximum return for each risk level. ๐Ÿง ๐—›๐—ผ๐˜„ ๐—ถ๐˜€ ๐—ถ๐˜ ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐˜ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—•๐—น๐—ฎ๐—ฐ๐—ธ-๐—Ÿ๐—ถ๐˜๐˜๐—ฒ๐—ฟ๐—บ๐—ฎ๐—ป? MPT finds optimal portfolios given expected returns and covariances you provide. Black-Litterman starts with market equilibrium returns and lets you blend in your own views. It's Bayesian. You get more stable, less extreme weights because it doesn't rely entirely on your estimates. MPT is the foundation. Black-Litterman fixes its garbage-in-garbage-out problem. โœ๏ธ ๐—ช๐—ต๐—ฒ๐—ป ๐˜๐—ผ ๐˜‚๐˜€๐—ฒ ๐— ๐—ฃ๐—ง: when you need a systematic way to balance risk and return across multiple assets, or when explaining why diversification mathematically reduces risk. ๐Ÿ‘‰ Land Data & AI jobs on datainterview.com
DataInterview tweet media
English
0
0
0
3
DataInterview
DataInterview@datainterviewยท
What is the Exponential Distribution? (in data & quant interviews) ๐Ÿ‘‹ Let's learn together โ†“ The exponential distribution is a ๐—ฐ๐—ผ๐—ป๐˜๐—ถ๐—ป๐˜‚๐—ผ๐˜‚๐˜€ ๐—ฝ๐—ฟ๐—ผ๐—ฏ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜† ๐—ฑ๐—ถ๐˜€๐˜๐—ฟ๐—ถ๐—ฏ๐˜‚๐˜๐—ถ๐—ผ๐—ป that models waiting times between independent events. It answers: how long until the next event happens? If customers arrive at rate ฮป = 3 per hour, how long until the next customer shows up? That wait time follows an exponential distribution. The key insight: it's memoryless. Past waiting gives no information about future waiting. ๐Ÿ“ ๐—ง๐—ต๐—ฒ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น: f(x) = ฮปe^(-ฮปx) for x โ‰ฅ 0 Where: ฮป โ†’ rate parameter (events per unit time) x โ†’ time until next event e^(-ฮปx) โ†’ exponential decay term The CDF (probability event occurs by time x): F(x) = 1 - e^(-ฮปx) โšก ๐—ž๐—ฒ๐˜† ๐—ฝ๐—ฟ๐—ผ๐—ฝ๐—ฒ๐—ฟ๐˜๐—ถ๐—ฒ๐˜€: โ‘  Mean = 1/ฮป (average wait time) โ‘ก Variance = 1/ฮปยฒ (spread of wait times) โ‘ข Median = ln(2)/ฮป (50th percentile) โ‘ฃ Mode = 0 (most likely time is right now) โ‘ค Memoryless: P(X > s+t | X > s) = P(X > t) The memoryless property means if you've already waited 10 minutes, the probability of waiting another 5 minutes is the same as if you just started. ๐Ÿง ๐—›๐—ผ๐˜„ ๐—ถ๐˜€ ๐—ถ๐˜ ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐˜ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—ฃ๐—ผ๐—ถ๐˜€๐˜€๐—ผ๐—ป? Poisson counts how many events happen in a fixed time window (discrete). Exponential measures how long until the next event (continuous). They're connected: if event counts follow Poisson(ฮป), then time between events follows Exponential(ฮป). Same rate parameter, different questions. โœ๏ธ ๐—ช๐—ต๐—ฒ๐—ป ๐˜๐—ผ ๐˜‚๐˜€๐—ฒ ๐—˜๐˜…๐—ฝ๐—ผ๐—ป๐—ฒ๐—ป๐˜๐—ถ๐—ฎ๐—น: when modeling customer arrival times, component failure times, or any process where events happen independently at a constant average rate. ๐Ÿ‘‰ Land Data & AI jobs on datainterview.com
DataInterview tweet media
English
0
0
0
1
DataInterview
DataInterview@datainterviewยท
What is the Bias-Variance Tradeoff? (in ML interviews) ๐Ÿ‘‹ Let's learn together โ†“ The bias-variance tradeoff is the ๐—ณ๐˜‚๐—ป๐—ฑ๐—ฎ๐—บ๐—ฒ๐—ป๐˜๐—ฎ๐—น ๐—ฑ๐—ฒ๐—ฐ๐—ผ๐—บ๐—ฝ๐—ผ๐˜€๐—ถ๐˜๐—ถ๐—ผ๐—ป ๐—ผ๐—ณ ๐—ฝ๐—ฟ๐—ฒ๐—ฑ๐—ถ๐—ฐ๐˜๐—ถ๐—ผ๐—ป ๐—ฒ๐—ฟ๐—ฟ๐—ผ๐—ฟ. Every model's error splits into three parts: squared bias, variance, and noise. You can't minimize both bias and variance at once. Reducing one typically increases the other. This is why model selection matters. ๐Ÿ“ ๐—ง๐—ต๐—ฒ ๐—ฑ๐—ฒ๐—ฐ๐—ผ๐—บ๐—ฝ๐—ผ๐˜€๐—ถ๐˜๐—ถ๐—ผ๐—ป: MSE = Biasยฒ(fฬ‚) + Var(fฬ‚) + ฯƒยฒ Where: Biasยฒ(fฬ‚) โ†’ squared difference between average prediction and truth Var(fฬ‚) โ†’ how much predictions vary across different training sets ฯƒยฒ โ†’ irreducible noise (data randomness you can't eliminate) ๐Ÿงฎ ๐—ช๐—ต๐—ฎ๐˜ ๐—ฒ๐—ฎ๐—ฐ๐—ต ๐—ฐ๐—ผ๐—บ๐—ฝ๐—ผ๐—ป๐—ฒ๐—ป๐˜ ๐—บ๐—ฒ๐—ฎ๐—ป๐˜€: โ‘  ๐—›๐—ถ๐—ด๐—ต ๐—ฏ๐—ถ๐—ฎ๐˜€: model is too simple and underfits. It consistently misses the true pattern. โ‘ก ๐—›๐—ถ๐—ด๐—ต ๐˜ƒ๐—ฎ๐—ฟ๐—ถ๐—ฎ๐—ป๐—ฐ๐—ฒ: model is too complex and overfits. Predictions swing wildly with different training data. โ‘ข ๐—ฆ๐˜„๐—ฒ๐—ฒ๐˜ ๐˜€๐—ฝ๐—ผ๐˜: balanced model minimizes total error by accepting some bias and some variance. โ‘ฃ ๐—ก๐—ผ๐—ถ๐˜€๐—ฒ: always present. No model can reduce it. โšก ๐—›๐—ผ๐˜„ ๐—ถ๐˜ ๐—ด๐˜‚๐—ถ๐—ฑ๐—ฒ๐˜€ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น ๐˜€๐—ฒ๐—น๐—ฒ๐—ฐ๐˜๐—ถ๐—ผ๐—ป: Simple models (linear regression on nonlinear data) have high bias, low variance. They underfit. Complex models (deep trees, k-NN with k=1) have low bias, high variance. They overfit. The goal is finding the complexity level where total error bottoms out. Cross-validation helps you estimate this sweet spot without seeing test data. Regularization (L1/L2) lets you reduce variance by adding controlled bias. โœ๏ธ ๐—ช๐—ต๐—ฒ๐—ป ๐˜๐—ผ ๐˜‚๐˜€๐—ฒ ๐˜๐—ต๐—ถ๐˜€ ๐—ณ๐—ฟ๐—ฎ๐—บ๐—ฒ๐˜„๐—ผ๐—ฟ๐—ธ: when choosing between models, tuning complexity, or explaining why your fancy neural net performs worse than a simple baseline on small data. ๐Ÿ‘‰ Land Data & AI jobs on datainterview.com
DataInterview tweet media
English
0
0
0
7
DataInterview
DataInterview@datainterviewยท
What is the Normal Distribution? (in ML interviews) ๐Ÿ‘‹ Let's learn together โ†“ The normal distribution is a ๐—ฐ๐—ผ๐—ป๐˜๐—ถ๐—ป๐˜‚๐—ผ๐˜‚๐˜€ ๐—ฝ๐—ฟ๐—ผ๐—ฏ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜† ๐—ฑ๐—ถ๐˜€๐˜๐—ฟ๐—ถ๐—ฏ๐˜‚๐˜๐—ถ๐—ผ๐—ป that forms a symmetric bell curve. It's the most important distribution in statistics and ML. Shows up everywhere: CLT, hypothesis testing, confidence intervals, feature distributions, weight initialization. Fully defined by just two parameters. That's it. ๐Ÿ“ ๐—ง๐—ต๐—ฒ ๐—ณ๐—ผ๐—ฟ๐—บ๐˜‚๐—น๐—ฎ: f(x) = (1 / ฯƒโˆš2ฯ€) ร— e^(-ยฝ((x-ฮผ)/ฯƒ)ยฒ) Where: ฮผ โ†’ mean (center of the curve) ฯƒ โ†’ standard deviation (controls spread) ฯƒยฒ โ†’ variance ๐Ÿงฎ ๐—ž๐—ฒ๐˜† ๐—ฝ๐—ฟ๐—ผ๐—ฝ๐—ฒ๐—ฟ๐˜๐—ถ๐—ฒ๐˜€: โ‘  Mean = Median = Mode (perfect symmetry) โ‘ก 68% of data within 1ฯƒ, 95% within 2ฯƒ, 99.7% within 3ฯƒ โ‘ข Skewness = 0, Kurtosis = 3 โ‘ฃ Sum of independent normals is also normal โ‘ค Tails extend to infinity but probability drops fast โšก ๐—ญ-๐˜€๐—ฐ๐—ผ๐—ฟ๐—ฒ ๐˜€๐˜๐—ฎ๐—ป๐—ฑ๐—ฎ๐—ฟ๐—ฑ๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป: Z = (X - ฮผ) / ฯƒ Transforms any normal variable into the standard normal N(0,1). Lets you look up probabilities in Z-tables. Critical for hypothesis testing and comparing values across different scales. ๐Ÿ” ๐—›๐—ผ๐˜„ ๐—ถ๐˜€ ๐—ถ๐˜ ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐˜ ๐—ณ๐—ฟ๐—ผ๐—บ ๐˜๐—ต๐—ฒ ๐˜-๐—ฑ๐—ถ๐˜€๐˜๐—ฟ๐—ถ๐—ฏ๐˜‚๐˜๐—ถ๐—ผ๐—ป? Normal assumes you know ฯƒ and works for any sample size. t-distribution is for when ฯƒ is unknown (estimated from data) and has heavier tails for small samples (n < 30). As n grows, t converges to normal. โœ๏ธ ๐—ช๐—ต๐—ฒ๐—ป ๐˜๐—ผ ๐˜‚๐˜€๐—ฒ ๐˜๐—ต๐—ฒ ๐—ก๐—ผ๐—ฟ๐—บ๐—ฎ๐—น ๐——๐—ถ๐˜€๐˜๐—ฟ๐—ถ๐—ฏ๐˜‚๐˜๐—ถ๐—ผ๐—ป: when modeling naturally occurring phenomena, applying CLT with large samples, initializing neural network weights, or assuming feature distributions in Naive Bayes. ๐Ÿ‘‰ Land Data & AI jobs on datainterview.com
DataInterview tweet media
English
0
0
0
1
DataInterview
DataInterview@datainterviewยท
What is the Chi-Square Distribution? (in data & quant interviews) ๐Ÿ‘‹ Let's learn together โ†“ The chi-square distribution is a ๐—ฝ๐—ฟ๐—ผ๐—ฏ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜† ๐—ฑ๐—ถ๐˜€๐˜๐—ฟ๐—ถ๐—ฏ๐˜‚๐˜๐—ถ๐—ผ๐—ป for sums of squared standard normal variables. It's the foundation for hypothesis testing with categorical data. When you test independence between two variables or check if observed counts match expected counts, you're using this distribution. The shape depends entirely on degrees of freedom (k). Low k gives a right-skewed curve. High k looks more normal. ๐Ÿ“ ๐—ง๐—ต๐—ฒ ๐—บ๐—ฎ๐˜๐—ต: f(x; k) = (x^(k/2-1) ร— e^(-x/2)) / (2^(k/2) ร— ฮ“(k/2)) Where: x โ†’ the chi-square statistic (always โ‰ฅ 0) k โ†’ degrees of freedom ฮ“ โ†’ gamma function (generalizes factorials) ๐Ÿงฎ ๐—›๐—ผ๐˜„ ๐—ถ๐˜'๐˜€ ๐—ฐ๐—ผ๐—ป๐˜€๐˜๐—ฟ๐˜‚๐—ฐ๐˜๐—ฒ๐—ฑ: โ‘  Take k independent standard normal variables Zโ‚, Zโ‚‚, ..., Zโ‚– โ‘ก Square each one โ‘ข Add them up: ฯ‡ยฒ = Zโ‚ยฒ + Zโ‚‚ยฒ + ... + Zโ‚–ยฒ โ‘ฃ This sum follows a chi-square distribution with k degrees of freedom Properties: mean = k, variance = 2k, always positive, right-skewed. ๐Ÿ” ๐—›๐—ผ๐˜„ ๐—ถ๐˜€ ๐—ถ๐˜ ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐˜ ๐—ณ๐—ฟ๐—ผ๐—บ ๐˜๐—ต๐—ฒ ๐—ก๐—ผ๐—ฟ๐—บ๐—ฎ๐—น ๐——๐—ถ๐˜€๐˜๐—ฟ๐—ถ๐—ฏ๐˜‚๐˜๐—ถ๐—ผ๐—ป? Normal is symmetric and ranges from negative to positive infinity. Used for continuous measurements. Chi-square is right-skewed, only positive values, and built from squared normals. Used for testing fit and independence in count data. As k increases, chi-square approaches normal shape but never goes negative. โœ๏ธ ๐—ช๐—ต๐—ฒ๐—ป ๐˜๐—ผ ๐˜‚๐˜€๐—ฒ ๐—–๐—ต๐—ถ-๐—ฆ๐—พ๐˜‚๐—ฎ๐—ฟ๐—ฒ: testing if observed frequencies match expected ones, checking independence in contingency tables, or validating if sample variance matches population variance. ๐Ÿ‘‰ Land Data & AI jobs on datainterview.com
DataInterview tweet media
English
0
0
0
1
DataInterview
DataInterview@datainterviewยท
What is the Geometric Distribution? (in data & quant interviews) ๐Ÿ‘‹ Letโ€™s learn together โ†“ The geometric distribution models ๐—ฑ๐—ต๐—ฒ ๐—ป๐˜‚๐—บ๐—ฏ๐—ฒ๐—ฟ ๐—ผ๐—ณ ๐—ถ๐—ป๐—ฑ๐—ฒ๐—ฝ๐—ฒ๐—ป๐—ฑ๐—ฒ๐—ป๐—ฑ ๐—ฑ๐—ฟ๐—ถ๐—ฎ๐—น๐˜€ ๐˜‚๐—ป๐—ฑ๐—ถ๐—น ๐—ฑ๐—ต๐—ฒ ๐—ณ๐—ถ๐—ฟ๐˜€๐—ฑ ๐˜€๐˜‚๐—ฐ๐—ฐ๐—ฒ๐˜€๐˜€. Think flipping a coin until you get heads. Or calling customers until someone buys. Each trial is independent with the same success probability p. The distribution tells you how likely it is to wait exactly k trials before succeeding. ๐Ÿ“ ๐—ง๐—ต๐—ฒ ๐—ณ๐—ผ๐—ฟ๐—บ๐˜‚๐—น๐—ฎ: P(X = k) = (1 - p)^(k-1) ร— p for k = 1, 2, 3, ... Where: p โ†’ probability of success on each trial k โ†’ trial number where first success occurs (1 - p)^(k-1) โ†’ probability of k-1 failures before success Expected value: E[X] = 1/p Variance: Var(X) = (1-p)/pยฒ โšก ๐—›๐—ผ๐˜€ ๐—ถ๐—ฑ ๐˜€๐—ผ๐—ฟ๐—ธ๐˜€: โ‘  Each trial is a Bernoulli experiment (success or failure) โ‘ก Trials are independent (past failures donโ€™t change future odds) โ‘ข Success probability p stays constant โ‘ฃ Count stops at the first success โ‘ค Support is unbounded (k can be any positive integer) ๐Ÿง ๐—›๐—ผ๐˜€ ๐—ถ๐˜€ ๐—ถ๐—ฑ ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฑ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—•๐—ถ๐—ป๐—ผ๐—บ๐—ถ๐—ฎ๐—น? Geometric counts trials until the first success. Support is unbounded (1, 2, 3, ...). You donโ€™t know when youโ€™ll stop. Binomial counts successes in a fixed number of trials. Support is bounded (0 to n). You know exactly how many trials youโ€™ll run. Both use independent Bernoulli trials with parameter p, but they answer different questions. โœ๏ธ ๐—ช๐—ต๐—ฒ๐—ป ๐—ฑ๐—ผ ๐˜‚๐˜€๐—ฒ ๐—š๐—ฒ๐—ผ๐—บ๐—ฒ๐—ฑ๐—ฟ๐—ถ๐—ฐ: when youโ€™re modeling waiting time or number of attempts until something happens for the first time. ๐Ÿ‘‰ Land Data & AI jobs on datainterview.com
DataInterview tweet media
English
0
0
0
5
DataInterview
DataInterview@datainterviewยท
What is the Kelly Criterion? (in data & quant interviews) ๐Ÿ‘‹ Let's learn together โ†“ The Kelly Criterion is a ๐—ฏ๐—ฒ๐˜ ๐˜€๐—ถ๐˜‡๐—ถ๐—ป๐—ด ๐—ณ๐—ผ๐—ฟ๐—บ๐˜‚๐—น๐—ฎ that tells you what fraction of your bankroll to wager to maximize long-term growth. It balances greed and safety. Bet too much and you risk ruin. Bet too little and you leave money on the table. Kelly finds the sweet spot where your wealth compounds fastest over time. ๐Ÿ“ ๐—ง๐—ต๐—ฒ ๐—ณ๐—ผ๐—ฟ๐—บ๐˜‚๐—น๐—ฎ: f* = (bp - q) / b Where: f* โ†’ optimal fraction of bankroll to wager p โ†’ probability of winning q โ†’ probability of losing (1 - p) b โ†’ net odds (payout per $1 risked) โšก ๐—›๐—ผ๐˜„ ๐—ถ๐˜ ๐˜„๐—ผ๐—ฟ๐—ธ๐˜€: โ‘  Calculate your edge: bp - q โ‘ก Divide by the odds b โ‘ข Result is the fraction to bet โ‘ฃ If f* โ‰ค 0, don't bet (no edge) The formula maximizes the expected logarithm of wealth, which is the same as maximizing geometric growth rate. This ensures optimal compounding. ๐Ÿง ๐—›๐—ผ๐˜„ ๐—ถ๐˜€ ๐—ถ๐˜ ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐˜ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—ณ๐—ถ๐˜…๐—ฒ๐—ฑ ๐—ฏ๐—ฒ๐˜๐˜๐—ถ๐—ป๐—ด? Fixed betting uses the same dollar amount every time, regardless of edge or bankroll size. Kelly scales with your bankroll and adjusts for the strength of your edge. It grows faster when you're winning and protects you from ruin when losing. Fixed betting ignores compounding. Kelly exploits it. Over-betting (more than Kelly) creates massive drawdown risk. Under-betting (fractional Kelly like ยฝ or ยผ) reduces variance but grows slower. โœ๏ธ ๐—ช๐—ต๐—ฒ๐—ป ๐˜๐—ผ ๐˜‚๐˜€๐—ฒ ๐—ž๐—ฒ๐—น๐—น๐˜†: when you have a known edge, can estimate probabilities accurately, and want to maximize long-term growth without going broke. ๐Ÿ‘‰ Land Data & AI jobs on datainterview.com
DataInterview tweet media
English
0
0
1
4
DataInterview
DataInterview@datainterviewยท
What are The Greeks? (in Quant interviews) ๐Ÿ‘‹ Let's learn together โ†“ ๐—ง๐—ต๐—ฒ ๐—š๐—ฟ๐—ฒ๐—ฒ๐—ธ๐˜€ ๐—ฎ๐—ฟ๐—ฒ ๐—ฝ๐—ฎ๐—ฟ๐˜๐—ถ๐—ฎ๐—น ๐—ฑ๐—ฒ๐—ฟ๐—ถ๐˜ƒ๐—ฎ๐˜๐—ถ๐˜ƒ๐—ฒ๐˜€ ๐˜๐—ต๐—ฎ๐˜ ๐—บ๐—ฒ๐—ฎ๐˜€๐˜‚๐—ฟ๐—ฒ ๐—ผ๐—ฝ๐˜๐—ถ๐—ผ๐—ป ๐—ฝ๐—ฟ๐—ถ๐—ฐ๐—ฒ ๐˜€๐—ฒ๐—ป๐˜€๐—ถ๐˜๐—ถ๐˜ƒ๐—ถ๐˜๐˜†. They tell you how an option's value changes when market conditions shift. Stock moves up? Delta tells you the impact. Time passes? Theta shows the decay. Each Greek isolates one risk factor while holding others constant. ๐Ÿ“ ๐—ง๐—ต๐—ฒ ๐—บ๐—ฎ๐—ถ๐—ป ๐—š๐—ฟ๐—ฒ๐—ฒ๐—ธ๐˜€: Delta (ฮ”) = โˆ‚C/โˆ‚S โ†’ rate of change w.r.t. stock price โ†’ ranges from 0 to 1 for calls, -1 to 0 for puts โ†’ approximates probability of expiring ITM Gamma (ฮ“) = โˆ‚ยฒC/โˆ‚Sยฒ = โˆ‚ฮ”/โˆ‚S โ†’ rate of change of Delta โ†’ highest for ATM options near expiry Theta (ฮ˜) = โˆ‚C/โˆ‚t โ†’ time decay per day โ†’ always negative for long options Vega (ฮฝ) = โˆ‚C/โˆ‚ฯƒ โ†’ sensitivity to volatility changes โ†’ largest for ATM, long-dated options Rho (ฯ) = โˆ‚C/โˆ‚r โ†’ sensitivity to interest rate changes โ†’ least important for short-dated options โšก ๐—›๐—ผ๐˜„ ๐˜๐—ต๐—ฒ๐˜† ๐˜„๐—ผ๐—ฟ๐—ธ ๐˜๐—ผ๐—ด๐—ฒ๐˜๐—ต๐—ฒ๐—ฟ: โ‘  Start with Black-Scholes: C = SยทN(dโ‚) - Kยทe^(-rT)ยทN(dโ‚‚) โ‘ก Take partial derivatives w.r.t. each input variable โ‘ข Each Greek measures one dimension of risk โ‘ฃ Use them to predict P&L from small market moves โ‘ค Rehedge when Greeks drift outside target ranges ๐Ÿง ๐—›๐—ผ๐˜„ ๐—ฎ๐—ฟ๐—ฒ ๐˜๐—ต๐—ฒ๐˜† ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐˜ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—ผ๐—ฝ๐˜๐—ถ๐—ผ๐—ป ๐—ฝ๐—ฟ๐—ถ๐—ฐ๐—ฒ? Option price is the total value you pay. Greeks are the sensitivities. They show how that price will change when one input moves. Price is a number. Greeks are rates of change. Delta-hedging uses Greeks to build risk-neutral portfolios. You can't do that with just the price. โœ๏ธ ๐—ช๐—ต๐—ฒ๐—ป ๐˜๐—ผ ๐˜‚๐˜€๐—ฒ ๐—š๐—ฟ๐—ฒ๐—ฒ๐—ธ๐˜€: when you need to manage option risk, hedge positions, or explain how market moves affect your portfolio. Essential for any quant trading or risk management role. ๐Ÿ‘‰ Land Data & AI jobs on datainterview.com
DataInterview tweet media
English
0
0
0
3
DataInterview
DataInterview@datainterviewยท
What is Poisson Regression? (in ML interviews) ๐Ÿ‘‹ Let's learn together โ†“ ๐—ฃ๐—ผ๐—ถ๐˜€๐˜€๐—ผ๐—ป ๐—ฟ๐—ฒ๐—ด๐—ฟ๐—ฒ๐˜€๐˜€๐—ถ๐—ผ๐—ป ๐—ถ๐˜€ ๐—ฎ ๐—ด๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐—น๐—ถ๐˜€๐—ฒ๐—ฑ ๐—น๐—ถ๐—ป๐—ฒ๐—ฎ๐—ฟ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น ๐—ณ๐—ผ๐—ฟ ๐—ฐ๐—ผ๐˜†๐—ป๐˜€ ๐—ฑ๐—ฎ๐˜€๐—ฎ. You can't use ordinary linear regression when your outcome is a count (0, 1, 2, 3...). Counts are non-negative integers, and their variance grows with the mean. Poisson regression handles this by modeling the log of the expected count as a linear function of predictors. The log link ensures predictions stay positive and creates that characteristic exponential curve. ๐Ÿ“ ๐—ง๐—ต๐—ฒ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น: ln(ฮปแตข) = ฮฒโ‚€ + ฮฒโ‚xโ‚ + ฮฒโ‚‚xโ‚‚ + ... + ฮฒโ‚šxโ‚š Where: ฮปแตข โ†’ expected count for observation i ฮฒโ‚€ โ†’ intercept (baseline log count) ฮฒโฑผ โ†’ coefficient for predictor j xโฑผ โ†’ predictor variables The response follows: P(Y = y) = (eโปแต ฮปสธ) / y! Key property: mean equals variance (equidispersion). โšก ๐—›๐—ผ๐˜ ๐—ถ๐˜€ ๐˜๐—ผ๐—ฟ๐—ธ๐˜€: โ‘  Model the log of expected counts as linear in predictors โ‘ก Fit coefficients via maximum likelihood (no closed form, uses iterative reweighted least squares) โ‘ข Each ฮฒโฑผ represents the change in ln(ฮป) per unit increase in xโฑผ โ‘ฃ Exponentiate coefficients to get incidence rate ratios ๐Ÿง ๐—›๐—ผ๐˜ ๐—ถ๐˜€ ๐—ถ๐˜€ ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐˜€ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—น๐—ถ๐—ป๐—ฒ๐—ฎ๐—ฟ ๐—ฟ๐—ฒ๐—ด๐—ฟ๐—ฒ๐˜€๐˜€๐—ถ๐—ผ๐—ป? Linear regression predicts continuous outcomes and assumes constant variance. It can produce negative predictions. Poisson regression predicts counts, assumes variance equals the mean, and guarantees non-negative predictions through the log link. It uses maximum likelihood instead of least squares. โœ๏ธ ๐—ช๐—ต๐—ฒ๐—ป ๐˜€๐—ผ ๐˜†๐˜€๐—ฒ ๐—ฃ๐—ผ๐—ถ๐˜€๐˜€๐—ผ๐—ป ๐—ฟ๐—ฒ๐—ด๐—ฟ๐—ฒ๐˜€๐˜€๐—ถ๐—ผ๐—ป: when modeling counts like number of events, purchases, or occurrences. Check for overdispersion (variance exceeds mean) and switch to negative binomial if needed. ๐Ÿ‘‰ Land Data & AI jobs on datainterview.com
DataInterview tweet media
English
0
0
0
6