Vector

3K posts

Vector banner
Vector

Vector

@_vector15

FAFOing through life Polymath maxing

127.0.0.1 Katılım Şubat 2025
393 Takip Edilen436 Takipçiler
Sabitlenmiş Tweet
Vector
Vector@_vector15·
Since I haven't posted a career update in a while: Have joined WorldQuant BRAIN as a Quantitative Research Consultant Gonna be working on the intersection of finance, data, math and a lot more stuff. Pretty excited about this.
Vector tweet media
English
21
1
91
3K
Vector retweetledi
𝒢𝒾𝓁𝒷ℯ𝓇𝓉
Why are threesomes only for sex? Why can't I join in on a couple's argument in public if I have a good point to make?
English
250
3.8K
38.6K
1.3M
silicognition
silicognition@silicognition·
@w2sgarnav a couple of undergrads including me joined forces to research independently on some topics, saw a couple of ideas from your previous post align with ours. we have guys who intern/interned at top labs, startups, and are associated w top tier indian unis. lmk if you're interested.
English
8
0
11
668
arnav sonavane
arnav sonavane@w2sgarnav·
i have like 150+ ideas in ml, need to make an academia group ig
arnav sonavane@w2sgarnav

aiming for 5 research topics for the upcoming few months, if yall want to join in pls do so, GPU shortage wont be there (hopefully) (worked on these problem statements a bit previously, and have ran a few experiments on each) find them below: ps 1 : Process Reward Models Beyond Outcome Supervision Without the need for human-labeled trajectories, we provide a completely automated approach for training Process Reward Models (PRMs) that either meet or surpass the quality of gold step-level annotations. We create dense Monte-Carlo Tree Search (MCTS) rollouts with depth d ≥ 32 and branching factor b = 8, starting from a base policy π_θ trained via SFT on chain-of-thought data. Each intermediate step is scored using an ensemble of outcome verifiers (ORMs) bootstrapped from self-consistency and LLM-as-judge signals under temperature T = 0.7. A process-DPO variation with step-wise Bradley-Terry losses weighted by MCTS visit counts and calibrated via Platt scaling on a short held-out verification set is introduced to reduce verifier noise. By simultaneously optimising the PRM and policy under a single RLVR goal that alternates between process-level preference optimisation and outcome-level PPO updates, with adaptive mixing ratio λ_t planned via cosine annealing, our method closes the annotation gap. Our auto-annotated PRM delivers +14.7% pass@1 over outcome-only RM baselines at 7B scale and transfers to code and scientific reasoning domains with 3% deterioration following LoRA adaptation on 2k domain-specific trajectories, according to extensive ablation on GSM8K, MATH, and HumanEval. We present the multi-domain PRM benchmark, the distilled verifier weights, and the whole MCTS annotation program, offering the first production-ready recipe for frontier-scale process supervision. ps 2 : Computer-Use Agents and GUI Grounding In addition to introducing a large-scale synthetic data engine that uses Playwright + Android Emulator instrumentation to generate 500k grounded interaction traces across web, mobile, and desktop environments, we formalise GUI grounding failures through a tripartite decomposition: perception (pixel-to-semantic mapping), planning (high-level action sequence), and execution (low-level mouse/keyboard trajectories). Pixel-level segmentation masks, accessibility tree annotations, and oracle action sequences obtained via deterministic UI state diffing are linked with each trace. Using a hybrid loss that combines contrastive screen embedding alignment (using InfoNCE on cropped UI elements), autoregressive action token prediction, and auxiliary bounding-box regression heads that function at 4× downsampled resolution to maintain fine-grained OCR and icon semantics, we train a multimodal VLA policy on top of a Qwen2-VL-7B backbone. A domain-adversarial training objective that aligns screen embeddings across platforms while maintaining task-specific action distributions is combined with test-time adaptation using a lightweight 256M adapter that conditions on platform-specific accessibility trees to achieve cross-platform zero-shot transfer. Our model decreases end-to-end grounding error from 48% (Claude-3.5 baseline) to 19% on the recently released GUI-Grounding-Bench (which includes 12k actual jobs from WebArena, AndroidWorld, and OSWorld), with the biggest improvements in perception-heavy mobile UIs. We provide the cross-platform VLA checkpoint, the failure atlas taxonomy, and the complete synthetic trace generator, creating the first reproducible benchmark and recipe for reliable computer-use agents. ps 3 : Agent Memory Architectures Beyond RAG We present TypedAgentMemory, a modular memory substrate controlled by a differentiable memory controller trained end-to-end with the agent policy that explicitly distinguishes episodic semantic (dense vector summaries with SAE-derived concept tags), procedural, and working (short-term KV cache compression) memories. A 128-dim uncertainty head that thresholds epistemic uncertainty from an ensemble of forward passes gates memory writes. The controller uses a hierarchical policy over four memory operations: write, consolidate (graph-based merging with GNN message passing), forget (learned eviction via eligibility traces and recency + relevance scores), and retrieve (hybrid dense + symbolic query routing). Explicit memory consolidation every 50 steps is used to evaluate long-horizon tasks on τ-bench, WebArena, and GAIA. This results in a 2.3× decrease in context length and a 31% improvement in success rate over flat vector-store RAG baselines. Per-memory-type differential privacy approaches, such as homomorphic encryption for procedural skill graphs, concept-level k-anonymity on semantic features, and ε = 0.5 noise injection on episodic writing, are used to ensure privacy. Ablations show that typed memory facilitates effective cross-task transfer through procedural memory reuse and prevents catastrophic forgetting on 200-step agent trajectories. We provide the first rational substitute for monolithic RAG for production-grade autonomous agents by making the whole TypedAgentMemory library (based on LangGraph + FAISS + Neo4j), the long-horizon evaluation harness, and pretrained memory controllers for Llama-3.1-8B and Qwen2.5-72B open-source. ps 4: SAE Universality Across Model Families By training 128k-feature JumpReLU SAEs (expansion factor 64, k = 32) on residual streams of Llama-3.1-8B, Qwen2.5-72B, Gemma-2-27B, Mistral-Large-2, and DeepSeek-V3 with the same hyperparameters and reconstruction aims, we perform the first extensive cross-family SAE universality investigation. A bipartite matching that quantifies pairwise overlap at both neuron-level (cosine similarity > 0.85) and concept-level (via automated interpretation pipelines using 512 probe prompts per feature) is obtained by performing feature matching via optimal transport with Sinkhorn algorithm on normalised decoder weight matrices. By grouping similar features from different families into 4.2k platonic ideas and annotating each concept with activation data, downstream steering efficacy, and causal mediation scores calculated via route patching, we further build a universal feature library. Steering vectors created from the universal library outperform within-family SAEs on out-of-distribution tasks and enhance zero-shot generalisation on MMLU-Pro, GPQA, and LiveCodeBench by an average of 9.4% when transferred between families, according to downstream transfer studies. We make available the whole SAE training software, the universal concept library with 4.2k interpreted features, the cross-family matching dataset (which includes optimum transport plans), and a plug-and-play steering toolkit that works with Hugging Face Transformers and vLLM. In order to facilitate transfer learning, model merging, and safety interventions within the existing frontier model ecosystem, this study offers the first rigorous atlas and infrastructure for mechanistic universality. ps 5 : Synthetic Data Generation Without Mode Collapse We provide an iterated synthetic data pipeline that explicitly characterises the collapse threshold ρ*(q) as a function of generator quality q (as determined by the activation entropy of the SAE feature and the entropy of the output distribution H_π). Using temperature-annealed sampling (T=1.0 → 0.7) supplemented with SAE-guided rejection sampling, we create synthetic corpora at different mixing ratios ρ ∈ {0, 0.1,…, 1.0} starting from a 7B base policy π_θ trained on 200B tokens of FineWeb-Edu. At each generation, we train a 128k-feature JumpReLU SAE (expansion factor 64, k=32) on the residual stream of the current model and filter synthetic samples whose top-activating features show activation entropy below a calibrated threshold τ derived from the real-data reference distribution. Our experiments provide the first empirical collapse-threshold map ρ*(q) at 1.3B–7B scale, demonstrating that SAE-guided diversity sampling extends the safe mixing ratio by 2.3× compared to persona-conditioned or temperature-only baselines, while generator entropy H_π ≥ 4.2 nats delays the onset of measurable perplexity degradation on a held-out real validation set until generation 7 under accumulation (versus generation 3 under pure replacement). A closed-form constraint on variance contraction rate under synthetic mixing is derived theoretically, connecting the number of safe iterations before tail probability mass falls below 10^{-3} to the spectral gap of the generator's transition kernel.

English
10
0
78
5.7K
saksham
saksham@sakshamred·
on god this is the last one devdrug.runable.site this app gives you scores of what you have build in all this time hireable + turncut everything in one place just put on the github username and check out leaderboard for all the info about your standings and your friends standings here's @notsmv's rating
saksham tweet media
Humi@byteHumi

its a looooong weekend and on addition to kshitij's challenge, here is a small prize from us ( the RunClub Community ) 『 』best one - Airpods pro + runable pro the challenge is simple .. just put your best work on runable.. it could be anything > website > short-flim > music song video depends on the creativity quote this tweet or reply under may the deserving candidate win valid till Sunday 11:59 IST

English
7
1
17
803
Vector
Vector@_vector15·
My mind keeps jumping between learning cutting edge stuff like ai/robotics out of passion and learning web so that I get paid. I think I should do both ngl
Vector tweet media
English
2
0
3
86
Vector retweetledi
Wᴀʟʟꜰʟᴏᴡᴇʀ 🥀
I never thought it would come to this, but here I am. I need financial help. Not for a medical emergency, but for my education. College fees? No, those are already minimal. What for, then? To attend a global conference. Why? Because my scholarship application was rejected, as I shared a few days back. ​The entry fee alone is $565. Yes, that is excluding all other travel expenses. I earned my spot at the conference based on merit, but unfortunately, I didn't qualify for their financial aid. ​So here I am, asking for any small contribution you can make to help me get there. As an undergraduate student from a lower-middle-class family, I simply cannot afford this huge sum of money on my own. ​I want to be clear: I am not just asking for a free handout. I need support, but I want you to put forward what you expect from me in return. I will evaluate if it is something I can deliver, and only then do you need to contribute. ​You can, of course, donate purely as charity if you don't want anything back; but simply accepting money for nothing will make it hard for me to sleep at night. ​If you cannot contribute financially, even just referring me to someone or an organization that might be able to sponsor me would mean the world to me.
Wᴀʟʟꜰʟᴏᴡᴇʀ 🥀 tweet media
English
31
23
72
7.6K
Vector
Vector@_vector15·
@VazeKshitij I want to, but she no longer talks to me 😭
English
0
0
0
59
kshitij vaze
kshitij vaze@VazeKshitij·
Here's your reminder - go and call THAT friend of yours a BAUNIIII
Pune, India 🇮🇳 English
10
0
27
1.7K
Vector
Vector@_vector15·
Honestly, took me until 4th sem to realise my approach to academics is way too "stereotypical" Approaching it as pieces of a puzzle which fit together completely in the end makes it actually interesting to study rather than mindlessly grinding for endsems (which I hate)
English
0
0
5
71
Vector retweetledi
Vector
Vector@_vector15·
@UtkarshS08 Kaash main bhi proposal bhej deta😭
हिन्दी
0
0
1
29
utkarsh🦉
utkarsh🦉@UtkarshS08·
Aaj sab GSoC m select hogye lagta🫩
Magyar
9
0
30
1.1K
Vector
Vector@_vector15·
@Resorcinolworks Sabko steve jobs banna hai Except that he was a visionary, and they do the same thing thousands of people already do
English
0
0
0
150
Vector retweetledi
kache
kache@yacineMTB·
you can outsource your thinking but you cannot outsource your understanding
English
234
3.5K
15.9K
2M
Vector
Vector@_vector15·
@jerkeyray Factually correct I think what people are forgetting is that people will be looking at it from an individual point of view rather than a game theoretic one
English
1
0
3
78
Jerkeyray
Jerkeyray@jerkeyray·
i'd pick red, i don't like leaving stuff up to chance or luck. razor thin line between self-abandonment and altruism.
English
3
1
20
640
Vector
Vector@_vector15·
Gave my first codeforces contest after almost a year Competitive programming arc incoming?
English
0
0
6
130
Vector retweetledi
Amaan
Amaan@dextertwts·
Built a file-upload pipeline with a dual-path architecture: [repo in comments] - small files (<5 MB) go through a fast in-memory lane - large media goes through a direct-to-cloud lane For small uploads like profile pictures: - files are checked twice: first by Multer MIME pre-check, then by real binary magic-number validation - the server resizes them, converts them to WebP, avoids pointless upscaling, and generates a BlurHash placeholder - Sharp optimisation and BlurHash generation run in parallel to keep the upload path low-latency For large uploads: - the backend never relays the actual file bytes through its own RAM/disk - it first creates a PENDING row in the DB - then it generates a signed Cloudinary upload payload - the client uploads directly to Cloudinary in chunks - Cloudinary finishes processing and sends back a signed webhook - only after webhook verification does the backend mark the upload as VERIFIED or FAILED
English
6
7
28
1.5K
Vector
Vector@_vector15·
How tf do you make friends to go on trips with and all? It's been 2 years in college aur bc ek group bhi aisa nhi bana aaj tak😭
English
4
0
6
350