Sameera Horawalavithana

5.4K posts

Sameera Horawalavithana banner
Sameera Horawalavithana

Sameera Horawalavithana

@SamTube405

#AI #Multimodal Scientist @PNNLab PhD @cseUSF Opinions here are my own and do not represent my employer. Proud 🇱🇰 Live 🇺🇸

Tampa, FL Katılım Kasım 2010
903 Takip Edilen710 Takipçiler
Sameera Horawalavithana
Sameera Horawalavithana@SamTube405·
4/ some VLM capabilities only emerge in the newest LLM generation. Other tasks dominated by visual understanding barely improve regardless of which LLAMA you plug in
English
1
0
0
51
Sameera Horawalavithana
Sameera Horawalavithana@SamTube405·
🧵 1/ New preprint drop: "Back to the Barn with LLAMAs: Evolving Pretrained LLM Backbones in Finetuning Vision Language Models" 🦙 If you swap a better LLM backbone into your VLM, do you get a better VLM? Short answer: not really. Longer answer: it's more interesting than that.
English
1
0
1
71
Sameera Horawalavithana retweetledi
Pacific Northwest National Laboratory
Streamlining federal permitting with AI 📄🖥️⏩ PNNL researchers are using AI to bring valuable data distributed across hundreds of federal government agencies into a single dataset that's crucial for modernizing permitting technology for the 21st century.
English
4
15
36
6.6K
Sameera Horawalavithana retweetledi
WHCEQ47
WHCEQ47@WHCEQ47·
CEQ coordinated with @ENERGY’s @PNNLab PermitAI project on the release of NEPATEC 2.0, a major accomplishment on the road to a simplified, speedier Federal permitting and environmental review process.
Pacific Northwest National Laboratory@PNNLab

Streamlining federal permitting with AI 📄🖥️⏩ PNNL researchers are using AI to bring valuable data distributed across hundreds of federal government agencies into a single dataset that's crucial for modernizing permitting technology for the 21st century.

English
1
1
8
594
Sameera Horawalavithana retweetledi
Charles Yang
Charles Yang@charlesxjyang·
We also currently have a kaggle competition open, which is running LLM evaluations on a QuAD benchmark specific to understanding permitting documents - deadline is june 30! kaggle.com/competitions/l…
Charles Yang tweet media
English
1
2
9
1.2K
Sameera Horawalavithana retweetledi
Charles Yang
Charles Yang@charlesxjyang·
This corpus is part of DOE's voltAIc initiative, which is using LLMs to accelerate permitting processes This corpus was developed in partnership with @PNNLab - you can find more details about their work on this project here: pnnl.gov/projects/polic…
English
1
2
12
809
Sameera Horawalavithana retweetledi
Charles Yang
Charles Yang@charlesxjyang·
🚨if you care about fixing NEPA environmental permitting OR are looking for new high-quality domain text corpuses to build LLMs on top of... DOE just released a 3.6B token corpus of federal permitting documents on huggingface! This corpus includes...
English
1
23
107
29.8K
Sameera Horawalavithana retweetledi
SummarizedML
SummarizedML@summarizedml·
A new multimodal model LLaMA-SciTune for science-focused visual and language understanding. 📄 arxiv.org/abs/2307.01139…
SummarizedML tweet media
English
0
1
0
122
Sameera Horawalavithana retweetledi
Tom Sawada
Tom Sawada@tsawada_ml·
SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions Abs: arxiv.org/abs/2307.01139 Pdf: arxiv.org/pdf/2307.01139… Presenting SciTune, a tuning framework to improve large language models' (LLMs) ability to follow scientific multimodal instructions. SciTune includes two stages: scientific concept alignment to learn across various scientific visual signals and textual signals, and scientific instruction tuning to fine-tune on a multimodal scientific reasoning task. LLaMA-SciTune, surpasses human performance on the ScienceQA multimodal reasoning benchmark and performs significantly better than SoTA vision-language models in a variety of scientific image understanding tasks with zero-demonstrations during the inference time.
Tom Sawada tweet media
English
0
2
2
146
Sameera Horawalavithana
Sameera Horawalavithana@SamTube405·
@ChunyuanLi That means, LLaMA -> Stage 1.1 Feature Alignment (CC3M) -> Stage 1.2 Medical Concept Alignment -> Stage 2 Medical Instruction Tuning
English
1
0
1
78
Sameera Horawalavithana
Sameera Horawalavithana@SamTube405·
@ChunyuanLi Since you used LLaVA as the base (LLaMA -> Stage 1: Feature Alignment -> Stage 2: Instruction Tuning -> LLaVA), I was thinking whether performing LLaVA-Med (Stage 1) on top of LLaVA (Stage 1) can increase the performance.
English
1
0
0
89
Chunyuan Li
Chunyuan Li@ChunyuanLi·
1/3 LLaVA-Med, our first attempt towards building a large language and vision assistant with multimodal GPT-4 level capabilities for the healthcare space, trained eight A100s in <15 hours. 🚀🧑‍⚕️ Paper: arxiv.org/abs/2306.00890 Project: github.com/microsoft/LLaV…
Chunyuan Li tweet mediaChunyuan Li tweet mediaChunyuan Li tweet media
AK@_akhaliq

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day paper page: huggingface.co/papers/2306.00… propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images. The key idea is to leverage a large-scale, broad-coverage biomedical figure-caption dataset extracted from PubMed Central, use GPT-4 to self-instruct open-ended instruction-following data from the captions, and then fine-tune a large general-domain vision-language model using a novel curriculum learning method. Specifically, the model first learns to align biomedical vocabulary using the figure-caption pairs as is, then learns to master open-ended conversational semantics using GPT-4 generated instruction-following data, broadly mimicking how a layperson gradually acquires biomedical knowledge. This enables us to train a Large Language and Vision Assistant for BioMedicine (LLaVA-Med) in less than 15 hours (with eight A100s). LLaVA-Med exhibits excellent multimodal conversational capability and can follow open-ended instruction to assist with inquiries about a biomedical image. On three standard biomedical visual question answering datasets, LLaVA-Med outperforms previous supervised state-of-the-art on certain metrics. To facilitate biomedical multimodal research, we will release our instruction-following data and the LLaVA-Med model.

English
6
18
90
64.7K
Kenneth Huang #HCOMP2026 Sep27-30@DC
We're launching the 1st Scientific Figure Captioning (SciCap) Challenge! We invite AI/NLP/CV researchers to build systems that caption all types of figures in arXiv papers. The challenge will be hosted at the CLVL workshop at #ICCV2023. Join us here: SciCap.AI
Kenneth Huang #HCOMP2026 Sep27-30@DC tweet media
English
3
17
48
9.5K