Sha Liu

245 posts

Sha Liu

@ShaLiu2010

PhD in Language Assessment @BristolUni|Automated speaking/writing assessment, feedback; feedback literacy; eye-tracking

Bristol, England Katılım Kasım 2017

272 Takip Edilen243 Takipçiler

Sha Liu retweetledi

Yutaka Ishii / 石井雄隆@yishii_0207·1d

Enhancing a large language model with a chain-of-metacognitive reasoning approach increases argumentative writing evaluation accuracy, student writing outcomes, and mental effort doi.org/10.1016/j.comp…

English

219

Sha Liu retweetledi

MizumotoAtsushi@MizumotoAtsushi·1d

™ChatGPT for automated writing evaluation: Scoring and feedback across prompt conditions - Yewon Lee, Myunghwan Hwang, 2026 journals.sagepub.com/doi/abs/10.117…

English

760

Sha Liu retweetledi

Yutaka Ishii / 石井雄隆@yishii_0207·2d

Exploring student engagement with multimodal generative AI in task-based Chinese language learning doi.org/10.1080/095882…

English

348

Sha Liu retweetledi

Andrej Karpathy@karpathy·1d

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English

1.8K

36.6K

9.1M

Sha Liu@ShaLiu2010·22h

In recognition of challenging circumstances affecting colleagues in our community, the deadline for our LAQ Special Issue has been extended to 16 May 2026. Early submissions proceed to publication without delay. Full CfP: osf.io/5adyv/overview

English

Sha Liu retweetledi

Shijun(Cindy)Chen@CindySJ46·22 Mar

Teacher feedback dilemmas and the use of GenAI: Challenges or opportunities? tandfonline.com/doi/full/10.10… My sincere thanks to my primary supervisor’s continuous support and encouragement over the last two years. @CarlessDavid This could never happen without his patient guidance.

English

1.1K

Sha Liu retweetledi

Yutaka Ishii / 石井雄隆@yishii_0207·15 Mar

Evaluating generative artificial intelligence’s role in enhancing feedback for digital multimodal compositions: a comparative study doi.org/10.1080/095882…

English

901

Sha Liu retweetledi

Yutaka Ishii / 石井雄隆@yishii_0207·13 Mar

A framework for evaluation of large language models in essay assessment: Reliability, alignment, and causal reasoning doi.org/10.1016/j.caea…

English

621

Sha Liu retweetledi

Yutaka Ishii / 石井雄隆@yishii_0207·11 Mar

Model texts as a form of feedback in second language writing: A meta-analysis doi.org/10.1016/j.syst…

English

603

Sha Liu retweetledi

alphaXiv@askalphaxiv·5 Mar

Yann LeCun 🤝 Saining Xie insane crossover of the 2 biggest visual representation researchers in the AI field “Beyond Language Modeling: An Exploration of Multimodal Pretraining” Right now, most multimodal models are basically a language model with a vision adapter bolted on, so they can describe images, but they don’t really think in images or video. This paper shows what happens when you do it the hard way: train one model from scratch on text, images, and video with a unified setup. They key idea is if you give the model a good visual internal format and it can use vision for both understanding and generating. Additionally, multimodal data can improve language instead of distracting it, and mixture-of-experts lets you scale vision’s huge data intake without bloating everything else. This paves the way towards changing the vision paradigm from “captioning add-on” model to native multimodal foundation model.

English

148

942

79K

Sha Liu retweetledi

Nav Toor@heynavtoor·6 Mar

🚨BREAKING: OpenAI published a paper proving that ChatGPT will always make things up. Not sometimes. Not until the next update. Always. They proved it with math. Even with perfect training data and unlimited computing power, AI models will still confidently tell you things that are completely false. This isn't a bug they're working on. It's baked into how these systems work at a fundamental level. And their own numbers are brutal. OpenAI's o1 reasoning model hallucinates 16% of the time. Their newer o3 model? 33%. Their newest o4-mini? 48%. Nearly half of what their most recent model tells you could be fabricated. The "smarter" models are actually getting worse at telling the truth. Here's why it can't be fixed. Language models work by predicting the next word based on probability. When they hit something uncertain, they don't pause. They don't flag it. They guess. And they guess with complete confidence, because that's exactly what they were trained to do. The researchers looked at the 10 biggest AI benchmarks used to measure how good these models are. 9 out of 10 give the same score for saying "I don't know" as for giving a completely wrong answer: zero points. The entire testing system literally punishes honesty and rewards guessing. So the AI learned the optimal strategy: always guess. Never admit uncertainty. Sound confident even when you're making it up. OpenAI's proposed fix? Have ChatGPT say "I don't know" when it's unsure. Their own math shows this would mean roughly 30% of your questions get no answer. Imagine asking ChatGPT something three times out of ten and getting "I'm not confident enough to respond." Users would leave overnight. So the fix exists, but it would kill the product. This isn't just OpenAI's problem. DeepMind and Tsinghua University independently reached the same conclusion. Three of the world's top AI labs, working separately, all agree: this is permanent. Every time ChatGPT gives you an answer, ask yourself: is this real, or is it just a confident guess?

English

1.4K

8.9K

33.7K

3.2M

Sha Liu retweetledi

Yutaka Ishii / 石井雄隆@yishii_0207·2 Mar

Modeling human-AI collaboration in EFL academic writing: Hidden Markov model and process mining approach doi.org/10.1016/j.jslw…

English

637

Sha Liu retweetledi

Yutaka Ishii / 石井雄隆@yishii_0207·1 Mar

AI-generated feedback in an EAP writing classroom: The collaborative process of feedback, uptake, and revision quality doi.org/10.1016/j.jeap…

English

468

Sha Liu retweetledi

MizumotoAtsushi@MizumotoAtsushi·24 Şub

Kim, M. (2026). Custom GPT as mediator: Dynamic Assessment with beginner KFL learners. Language Learning & Technology, 30(1), 1–27. doi.org/10.64152/10125…

English

1.2K

Sha Liu retweetledi

Akari Asai@AkariAsai·4 Şub

Thrilled to share: OpenScholar - our work on scientific deep research agents for reliable literature synthesis -has been accepted to Nature! 🎉 Huge thanks to collaborators across institutions who made this possible!

English

229

1.3K

125.8K

Sha Liu retweetledi

RMAL@RMALJournal·14 Şub

New article: Candarli, D. (2026). Using human-Al collaboration to explore meanings of semiotic resources in L2 multimodal writing. Research Methods in Applied Linguistics, 5(1), 100305: sciencedirect.com/science/articl…

English

660

Sha Liu retweetledi

Andrew Ng@AndrewYNg·10 Şub

Job seekers in the U.S. and many other nations face a tough environment. At the same time, fears of AI-caused job loss have — so far — been overblown. However, the demand for AI skills is starting to cause shifts in the job market. I’d like to share what I’m seeing on the ground. First, many tech companies have laid off workers over the past year. While some CEOs cited AI as the reason — that AI is doing the work, so people are no longer needed — the reality is AI just doesn’t work that well yet. Many of the layoffs have been corrections for overhiring during the pandemic or general cost-cutting and reorganization that occasionally happened even before modern AI. Outside of a handful of roles, few layoffs have resulted from jobs being automated by AI. Granted, this may grow in the future. People who are currently in some professions that are highly exposed to AI automation, such as call-center operators, translators, and voice actors, are likely to struggle to find jobs and/or see declining salaries. But widespread job losses have been overhyped. Instead, a common refrain applies: AI won’t replace workers, but workers who use AI will replace workers who don’t. For instance, because AI coding tools make developers much more efficient, developers who know how to use them are increasingly in-demand. (If you want to be one of these people, please take our short courses on Claude Code, Gemini CLI, and Agentic Skills!) So AI is leading to job losses, but in a subtle way. Some businesses are letting go of employees who are not adapting to AI and replacing them with people who are. This trend is already obvious in software development. Further, in many startups’ hiring patterns, I am seeing early signs of this type of personnel replacement in roles that traditionally are considered non-technical. Marketers, recruiters, and analysts who know how to code with AI are more productive than those who don’t, so some businesses are slowly parting ways with employees that aren’t able to adapt. I expect this will accelerate. At the same time, when companies build new teams that are AI native, sometimes the new teams are smaller than the ones they replace. AI makes individuals more effective, and this makes it possible to shrink team sizes. For example, as AI has made building software easier, the bottleneck is shifting to deciding what to build — this is the Product Management (PM) bottleneck. A project that used to be assigned to 8 engineers and 1 PM might now be assigned to 2 engineers and 1 PM, or perhaps even to a single person with a mix of engineering and product skills. The good news for employees is that most businesses have a lot of work to do and not enough people to do it. People with the right AI skills are often given opportunities to step up and do more, and maybe tackle the long backlog of ideas that couldn’t be executed before AI made the work go more quickly. I’m seeing many employees in many businesses step up to build new things that help their business. Opportunities abound! I know these changes are stressful. My heart goes out to every family that has been affected by a layoff, to every job seeker struggling to find the role they want, and to the far larger number of people who are worried about their future job prospects. Fortunately, there’s still time to learn and position yourself well for where the job market is going. When it comes to AI, the vast majority of people, technical or nontechnical, are at the starting line, or they were recently. So this remains a great time to keep learning and keep building, and the opportunities for those who do are numerous! [Original text; deeplearning.ai/the-batch/issu… ]

English

227

591

2.9K

478.4K

Sha Liu retweetledi

Robert Youssef@rryssf_·10 Şub

Google just mass-published how 34 researchers actually use Gemini to solve open math and CS problems. not benchmarks. not demos. real unsolved problems across cryptography, physics, graph theory, and economics. 145 pages of case studies. here's what actually matters:

English

242

1.4K

118.3K

Sha Liu retweetledi

Yutaka Ishii / 石井雄隆@yishii_0207·8 Şub

Transforming Assessment: The Impacts and Implications of Large Language Models and Generative AI doi.org/10.1111/emip.1…

English

244

Sha Liu@ShaLiu2010·8 Şub

📢 Call for Papers: Language Assessment Quarterly Special Issue 2027 "Construct Transformation and Validity Challenges in AI-Mediated L2 Speaking Assessment" 📅 Abstracts due: April 16, 2026 Full CfP:🔗 osf.io/5adyv/overview #LanguageAssessment #AI #CallForPapers

English

964

Keşfet

@CarlessDavid @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine