Christopher C. Cyrus
11 posts

Christopher C. Cyrus
@ccyrus
https://t.co/VJeMJuHtNE


We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math. openai.com/index/introduc…




We always knew that Chomsky was wrong about language models, it’s nice to have a paper showing you just how wrong he was! #ACL2024 best papsr. arxiv.org/abs/2401.06416

I've been asked by few first year PhD about how to start LLM research on X, say long context modeling. My number one suggestion -- though it seems a bit of unconventional -- is *not* to read any papers related to long-context, but to talk to the model - Talk to the model about a text book, course slides, financial reports, novels, nonfictions, any long document you could find - Talk to the model for two whole weeks, from the morning first thing after opening up the laptop, to the evening last thing before going to the bed. - Ask every single question you could imagine, what is PCA? How does it compare to SVD? Which part of the book describes the two? What the book says exactly? - Talk to all the models you could access, GPT, Gemini, Claude, Llama ... - Keep talking to the model for two whole weeks, no research, no paper, no arxiv, just talk to the model. - During the above process, continuously observe how the model behave, discover their problems, and think about why models could behave that way I found people who have gone through the above process have a fundamentally different level of understanding than people who just read papers 😉






Visualization-of-Thought Elicits Spatial Reasoning in LLMs Inspired by a human cognitive capacity to imagine unseen worlds, this new work proposes Visualization-of-Thought (VoT) prompting to elicit spatial reasoning in LLMs. VoT enables LLMs to "visualize" their reasoning traces, creating internal mental images, that help to guide subsequent reasoning steps. Think of this prompting approach as a way of eliciting the "mind's eye" of LLMs. When tested on multi-hop spatial reasoning tasks like visual tiling and visual navigation, VoT outperforms existing multimodal LLMs. This is a fascinating paper and makes me wonder whether there are other human cognitive abilities that can inspire even more complex capacities of LLMs and multimodal LLMs.










