Large language models #
2023-12-03 #
So I am realising I am going to have to learn a bit about these large language models, and in general about artificial intelligence: it’s not going anywhere, more and more people are talking about, and smart people are starting to say things about it that scare me.
Links and resources #
- Some thoughts on why companies will need to focus on “prompt engineering” to differentiate themselves in a crowded AI space.
- A nice technical summary of transformers, the neural network architecture used to build LLMs.
- Someone using LLMs to build a “power user” interface for iMessage.
- The potential impact of LLMs on the creative arts has often reminded me of this fantastic short story I remember reading when I was younger, by Roald Dahl.
- An interesting post on how the uncensored open-source llama2 model was trained.
- A really impressive visualization of how LLMs work under the hood.
- A really great introduction to large language models by Andrej Karpathy - this is meant to be for a “lay” audience but I think it serves as a good introduction for technical-minded people who have not encountered the ideas before.
- The first paper I’ve seen which claims that LLMs can “discover new mathematics”.
- A nice notebook from a reliable source about attention in transformers.
- A nice emacs package for integrating with local LLMs.
- This post pretty much sums up how I feel heading into 2024.
- A famous internal paper from Google about how “OpenAI has no moat and neither do we” when it comes to LLMs.
- A fairly nice summary of a typical technical person’s use of LLMs. I keep trying a lot of the things on this list but always get tripped up somewhere. Still not sure if this is a fundamental limitation of LLMs or they’re (or I’m) just not there yet. Remember: LLMs are always “hallucinating”, they have no real model of correctness or truth!
- An interesting tool for learning how to prompt LLMs more effectively.
- An article in Nature about how more and more researchers are running LLMs locally instead of using cloud-based models like ChatGPT.
- A nice post on the Cursor editor and projects.
- An interesting paper about using multiple small language models in interesting ways to improve “reasoning” performance. I am still trying to get my head around results like this: is it just more layers of smoke and mirrors, or is this how something interesting is going to come out of all this stuff?