large language models

Large language models #

2023-12-03 #

So I am realising I am going to have to learn a bit about these large language models, and in general about artificial intelligence: it’s not going anywhere, more and more people are talking about, and smart people are starting to say things about it that scare me.

Links and resources #

Some thoughts on why companies will need to focus on “prompt engineering” to differentiate themselves in a crowded AI space.
A nice technical summary of transformers, the neural network architecture used to build LLMs.
Someone using LLMs to build a “power user” interface for iMessage.
The potential impact of LLMs on the creative arts has often reminded me of this fantastic short story I remember reading when I was younger, by Roald Dahl.
An interesting post on how the uncensored open-source llama2 model was trained.
A really impressive visualization of how LLMs work under the hood.
A really great introduction to large language models by Andrej Karpathy - this is meant to be for a “lay” audience but I think it serves as a good introduction for technical-minded people who have not encountered the ideas before.
The first paper I’ve seen which claims that LLMs can “discover new mathematics”.
A nice notebook from a reliable source about attention in transformers.
A nice emacs package for integrating with local LLMs.
This post pretty much sums up how I feel heading into 2024.
A famous internal paper from Google about how “OpenAI has no moat and neither do we” when it comes to LLMs.
A fairly nice summary of a typical technical person’s use of LLMs. I keep trying a lot of the things on this list but always get tripped up somewhere. Still not sure if this is a fundamental limitation of LLMs or they’re (or I’m) just not there yet. Remember: LLMs are always “hallucinating”, they have no real model of correctness or truth!
An interesting tool for learning how to prompt LLMs more effectively.
An article in Nature about how more and more researchers are running LLMs locally instead of using cloud-based models like ChatGPT.
A nice post on the Cursor editor and projects.
An interesting paper about using multiple small language models in interesting ways to improve “reasoning” performance. I am still trying to get my head around results like this: is it just more layers of smoke and mirrors, or is this how something interesting is going to come out of all this stuff?
A good summary of the theory behind LLMs.
A nice introduction to LLMs for mathematicians.