Large language models #

2023-12-03 #

So I am realising I am going to have to learn a bit about these large language models, and in general about artificial intelligence: it’s not going anywhere, more and more people are talking about, and smart people are starting to say things about it that scare me.

  • Some thoughts on why companies will need to focus on “prompt engineering” to differentiate themselves in a crowded AI space.
  • A nice technical summary of transformers, the neural network architecture used to build LLMs.
  • Someone using LLMs to build a “power user” interface for iMessage.
  • The potential impact of LLMs on the creative arts has often reminded me of this fantastic short story I remember reading when I was younger, by Roald Dahl.
  • An interesting post on how the uncensored open-source llama2 model was trained.
  • A really impressive visualization of how LLMs work under the hood.
  • A really great introduction to large language models by Andrej Karpathy - this is meant to be for a “lay” audience but I think it serves as a good introduction for technical-minded people who have not encountered the ideas before.
  • The first paper I’ve seen which claims that LLMs can “discover new mathematics”.
  • A nice notebook from a reliable source about attention in transformers.
  • A nice emacs package for integrating with local LLMs.
  • This post pretty much sums up how I feel heading into 2024.
  • A famous internal paper from Google about how “OpenAI has no moat and neither do we” when it comes to LLMs.
  • A fairly nice summary of a typical technical person’s use of LLMs. I keep trying a lot of the things on this list but always get tripped up somewhere. Still not sure if this is a fundamental limitation of LLMs or they’re (or I’m) just not there yet. Remember: LLMs are always “hallucinating”, they have no real model of correctness or truth!
  • An interesting tool for learning how to prompt LLMs more effectively.
  • An article in Nature about how more and more researchers are running LLMs locally instead of using cloud-based models like ChatGPT.
  • A nice post on the Cursor editor and projects.
  • An interesting paper about using multiple small language models in interesting ways to improve “reasoning” performance. I am still trying to get my head around results like this: is it just more layers of smoke and mirrors, or is this how something interesting is going to come out of all this stuff?