Three new Large Language Models announced in July

July 2023 saw the announcement of three new developments in large language models: Meta’s Llama 2, Anthropic’s Claude 2, and StackOverflow’s OverflowAI.

In this post, I’ll provide a very brief summary of each model. While it’s promising to see the space continue to expand beyond OpenAI’s offerings, many questions still exist surrounding training data ownership, reliability, and biases of these models.

Llama 2

Meta’s Llama 2 model is a general purpose large language model that was released on 18th July. It is the second generation of the Llama model, with the first released earlier this year. While the first model was only available for research usage, the licensing for Llama 2 is much broader and allows for commercial use. It has a context window size of 4096 tokens (which is double that of Llama 1). Llama 2 comes in two main varieties: standard (“Llama 2”) and chat (“Llama-2-chat”). The chat variety is a fine-tuned model that specialised in allowing a more conversational use of the model, trained in part by human feedback. The models are available for download today, and you can request access via this form. You can also interact with hosted versions of the model on HuggingFace.

A diagram showing how Llama 2 Chat were trained in three steps: pretraining, fine-tuning and human feedback.

Claude 2

Anthropic’s Claude 2 model is another AI language model that was released on 11th July. The Claude 2 model is not currently available for public download, nor any kind of free-tier API. However you can interact with Anthropic’s hosted version at claude.ai. It seems that Claude 2 retains the same context window size as its predecessor with 100,000 tokens. This model continue’s Anthrophic’s use of “Constitutional AI” and reinforcement learning from human feedback (as Meta has now used in Llama 2), and they report an improvement on harmless responses compared with Claude 1.

A screenshot from Anthropic's video about Claude 2 titled: Long inputs, multi-step output.

OverflowAI

On 28th July StackOverflow’s announced OverflowAI. The model is part a broad set of developments from StackOverflow. Of most interest is an improved search functionality that will provide generated answers from a model trained on the data from public StackOverflow questions. It is designed to help developers find answers to their coding questions more quickly and accurately, without having to abide by the usual guidelines that come with submiting a question to StackOverflow. It is currently unclear whether the OverflowAI model will be produced as an in-house novel solution, in partnership with a provider, or by leveraging existing solutions. Of note is that StackOverflow’s changing policy on generative AI played a part of the recent moderator’s strike on the platform; it seems likely that the policy was changed in anticipation of OverflowAI. Given the model is trained on classic StackOverflow questions and answers, it will be interesting to see how the success of their conversational search affects their ability to continue to build their training data set of human answers. The new search is not yet available for public use, however it is slated to enter a private alpha in August 2023.