GPT-2/3 are names of Transformer Neural Network models created by OpenAI and used by AI Dungeon. GPT was released in June 2018. GPT-2 full version was released in November 2019, while GPT-3 still hasn't been released publicly, but has been announced and can be used through OpenAI's API. GPT is short for Generative Pretrained Transformer. GPT uses BPE tokenization.
They use the transformer architecture and are used for text generation. GPT-2 largest model contains 1.5B parameters and GPT-3 largest model contains 175B parameters, which makes for a drastic quality shift between them.
Dragon uses the biggest GPT-3 model with 175 billion parameters, and Griffin uses a smaller GPT-3 model. Before Griffin and Dragon, AI Dungeon used the biggest GPT-2 model with minor changes. That model is still used for the first AI response to prevent direct access and abuse of GPT-3.
GPT was proposed in this paper in 2018. It was made to test the impact of pretraining on AI performance. Its dataset was the BooksCorpus dataset, a library of unpublished books.
GPT-2 was proposed in this paper in 2019. It was made to show that transformers can learn multiple tasks concurrently without supervision. Its dataset was called WebText, and was made from slightly over 8 million documents for a total of 40 GB of text from URLs shared in Reddit submissions with at least 3 upvotes.
GPT-3 was proposed in this paper in 2020. It was made to show that sufficiently large transformers can learn multiple tasks without finetuning. Its dataset was based on a heavily filtered version of the CommonCrawl dataset, an expanded version of WebText, two online book libraries, and English Wikipedia.
|Title||Layers||Dimensional States||Total Parameters|
|GPT-2 XL or GPT-2||48||1600||1.5B|
|GPT-3 175B or GPT-3||96||12288||175B|