From AI Dungeon Wiki
Revision as of 02:11, 12 March 2021 by Luihum (talk | contribs) (Removed repeated link)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

GPT-2/3 are names of Transformer Neural Network models created by OpenAI and used by AI Dungeon. GPT was released in June 2018. GPT-2 full version was released in November 2019, while GPT-3 still hasn't been released publicly, but has been announced and can be used through OpenAI's API. GPT is short for Generative Pretrained Transformer. GPT uses BPE tokenization.

They use the transformer architecture and are used for text generation. GPT-2 largest model contains 1.5B parameters and GPT-3 largest model contains 175B parameters, which makes for a drastic quality shift between them.

Dragon uses the biggest GPT-3 model with 175 billion parameters, and Griffin uses a smaller GPT-3 model. Before Griffin and Dragon, AI Dungeon used the biggest GPT-2 model with minor changes. That model is still used for the first AI response to prevent direct access and abuse of GPT-3.

GPT Models


GPT was proposed in this paper in 2018. It was made to test the impact of pretraining on AI performance. Its dataset was the BooksCorpus dataset, a library of unpublished books.


GPT-2 was proposed in this paper in 2019. It was made to show that transformers can learn multiple tasks concurrently without supervision. Its dataset was called WebText, and was made from slightly over 8 million documents for a total of 40 GB of text from URLs shared in Reddit submissions with at least 3 upvotes.


GPT-3 was proposed in this paper in 2020. It was made to show that sufficiently large transformers can learn multiple tasks without finetuning. Its dataset was based on a heavily filtered version of the CommonCrawl dataset, an expanded version of WebText, two online book libraries, and English Wikipedia.

Model Comparison

Title Layers Dimensional States Total Parameters
GPT 12 768 125M
GPT-2 Small 12 768 117M
GPT-2 Medium 24 1024 345M
GPT-2 Large 36 1280 762M
GPT-2 XL or GPT-2 48 1600 1.5B
GPT-3 Small 12 768 125M
GPT-3 Medium 24 1024 350M
GPT-3 Large 24 1536 760M
GPT-3 XL 24 2048 1.3B
GPT-3 2.7B 32 2560 2.7B
GPT-3 6.7B 32 4096 6.7B
GPT-3 13B 40 5140 13B
GPT-3 175B or GPT-3 96 12288 175B