Difference between revisions of "GPT"

From AI Dungeon Wiki
Jump to navigation Jump to search
[checked revision][checked revision]
m (Categorizing)
m (Removed repeated link)
 
Line 1: Line 1:
'''GPT-2/3''' are names of [[Transformer]] [[Neural Network]] models created by [[OpenAI]] and used by [[AI Dungeon]]. GPT was released in June 2018. GPT-2 full version was released in November 2019, while GPT-3 still hasn't been released publicly, but has been announced and can be used through [[OpenAI]]'s API. GPT is short for Generative Pretrained [[Transformer]]. GPT uses BPE [[Token#Tokenizations|tokenization]].
+
'''GPT-2/3''' are names of [[Transformer]] [[Neural Network]] models created by [[OpenAI]] and used by [[AI Dungeon]]. GPT was released in June 2018. GPT-2 full version was released in November 2019, while GPT-3 still hasn't been released publicly, but has been announced and can be used through OpenAI's API. GPT is short for Generative Pretrained Transformer. GPT uses BPE [[Token#Tokenizations|tokenization]].
  
They use the [[Transformer]] architecture and are used for text generation. GPT-2 largest model contains 1.5B parameters and GPT-3 largest model contains 175B parameters, which makes for a drastic quality shift between them.  
+
They use the transformer architecture and are used for text generation. GPT-2 largest model contains 1.5B parameters and GPT-3 largest model contains 175B parameters, which makes for a drastic quality shift between them.  
  
[[Dragon]] uses the biggest GPT-3 model with 175 billion parameters, and [[Griffin]] uses a smaller GPT-3 model. Before the [[Griffin]] and [[Dragon]], [[AI Dungeon]] used the biggest GPT-2 model with minor changes. That model is still used for the first AI response to prevent direct access and abuse of GPT-3.
+
[[Dragon]] uses the biggest GPT-3 model with 175 billion parameters, and [[Griffin]] uses a smaller GPT-3 model. Before Griffin and Dragon, AI Dungeon used the biggest GPT-2 model with minor changes. That model is still used for the first AI response to prevent direct access and abuse of GPT-3.
 
==GPT Models==
 
==GPT Models==
 
===GPT===
 
===GPT===

Latest revision as of 02:11, 12 March 2021

GPT-2/3 are names of Transformer Neural Network models created by OpenAI and used by AI Dungeon. GPT was released in June 2018. GPT-2 full version was released in November 2019, while GPT-3 still hasn't been released publicly, but has been announced and can be used through OpenAI's API. GPT is short for Generative Pretrained Transformer. GPT uses BPE tokenization.

They use the transformer architecture and are used for text generation. GPT-2 largest model contains 1.5B parameters and GPT-3 largest model contains 175B parameters, which makes for a drastic quality shift between them.

Dragon uses the biggest GPT-3 model with 175 billion parameters, and Griffin uses a smaller GPT-3 model. Before Griffin and Dragon, AI Dungeon used the biggest GPT-2 model with minor changes. That model is still used for the first AI response to prevent direct access and abuse of GPT-3.

GPT Models

GPT

GPT was proposed in this paper in 2018. It was made to test the impact of pretraining on AI performance. Its dataset was the BooksCorpus dataset, a library of unpublished books.

GPT-2

GPT-2 was proposed in this paper in 2019. It was made to show that transformers can learn multiple tasks concurrently without supervision. Its dataset was called WebText, and was made from slightly over 8 million documents for a total of 40 GB of text from URLs shared in Reddit submissions with at least 3 upvotes.

GPT-3

GPT-3 was proposed in this paper in 2020. It was made to show that sufficiently large transformers can learn multiple tasks without finetuning. Its dataset was based on a heavily filtered version of the CommonCrawl dataset, an expanded version of WebText, two online book libraries, and English Wikipedia.

Model Comparison

Title Layers Dimensional States Total Parameters
GPT 12 768 125M
GPT-2 Small 12 768 117M
GPT-2 Medium 24 1024 345M
GPT-2 Large 36 1280 762M
GPT-2 XL or GPT-2 48 1600 1.5B
GPT-3 Small 12 768 125M
GPT-3 Medium 24 1024 350M
GPT-3 Large 24 1536 760M
GPT-3 XL 24 2048 1.3B
GPT-3 2.7B 32 2560 2.7B
GPT-3 6.7B 32 4096 6.7B
GPT-3 13B 40 5140 13B
GPT-3 175B or GPT-3 96 12288 175B