Difference between revisions of "Token"

From AI Dungeon Wiki
Jump to navigation Jump to search
[unchecked revision][checked revision]
(Improve definition cuz erin said im bad)
m (Removed repeated link)
 
(2 intermediate revisions by 2 users not shown)
Line 1: Line 1:
A Token is a the smallest unit of text a [[neural net]] processes on. The text inputted to the AI is first Tokenized so it is a list of tokens then sent to the AI.
+
A token is a the smallest unit of text a [[neural net]] processes on. The text inputted to the AI is first tokenized, so it is a list of tokens then sent to the AI.
 
==Tokenizations==
 
==Tokenizations==
Because the neural net takes in text as a sequence of tokens in order for text to be processed it first needs to be tokenized. This can be as simple as using characters or words as tokens, but more complex tokenizations lead to better results. The one used by [[AI Dungeon]] and [[GPT]] in general works with common character clusters for instance try would be converted to [try] and trying would be converted to [try][ing] that way the [[neural net]] can see the relations between words while not having to bother with individual characters.
+
Because the neural network takes in text as a sequence of tokens in order for text to be processed, it first needs to be tokenized. This can be as simple as using characters or words as tokens, but more complex tokenizations lead to better results. The one used by [[AI Dungeon]] and [[GPT]] in general works with common character clusters. For instance, try would be converted to [try] and trying would be converted to [try][ing]. That way, the neural net can see the relations between words while not having to bother with individual characters.
 +
 
 +
[[Category:Artificial intelligence]]

Latest revision as of 02:09, 12 March 2021

A token is a the smallest unit of text a neural net processes on. The text inputted to the AI is first tokenized, so it is a list of tokens then sent to the AI.

Tokenizations

Because the neural network takes in text as a sequence of tokens in order for text to be processed, it first needs to be tokenized. This can be as simple as using characters or words as tokens, but more complex tokenizations lead to better results. The one used by AI Dungeon and GPT in general works with common character clusters. For instance, try would be converted to [try] and trying would be converted to [try][ing]. That way, the neural net can see the relations between words while not having to bother with individual characters.