These tests will involve taking all three perspectives and seeing how well the AI functions with them.
"Stability" is a general term here, and will be rated in terms of "failures".
Run some actions with the AI using 3 different perspectives on contexts which are otherwise identical, to see what the rate of "failures" are with each. These will be decided by a group of 3 and be totaled over a series of tests.
The following are usually annoying but in-bounds for the AI, able to be continued on with alteration, and will be considered "minor failures"...
- Responses which are too short to be meaningful.
- Responses which would not work in context, but still acknowledge it.
- Including characters who are not there.
- Changing minor details.
- Taking the story off on a tangent.
- Switching perspectives ("You" in third or first person, the character's name in first or second person, etc)
- Attempting to create a scene-break but otherwise keeping sane.
The following are outright failures of the AI to continue the story, and are almost completely unusable, and will be considered "major failures"...
- Responses which seem completely unrelated to given context.
- Literal Nonsense
These tests can (and should) use a diverse range of contexts, including various prompts, but should also make an attempt to both fill both history and memory and to avoid using scenarios which reference a perspective with WorldInfo. Each context should have a significant number of attempts made within it using all 3 perspectives.
There should also be a secondary test run with
Tests to be run after review of the above criteria.