The Tokyo Test

The Tokyo Test is a demonstration that frontier AI models routinely disagree on questions of fact. The same prompt is run across multiple models simultaneously, and the user sees that the answers diverge.

The name comes from a class of geographic and demographic questions — for example 'what is the population of Tokyo?' — where the correct answer depends on whether you mean the prefecture, the metropolis, or the greater metropolitan area. Different models pick different definitions and give different numbers, with equal confidence.

The test is not really about Tokyo. It is about demonstrating to a regulated buyer that single-model AI is structurally unsafe for any question where the stakes justify a second opinion — because the model will not tell you when it is uncertain or when its peers disagree.

Backplain's interactive Tokyo Test runs three free prompts across every frontier model in the platform, with no signup required.