What is The Tokyo Test?

The Tokyo Test

The Tokyo Test is a demonstration that frontier AI models routinely disagree on questions of fact. The same prompt is run across multiple models simultaneously, and the user sees that the answers diverge.

The name comes from a class of geographic and demographic questions — for example 'what is the population of Tokyo?' — where the correct answer depends on whether you mean the prefecture, the metropolis, or the greater metropolitan area. Different models pick different definitions and give different numbers, with equal confidence.

The test is not really about Tokyo. It is about demonstrating to a regulated buyer that single-model AI is structurally unsafe for any question where the stakes justify a second opinion — because the model will not tell you when it is uncertain or when its peers disagree.

Backplain's interactive Tokyo Test runs three free prompts across every frontier model in the platform, with no signup required.

Run the Tokyo Test →

Related terms

Model Disagreement

Model disagreement is when two or more frontier AI models give materially different answers to the same prompt. It is the strongest available signal that a claim is contested, uncertain, or context-dependent.

Multi-Model AI

Multi-model AI is the practice of running the same prompt across two or more frontier models from different providers — and comparing the answers — rather than committing to one vendor's model.

See how Backplain handles this.

Sign-Up Book a Demo

← All glossary terms