Vision-capable AI models have improved dramatically in the past two years. Show one a photograph of a street corner, a coastline, or a satellite view of a patch of farmland, and there is a reasonable chance it will name the country correctly, sometimes the city, occasionally the exact neighbourhood. Naturally, this raises a question for anyone who plays geography games: are humans still competitive?
We decided to find out. We ran a small experiment: 50 randomly chosen satellite images from EarthGuessr's pool, scored against the same scoring system the game uses, played by a top-tier AI vision model on one side and a panel of human players on the other. The humans ranged from a complete beginner to one of EarthGuessr's top global players. The AI was given the same single satellite image, no metadata, no extra context, and asked for its best guess as a latitude-longitude pair.
The results were not what most people would predict.
What the AI Was Surprisingly Good At
On obvious rounds — coastal cities, distinctive landforms, anything with a famous landmark visible in the frame — the AI was excellent. Show it a satellite view of Manhattan and it placed within a few city blocks. Show it the bend of the Amazon at Manaus and it placed within twenty kilometres. Anything that involved recognising a specific, well-photographed location, the AI handled at the level of a top human player or better.
It was also strong on broad climate reasoning. Show it a tropical rainforest and it correctly narrowed to the Amazon, Congo Basin, or Southeast Asia. Show it ice and it correctly narrowed to the polar regions. The kind of large-scale biome reasoning that beginners take months to learn, the AI did instantly. On the obvious rounds, our best human player scored 4,200 average points out of 5,000. The AI scored 4,400.
What the AI Was Surprisingly Bad At
The featureless rounds — the hard ones — broke the AI completely. Show it an unmarked patch of Siberian taiga and it guessed the Canadian boreal forest. Show it a section of Saharan dune and it guessed the Empty Quarter (a continent away). Show it a stretch of open ocean and it picked random spots in entirely the wrong hemisphere.
The pattern was consistent: when the image had a single dominant clue, the AI was excellent. When the round required combining several weak clues into a single confident hypothesis — the soil colour suggesting one region, the river meander suggesting another, the field geometry pinning it down — the AI struggled. The compound reasoning, the kind of layered inference that experienced human players do almost unconsciously, was where the model fell apart.
On the hardest 10 rounds in our set, our best human player scored 2,800 average points. The AI scored 1,100.
The Final Score (And the Real Lesson)
Across all 50 rounds, the AI scored an average of 3,650 points per round. Our top human player scored 3,820. The average human in our panel scored 2,400. The AI was better than most humans. Our top human was still better than the AI — by a small but consistent margin — driven entirely by the AI's collapse on the hardest rounds.
Why This Result Was Predictable in Retrospect
Modern vision models are trained on a vast pool of labelled imagery from the internet. They are essentially excellent at recognition: if a scene resembles something they have seen labelled before, they get the answer right. They are far weaker at reasoning from first principles in a domain where labels are sparse — and unmarked satellite imagery of remote regions is exactly that kind of domain. There is very little training data for the inside of the Empty Quarter, or for a random patch of Lake Superior shoreline, because nobody photographs those locations and labels them by coordinate.
Humans, by contrast, do not need a labelled training set. A human player who has spent enough hours playing satellite imagery games learns to reason from physics — soil colour and climate, river meander and slope, field shape and irrigation type — and to combine those into a confident hypothesis even when the AI has nothing to grab onto.
What This Means for Geography Games
If you were worried that AI would make geography games trivial, the answer is: it has not, and it probably will not for some time. The easy rounds are easy for everyone, the AI included. The hard rounds — the ones that actually train the skill, the ones that produce the satisfying "aha" moments — are still genuinely a human domain. The compound reasoning required for them is exactly the kind of cognition that current models are not yet good at. That gap will close over time, but the playing experience does not depend on humans being the best in the world at the task. It depends on the task being interesting to think through, and that has nothing to do with AI.
If anything, the experiment made us appreciate the format more, not less. The geography skill that beats AI is not the skill that knows trivia. It is the skill that reasons through layered evidence under uncertainty. That is a skill worth building for its own sake — and one that geography games seem to be among the best tools in the world for training. We have plans to run this experiment again next year, and the year after. We will be very surprised if the gap between human top-tier play and frontier models closes faster than the game itself evolves to push the difficulty back up.