For me, a positioning system has a few requirements to be appropriate for widespread use in Rainbow’s End-style augmented reality:
- The system should scale to any number of mobile devices.
- The system should work indoors and outdoors. It should also work underground in places like subway stations.
- No one should be able to track the position of devices in the system.
- A mobile device should require minimal warm-up time of less than ten seconds
- A mobile device should be able to determine its position on an ongoing basis with a frequency of at least 30Hz.
- A mobile device should be able to pinpoint its position down to 1cm or less.
- A mobile device should be able to operate with its positioning system activate at all times and still maintain a reasonable battery life.
The closest current contender is GPS. Let’s see how it does on each of those front:
- So far so good. The GPS satellites don’t care how many receivers there are. GPS has weathered an explosion in the number of receivers over the past ten years and come through just fine.
- GPS fails this one. It works outdoors most of the time but indoors only if you are near an equator-facing window. It never works underground.
- Since GPS receivers only listen, this is generally true. The 911-driven remote activation requirements allow some GPS devices to be trackable, but the tracking happens through the phone’s network connection not through the positioning system itself.
- GPS manufacturers claim warm-start times under ten seconds. According TTFF measurements for many models from 2003 some models can warm-start in under ten seconds. Things have significantly improved since then.
- GPS receivers typically send an NMEA position sentence once per second (or 1Hz). SparkFun lists a few GPS components in the 5-10Hz range. It’s not clear if this is a limitation of the system or if GPS has an inherent update frequency limitation, so we’ll assume that improved chipsets will get the frequency up to 30Hz.
- GPS completely fails this one. Under ideal circumstances and non-real-time post-processing GPS will get you down to about 2cm. Under normal circumstances the accuracy is more like 10-50m. GPS will tell you what street you’re on (if you assume you’re on a street) or what house you’re in, but it can’t tell you what room you’re in.
- Current GPS receivers still draw too much power to leave them on all the time, but Moore’s Law is changing that. They should be always-on in a few more years.
GPS fails in two very important requirements: where you can use it and how accurate it is. Satellite-based replacements for GPS are likely to have the same failure indoors and underground. If it ever launches, Galileo is supposed to have a commercial encrypted system that provides accuracy down to 1cm, but it still won’t work indoors or underground. Relying on satellite-based positioning is a dead-end for augmented reality.
The other way that AR researchers are tracking position is with a camera-based system. No one has yet built such a system that operates out in the wild, but it would be theoretically possible. A visual tracking system would operate by comparing the stream of images from the camera against a database of images that is stored in the cloud. The exact form of that comparison is a matter of much research. Whether the comparison happens in the cloud or on the mobile device is also an open question. The general form of the system (large database in the cloud and a stream of images from the camera on the mobile device) is pretty stable though. One key assumption here is that the image database for a city-sized area is far too large to download to the mobile device. Let’s see how that does on our requirements:
- Because of the requirement that we either stream the camera images to the cloud or the local portion of the database from the cloud to the mobile device, each additional user puts incremental load on the system. The number of users in a local area will be limited by the mobile network bandwidth available to those users. The number of total users of the system will also be limited by the server capacity of the system’s provider, but that end of things can scale out more easily.
- This system would work anywhere the database covered. Indoor and underground environments would be fine. Areas where the camera could only see other people (i.e. crowds) would be a problem because the database wouldn’t have anything static to compare against. If the camera depends on environmental light this system would perform poorly in dark areas (or at night.)
- If the camera’s images are streamed to the cloud the system’s provider would know exactly where each device was at all times. If the portion of the database related to a small area is streamed down to the device then the service provider will only be able to locate the device to within that small area. Either way, the provider will know where the user is to within a few hundred feet.
- If the camera images are streamed to the cloud, start-up times should be more or less instant. If the database is streamed down to the device it may take a few seconds to get things started, which is well within our tolerance.
- Current visual tracking systems have trouble reaching 30Hz, but Moore’s Law should take care of that eventually. For a system that streams the video to the cloud bandwidth can also affect update frequency. Once the link starts filling up with streams from other devices the update frequency goes down for every device.
- Visual tracking systems are quite accurate. Finding hard numbers is difficult, but there’s no reason to believe that a visual tracking system would be less accurate than 1cm.
- Visual tracking systems are power-hungry at the moment. They require fast cameras, fast network connections, fast CPUs on the mobile devices, and lots of memory. Because so much of the system is unknown, it’s hard to pin down numbers, but I would estimate that we need 100x power reduction before leaving this system on all the time is realistic. That will take Moore’s Law about ten years to accomplish.
If we can solve the low-light and power issues, a visual tracking system would certainly work for a small number of users. Solving the bandwidth constraint for a system that much of the population is using is a more daunting issue. All that bandwidth also makes the system expensive to operate, which will be passed on to end users as either usage fees or advertising. Building a workable generally available visual tracking system not an impossible problem, but it’s certainly a difficult one.
Personally, I’m not satisfied with either of these systems. I have thoughts on how to build a better one, but I’ll save those for a future post. What do you think? Am I missing any major requirements? Are any of mine unnecessary? Am I representing GPS or the imagined visual tracking system unfairly? Let me know in the comments!