7 Requirements for an Augmented Reality Positioning System

For me, a positioning system has a few requirements to be appropriate for widespread use in Rainbow’s End-style augmented reality:

  1. The system should scale to any number of mobile devices.
  2. The system should work indoors and outdoors. It should also work underground in places like subway stations.
  3. No one should be able to track the position of devices in the system.
  4. A mobile device should require minimal warm-up time of less than ten seconds
  5. A mobile device should be able to determine its position on an ongoing basis with a frequency of at least 30Hz.
  6. A mobile device should be able to pinpoint its position down to 1cm or less.
  7. A mobile device should be able to operate with its positioning system activate at all times and still maintain a reasonable battery life.

The closest current contender is GPS. Let’s see how it does on each of those front:

  1. So far so good. The GPS satellites don’t care how many receivers there are. GPS has weathered an explosion in the number of receivers over the past ten years and come through just fine.
  2. GPS fails this one. It works outdoors most of the time but indoors only if you are near an equator-facing window. It never works underground.
  3. Since GPS receivers only listen, this is generally true.  The 911-driven remote activation requirements allow some GPS devices to be trackable, but the tracking happens through the phone’s network connection not through the positioning system itself.
  4. GPS manufacturers claim warm-start times under ten seconds. According TTFF measurements for many models from 2003 some models can warm-start in under ten seconds. Things have significantly improved since then.
  5. GPS receivers typically send an NMEA position sentence once per second (or 1Hz). SparkFun lists a few GPS components in the 5-10Hz range. It’s not clear if this is a limitation of the system or if GPS has an inherent update frequency limitation, so we’ll assume that improved chipsets will get the frequency up to 30Hz.
  6. GPS completely fails this one. Under ideal circumstances and non-real-time post-processing GPS will get you down to about 2cm. Under normal circumstances the accuracy is more like 10-50m. GPS will tell you what street you’re on (if you assume you’re on a street) or what house you’re in, but it can’t tell you what room you’re in.
  7. Current GPS receivers still draw too much power to leave them on all the time, but Moore’s Law is changing that. They should be always-on in a few more years.

GPS fails in two very important requirements: where you can use it and how accurate it is.  Satellite-based replacements for GPS are likely to have the same failure indoors and underground. If it ever launches, Galileo is supposed to have a commercial encrypted system that provides accuracy down to 1cm, but it still won’t work indoors or underground. Relying on satellite-based positioning is a dead-end for augmented reality.

The other way that AR researchers are tracking position is with a camera-based system. No one has yet built such a system that operates out in the wild, but it would be theoretically possible. A visual tracking system would operate by comparing the stream of images from the camera against a database of images that is stored in the cloud. The exact form of that comparison is a matter of much research. Whether the comparison happens in the cloud or on the mobile device is also an open question. The general form of the system (large database in the cloud and a stream of images from the camera on the mobile device) is pretty stable though. One key assumption here is that the image database for a city-sized area is far too large to download to the mobile device. Let’s see how that does on our requirements:

  1. Because of the requirement that we either stream the camera images to the cloud or the local portion of the database from the cloud to the mobile device, each additional user puts incremental load on the system. The number of users in a local area will be limited by the mobile network bandwidth available to those users. The number of total users of the system will also be limited by the server capacity of the system’s provider, but that end of things can scale out more easily.
  2. This system would work anywhere the database covered. Indoor and underground environments would be fine. Areas where the camera could only see other people (i.e. crowds) would be a problem because the database wouldn’t have anything static to compare against.  If the camera depends on environmental light this system would perform poorly in dark areas (or at night.)
  3. If the camera’s images are streamed to the cloud the system’s provider would know exactly where each device was at all times. If the portion of the database related to a small area is streamed down to the device then the service provider will only be able to locate the device to within that small area. Either way, the provider will know where the user is to within a few hundred feet.
  4. If the camera images are streamed to the cloud, start-up times should be more or less instant. If the database is streamed down to the device it may take a few seconds to get things started, which is well within our tolerance.
  5. Current visual tracking systems have trouble reaching 30Hz, but Moore’s Law should take care of that eventually. For a system that streams the video to the cloud bandwidth can also affect update frequency. Once the link starts filling up with streams from other devices the update frequency goes down for every device.
  6. Visual tracking systems are quite accurate. Finding hard numbers is difficult, but there’s no reason to believe that a visual tracking system would be less accurate than 1cm.
  7. Visual tracking systems are power-hungry at the moment. They require fast cameras, fast network connections, fast CPUs on the mobile devices, and lots of memory. Because so much of the system is unknown, it’s hard to pin down numbers, but I would estimate that we need 100x power reduction before leaving this system on all the time is realistic. That will take Moore’s Law about ten years to accomplish.

If we can solve the low-light and power issues, a visual tracking system would certainly work for a small number of users. Solving the bandwidth constraint for a system that much of the population is using is a more daunting issue. All that bandwidth also makes the system expensive to operate, which will be passed on to end users as either usage fees or advertising. Building a workable generally available visual tracking system not an impossible problem, but it’s certainly a difficult one.

Personally, I’m not satisfied with either of these systems. I have thoughts on how to build a better one, but I’ll save those for a future post. What do you think? Am I missing any major requirements? Are any of mine unnecessary? Am I representing GPS or the imagined visual tracking system unfairly? Let me know in the comments!

~Joe


10 Responses to “7 Requirements for an Augmented Reality Positioning System”

  1. Kelly commented on :

    These requirements seem kind of arbitrary to me. Plenty of airplanes fly perfectly well with flight system updates much less than 30Hz. I’m not even sure what 1cm positional accuracy means here – measured against a real datum? measured against the WGS84 ellipsoid? What about elevation?

    I can barely imagine a closed-course sensor-laden playground that provides < 1cm resolution. Maybe if you build a bunch of Kinect-like optical or laser sensor that actively scan and track things, and then use sensor fusion techniques to stitch their individual scenes together…

  2. rouli said on :

    Sorry for the long reply. I was planning to write a post about it for a long time, and never found the motivation to do so (till now :) . Hope it makes some sense.

    It’s latency, rather than bandwidth, that will determine the nature of such future AR system. While the amount of available bandwidth grows as years pass, latency has a hard set limit – the speed of light. Sending your cellphone camera’s output to some server and waiting for 60ms for a server reply with your location won’t cut it I’m afraid (and 60ms is considered fast). The augmentation would be slow and jittery.

    That’s why it’s necessary for the cellphone itself to do all the real-time computations locally, and therefore, the local image database would have to be streamed to the cellphone rather than the other way around. Now, since the local image database is the same for everybody in the same locality, there’s no reason why it cannot be broadcasted. Broadcasting would also solve the problem of privacy (3).

    I imagine a future where every store (or office, or public place) has a device that broadcast a database of images that can be used to augment its interior and surroundings. The only other viable option (as far as I can see), is to have such devices that serve as local augmentation servers to which the phone send its camera stream (and since they are local and close-by, there’s no problem of bandwidth or latency). Either way, AR needs to be distributed.

  3. Joe wrote on :

    @Kelly: The requirements are certainly somewhat arbitrary. What number would you pick for location accuracy? 1m is definitely too big. If you’re drawing something related to an object 10m in front of you, but you can’t tell where you are to within 1m, you are likely to complete miss the object. The coordinate system that the position is known in seems irrelevant to me as long as the objects you’re going to overlay on the scene are in the same coordinate system (or can be converted to be.)

    In the case of location accuracy and update frequency the goal is for the artifacts that the user can detect to be minimized. For an AR system that uses a head mounted display the user’s head is going to move in a significantly less predictable path than an airplane. That is even more true for a big plane like the autopilot systems on commercial jets you might be referring to. Autonomous cars likely need less than 30Hz too. In the case of mounting a system to somebody’s head it’s more about rapid changes in acceleration than about any kind of top speed. (Though for a person in a car you actually have both.)

  4. Joe replied on :

    @Rouli: Over a link that has way more bandwidth than you need, you’re right. It’s all about the latency. Over a link that’s seeing a lot of use low bandwidth starts to cause high latency. I probably should have included a latency requirement considering how important it is. Having an accurate position at 30Hz doesn’t help you much if the position you’re getting is a few seconds out of date.

    You’re probably right about the computation taking place on the mobile device itself. It has the added benefit of making the whole thing much more scalable too. Every new user comes with a new chunk of CPU horsepower.

  5. Hampus Jakobsson commented on :

    I think you have missed the user interface aspect – a very important requirement is that you can link the “augmented layer” with the “reality layer” so that it is easy for the user both to understand what object is augmentation is linked to, as well as easy to interact with. We spent a lot of time thinking about this when working with our AugmentedID-concept.

    So one more requirement:
    8. The option to find and identify object or text in the picture to simplify the user experience

  6. Joe commented on :

    Is the ability to find and identify objects a positioning requirement, or it a requirement of the larger AR system the positioning is used in? I am hoping to constain the quest just to positioning so that I can avoid many potential requirements of the larger system. Always-on network connection, see through display, gesture recognition, etc. The list could get very long very fast.

  7. Martin commented on :

    Have a look at this. They are using ultrasound based triangulation for indoor tracking of iOS and Android devices using the build in microphones. Should be quite accurate:
    http://www.technologyreview.com/communications/26156/?a=f

  8. Thomas Wrobel replied on :

    This is more or less how I see it working;

    a) Use gps or other means for a client to know very roughly where it is.

    b) Cache as much infrastructure image data as is responsible about the surrounding area.(I don’t think gigabytes is out the question given how things progress)

    c) Use the cache to position yourself accurately and with total privacy. Even offline as long as you stay within your cache boundary!

    Anyway , a good list.

  9. simpleblob replied on :

    Doesn’t Vernor Vinge elaborate on the working of localizers in his book, Deepness in the Sky? ( IIRC. There’s a little bit in Rainbows End too)

    Basically it’s an Ultrawide band wireless mesh node with time-of-flight calculations. That gives us relative coordinate of those nodes. Then we hook up to one big GPS-enabled node to place the whole thing in realspace coordinate.

  10. Joe thought on :

    I don’t remember a specific passage getting into the details and unfortunately Rainbow’s End doesn’t appear to be up as a free download anymore so I can’t search for it.

    It wouldn’t surprise me though. Those are the lines I’m thinking along… I am just trying to figure out how to actually build the thing.

Leave a Reply