Lately I’ve developed an intense fascination with Augmented Reality (AR). This doesn’t bode well for my Facebook game, but learning about the current state of AR is taking up quite a lot of my free time. This post is my attempt to distill what I’ve learned so far.
What is Augmented Reality?
Augmented Reality is the combination of artificial and natural sensory input in in such a way that the artificial input “augments” the natural input. That’s the all-encompassing definition anyway; when you hear someone talking about AR they are probably talking about adding computer-generated images to a video feed and showing the result to the user. AR is technically a subset of Mixed Reality owing to its position on the virtuality continuum (image from Wikipedia):
Augmented Reality has been a subject of countless academic research projects since the mid-1990s, but it hasn’t really broken through into very many commercial applications. There has been talk of applications in wiring jets at Boeing, enhancing exhibits in museums, and personal navigation, but it’s not clear that any of those are really being used. In fact, AR seems to be one of those research projects that people work on while they’re in grad school and then abandon for more practical pursuits after graduation. The number of people that have been researching AR consistently for the past ten years is very small.
Augmented Reality has obvious crossovers with Virtual Reality, specifically in display technologies. The most exciting AR applications use similar head-mounted displays to those used in Virtual Reality, and gain tremendous benefit from head tracking. Augmented Reality also uses rendered 3D images, but those are hardly the exclusive domain of VR. In terms of rendering both AR and VR are taking their cues from the game industry at this point.
Computer Vision is also mentioned frequently in AR circles. The biggest problem that AR needs to solve to work really well is to accurately determine the position of the viewer. Many AR researchers use Computer Vision techniques to recognize objects in the scene and work backwards from there to determine the position and orientation of the camera. Much of this research uses markers in the scene itself, including both ARTag and the various flavors of AR Toolkit. The markers make object recognition faster and far more reliable. Here is an example of such markers in action (from the ARTag website):
ARTag is GPL’d, so in theory you can download it and start building your own marker-based AR applications. Unfortunately Mark Fiala, the guy in the photo above, is no longer at his university and there seems to be some sort of custody battle going on over who can distribute ARTag. So you can buy his book to learn all about how to use the SDK, but you can’t actually download the SDK. (Incidently, I can’t really recommend that book to an experienced game programmer. It was about 75% Game Programming 101 and had precious few details on how ARTag actually works behind the scenes.)
Approaches to AR in the Current Research
There are three major approaches to AR in the research I’ve been able to find. The first is Magic Lens, where a handheld device of some sort shows the view from its camera with graphics overlayed. The second is Magic Mirror, where a camera is used to show the user an augmented version of himself. I wasn’t able to find a fancy name for the third approach, so I’ll call it First Person. In First Person the user’s view is overlaid with the augmented bits directly. All three of these approaches have appeared in many papers and research projects over the past ten years.
Magic Lens is the approach that seems to be getting the most traction. Today’s PDAs are powerful enough to run marker-detection algorithms and rendering engines in real time, and they all have cameras. By far the most interesting of these projects is Enkin, which is a prototype of a device that promises to offer Magic Lens navigation on Google’s Android. The paper on Enkin’s site outlines many of the very same problems I’m considering in my own nascent AR research. Magic Lens neatly side-steps the biggest problem with First-Person AR by using existing PDA screens instead of expensive and inadequate head mounted displays.
Magic Mirror is an approach that doesn’t seem to be useful for much more than making YouTube videos. The user’s own image is going to dominate the screen almost by definition. It could see some use in trying on hairstyles or clothing without actually trying them on, but I don’t think we’ll see widespread use of Magic Mirror as an AR display metaphor.
The third major AR approach is First Person. The user wears some sort of head-mounted display that tracks the motion of their head through inertial systems, computer vision, or (more likely) a combination of both. From what I’ve been able to find, most AR research using this approach completely obscures the user’s vision and they see only what’s coming through the video camera mounted to their head. Because everything the user sees is on the display, imperfections in head tracking or poor frame rate can cause serious motion sickness. There are a few head mounted displays that are coming on the market soon that allow graphics to painted in the user’s field of view without obscuring their natural vision, so those will help.
Of all these approaches, I think First Person is going to win out in the end. Magic Lens is powerful, but suffers from a need to pay close attention to a little two inch screen on your PDA. That isn’t a big problem in most of the demos, but it precludes using it while driving and eliminates many gaming applications. If you try to move much while your focus is on your PDA you are eventually going to get yourself run over by a bus. We will probably see a rise in Magic Lens AR over the next five years or so and then see a rather sudden shift to First Person once the display technology catches up.
How Far Out is Augmented Reality?
Given sufficient infrastructure to improve location detection in mobile devices, we could have mass-market AR now. Setting up transmitters to precisely determine location and orientation works fine in a laboratory environment. However, it’s unlikely that such infrastructure investment would happen without a strong push from the public, and that’s not going to happen for such an unproven technology. GPS gives AR devices only a very rough idea of where it is, down to 2m in the best cases. For many applications the accuracy needs to be 1cm or less to avoid horrible jittering. The hot new GPS technology coming out in 2013 will still only be accurate down to one meter.
AR markers are used as a solution for this problem, but they aren’t practical in the real world either. We aren’t going to carpet the world with AR markers just so the first few dozen geeks with AR goggles can find their way around without looking at a map. The fact that they can actually show results from their research when they use markers seems to have distracted many AR researches from the need to develop marker-less solutions.
Computer Vision offers an alternative to both markers and fine-grained GPS, and great advances have been made in determining camera position and scene geometry from an arbitrary set of 2D images. These algorithms seem to do a good job of figuring out what they’re looking at, but it’s not clear if they’re running in real time yet. Unlike AR researchers, Computer Vision researchers don’t seem to be into whiz-bang demos. If the problem with the current algorithms really is speed, simply waiting for computing power to catch up will likely resolve this issue.
My entirely unscientific gut feel is that we will see early Magic Lens AR applications hit the market in about two years. Early First Person applications will probably appear around the same time, but won’t be compelling enough to gain widespread use. About 5 years out I expect to see First Person applications of AR begin to take off thanks to increased mobile computing power and improvements in head-mounted displays. I believe mainstream adoption of AR is about ten years away.
On the other hand, I could be totally wrong and Augmented Reality will turn out to be “just around the corner” forever like Artificial Intelligence and Virtual Reality.