Archive for the ‘Augmented Reality’ Category

If Augmented Reality is the solution, what is the problem?

Although augmented vision is where I started with my interest in the field, I have really moved a bit beyond that now.  These days when I say Augmented Reality I really mean wearable mobile computing with an interface that actually works.

With my new job at Valve came a new commute. I spend on the order of three hours a day riding on the bus, waiting for the bus, or walking to or from the bus stop. So far I have occupied my time with podcasts and paperbacks, but I am going to try to spend more time on my AR work starting today.  Thus, I am writing this while on my morning commute to Bellevue.

The mobile computing experience is poor for many reasons but they can be summed up as two things: Input and Output. Yes, that’s all. :)

On the input side the problem is that so much of what I want to do is driven by entering text. My phone (a Treo 650) has a great keyboard… for a phone. Even so, it really doesn’t compare to the experience of typing on a laptop or desktop keyboard. On my laptop the act of entering the text into the computer is not hindered by the act of typing. I’m far from the fastest typist, but I can still type much faster than I can generate the words I want to type. Until a mobile platform offers the same level of comfort and speed it will never be suitable for writing anything longer than text messages and tweets. The idea of writing code via a phone keyboard is just absurd.

My laptop only works while I’m riding the bus or waiting someplace I can sit down. It is s not an option at any stop where I wait while standing. That means I can’t use my laptop at full half of the places where I wait for the bus. I’m not sure I have ability to write while I’m actually walking and not knock over my fellow pedestrians (or get hit by a car) but I certainly have a lot of downtime-while-standing when waiting for my transfer on the way home.

So the first problem I want to solve that I keep in my mental file labelled “Augmented Reality” is the ability enter thousands of words of text comfortably while out and about.

The second problem is getting video output from the computer while out and about.  My laptop does an good job for the part of the time when I can have it open. It’s hard to see the screen on a bright day, but we fortunately don’t get too many of those here in Seattle. The trouble is that I could use output from my computer in many more places than I can break out the laptop.

The screen on my Treo (or my iPod Touch) is a little better in some ways. It’s much more feasible to bring it out when I need to check a bus arrival time. Thanks to and my phone I can get this information on demand. All I have to do is wake up my phone, open the web browser, hit one of the bookmarks I’ve saved for the bus stops I frequent, and wait for 10-15 seconds while the page loads. If I’ve recently looked up that information for a stop I just need to wake up the phone and hit refresh to get updated information. It’s definitely better than looking at a clock and the never-very-accurate schedules on the stop itself.

What I really want, however, is to just know this information. On my way home from work there are two questions I often ask:

  1. How far is the #550 from my stop? Do I need to run? – By the time I can see the bus it’s only about 40 feet from the stop, so having some notice that I’m going to miss it would really help.
  2. How long do I have before the #1, #2, #4, and #13 arrive at this stop? – Any of these busses will get me within walking distance of home, the only real difference is how far I have to walk. The #2 stops a half block from my house, so I prefer it, but if it is lagging behind the others by a wide enough margin it isn’t worth waiting.

This kind of ambient awareness is where the augmented vision comes in. If it is within an hour of one of my usual riding time at one of my usual bus stops I want to see the current data from for that stop.  Big obvious columns of light that let me see the bus approaching from blocks away would be cool, but they probably don’t solve the problem as well as a 2D display on my personal HUD that I can glance at occasionally. “When will my bus arrive” is just the most obvious question I want a constant answer to. Once that one is solved I imagine that many more will present themselves. (I also imagine that many of those will actually require some level of registration with the world.)

Once I am wearing a head-mounted display I will probably use it for one more purpose. I would like to be able to block out the world in front of me once I am actually sitting on the bus. I am prone to visual distractions, and have a hard time focussing on much of anything when a bunch of people are around me. If I could occlude most of my field of view with whatever I’m working on the distraction would be greatly reduced.

The problem that I am interested in solving is “Mobile computing sucks.” Location and temporally aware wearable computers with first person displays are the solution to that problem.

Augmented Reality should be open

Over the past year I’ve spent a lot of time thinking about what piece of the augmented reality ecosystem would be the best to start a business around. I’m still not ready to take that jump so, in my case at least, the answer is still “none yet”.  However, in my exploring I keep coming up against a problem:

  1. The absolute most profitable place to be in augmented reality is the platform provider at the center of everything.
  2. The profit motives of that platform provider could set the development of AR back by about ten years.

A brief history of the web

Whether by design or happy accident the protocols (HTML and HTTP) behind the web are easy to implement and completely open. This meant that by the time Netscape came along, there were already browsers on the Macintosh (CERN’s and Mosaic), Windows (Mosaic), and X (CERN, Mosaic, Viola, etc.) There were also 200 active web servers and port 80 accounted for more than 1% of the traffic on the NSF backbone.

That ecosystem meant that Netscape remain compatible with what already existed in order to succeed.  Sure, they were selling licenses to their own software, which let them cash in on the shocking growth of the web, but the Netscape browser had to work just as well against pages served by HTTPD, IIS, Apache, and any other random web server anyone decided to write. The same thing was true from the other side.  Netscape Now! buttons aside, website operators soon had to deal with at least two and possibly more different browser, as well as various versions of each browser.

This made life interesting for web designers, but it was good for the web as an platform. The nature of the web meant that nobody had to convince somebody else to say “Yes” to get involved.  There is no way that any one company (or any ten companies for that matter) could have even authorized, let alone managed, all of the initiatives that went on with the web between 1994 and 2000. There was just too much stuff happening.

The open nature of the web allowed the cost of innovation to be spread around to thousands of organizations around the world.  It also let anyone with enough cash to buy some hosting try out their big idea. Most of those ideas failed, of course, but when taken as a whole they succeeded beyond anyone’s wildest expectations.

I think that augmented reality has the potential to follow a growth curve with the same shape as the one the web followed. The web had very few institutional barriers standing in the way of its growth, and the AR ecosystem would do well to learn from that.

Open Augmented Reality

If the emerging augmented reality ecosystem wants to grow as quickly as the web it cannot include anyone who must say “Yes” to allow existing users to get a new capability. That implies a few things:

  1. Anyone can publish content into the system. There are no controls for quality or appropriateness of content on this ability to publish.
  2. Clients from multiple vendors are able to view that content. Anyone who choses to can write a new client that works with existing content.
  3. Servers from multiple vendors are able to respond to requests for data. Choosing server technology is primarily a decision for content providers to make and their choice is invisible to end users.
  4. The network itself is neutral to the data being transmitted across it. This means the mobile internet providers must not white-list content from publishers that it has partnerships with.
  5. There is no single central directory that all content (or every content provider) must be listed in to be available.

Note, that this does not require that the software in question be open source. Open source software (in the form of Linux, HTTPD, Apache, Perl, PHP, and others) was instrumental in spreading the web far and wide. However, the personal computer revolution happened with little in the way of open source software and was just as rapid as the spread of the internet.

Open Standards

As VRML and many other standards over the years have taught us, developing a new standard from whole cloth is fraught with peril. It is even more difficult (as in the case of VRML) when there is not an existing standard that the new standard is intended to supplant. The AR community must avoid repeating the history of VRML. Fortunately there are existing standards that lend themselves well to the problems augmented reality developers are trying to solve.

The first of these is good old HTTP. As a transport protocol, HTTP fits the list above very well. The protocol is well understood, decentralized, and available in server or client library form for every platform. Minor new standards for querying location-specific data are already emerging.

The second current standard that the augmented reality developers can adopt and bend to their will is KML. KML is the file format that Google Earth uses to represent geocoded information. It has support for points, lines, and shapes. KML is an open standard and is supported by many GIS packages in addition to Google Maps and Google Earth.  Google has open-sourced its own KML parsing library so there is a place to start there too.

Any augmented reality client that supports attaching web browsers (including URLs) to locations can also take advantage of most other existing web standards for whatever happens to be in those browsers.

Is this how things are actually going?

So far, I have seen very little discussion of how different augmented reality systems will work together.  In large part that is  the point of this post. But then there are also very few AR systems that exist outside of laboratories, so we could just be in the bad old proprietary hypertext system days of the late 80s.

So far the AR systems that seem to be designed for lots of different kinds of data (Layar and Seer) have not announce any way for third parties to publish data for their clients. My twitter exchanges with Raimo at SPRXMobile make me think that Layar is at least thinking about it.  Hopefully they will turn out to be as open as I’ve outlined above.

How important do you think open AR standards are? Can an AR solution succeed without them?

Slidecast of my Augmented Reality presentation from LOGIN 2009

This is my presentation from LOGIN 2009 titled “What Augmented Reality Means for Game Developers.” It is more or less aimed at game developers, but is really just where I see AR going in general. The presentation itself is 50 minutes long followed by 20 minutes of Q&A.

It is also my first attempt to post a SlideCast, so if something in there is messed up, let me know.

You can download the slides from SlideShare, and the audio can be found here. My own audio came through fine, but next time I think I need to figure out a way to mic the audience.

Notes from the AR Dinner at GDC09

I had a wonderful dinner the other night with a bunch of game developers who are interested in Augmented Reality. In attendance were Stefan Misslinger and Noora Guldemond from MetaIO, Mitch Ferguson from Carbine, John Walker, Cory Bloyd, and Ron Haidenger and Paul Travers from Vuzix. It was great to spend the evening talking to other people who are as interested in AR as I am.

Noora from MetaIO talked a bit about how their LEGO kiosks have been received by end users. At first they don’t get it because they’re looking at the box in their hand. Once they notice the screen they are amazed and start running around the store trying out other boxes. There is a kiosk installed at a LEGO store in San Mateo. I’m not sure if the store in Seattle has one or not, so I may try to go and take a look over the weekend.

John Walker spent quite a bit of time talking about about the AR applications he has worked on for the Department of Defense. Even though I was sitting directly across from him, I unfortunately couldn’t hear a word he was saying. Hopefully I can catch up with John later in the week and get the scoop. (Since I’m posting this on Sunday, I can say that didn’t happen.)

I spent the night peppering Paul Travers from Vuzix with questions about the Wrap 920AV glasses and other things they are working on. This is a bunch of the stuff I was intending to ask when I get a chance to go by their booth, but I got that out of the way early. Here are the answers to those questions:

  • They don’t know exactly what the price will be, but they are expecting it to be less than $500.
  • Paul is very confident that the Wrap glasses will ship this year
  • The displays are 800×600 in these glasses. That’s a step up from the 640×480 resolution that their other glasses use.
  • The two displays are independantly controllable through a variety of methods, but if your software can handle it, you can provide 60Hz to each eye.
  • The IMU for the wrap will include accelerometers, gyros, and magnetic sensors, and will provide yaw, pitch, and roll to the software at a very high rate.
  • When they are in visual pass-through mode the Wraps will blend a translucent scene over the world. In this mode the brighter a pixel is the more visible it will be to the user. That makes black the transparent color and white the “visible as it gets” color.
  • Paul was coy about exactly what the specs on the camera will be. I think they aren’t 100% settled yet. He was very aware of the issues with frame rate on USB cameras, though, so hopefully they will figure out a way to provide a reasonable frame rate (or at least crisp frames.)

It turns out that as part of their research into how to get the IMU working they have been with the same SparkFun 6DOF IMU that I have. They have also had trouble with the magnetic sensors. The voltage range provided by the sensors is far too small and there is no amplification between the sensors and the microcontroller. The result of all this is that the noise in the system tends to swamp the actual readings. That sound like exactly the problem I have run into.

I left dinner very excited about the next year of Augmented Reality. In six months or so I will be able to buy a pair of glasses (with IMU) for less than $500 that will show visuals over the world. Right now the other options in this space cost $1700 for the IMU and $30,000 for the glasses. When the price on a piece of technology drops to 1/60th of what it used to be it unlocks a huge potential for exciting new applications. I can’t wait!

Useless magnetic sensors

After getting roll and pitch working through the Kalman filter, this week I wanted to move on to yaw. Too bad the magnetic sensors in the SparkFun IMU don’t actually work:

While I was recording those values the IMU rotated a full 360 degrees and was even turned upside down.  MagZ should have inverted when it turned upside down, at least.  I guess there is enough stuff going on inside the IMU that it mostly detects itself.

I tried using just the gyros to track yaw by dead reckoning, but they drift enough that the fish are turned 90 degrees after about ten seconds. I’ll have to wait to track yaw until I can get a magnetic compas that works or have vision-based tracking working well enough to use it to compensate for the drift.