Archive for August, 2009

My Layar development experience

When SPRX Mobile announced that they were opening up the Layar API back in July, I applied immediately. I wanted to learn more about publishing geo-coded data, keep abreast of what Layar was up to, and try to deliver some useful data all at the same time. Fortunately my application was accepted and I received one of the first batch of API keys to go out.

My specific project has been to take the real-time bus arrival information provided by One Bus Away and publish it on the Layar platform. I use the mobile-formatted One Bus Away website at least twice per workday as part of my commute. This data is currently only available in Seattle, but will soon be expanding to everywhere that offers a GTFS feed. My feelings about this experience have been almost entirely positive, but I still come away from it discouraged.

On one hand, the people building Layar (Dirk in particular) have been very helpful. The platform is easy to develop for and they provide good documentation and tools to make it even easier. All of the time spent on this project (which took less than 24 working hours, total) was spent figuring out Google AppEngine, the python web framework I used, the One Bus Away API, and how to filter nearby stops to a reasonable set to show to a user. With minor exceptions Layar performed very well. I have provided all 436 lines of code here so you can see for yourself how easy it was.

Marjolein and Claire from SPRX were helpful in less technical ways too.  All developers were invited to the launch event to show off their layers. They ran several conference calls for people all over the world to answer any questions on the API or about the launch. SPRX has done a great job with the launch of Layar 2.0, and I think all the positive press they have received is a direct result of that.

My discouragement has less to do with Layar specifically than it does with the entire category of tricorder augmented reality. The view through the mobile phone and its camera is less useful than a top-down map would be for every piece of data I have seen so far. For my layer in particular, the rider is very likely to know where the stop is. In situations like that where location is unimportant, both the Reality View and Map View actually get in the way.

This experience has led me to two conclusions.  First, augmented vision is pointless until head-mounted displays are available.  I already felt that way, so now I am just more firm in my belief.  Second, filtering data to a useful subset for display is actually the hard problem.  Job listing sites, travel sites, Ecommerce sites, and review sites already knew this, which is why they spend so much effort on search. Turns out the problem is the same for mobile location-aware services.

If you live in Seattle and would like to try out the One Bus Away layer for Layar, just search for One Bus Away inside of Layar.  I welcome your feedback on how I could make this layer more useful. And, of course, I would also love to hear your thoughts on the utility of augmented vision on a mobile phone.

Cameras vs. Sensors

If you search for “augmented reality” in Google, most of the hits will involve systems that analyze the output of a video stream in order to figure out what to draw in the overlay and where to draw it.  Sometimes the what and where are answered by the same marker (as in the endless YouTube AR clips.) In the more interesting examples the what comes from using the camera to figure out where the camera is pointed in more general terms and then to draw something positioned in some sort of known coordinate space (like PTAM or the recently announced MetaIO World.) This latter approach is broadly termed visual odometry. This seems to be what most people think of when they refer to AR, and that is no surprise given how much academic AR research focusses on computer vision.

As Wikitude (and more recently Layar, Nearest Tube, and Wimbledon Seer) has shown us, there is another way.  Making sense of a video stream is hard, particularly on a mobile device. Why not just use the non-camera sensors on that device (GPS receiver, tilt sensor, and compass) to provide the absolute position and orientation of the device and then look up nearby waypoints from some sort of database. This approach makes these applications more similar to map-based location aware apps (like Whrrl and Urban Spoon) than to those YouTube videos, but it’s not clear that users care.

Using sensors to determine position and orientation has key advantages.  The first is that it works in more environments.  While GPS often fails indoors, it works fine at night, at sea, and on most parts of the earth. Visual odometry has been shown to work relative to a start point — basically where you start up the tracking system — but not relative to an absolute coordinate system. GPS is also immune to nearby objects moving around. Real world scenes are very dynamic and moving cars, furniture, and people around can throw off vision-based systems. Tilt sensors comprised of accelerometers and gyros are quite good at returning stable, accurate pitch and roll values.  Compasses are somewhat less reliable due to their susceptibility to nearby magnetic fields and large chunks of metal, but they are still able to give you a reasonable approximation of heading. Tilt sensors and compasses also work fine indoors and out of doors.

On the other hand, vision-based tracking systems have advantages of their own.  The biggest is accuracy.  PTAM demonstration videos show an accuracy down to a centimeter or less. Marker-based approaches show even better accuracy. Compare that to the two meters that represents the best possible accuracy of a GPS receiver. Those two orders of magnitude mean that GPS based AR systems simply don’t work for objects that are less than ten meters away. The second advantage for vision based systems is that there are many cases where it is impractical to know about all the objects in the user’s field of view.  They aren’t there yet, but advanced computer vision techniques offer hope that one day a computer will be able to recognize any arbitrary object simply by looking at it.  And until that day arrives there are already readily indexed markers on most items in the form of UPC codes. GPS will never provide such a service, and even if every item in the world had an RFID tag there is no way that every person would have access to the database into which those tags are indices.

Despite its those shortcomings, my belief is that pragmatism is going to result in GPS-based systems winning this fight. The fact is that today’s GPS-based solutions actually work in the general case and vision based have only worked well in very controlled demonstrations. If pioneering companies like SPRX Mobile and Mobilizy start to make money then capital is going to start flowing into this industry. Most of those new companies are going to follow the lead of the existing players and prefer GPS to computer vision. Eventually that will drive sensor-based approaches to get better, faster than vision-based approaches, which will encourage more investment until eventually vision-based AR tracking systems are left in the dust.  One of these improvements could be Galileo, which is expected to offer GPS accuracy down to 2cm. When vision researchers eventually solve the object recognition problem those solutions will be integrated into already existing AR platforms with sensor-based trackers.

What do you think? Do you see vision-based systems coming out on top? Will non-camera sensors be king? Or is a hybrid system the only way to go long-term?