March 13, 2010

p

Obvious Idea #1: OpenStreetMap for AR Tracking Images

Filed under: Augmented Reality, Obvious Ideas — Joe @ 10:51 am

At TED 2010 Blaise Aguera y Arcas from Microsoft demoed live integration of video into the existing structure-from-motion dataset in Photosynth. Though his demo showed a video feed moving around a scene the same data could just as easily be turned around to find the precise position of the camera in real-time. That capability is a key part of building a head-mounted augmented reality system.

Two weeks later Google announced that they are incorporating user photos into Google Street View. This requires essentially the same data as Photosynth. Google has the added advantage that they can combine it with the Street View images and LIDAR data they are already collecting. Though they haven’t demonstrated real-time capability with this data they certainly have all the pieces they need to make this happen.

Access to the data required to perform pose recognition with cameras is a novelty at the moment, but if mobile augmented reality takes off in a big way it will become a key component of that system.  In my opinion this component is too important to be left in the hands of one company. A much more desirable situation would be to have an OpenStreetMap-type project to accumulate and curate a freely available dataset to provide structure from motion and pose recognition for use in mobile augmented reality and whatever other uses someone can dream up.

OpenStreetMap is a project that sprung up to provide access to data that was free from the costs and restrictions that come with commercial data. It uses a Creative Commons license to make the data free for use by anyone for most any purpose. Although OpenStreetMap came about in response to the restrictions on commercial data sources, the same approach could be taken for 3D structure and image data even though commercial sources for that data do not yet exist. If OpenStreetMap had existed when car navigation systems became feasible in the late nineties it is likely that many commercial products could have been developed on open data at far lower cost and in much more variety.

All such a project needs is a small number of dedicated people to get it started. Download a copy of Bundler (an open source structure from motion library based on the same research that spawned Photosynth) and seek out publicly available photograph libraries. Then talk a cloud computing provider into sponsoring the project by hosting the data and build things up from there. The project won’t have many users for a few years, but as the accuracy and coverage of the dataset grows the set of applications based on this open data will grow too. Somebody just has to get the ball rolling.


I have a bunch of ideas like this one rattling around in my head. Some of them could be products or businesses, and some are just cool projects. I have looked into them all to some degree but probably never start real work on them. I’m going to post them here in an attempt to spawn a discussion and encourage you to share your thoughts in the comments. Feel free to do whatever you like with these ideas.

December 31, 2009

p

2010

Filed under: Augmented Reality, Game Industry — Joe @ 5:24 pm

This rounds out my trilogy of year end posts. Here is what I think will happen in the coming year. I would love to hear your thoughts on these predictions:

  1. Star Trek Online will be the only significant MMO launch in 2010. It will do well enough to make Atari and Cryptic plenty of money, but will not do nearly as well as World of Warcraft, so many people will consider it a failure. (Those people are dumb.)
  2. At least three of the major unreleased MMO projects will be cancelled. I have a guess about which two are most likely, but I’ll keep that to myself.  Qualifying projects include:
    • Guild Wars 2
    • Whatever Carbine is working on
    • That console MMO Turbine hasn’t said much about
    • Whatever Trion is working on
    • The Sci-Fi channel tie-in MMO that Trion has said they’re working on
    • Whatever Zenimax is working on that may or may not be Fallout
    • Whatever 38 Studios is working on
    • The Agency
    • Whatever Gazillion’s Gargantuan studio is working on
    • Star Wars: The Old Republic
    • The second MMO that CCP is working on down in Atlanta where all those White Wolf people are.  Hmm. What could it be?
    • Whatever Red 5 is working on
    • DC Online
    • That other MMO I know NCSoft is working on that is completely under the radar
    • Whatever Slipgate Ironworks is working on
    • The second MMO Blizzard has in the works
    • APB
    • Jumpgate: Evolution
  3. Project Natal and the Playstation Motion Controller will both come out.  Natal will do fairly well. Both controllers will allow some new kinds of games, but we won’t see any compelling examples of those games until 2011.
  4. Unemployment will peak and then start to fall.
  5. The compass+GPS augmented reality products will begin to shift to general location-awareness and away from their Augmented Reality roots. They will de-emphasize magic lens and start to emphasize aggregation of nearby content.
  6. No consumer-level see-through displays will come out in 2010. Significant progress toward them will be made, but nothing will be released.
  7. Neither Google nor Apple will release any kind of AR-focused hardware
  8. The use of “wave it in front of your webcam” type AR in advertising will peak with an AR-enhanced ad in the Superbowl.  The backlash will begin. By the end of the year the advertising world will have moved on.
  9. Apple will release its tablet and a new iPhone (faster and more storage) but won’t release anything that is specifically an AR product.
  10. Apple will address the pain caused by its app-store approval process, at least in part. I have no idea what their specific solution will be, but they aren’t going to let their developer community grow to hate them.
  11. Android will continue to pick up steam. By the end of the year Android will boast 50,000 applications.
  12. People will spend the entire year trying to find something really useful to do with Google Wave. They won’t succeed in 2010.
  13. Google will make Wave interoperate with email. This will make it useful as an email client if nothing else.

Ok, that’s the last of this kind of post for at least a year.  If only I could get back to posting regular stuff again. :)

December 25, 2009

p

The twenty-teens

Filed under: Augmented Reality, Game Industry, Off Topic — Joe @ 8:28 am

Around this time of year for the past few years I have written a blog post listing what I expected to occur during the coming year. Since this new year marks the start of a new decade, I thought I would start a new tradition and write a post on my expectations for the coming decade. 2020 is a long way away, so I’m sure most of this will miss the mark. Hopefully at least 48 year old me will be amused by what 38 year old me had to say.

Please note that just because something is on this list does not mean that it’s something I want to happen, only that it’s something I think will happen. Anything that’s missing from this list is probably just something I didn’t think of.

I would love to hear your thoughts on any or all of these.  Please comment below.

General Technology Trends:

  1. Moore’s Law will continue to operate for the entire decade. That means a given form-factor of computing device will be approximately 100x the power of the same form-factor today.
  2. Mobile computing will dominate. Everyone who owns a laptop or desktop today will have a mobile device that is about 10x the power of their current computer.  We may still call these “phones”, but placing voice calls will only be one tiny part of what they do. This device will replace most users’ desktop and laptop computers.
  3. Digital Distribution will be king. Only a tiny fraction of the media that’s currently consumed digitally (TV, movies, music, and software) will be purchased on a hunk of plastic. Both the subscription model (aka Rhapsody or cable television) or the purchase model (aka iTunes or DVDs) will have at least 20% market share, but one of those two models will be gradually taking over. Advertising supported media will be just as big of a deal as it now, but the user will have much more control over how they consume that media (think Hulu rather than broadcast television.) Books are on the same trajectory, but in 2020 the majority of books will still be sold on dead trees.
  4. Speach recognition will gain a lot of ground as the primary way we enter text into a computer. Offices are one place where this trend won’t have advanced very far mostly because of the noise involved.

Game Industry Trends:

  1. Total revenues from video games of all kinds (including mobile and social games) will exceed revenue from movies and television (independantly, not added together.) Games will finally learn to exploit merchandising and secondary markets as vigorously as movies do.
  2. In 2020 no one will be selling a dedicated gaming console. All computing devices in production in ten years will be about consuming other kinds of media just as much as they are about playing games.
  3. Desktop PC gaming will be all but dead, with the majority of triple-A games coming out for multi-media consoles or mobile devices.
  4. Gaming that involves exercise will be the primary way that the majority of people get their exercise.
  5. Location-aware games will be common.

Augmented reality:

  1. A growing minority of people in the developed world will wear heads up displays almost all the time. These displays will be capable of information overlays, but will mostly be about contextual information that is not overlaid on the world. These products will be on the verge of hitting the mainstream, but won’t quite be mainstream yet.
  2. Development of these displays will be by small companies (perhaps companies that are around now) but those companies will be acquired by massive consumer electronics multinationals before wearable displays hit the mainstream.
  3. Recognition of people and text in images (and video) will be nearly perfect, at least in reasonable lighting conditions.
  4. Gestural interfaces will be commonplace. Many hard-core computer users will be sad at how clumsy they are compared to keyboard and mouse.

The fate of specific companies:

  1. Google will be huge and influential. Their influence will likely peak in the 2010s, but it will difficult to see that from the ground. Google will have had some sort of anti-monopoly action taken against them.
  2. Microsoft will fail to transition to the new mobile-centric world and will be in decline. They will still be a very powerful multi-billion-dollar company, but will not own the end-user to nearly the extent they do now.
  3. A company that exists today will be the dominant social network.  that could be Facebook, Twitter, or YouTube, but it probably won’t be MySpace.
  4. Apple will be huge and influential. They won’t ever be as dominant as Microsoft was in the 90s, but they will be very successful. Steve Jobs will still be running the company.

US Politics:

  1. Gay marriage will be legal in most states.
  2. Marijuana use will be legal in California and a few other states.
  3. We won’t have elected a woman president. (My wife came up with this one, but I agree with her.)
  4. The problems of illegal immigration will not be solved.
  5. The problems of providing health-care to everyone that needs it will not be solved.
  6. Privacy in an age of always-on location-aware devices will be a huge topic of debate.
  7. Silicon Valley will remain the world’s premier startup region.
  8. The US will still have troops in both Iraq and Afganistan. These will be like the troops we still have in Germany and South Korea, and will not be in combat often, if ever.

International Politics:

  1. Carbon emissions will be at approximately their peak in 2020.
  2. Oil production will also be peaking around 2020.
  3. Most other countries will be ahead of the US in terms of switching to renewable energy.
  4. Most of the rest of the world will have consumer-friendly privacy regulations in place. Those countries will scratch their heads at the debate raging in the US.

Things that will not happen:

  1. We will not have flying cars, jet-packs, or most of the other things promised by Sci-Fi in the 50s.
  2. There will not be peace in the middle east.
  3. Africa will still be the poorest continent.
  4. Brain-computer interfaces will still not work very well. No one will be uploading themselves into a computer.
  5. We won’t have a human equivalent AI.
  6. We won’t know how to reliably unfreeze people.
  7. World War Three won’t have happened.

October 25, 2009

p

50 Things I Learned at ISMAR 2009

Filed under: Augmented Reality — Joe @ 9:50 pm

The good thing about going to your first conference on a new subject matter is that you’re not jaded and certainly not level capped. So without further ado, here are fifty things I learned at ISMAR:

  1. Metaio is pronounced mehtayo, not (as I’ve been saying) mehtah-ayo.
  2. The high-end HMDs that academics buy for tens of thousands of dollars are terrible.
  3. Nokia has a very cool see-through display with eye tracking up and running in their research lab. This display may never see the light of day.
  4. There are still tons of people doing research with markers.
  5. Robert Rice and I are both 38.
  6. When using a tag-based gesture to activate a menu, users are more accurate and able to select their option more quickly if the options are presented relative to the user’s view than if they are presented relative to the marker’s original location or an object in the world.
  7. Vuzix is working on cool stuff and Paul Travers is a good guy with a passion for AR.
  8. Telepresence is creepy when it is accomplished by projecting a remote video feed onto a static mannekin head. (This was the Anamatronics Shader Lamps Avatar paper and demo.)
  9. Robert Rice really got into AR in early 2008, just like me.
  10. The academic AR community is ready to welcome industry to their conference with open arms. Apparently there were many more companies present this year than last year.
  11. Metaio’s mobile platform (Junaio) is not a clone of Layar/Wikitude in any way. They are building a much more social system based on user-provided content.  Junaio is also going to work on phone with no compass (i.e. the iPhone 3G.)
  12. X from Y is a smart dude. (Sub in any X and Y you like among the many people I met this week. I met so many smart people.)
  13. There are some professors who love the sound of their own voices. OMG, (that one guy) from (that one university) can’t seem to ask a question in less than five minutes.
  14. I believe that augmented reality is the next big technology revolution and will have an impact at least as big as the web’s impact. This will provide opportunities for tons of companies and as a result there’s no reason to start competing bitterly at this early stage.  It turns out Robert Rice agrees with me.
  15. Tish Shute is obsessed with XMPP (and a smart non-dude.)
  16. There are more AR startups out there that are flying under the radar. For instance, there are these two guys from Rochester…
  17. Silicon Valley remains completely oblivious to AR. If Robert and I are right it will be interesting to see what this means for their dominance of the startup community.
  18. The vast majority of the AR research being done in adademia is being done outside the US. I knew this going in, but it was shocking to be confronted with it in person.
  19. Georg Klein (of PTAM fame) works at Microsoft now.  Hmm.
  20. The food in Orlando is terrible.  Maybe they could move this conference to Austin…
  21. Microvision’s display technology works really well.  At least on the monocular test unit that I got a chance to look through after their talk.
  22. There is (or was) at least one PC gamer out there that has never heard of Steam. I was shocked.
  23. Qualcomm is backing AR in a big way and intends to be the hardware provider of choice for mobile AR.
  24. Venture Capital isn’t flowing into augmented reality quite yet. Most AR startups are self-funded or funded by friends and family.
  25. I am much better at networking than I was when I first started going to game conferences.
  26. It is far too early for meaningful standards in AR. It would be awfully nice if the Wikitude content provider API used the same format that people are already providing to Layar, however.
  27. The projector part of Sixth Sense is still a non-starter. The UI parts are still very cool, however.
  28. Robert Rice and I have a creepy number of common traits.
  29. Disney Imagineering makes extensive use of AR.
  30. Peter from Metaio suggests that if you want to get anything done in the AR space you shouldn’t spend any time worrying about whether or not what you’re doing is AR or not. I agree with him. There’s not a clear line between AR and not AR and there probably never will be.
  31. See-through glasses at a reasonable price point (and field of view) are probably more than a year out. This is frustrating to a great many people, including me.
  32. Layar isn’t going to ruin AR. I went into the week with a fear that the GPS+compass category (which Layar is currently leading) would forever taint the term Augmented Reality by providing a fairly useless AR view (when compared to a map or list view.)  Instead I think that people will simply not use the AR view and that Layar pushes location based services forward in a huge way by providing access to multiple content providers from a single app. One day no one will remember that they started out as primarily an AR app.
  33. I prefer talks about what people did over talks about what people think will happen.
  34. For many researchers, augmented reality is a solution looking for a problem. There are a lot of gee-whiz demos and many people seem to accept cool factor as a compelling reason to use AR instead of more traditional solutions.
  35. I saw a presentation on an AR-based interface that included a user study that concluded the mouse-and-keyboard interface they devised for comparison was both more accurate and faster for users. Clearly we should not rush out and replace all UI in places where a mouse and keyboard are working now.
  36. Roundtable sessions with fifty or more people in the room don’t work.
  37. There was a company using optical flow to fake accelerometer-type UI elements back before phones had accelerometers. On a related note, promo videos from old dead-end technologies are funny.
  38. By and large academics feel that augmented reality is poised to take off in a big way.
  39. Academics don’t drink nearly as much as game developers.
  40. Nobody has solved the problem of optical tracking in arbitrary outdoor environments as a means of correcting GPS and magnetometer error. The sensor fusion presentation from Gratz was promising, however.
  41. ISMAR doesn’t treat their speakers very well. Apparently there was some question at to whether or not speakers would even get a free badge.  That’s just silly. Speakers also shouldn’t have to buy their tickets to the award banquet all attendees get for free.
  42. Some people think that “the Layar and Wikitude type apps” don’t count as real AR because they only use the camera for video pass through. Most people (including some of the people in the first group) agree that it doesn’t really matter whether these apps are AR or not.
  43. Video pass-through introduces massive latency, which can cause significant issues with perception of haptic feedback.
  44. Natasha Tsakos is happy to use the same shtick to open her talks at both TED and ISMAR.
  45. AR researchers are poor at name badge design. Badges should include company/university name. The name of the attendee is the most important thing on the badge and should be larger than everything else. The ISMAR badges had three lines of text, all the same size:
    • ISMAR 2009
    • Your Name
    • Science and Technology or Arts and Humanities.
  46. Nobody in the ISMAR community takes the various advertising uses of AR too seriously.
  47. You shouldn’t register for a conference on the day registration opens. Apparently the regonline account was still in test mode for the first day or so and all the people who registered that day didn’t really register (or have a charge appear on their credit cards.)
  48. There is a strong bias toward computer vision and away from other sensors among many researchers.
  49. Orlando was not made for walking.
  50. ISMAR 2009 was totally worth attending.

I am so happy I went.  ISMAR reinvigorated my interested in AR and allowed me to meet many great people. I wonder if I’ll be able to swing a trip to Seoul for ISMAR 2010.

August 27, 2009

p

My Layar development experience

Filed under: Augmented Reality — Joe @ 7:56 pm

When SPRX Mobile announced that they were opening up the Layar API back in July, I applied immediately. I wanted to learn more about publishing geo-coded data, keep abreast of what Layar was up to, and try to deliver some useful data all at the same time. Fortunately my application was accepted and I received one of the first batch of API keys to go out.

My specific project has been to take the real-time bus arrival information provided by One Bus Away and publish it on the Layar platform. I use the mobile-formatted One Bus Away website at least twice per workday as part of my commute. This data is currently only available in Seattle, but will soon be expanding to everywhere that offers a GTFS feed. My feelings about this experience have been almost entirely positive, but I still come away from it discouraged.

On one hand, the people building Layar (Dirk in particular) have been very helpful. The platform is easy to develop for and they provide good documentation and tools to make it even easier. All of the time spent on this project (which took less than 24 working hours, total) was spent figuring out Google AppEngine, the python web framework I used, the One Bus Away API, and how to filter nearby stops to a reasonable set to show to a user. With minor exceptions Layar performed very well. I have provided all 436 lines of code here so you can see for yourself how easy it was.

Marjolein and Claire from SPRX were helpful in less technical ways too.  All developers were invited to the launch event to show off their layers. They ran several conference calls for people all over the world to answer any questions on the API or about the launch. SPRX has done a great job with the launch of Layar 2.0, and I think all the positive press they have received is a direct result of that.

My discouragement has less to do with Layar specifically than it does with the entire category of tricorder augmented reality. The view through the mobile phone and its camera is less useful than a top-down map would be for every piece of data I have seen so far. For my layer in particular, the rider is very likely to know where the stop is. In situations like that where location is unimportant, both the Reality View and Map View actually get in the way.

This experience has led me to two conclusions.  First, augmented vision is pointless until head-mounted displays are available.  I already felt that way, so now I am just more firm in my belief.  Second, filtering data to a useful subset for display is actually the hard problem.  Job listing sites, travel sites, Ecommerce sites, and review sites already knew this, which is why they spend so much effort on search. Turns out the problem is the same for mobile location-aware services.

If you live in Seattle and would like to try out the One Bus Away layer for Layar, just search for One Bus Away inside of Layar.  I welcome your feedback on how I could make this layer more useful. And, of course, I would also love to hear your thoughts on the utility of augmented vision on a mobile phone.

August 2, 2009

p

Cameras vs. Sensors

Filed under: Augmented Reality — Joe @ 9:17 am

If you search for “augmented reality” in Google, most of the hits will involve systems that analyze the output of a video stream in order to figure out what to draw in the overlay and where to draw it.  Sometimes the what and where are answered by the same marker (as in the endless YouTube AR clips.) In the more interesting examples the what comes from using the camera to figure out where the camera is pointed in more general terms and then to draw something positioned in some sort of known coordinate space (like PTAM or the recently announced MetaIO World.) This latter approach is broadly termed visual odometry. This seems to be what most people think of when they refer to AR, and that is no surprise given how much academic AR research focusses on computer vision.

As Wikitude (and more recently Layar, Nearest Tube, and Wimbledon Seer) has shown us, there is another way.  Making sense of a video stream is hard, particularly on a mobile device. Why not just use the non-camera sensors on that device (GPS receiver, tilt sensor, and compass) to provide the absolute position and orientation of the device and then look up nearby waypoints from some sort of database. This approach makes these applications more similar to map-based location aware apps (like Whrrl and Urban Spoon) than to those YouTube videos, but it’s not clear that users care.

Using sensors to determine position and orientation has key advantages.  The first is that it works in more environments.  While GPS often fails indoors, it works fine at night, at sea, and on most parts of the earth. Visual odometry has been shown to work relative to a start point — basically where you start up the tracking system — but not relative to an absolute coordinate system. GPS is also immune to nearby objects moving around. Real world scenes are very dynamic and moving cars, furniture, and people around can throw off vision-based systems. Tilt sensors comprised of accelerometers and gyros are quite good at returning stable, accurate pitch and roll values.  Compasses are somewhat less reliable due to their susceptibility to nearby magnetic fields and large chunks of metal, but they are still able to give you a reasonable approximation of heading. Tilt sensors and compasses also work fine indoors and out of doors.

On the other hand, vision-based tracking systems have advantages of their own.  The biggest is accuracy.  PTAM demonstration videos show an accuracy down to a centimeter or less. Marker-based approaches show even better accuracy. Compare that to the two meters that represents the best possible accuracy of a GPS receiver. Those two orders of magnitude mean that GPS based AR systems simply don’t work for objects that are less than ten meters away. The second advantage for vision based systems is that there are many cases where it is impractical to know about all the objects in the user’s field of view.  They aren’t there yet, but advanced computer vision techniques offer hope that one day a computer will be able to recognize any arbitrary object simply by looking at it.  And until that day arrives there are already readily indexed markers on most items in the form of UPC codes. GPS will never provide such a service, and even if every item in the world had an RFID tag there is no way that every person would have access to the database into which those tags are indices.

Despite its those shortcomings, my belief is that pragmatism is going to result in GPS-based systems winning this fight. The fact is that today’s GPS-based solutions actually work in the general case and vision based have only worked well in very controlled demonstrations. If pioneering companies like SPRX Mobile and Mobilizy start to make money then capital is going to start flowing into this industry. Most of those new companies are going to follow the lead of the existing players and prefer GPS to computer vision. Eventually that will drive sensor-based approaches to get better, faster than vision-based approaches, which will encourage more investment until eventually vision-based AR tracking systems are left in the dust.  One of these improvements could be Galileo, which is expected to offer GPS accuracy down to 2cm. When vision researchers eventually solve the object recognition problem those solutions will be integrated into already existing AR platforms with sensor-based trackers.

What do you think? Do you see vision-based systems coming out on top? Will non-camera sensors be king? Or is a hybrid system the only way to go long-term?

July 3, 2009

p

If Augmented Reality is the solution, what is the problem?

Filed under: Augmented Reality — Joe @ 7:00 pm

Although augmented vision is where I started with my interest in the field, I have really moved a bit beyond that now.  These days when I say Augmented Reality I really mean wearable mobile computing with an interface that actually works.

With my new job at Valve came a new commute. I spend on the order of three hours a day riding on the bus, waiting for the bus, or walking to or from the bus stop. So far I have occupied my time with podcasts and paperbacks, but I am going to try to spend more time on my AR work starting today.  Thus, I am writing this while on my morning commute to Bellevue.

The mobile computing experience is poor for many reasons but they can be summed up as two things: Input and Output. Yes, that’s all. :)

On the input side the problem is that so much of what I want to do is driven by entering text. My phone (a Treo 650) has a great keyboard… for a phone. Even so, it really doesn’t compare to the experience of typing on a laptop or desktop keyboard. On my laptop the act of entering the text into the computer is not hindered by the act of typing. I’m far from the fastest typist, but I can still type much faster than I can generate the words I want to type. Until a mobile platform offers the same level of comfort and speed it will never be suitable for writing anything longer than text messages and tweets. The idea of writing code via a phone keyboard is just absurd.

My laptop only works while I’m riding the bus or waiting someplace I can sit down. It is s not an option at any stop where I wait while standing. That means I can’t use my laptop at full half of the places where I wait for the bus. I’m not sure I have ability to write while I’m actually walking and not knock over my fellow pedestrians (or get hit by a car) but I certainly have a lot of downtime-while-standing when waiting for my transfer on the way home.

So the first problem I want to solve that I keep in my mental file labelled “Augmented Reality” is the ability enter thousands of words of text comfortably while out and about.

The second problem is getting video output from the computer while out and about.  My laptop does an good job for the part of the time when I can have it open. It’s hard to see the screen on a bright day, but we fortunately don’t get too many of those here in Seattle. The trouble is that I could use output from my computer in many more places than I can break out the laptop.

The screen on my Treo (or my iPod Touch) is a little better in some ways. It’s much more feasible to bring it out when I need to check a bus arrival time. Thanks to onebusaway.org and my phone I can get this information on demand. All I have to do is wake up my phone, open the web browser, hit one of the bookmarks I’ve saved for the bus stops I frequent, and wait for 10-15 seconds while the page loads. If I’ve recently looked up that information for a stop I just need to wake up the phone and hit refresh to get updated information. It’s definitely better than looking at a clock and the never-very-accurate schedules on the stop itself.

What I really want, however, is to just know this information. On my way home from work there are two questions I often ask:

  1. How far is the #550 from my stop? Do I need to run? - By the time I can see the bus it’s only about 40 feet from the stop, so having some notice that I’m going to miss it would really help.
  2. How long do I have before the #1, #2, #4, and #13 arrive at this stop? - Any of these busses will get me within walking distance of home, the only real difference is how far I have to walk. The #2 stops a half block from my house, so I prefer it, but if it is lagging behind the others by a wide enough margin it isn’t worth waiting.

This kind of ambient awareness is where the augmented vision comes in. If it is within an hour of one of my usual riding time at one of my usual bus stops I want to see the current data from onebusaway.org for that stop.  Big obvious columns of light that let me see the bus approaching from blocks away would be cool, but they probably don’t solve the problem as well as a 2D display on my personal HUD that I can glance at occasionally. “When will my bus arrive” is just the most obvious question I want a constant answer to. Once that one is solved I imagine that many more will present themselves. (I also imagine that many of those will actually require some level of registration with the world.)

Once I am wearing a head-mounted display I will probably use it for one more purpose. I would like to be able to block out the world in front of me once I am actually sitting on the bus. I am prone to visual distractions, and have a hard time focussing on much of anything when a bunch of people are around me. If I could occlude most of my field of view with whatever I’m working on the distraction would be greatly reduced.

The problem that I am interested in solving is “Mobile computing sucks.” Location and temporally aware wearable computers with first person displays are the solution to that problem.

June 28, 2009

p

Augmented Reality should be open

Filed under: Augmented Reality — Joe @ 2:29 pm

Over the past year I’ve spent a lot of time thinking about what piece of the augmented reality ecosystem would be the best to start a business around. I’m still not ready to take that jump so, in my case at least, the answer is still “none yet”.  However, in my exploring I keep coming up against a problem:

  1. The absolute most profitable place to be in augmented reality is the platform provider at the center of everything.
  2. The profit motives of that platform provider could set the development of AR back by about ten years.

A brief history of the web

Whether by design or happy accident the protocols (HTML and HTTP) behind the web are easy to implement and completely open. This meant that by the time Netscape came along, there were already browsers on the Macintosh (CERN’s and Mosaic), Windows (Mosaic), and X (CERN, Mosaic, Viola, etc.) There were also 200 active web servers and port 80 accounted for more than 1% of the traffic on the NSF backbone.

That ecosystem meant that Netscape remain compatible with what already existed in order to succeed.  Sure, they were selling licenses to their own software, which let them cash in on the shocking growth of the web, but the Netscape browser had to work just as well against pages served by HTTPD, IIS, Apache, and any other random web server anyone decided to write. The same thing was true from the other side.  Netscape Now! buttons aside, website operators soon had to deal with at least two and possibly more different browser, as well as various versions of each browser.

This made life interesting for web designers, but it was good for the web as an platform. The nature of the web meant that nobody had to convince somebody else to say “Yes” to get involved.  There is no way that any one company (or any ten companies for that matter) could have even authorized, let alone managed, all of the initiatives that went on with the web between 1994 and 2000. There was just too much stuff happening.

The open nature of the web allowed the cost of innovation to be spread around to thousands of organizations around the world.  It also let anyone with enough cash to buy some hosting try out their big idea. Most of those ideas failed, of course, but when taken as a whole they succeeded beyond anyone’s wildest expectations.

I think that augmented reality has the potential to follow a growth curve with the same shape as the one the web followed. The web had very few institutional barriers standing in the way of its growth, and the AR ecosystem would do well to learn from that.

Open Augmented Reality

If the emerging augmented reality ecosystem wants to grow as quickly as the web it cannot include anyone who must say “Yes” to allow existing users to get a new capability. That implies a few things:

  1. Anyone can publish content into the system. There are no controls for quality or appropriateness of content on this ability to publish.
  2. Clients from multiple vendors are able to view that content. Anyone who choses to can write a new client that works with existing content.
  3. Servers from multiple vendors are able to respond to requests for data. Choosing server technology is primarily a decision for content providers to make and their choice is invisible to end users.
  4. The network itself is neutral to the data being transmitted across it. This means the mobile internet providers must not white-list content from publishers that it has partnerships with.
  5. There is no single central directory that all content (or every content provider) must be listed in to be available.

Note, that this does not require that the software in question be open source. Open source software (in the form of Linux, HTTPD, Apache, Perl, PHP, and others) was instrumental in spreading the web far and wide. However, the personal computer revolution happened with little in the way of open source software and was just as rapid as the spread of the internet.

Open Standards

As VRML and many other standards over the years have taught us, developing a new standard from whole cloth is fraught with peril. It is even more difficult (as in the case of VRML) when there is not an existing standard that the new standard is intended to supplant. The AR community must avoid repeating the history of VRML. Fortunately there are existing standards that lend themselves well to the problems augmented reality developers are trying to solve.

The first of these is good old HTTP. As a transport protocol, HTTP fits the list above very well. The protocol is well understood, decentralized, and available in server or client library form for every platform. Minor new standards for querying location-specific data are already emerging.

The second current standard that the augmented reality developers can adopt and bend to their will is KML. KML is the file format that Google Earth uses to represent geocoded information. It has support for points, lines, and shapes. KML is an open standard and is supported by many GIS packages in addition to Google Maps and Google Earth.  Google has open-sourced its own KML parsing library so there is a place to start there too.

Any augmented reality client that supports attaching web browsers (including URLs) to locations can also take advantage of most other existing web standards for whatever happens to be in those browsers.

Is this how things are actually going?

So far, I have seen very little discussion of how different augmented reality systems will work together.  In large part that is  the point of this post. But then there are also very few AR systems that exist outside of laboratories, so we could just be in the bad old proprietary hypertext system days of the late 80s.

So far the AR systems that seem to be designed for lots of different kinds of data (Layar and Seer) have not announce any way for third parties to publish data for their clients. My twitter exchanges with Raimo at SPRXMobile make me think that Layar is at least thinking about it.  Hopefully they will turn out to be as open as I’ve outlined above.

How important do you think open AR standards are? Can an AR solution succeed without them?

May 16, 2009

p

Slidecast of my Augmented Reality presentation from LOGIN 2009

Filed under: Augmented Reality — Joe @ 10:04 am

This is my presentation from LOGIN 2009 titled “What Augmented Reality Means for Game Developers.” It is more or less aimed at game developers, but is really just where I see AR going in general. The presentation itself is 50 minutes long followed by 20 minutes of Q&A.

It is also my first attempt to post a SlideCast, so if something in there is messed up, let me know.

You can download the slides from SlideShare, and the audio can be found here. My own audio came through fine, but next time I think I need to figure out a way to mic the audience.

March 29, 2009

p

Notes from the AR Dinner at GDC09

Filed under: Augmented Reality — Joe @ 9:11 am

I had a wonderful dinner the other night with a bunch of game developers who are interested in Augmented Reality. In attendance were Stefan Misslinger and Noora Guldemond from MetaIO, Mitch Ferguson from Carbine, John Walker, Cory Bloyd, and Ron Haidenger and Paul Travers from Vuzix. It was great to spend the evening talking to other people who are as interested in AR as I am.

Noora from MetaIO talked a bit about how their LEGO kiosks have been received by end users. At first they don’t get it because they’re looking at the box in their hand. Once they notice the screen they are amazed and start running around the store trying out other boxes. There is a kiosk installed at a LEGO store in San Mateo. I’m not sure if the store in Seattle has one or not, so I may try to go and take a look over the weekend.

John Walker spent quite a bit of time talking about about the AR applications he has worked on for the Department of Defense. Even though I was sitting directly across from him, I unfortunately couldn’t hear a word he was saying. Hopefully I can catch up with John later in the week and get the scoop. (Since I’m posting this on Sunday, I can say that didn’t happen.)

I spent the night peppering Paul Travers from Vuzix with questions about the Wrap 920AV glasses and other things they are working on. This is a bunch of the stuff I was intending to ask when I get a chance to go by their booth, but I got that out of the way early. Here are the answers to those questions:

  • They don’t know exactly what the price will be, but they are expecting it to be less than $500.
  • Paul is very confident that the Wrap glasses will ship this year
  • The displays are 800×600 in these glasses. That’s a step up from the 640×480 resolution that their other glasses use.
  • The two displays are independantly controllable through a variety of methods, but if your software can handle it, you can provide 60Hz to each eye.
  • The IMU for the wrap will include accelerometers, gyros, and magnetic sensors, and will provide yaw, pitch, and roll to the software at a very high rate.
  • When they are in visual pass-through mode the Wraps will blend a translucent scene over the world. In this mode the brighter a pixel is the more visible it will be to the user. That makes black the transparent color and white the “visible as it gets” color.
  • Paul was coy about exactly what the specs on the camera will be. I think they aren’t 100% settled yet. He was very aware of the issues with frame rate on USB cameras, though, so hopefully they will figure out a way to provide a reasonable frame rate (or at least crisp frames.)

It turns out that as part of their research into how to get the IMU working they have been with the same SparkFun 6DOF IMU that I have. They have also had trouble with the magnetic sensors. The voltage range provided by the sensors is far too small and there is no amplification between the sensors and the microcontroller. The result of all this is that the noise in the system tends to swamp the actual readings. That sound like exactly the problem I have run into.

I left dinner very excited about the next year of Augmented Reality. In six months or so I will be able to buy a pair of glasses (with IMU) for less than $500 that will show visuals over the world. Right now the other options in this space cost $1700 for the IMU and $30,000 for the glasses. When the price on a piece of technology drops to 1/60th of what it used to be it unlocks a huge potential for exciting new applications. I can’t wait!

Next Page »