This is my talk from last Wednesday’s Seattle Augmented Reality Meetup. I will upload the discussion that followed as a separate video later today. Comments and feedback are welcome… just comment below or over on Vimeo.
If you follow me on Twitter you have probably noticed all my tweets about a mysterious Android app that I’m working on. Well that app shall be a mystery no more… I have made it available for download and encourage you to try it out. You can find the client here. (You will probably want to click that link on your phone. You can also find it in the navigation links on the right side of the page.) It should work on any Android device with GPS, a camera, and at least Android 1.6.
The working title for the app is Mobile Photo Hunt. If you give Photo Hunt a try and have feedback, I set up a User Voice forum to collect that feedback. This is an open alpha, so don’t be surprised if major changes occur over the coming weeks. I am making it available here well before it goes up on the app market so I can get some early feedback. Please tell me what you think.
The basic idea is this: People take pictures of cool or interesting things in the world and upload those as puzzles. Those pictures (and their GPS coordinates) are made available to everyone else and everyone else is encouraged to find whatever the thing is and take a picture of their own to prove that they found it. Other users can then compare the two pictures and vote up or down about whether or not they match. Some sort of game mechanics wrap the whole activity to encourage good puzzles, searching for puzzles, and confirming matches. Except for the game mechanics this all basically works now.
The game uses Facebook Connect to authenticate users. I expect to eventually add Twitter, Google, and OpenID logins as alternate ways to authenticate. I have no interest in maintaining a list of usernames and passwords, so creating a custom Photo Hunt account will never be an option. These login methods are only used to figure out who you are and and come up with something to call you. The app doesn’t publish anything to your feed, send any messages to your friends, or do anything else annoying in whatever social network you used to log in. I may eventually add the ability to automatically post new puzzles to Facebook/Twitter, but that will be something you can opt-in to when you upload the puzzle.
There are a few known issues up on User Voice already, and a few more minor ones I’ll list here:
After logging in you will see three paragraphs telling you how to play but no buttons. They appear after your phone acquires a GPS signal. Need a busy indicator of some sort to indicate what’s going on.
The FB Connect login has a yellow warning bar that says “Cookies Required.” Cookies actually WORK and your login is stored, so I don’t know what that’s about. Haven’t dug into it yet.
When you bring up an on-screen keyboard the FB Connect dialog freaks out. I’m hoping this will be fixed when I switch to the official Facebook SDK.
After you switch to another app it may still be using GPS. Killing Photo Hunt with a task killer will probably fix that. Eventually it will stop using GPS on its own.
So that’s my app. Please give it a shot and tell me what you think.
Here’s a QR code for the app in case you happen to be viewing this on your desktop:
The impending launch of the iPad has had me thinking a lot about where computers in general are going. Mobile computers are moving into larger form-factors and they are bringing their mobile operating systems with them. In addition to the iPad there are also about half a dozen tablets and a few netbooks coming out that run Android. I believe these devices are the start of a new wave that will eventually replace Windows and OSX machines for the vast majority of computer users.
This sort of platform has existedbefore, but previous attempts never really worked out. This time I think they have a real shot at it, and I think two factors will make the difference this time around. Both platforms have large libraries of apps, and both platforms greatly reduce the cost of owning a computer.
Previous attempts to build small, lightweight operating systems always included a big push to sign up developers. Then they failed to attract significant developer attention because they had no installed base. Then no one could figure out why they should buy one because they had no applications and the installed base never materialized. Both Apple and Google have used the mobile web to bootstrap both their user-bases. Many people are willing to buy the phones because they can read web-pages from anywhere. Nobody balks at developing for these platforms because there are millions of them out there. Then another wave of people are happy to buy the phones because of all the cool apps. As a result each platform have tens of thousands of applications when their mid-sized devices launch.
The cost of ownership factor is also a pretty big deal. For years I have had computer-savvy friends describe to me how they will no longer support relatives on Windows and have purchased iMacs for their parents. It is much easier for a normal person to break a Windows machine than to break a Macintosh, so the unfortunately tech-savvy guy in the family ends up spending more time supporting a casual computer user on Windows. Android and iPhone OS push this much further by removing most of the remaining pitfalls. This doesn’t matter much for powerusers, but for the average computer user it is a big deal.
Some previous attempts at midsized computers (e.g. Magic Cap and Newton) had a similarly low cost of ownership. Some previous attempts at midsized computers (e.g. Ultra-Mobile PCs, Windows-based Tablet Computers, and Netbooks) had a huge software library to draw on but a cost of ownership that was actually higher than their desk-bound brothers. These new mobile-derived operating systems are the first time we’ve seen both factors in the same devices. I think this could be as disruptive as the original Personal Computer revolution.
What do you think? Will these new mid-sized computers cause massive upheaval, or will they fall down the same dark hole as their predecessors and never be heard from again?
I am bad with faces. I mean really bad with faces. My brain just doesn’t seem to be very good at mapping what someone looks like with their name. This often makes things difficult for me at networking events and conferences.
The Solution
A passive mobile application that scans the environment around the user for faces. When it detects a face that it recognizes the application speaks the person’s name to the user via their bluetooth earpiece. Ideally this solution would also involve a discrete camera that could operate without being obvious to the people it is operating on. The point is to serve as a passive aid to memory while not changing the behavior of the people you are interacting with.
The Competition
There are a couple concept applications out there along this line including Recognizr and Comverse Social AR. Both of these applications have the same problem, which is that you have to hold up your phone to take a photo of the person you want to identify, then wait for the result to come back. That is intrusive enough that a simple “I’m sorry, remind me what your name is…” would be a better option.
The Pieces
Facial recognition. There are many providers of facial detection and recognition APIs, so it should be possible to license this piece. Unfortunately most of the providers don’t seem to be very good at licensing their SDK to people. I get the idea that these are all very small companies that spun out of someone’s PhD research.
Luxand put me on their marketing email list, but never sent me an evaluation key.
Betaface actually gave me a chance to evaluate their SDK. It works quite well. I wasn’t a fan of their licensing terms, but you might have different needs than I did.
Ayonix got back to me right away but never provided the promised evaluation link.
Bluetooth camera – I bought an OptiEye. It works pretty well. If you ask them nicely they will send you the protocol documentation. The specs claim a four hour battery life, which is plenty for most networking events.
Text to speech – I haven’t done any research here. Many applications do it, though, so I would imagine SDKs are available. If nothing else the user could record the names and the software could just play back the recordings.
Mobile computer – Both Android and iPhone allow communication over RFCOMM, which is what the OptiEye uses. Existing devices are also too weak in the CPU department to do much visual processing on the phone, but they could stream video or individual frames up to a server for further processing.
What do you think? Dream product? Interesting project? Terrible idea?
At TED 2010 Blaise Aguera y Arcas from Microsoft demoed live integration of video into the existing structure-from-motion dataset in Photosynth. Though his demo showed a video feed moving around a scene the same data could just as easily be turned around to find the precise position of the camera in real-time. That capability is a key part of building a head-mounted augmented reality system.
Two weeks later Google announced that they are incorporating user photos into Google Street View. This requires essentially the same data as Photosynth. Google has the added advantage that they can combine it with the Street View images and LIDAR data they are already collecting. Though they haven’t demonstrated real-time capability with this data they certainly have all the pieces they need to make this happen.
Access to the data required to perform pose recognition with cameras is a novelty at the moment, but if mobile augmented reality takes off in a big way it will become a key component of that system. In my opinion this component is too important to be left in the hands of one company. A much more desirable situation would be to have an OpenStreetMap-type project to accumulate and curate a freely available dataset to provide structure from motion and pose recognition for use in mobile augmented reality and whatever other uses someone can dream up.
OpenStreetMap is a project that sprung up to provide access to data that was free from the costs and restrictions that come with commercial data. It uses a Creative Commons license to make the data free for use by anyone for most any purpose. Although OpenStreetMap came about in response to the restrictions on commercial data sources, the same approach could be taken for 3D structure and image data even though commercial sources for that data do not yet exist. If OpenStreetMap had existed when car navigation systems became feasible in the late nineties it is likely that many commercial products could have been developed on open data at far lower cost and in much more variety.
All such a project needs is a small number of dedicated people to get it started. Download a copy of Bundler (an open source structure from motion library based on the same research that spawned Photosynth) and seek out publicly available photograph libraries. Then talk a cloud computing provider into sponsoring the project by hosting the data and build things up from there. The project won’t have many users for a few years, but as the accuracy and coverage of the dataset grows the set of applications based on this open data will grow too. Somebody just has to get the ball rolling.
I have a bunch of ideas like this one rattling around in my head. Some of them could be products or businesses, and some are just cool projects. I have looked into them all to some degree but probably never start real work on them. I’m going to post them here in an attempt to spawn a discussion and encourage you to share your thoughts in the comments. Feel free to do whatever you like with these ideas.