Friday, January 22, 2010

6. On Caricatures



See Original




"Caricature is a graphical coding of facial features that seeks, paradoxically, to be more like a face than the face itself. It is a transformation which amplifies perceptually significant information while reducing less relevant details. The resulting distortion satisfies the beholder's mental model of what is unique about a particular face. Caricature, traditionally executed with few lines and loaded with symbols, can be considered a sophisticated form of semantic bandwidth compression. What goes on in the mind's eye of the caricaturist as she or he exaggerates a face?...Can these visualization and transformation processes be animated using a computer?"


This influential research from the 1980's was an early attempt to create a "caricature generator," stemming from a desire to understand the way in which we all process and recognize facial features. Since then, further research has been done and much progress has been made toward creating this automated identifier/exaggerator of the anatomical features which make us each unique.
To date I haven't encountered the idea of integrating these techniques into some sort of mixed reality display or even more obvious post-experience use. We don't see people regularly using caricature generators to poke fun at their friends' photos. Perhaps the result isn’t worth the effort yet. Beyond still pictures, if this same process could be used to produce ever improving three dimensionally rendered caricatures of someone- their CG visage- which could be shared, manipulated, and interacted with in real time, there might be much more potential applications and therefore more demand which would produce more applications, on and on in a circle of popularity- critical mass appeal. And since the technology is on the way, it’s quite interesting to take a look at some of the hypotheticals:

The Gaze

Alright, so say we're a group of cartoonishly typical men standing in the street, and nearby an attractive, high heeled woman suddenly lets down her hair. Being the stereotypes we are, we all turn to stare. Since our devices are watching along with us, we have digital copies of at least 3 different vantage points of the same spectacle. These devices could detect that something interesting just happened based on recognition of certain signs- for instance, the fact that we all turned our heads in the same direction at the same time- and react accordingly. For this slightly creepy example, let's say it reacts by showing us an instant replay of the girl, her hair flowing in slow motion like some sort of live shampoo commercial. Since we have three or more versions of the same scene, shot from slightly different angles, it could show us all three at once, sequentially, or layered on top of one another in a makeshift instant montage.
That is amazing enough, but there's more: it could quickly create a 3D replica of the scene based on the different shots- filling in the inherent gaps (her hidden side) with CGI approximations. [How?] The idea of gaps holds many other possibilities, because it would be like collecting data and then just inferring the rest- an educated digital guess. The human user could decide what doesn't work as a type of easy correctional filter. Therefore, the unavoidable kinks in the system would slowly get worked out, as the computer started to understand common features in its "mistakes" as judged by the human. It would get better at replicating the whole of a person through increasingly small portions of their image.

This wouldn't be limited to human forms.

By rendering animals, buildings, and landscapes in way, we could create a 3D model of the entire world much faster than we could map it manually. This is like the WM, only we are correcting the computer's error instead of man's- visual data is collected through everyone’s constantly gazing camera eyes and automatically recycled into the expanding database. You could imagine an early version of the incompletely mapped 3D world, where complaints are recorded about the Golden Gate Bridge's color (too golden), and when enough people have tagged it as erroneous, moderators improve it directly or indirectly through supervision, just like a Wikipedia article.
This expansive database also wouldn't have to start from scratch. If things like Microsoft’s panorama project were integrated into this system, geo-tagged photos would provide much of the necessary texture and color for surfaces around the world. If we wanted to get a color for the golden gate bridge, all that would need to be done would be to take an average RGB range for the bridge in photos throughout the internet allowing for certain variations in artificial color manipulation, weather, and time of day. From thousands of pictures we would obtain a pretty fair representation of how the bridge looks in many different types of lighting, therefore we would have a 3D model which would react to its surroundings in real time (online weather and sun-tracking). This would become even more accurate as things like Google's stationary public cameras fed the system a live stream of how the bridge actually looks.
So how would we use this second world? Just on the surface we would create a virtual simulation of the entire planet, accessible for free across the globe, which would allow for all sorts of things: remotely visiting foreign countries or previewing potential neighborhoods before moving or going on a vacation, along with statistical data and even maps displaying crimes, recent events, registered sex offenders, schools, shops, restaurants, traffic simulations during rush hour, amount of tourists on the street/beach during nice weather, etc. etc. etc. Google Maps is obviously trying to do something like this, though not yet visually based and not nearly this detailed.
Especially as all these systems are combined into one another, the possibilities become more intense. Yelp, Wikipedia, Flavorpill, Craigslist, and other location based services could all show up on real time, interactive maps. There are already cell phone programs which enable you to go into “social mode”, allowing your friends to know your exact position via GPS. This map could show you your friends’ locations as they move about the city, enabling easier and more random encounters. In the same way, if you found yourself downtown waiting for a friend to arrive, you could tell your display to show you every Craigslist freebie item available for pickup within a certain distance- say a ten block radius- which would give you something useful and fun to do in the meantime. When your friend finally arrives, you could use some Yelp-like service along with your combined food profiles to find the cheapest, best rated, and most mutually appealing, "highest common denominator" restaurant in the area, then direct you to its location while accessing the restaurant's real time registry to make a reservation and give you an estimated waiting time.

Of course, these last things could already be done without some sort of elaborate 3D model, but having one would improve the whole process. Searching for the restaurant, it could show you pictures or a model of what the inside of the restaurant would look like complete with an estimate of the amount of people inside based on the wait time, allowing you to avoid an awkward or unpleasantly crowded atmosphere if that's not what you were looking for. Instead of a simple rating system for the quality and quantity of food provided, it could show you actual scaled pictures of the courses it predicts you would probably order.


“Fuck second life. With the 3D model of the real world set on an ever expanding course, we could do all sorts of things. Friends could chat with a background of whatever country or exotic location they desired. While talking, they could literally explore the area and tell each other of cool findings. It would also work more locally. People could digitally explore areas of their own neighborhood that they hadn't previously known existed. As restaurant and store profiles are built up, they could be incorporated so that while exploring you might come across a half-hidden restaurant and instantly see what the menu and prices are like. This would exist as a functional "no risk" version of the real world. Instead of trying to bring a girl to an unknown restaurant, vaguely guessing at how busy it should be, you could just tell the program to run a simulation while you watch for yourself.”

Relevant links and updates:

Ever wanted to see where your city’s highest concentration of frisky, mature Cougars was located? How about a list of locations in town that offer free meals when it’s your birthday? Two ex-Googlers have quietly launched a site called TownMe that’s looking to answer these questions and more. In fact, the site is aiming to become a comprehensive guide to pretty much everything that’s relevant at the local level, from restaurant reviews to the best schools and hospitals in town.

Co-founder Elad Gil says that TownMe is still in “very, very early stages”, so there are still many features to come, but the core of the site seems to be in place, with local reviews and guides available for plenty of restaurants and events like San Francisco’s street fairs. The variety of topics covered is fairly broad, though at there are still a modest number of reviews.

While Gil ackowledges that there are other major sites like Yelp in this space, he points out some key differences. The site aggregates data from across the web, and also accepts user-submitted content. But instead of presenting a list of reviews submitted by individual users, the site is using a group-edited Wiki system, with a lengthy overview describing a certain establishment (there are still shorter, Yelp-style reviews with a star rating and comments beneath the Wiki). Gil says that the site also has a broader focus, and looks to offer entries that are more detailed than a Yelp review. For example, he points out that if you were to look up “Golden Gate Bridge” on Yelp, you’d be hard pressed to find a listing of the best locations to shoot a photo from or which landmarks to look out for.

No comments: