How much of our disbelief can we suspend in augmented reality?

🕶️ Apple Vision Pro, orange smog, and the politics of looking real

Jun 27, 2023

Real is what you feel
Feelings aren't real
Put your money down
It's your best spell to win
The Realness – RuPaul

Like a membrane thinned out from too much chafing, the boundary between reality and fiction has felt less secure than ever lately. While a thick, orange smog covered the most photographed city in the world, on the other side of the country a tech giant released a pair of glasses promising to make you see more. The dissonance between people in their living room watching beautiful landscapes through the Apple Vision Pro while the fires in Canada made the air outside thick as a screen was beyond irony.

With the background of a reality looking more and more terrifying, the drive to be amazed at technologies that promise to trick our senses is understandable. Please, do trick my senses. Catfish me. Anything to suspend my disbelief. Yet it feels counter-intuitive, so much work and resources for something to look…merely there. There’s something very anti-climactic in selling the technological prowess behind showing your living room as it is, but through a screen. And so I wonder, what is it about looking real that is so fascinating? Why do we crave it so much? What kind of work goes into making something believably real? And what visual skills do we need to recognize this kind of work? And how does it affect our relation to the rest of our lives?

In other words, how much of our disbelief can we suspend in augmented reality?

The labour of realness

“To make all these digital experiences feel real in your space takes an extraordinary amount of technology” says the voice on the Apple Vision Pro presentation video. The statement feels almost oxymoronic. Maybe years of consuming science fiction had led me to hope that technology would hold the promise of expanding what I consider real, making accessible layers of reality previously escaping my own, very limited human perception. I expected “an extraordinary amount of technology” to generate something surreal, like seeing an immense black hole 27,000 light-years away. On the other hand, I never really thought of technology as a mean to re-enchant reality. That I can do best when I’m away from it, when I rediscover how to give attention instead of paying it. So what is so special about the work behind the Vision Pro? What relation does it produce?

There’s something effortless to reality. We expect it to merely be there. Of course, this is biased. So much reality escapes our ability to witness it. Visual reality for example, is a matter of wavelength. Our eyes can only see a fraction of the light spectrum, roughly between 400 and 700 nm, right between infrareds and ultraviolets and our brains deal with making sense of it and in some cases make up for the lack of it, like for magenta. What we perceive as real everyday falls in between these thresholds. Same goes for other senses. Yet I still make fun of my dog for barking “at nothing”, even though I know his sensorial reality doesn’t overlap with mine. It’s easy to take reality for granted because it seems we’re not really working for it.

Looking real on the other hand, is a tricky business. It is not about recreating reality per se but performing something different, adjacent to reality, something I would like to name, borrowing from my queer kin, realness. And realness is work. Active, intentional work of –one might say– deception. A fundamental concept in the Ballroom culture in the 80s and 90s New York, realness was a category of walks offering the chance to embody identities traditionally inaccessible to marginalized black queer people such as “executive realness”, “schoolboy realness” or “military realness”. Realness is dedication to the craft of ‘passing’, making others believing we’re like them, something queer people are profoundly familiar with.

Turning to a more contemporary (and profit-generating) figure of realness, RuPaul, we might enlighten the entanglements between realness, capitalism, and technology. As the lyrics opening this letter point to, realness is both about feelings, which are deceptive, but also money, which has a stable value. The labour of realness is about investing in deception. Technologies like the Vision Pro are made of exactly the same thing, an ability to deceive perception sustained by massive capital. The main difference is that realness for the queer artists of the balls was empowering and aspirational, while the realness of tech companies is often alienating and profit-driven. These technologies of realness are not just mere engineering feats.

Like queer realness, they rely on deep understanding and manipulation of our cultural relation to reality. And while the technology is often hidden away in a shroud of technical and legal smoke, we can learn to see through the cultural elements of the technologies of realness. Realness is everywhere in our technological lives, and we sometimes seem to crave it maybe more than we care about reality. This is by design. But if realness is a form of labour, and not just a magical, technological feat, we can learn to become more attuned to the traces of its construction and therefore cultivate an eye for its deconstruction.

Main character era

During the Apple Worldwide Developers Conference 2023, John Gruber, a tech blogger invited to host a conversation with several Apple leading figures, described the onboarding experience of the Vision Pro as “a very cinematic… sort of… You know. The movie opens with black title cards, and then boom you’re in the movie.” What he described is the moment coming after setting up the hand gestures recognition, which takes place against a black background which finally opens to a rendition of the environment we’re in.

Real things can therefore be worked into the labour of realness, which is not only about making virtual things look real, but also making real things —like your living room— look part of the virtual spectacle, bringing them closer to one another to blur the boundary between the two. If the world itself looks like it’s part of the experience, then it becomes easier to believe the most artificial elements of the experience are real.

We needed not to wait for Apple to dramatize our lives in cinematic ways. When I take the train, I sit with my laptop and look at the landscape passing by in a blur, I choose a slow yet inspiring song as the “soundtrack” of my journey and with high-tech goggles I would desaturate the “image” of my world to match 2020s film aesthetics in a mood enhancing vision. I pull reality toward fiction when I want to feel I’m in my “main character era”, to use another boundary blurring expression. The very “landscape” I look at is the construct of a deep cultural process of seeing elements of land through the filter of European landscape painting. The world outside my window might be mere reality, product of agricultural labour or forestry, but as I see it from my own cultural position, it becomes closer to realness, steeped in aesthetic references. And just like land becomes landscape through painting, the living room becomes a stage through the cinematic of the Vision Pro.

The biology of a screen

The pulling of our physical lives into technology mirrors that of the pushing of technology into how we think of the physical. Apple has long understood the power of language to frame visual experiences. Years ago came out the “Retina displays”, screens with a resolution so high it supposedly matches that of the eye, hence the name. Back in 2010, when Steve Jobs released the first Retina display on the Iphone 4, he claimed that the pixels were so small that they couldn’t get resolved by the human eye when holding the phone at 12 inches form your face. This claim got challenged and eventually —no pun intended— resolved by physicist Phil Plait who demonstrated that it was true for an average eye but not for a perfect sighted person with a 20/20 eyesight. The physics behind the claim are fascinating but the semiotics of it even more so.

Resolution means something different whether we talk about screens or eyes and the comparison between the two is slippery. Screen resolution is a function of the number of pixels on a given surface. Eye resolution, on the hand, is function of the angle at which you can start to see two distinct objects, whether pixels or stars. Past that angle (i.e. far away) two objects would be seen as one until they get within our angle (i.e. closer) and get resolved. Plait gives the example of seeing a single light in the distance at night that reveals itself to be the two headlights of a car as it moves towards us. Moreover, our retina is covered in photoreceptors that are not equally distributed over its surface like pixels are. Our fovea, in the center, is densely packed with photoreceptors giving us extreme acuity in a very small spot but we also have a whole blind spot where the optic nerve and blood vessels are connected. Our eyes also move, from tiny movements barely noticeable called saccades to the wide movements of our body, both of which help us get a sense of what we see and how well.

The trademark Retina leads us to think of the biological in terms of the technological and the technological in terms of the biological. This has been present for a long time in tech and cognitive sciences. Think of how the term “intelligence” in Artificial Intelligence is doing the same kind of work for thinking of statistics in cognitive terms. Yet, the goal of a Retina display is not to match the human eye, but to deceive it by creating an experience just under the threshold of human perception. A metaphorical Trojan horse, Retina (with a capital R) presents the likeness of the retina (with a small r) but it is designed to trick it. Both the technology and the linguistic elements of Retina displays, just like with the Vision Pro, are part of the labour of realness. That labour is technological, but also semiotic (made of words, visual references, etc.) and is as much about fictionalizing reality as it is about making fiction look real.

We learn to see the world through technologies of realness, yet they remain completely opaque to us, hidden away behind marketing language and legal patents. They mediate not only our relationship to the physical world, making us see the virtual as physical and the physical as virtual, but also to the social world, making us connected while isolated, or isolated while connected.

The transparent opacity of social relations

Making physical things look real is one thing, making face to face interactions feel real is another. Face to face interaction is the atom of sociality, it’s at the very basis of our social worlds. In 1997, cognitive scientist Nancy Kanwisher published her work on the discovery and characterization of what is now named the Fusiform Face Area (FFA), the part of our brain that reacts to faces like no other. This was ground-breaking, we knew at that point that if trauma was caused in that general area, face recognition was affected, but we didn’t really know why. Using fMRI, Kanwisher and her team figured that the small area was selectively dedicated to face recognition and didn’t react to other objects. For a zone of the brain to be solely dedicated to such a specific task says a lot of the evolutionary benefits of face-to-face interaction. Reading faces helps us make decisions based on someone’s identity, emotions, age, and so on, a useful skill when negotiating a socially complex environment. This is a crucial cognitive skill upon which depends a lot of our learning and development.

From the firing of thousands of neurons in specific areas of the brain emerges the social connections that shape our lives at immense scale. The sociologist Georg Simmel gave face to face interaction, and particularly the ability to look in each other’s eyes, a central importance to who we are:

“[…] the whole interaction between human beings, their empathy and antipathy, their intimacy and their coolness, would be changed incalculably if the look from one eye into another did not exist which, compared with the simple seeing or observation of the other person, signifies a new and incomparable relationship between them.”

Even the Zoomification of our interactions the past three years hasn’t changed this need. Earlier this year, Nvidia released an AI-powered tool that “corrects” our gaze to be always looking at the camera when on a video call, leaving us, unfortunately, unable to roll our eyes at a superior’s comment. That resources go into reproducing such signs of connection is a testament to how important this shared gaze is to our very humanity. Socially, sharing a gaze is how we acknowledge each other, how we assert that we’re present, how we seduce and how we despise. Conversely, refusing or failing to share a gaze can be a defying or condescending act. In any case, it creates a relationship. These things all become much harder when we’re wearing a 1-pound computer screen on your face.

Eyesight

If the Vision Pro straddles the limit between virtual and reality, wearing it makes its user straddle the one between presence and absence. Not fully immersed like in virtual reality experiences, yet not totally connected to others like in face-to-face interactions. In the Vision Pro, there’s a dream of wanting to do both, to be present and isolated at the same time, to be connected and disconnected. This is another incarnation of the late capitalist dream of blurring the lines between productive and non-productive time and space. I can now play with my dog, but also work on that spreadsheet at the same time. It’s a computer screen I never need to turn off. The *dream*.

It took me a minute to realize the Vision Pro are not actual transparent glasses. This too, was by design. One of the first scene in the commercial is a seamless move from a third-person view of someone about to put the device on to a first-person point of view from inside the Vision Pro. The feeling is that you’re just seeing through them. On the other side, we see footage of the eyes of the user peeking through the device as if it’s transparent. Transparency is invoked, but it’s achieved through technological opacity.

Echoing Simmel’s insight, Apple representative introduces the feature saying: “Your eyes are a critical indicator of connection and emotion, so Vision Pro displays your eyes when someone is nearby”. This is a reference to Eyesight, the exterior lenticular screen that displays the eyes of the user in real time and seen from different angles to anyone interacting with them. This is paired to a feature that makes anyone looking at you in the “eyes” to pierce through your screen, which otherwise appears in the foreground of your visual field. Another function, for video calls this time, scans your face, and create a moving 3D model that is displayed in lieu of your covered face to look more…present. This absurd list of patches is reminiscent of a Rick and Morty’s episode in which, after transforming the whole world in humanoid praying mantis, Rick rustles up a potion: “It's koala mixed with rattlesnake, chimpanzee, cactus, shark, golden retriever, and just a smidge of dinosaur. Should add up to normal humanity” he says. The Vision Pro’s list of visual technology sometimes feels like this mad scientist’s recipe : “It’s putting a helmet on, but there’s eyes on the helmet, but also a camera inside, and it sees your face in real time, and just a smidge of a realistic avatar of your face not wearing the helmet to be displayed online. Should add up to the normal experience of seeing each other.”

Opaque layers of technologies seem to endlessly pile up to recreate the very real experience of just being there with someone. But even in a best-case scenario in which all these things do feel incredibly seamless with reality, I still wonder, do I really want to talk to someone who won’t take off the goggles to have a conversation? Suspension of disbelief may work for our experience of physical spaces and things, but can we suspend our disbelief when it comes to human interactions, and maybe most importantly, do we want to?

The Catfish

Technologies of realness train our perception in unexpected, and at times undesirable ways. AI-generated voices to speak through, connected mouths to make out with, or opaque glasses to see with all romanticize a version of sensorial catfishing. Teaching our senses to suspend our disbelief, they reshape our senses in a way that make them more productive. In the attention economy, senses are mere channels to be shaped and optimized for increased consumption. On the other hand, physical, serendipitous reality becomes suspicious in alienating ways. The redundant similarity of certain objects in the world point to a “glitch” in the simulation rather than to the supply chain of mass-produced commodities.

Deepfakes, virtual and augmented reality, generative AI, TikTok filters, and so on challenge our ways of seeing, and therefore our ways of knowing and require us to develop new forms of visual literacy. Looking seems more than ever to be a defensive mechanism as we are more and more required to decipher real from fake, human from non-human, and so on. In an economy running on attention, learning to see is a practice of resistance. Becoming attentive to the ways reality is dramatized into fiction, how virtuality is made concrete through metaphor, or how our senses and interactions are increasingly replaced by proxies, might be some first steps toward that resistance.

But maybe we don’t want to resist. Maybe one day we’ll have a fable of a catfish mesmerized by a better world in an Apple Vision Pro, oblivious to its own lake surrounded by fires, and of a friendly toad who will try to warn it. Paraphrasing another iconic line of RuPaul, the toad would say, “Take that thing off your [eyes]”. But slightly dizzy from the fumes of orange smoke, the catfish might answer “I’d like to keep it on please”.

This letter is the start of a conversation, don’t hesitate to share it to someone who might like it, or to like and comment below. You can also follow me on Twitter or Instagram.

On Looking

Discussion about this post