Apple Vision Pro

The evolution of the interfaces between people and technology

The focus on interfaces rather than tools functionality

In the realm of technology, interfaces have garnered considerable attention as they often play a determining role in the success of technological advancements. Rather than focusing solely on the functionality of tools, I have always been intrigued by the continuous journey of evolving interface types that grant us access to these capabilities.


This month, Apple announced a new device that I believe signifies a significant milestone in this journey. Therefore, I aim to offer my personal, albeit non-academic, perspective on various interface types.


Typically, these interface types do not replace their predecessors entirely. Instead, previous interfaces continue to be employed within narrower technological domains. This implies that the introduction of a new type seldom marks the end of the previous one (at least until now).


Please note that those expecting a review of Apple Vision Pro will be disappointed. Instead, I invite you to traverse the path that has led us to its creation.

A Consistent Relationship with Technology Over Millions of Years

Since the emergence of Homo Habilis, humans have been crafting and utilizing tools to ease their existence. From the creation of rudimentary stone tools to the advent of computers, the fundamental nature of interfaces with technology has remained unchanged.


Millions of years ago, even though it was not termed as such, technology resided in the form of stones, sticks, axes, or clubs. However, even with the introduction of the first steam engine in the late 18th century, the concept of interfaces with technology did not differ significantly from those employed millions of years prior.


Despite the emergence of new materials and numerous inventions, the physical relationship between humans and technology has remained unaltered. To operate a machine or tool, one needed an understanding of its design. Commands, activated through physical force, held meaning if one comprehended the technology’s construction.

The first type of interfaces

  • Way of interaction: direct on technology
  • Communication mode: non-coded
  • Type of command: physical by rotation, pull or push
  • Construction method: handles, levers, ropes or cords in their various forms
  • Command interpretation: static, every command make sense for one technology
  • Way of providing the output: directly, mostly with a movement as a response of the technology
  • Operator capabilities: strength and endurance, knowledge of how technology is built

Evolution changes nothing

The interfaces have remained largely unchanged throughout various technological advancements. Even when Elmer Ambrose Sperry invented the first servomechanism in the early 1900s, it primarily aimed to reduce physical exertion, but the overall interface paradigm remained the same. It was an improvement in resource usage, specifically reducing the effort required from workers. The interaction with technology still relied on applying direct forces.


A few decades earlier, a significant technological development emerged that laid the foundation for subsequent innovations while maintaining a similar interface paradigm. The Remington 1 typewriter, invented by Christopher Latham Sholes, Samuel W. Soule, and Carlos S. Glidden, featured hundreds of levers. The interface was similar in concept: pressing a key resulted in the precise printing of a specific letter.

Despite advancements in efficiency and resource usage, the core principles of physically interacting with technology have remained largely consistent. While interfaces have evolved in terms of materials, mechanisms, and efficiency, the fundamental interaction patterns have endured.

In addition, Charles Babbage’s Difference Engines and Analytical Engines, developed in the 1820s, utilized gears and a lever that needed to be turned to obtain the desired results.

Indeed, this technology was highly intricate, but as demonstrated in this video, it resembled more of a moving sculpture. At around 1:35 in the video, you can see what I mean when Julie activates the computer.

Some changes

There were some early glimpses of new interfaces emerging in 1725 when Basile Bouchon invented a method to control looms using a perforated paper tape. This marked the beginning of punched cards, a system of encoding commands and data on a medium that can be interpreted by technology.

It marked the beginning of punched cards, a system for encoding commands and data on a medium that is later interpreted by technology.


This development introduced two main aspects: the coding of commands and data, and the utilization of a means, namely the punched card, to interact with technology.


Then, around the turn of the 20th century, a significant modification to the human-machine interface emerged with the adoption of this innovation in machines like the IBM 405.

This innovation was fully harnessed by the United States Bureau of the Census, from 1890 until 1950, through the use of the first card punching machines.


I believe this change marked the very beginning of the technology’s exponential surge, as it initiated a self-perpetuating cycle. Tools and machines became faster and more user-friendly, facilitating research and industry in producing new inventions and technologies. These advancements, in turn, increased the speed and ease of use, leading to further progress.

The second type of interfaces

  • Way of interaction: indirectly with the core technology through punched cards
  • Communication mode: coded
  • Type of command: coded on punched cards
  • Construction method: punched cards
  • Command interpretation: static, every command make sense for one technology
  • Way of providing the output: with other punched cards (new, sortered, paritioned, ecc.)
  • Operator capabilities: deep knowledge of how technology is built, knowledge of code used, typewriter capabilities.

A new position to work

1962 marked the beginning of a significant acceleration in the evolution of interfaces.


It was the year when the IBM 1440 was introduced, featuring a terminal/printer as a console. In 1963, the Teletype Model 33 teleprinter was announced, and in April 1964, IBM unveiled the System/360.


While punched cards were still the primary input method for these computers, keyboards and printers were also utilized for interaction. During this period, humans started to directly and continuously engage with technology. The methods of coding data and commands were refined, enabling the delivery of results through printouts or video displays.


This era witnessed the emergence of new forms of command coding, commonly known as programming languages. These languages not only facilitated the description of complex operations but also enabled the execution of algorithms across various types of machines.


Between the 1960s and 1980s, it became commonplace for the top 500 companies to have data processing centers equipped with IBM computers. These machines played a pivotal role in handling the computational needs of these organizations.


The advancements in interfaces during this period laid the groundwork for subsequent developments in computer technology. The shift towards direct human interaction, improved coding methods, and the widespread adoption of computer systems by large enterprises set the stage for the remarkable advancements yet to come.

The technology is primarily controlled through terminals, and those who operate them must possess a deeper understanding of coding mechanisms (programming languages) rather than intricate construction details.

The third type of interfaces

  • Way of interaction: direct with the core technology through keyboard and monitor
  • Communication mode: coded
  • Type of command: coded directly into the technology
  • Construction method: terminals, tape, printers, all contained in ad-hoc rooms called datacenters
  • Command interpretation: static, every command produce the same result
  • Way of providing the output: visually on monitor or paper
  • Operator capabilities: deep knowledge of code used.

However, the race for advancements in technology continues, and a new trend of miniaturization emerges. The first desktop computers begin to make occasional appearances, and in 1977, “the trinity” consisting of the Apple II, PET 2001, and TRS-80 w is established.


These desktop computers gain prominence in cutting-edge offices, yet they maintain similar interface modalities to their larger counterparts.


The concept of enhancing interfaces had already been pondered a little earlier. In 1962, a project supported by the Defense Advanced Research Agency (DARPA) aimed at augmenting human intelligence, resulting in the development of the first mouse. However, the mouse remained solely as an outcome of the project and did not see widespread adoption at the time.

The garaphical interface

It wasn’t until 1981 that the Xerox 8010 emerged as the first computer equipped with a mouse and software featuring a graphical interface, allowing users to interact with icons, files, and folders. This pioneering technology introduced an interface that showcased windows dedicated to specific functions or applications. Unlike the previous uniform screen display, users could now customize their own screens according to their work requirements.


Forward-thinking companies recognized that the operating system software played a more vital role than the hardware itself. It was the operating system’s ability to deliver unprecedented functionality that became the essential element.

The Sprint for Window Systems Begins.

In the race to develop window systems, significant players emerged in the early 1980s. Xerox, with its XOS operating system, took strides toward graphical interfaces. Meanwhile, Microsoft, in collaboration with IBM for the development of MS-DOS, embarked on the ambitious “Interface Manager” project. The fruit of their labor was unveiled in 1985 as Windows 1.0. Apple, renowned for its innovative approach, introduced Lisa in 1983, which boasted a graphical interface ahead of its time. The subsequent release of the Macintosh in 1984 brought graphical interfaces to a wider audience.


This period marked a true revolution in computing. The interfaces, though not explicitly stated, introduced a novel concept in command execution: context awareness. Within the interface, commands were tailored to specific contexts, offering different functionalities based on the active context.


To illustrate the significance of context, consider the simple act of right-clicking. When performed on the desktop area, it triggered a contextual menu offering options related to file management or customization. However, when executed on a specific file or icon, it presented a different set of options pertinent to that particular file.


This paradigm shift brought forth new challenges. Even seasoned computer enthusiasts, proficient in operating complex machines and coding in assembler languages, encountered a learning curve when faced with the intuitive manipulation of a mouse and the concept of context-aware interactions. These individuals, commonly referred to as “returning illiterates”, had to adapt their skills to navigate the graphical user interfaces with the same fluency as their command-line counterparts.


Nevertheless, the proliferation of graphical interfaces proved transformative, democratizing access to computing power and opening doors for a broader range of users. The stage was set for further advancements in interface design, leading to more user-friendly systems and ultimately shaping the way we interact with technology today.

The fourth type of interfaces

  • Way of interaction: direct with the core technology through keyboard and monitor and mouse
  • Communication mode: coded, first kind of gesture mouse-aided.
  • Type of command: coded directly into the technology
  • Construction method: software windows-based, keyboards and mouse
  • Command interpretation: machine-context sensitive
  • Way of providing the output: visually on monitor or paper or audio
  • Operator capabilities: knowledge of Operating System used, typewriter capabilities, mouse handling capabilities.

Windows 3.0, announced in 1990, determined the standard de-facto of such interfaces.


Further advancements were envisioned with the introduction of touch screens. However, in my opinion, while innovative, this technology did not bring about radical changes in interfaces.


The origins of touch displays can be traced back to 1965 at the Royal Radar Establishment research center in Malvern, UK. Eric Arthur Johnson conceived the idea of using fingers to mark points on radar screens, believing it would enhance operators’ accuracy and reaction speed. Although Johnson patented the concept of a capacitive touch-sensitive display in 1969, it wasn’t until around 1990 that British Air Traffic Controllers began to fully utilize this technology.


Around the same time, POS devices with touch displays that responded to stylus input started appearing in restaurants. In 1996, the Palm Pilot, the first personal digital assistant (PDA) with touch functionality, was announced. This too relied on stylus interaction.


However, in terms of interfaces, little changed. The stylus merely replaced the mouse, even in devices as large as a mouse. When considering the description of the fourth type of interface, substituting the mouse with a stylus essentially yielded no significant alterations.


This notion is reinforced by the history of FingerWorks, an innovative company founded in 1998. FingerWorks marketed a series of peripheral devices that were the first to utilize finger-activated multi-touch screens. Despite their groundbreaking approach, FingerWorks ceased production and closed in 2005.


Even with the introduction of the first smartphones, which amalgamated the features of PDAs, GPS navigators  and telephones, substantial changes were not introduced. The LG KE850 in 2006 and the iPhone in 2007 brought about minimal modifications, primarily in terms of improved visual interface fluidity and the abandonment of stylus input.


While touch screens and smartphones represented notable advancements, they did not bring about a fundamental transformation in interface paradigms.

Sensors perceive the world

In 2009, the introduction of the iPhone 3GS brought along a range of sensors such as the accelerometer, proximity sensor, compass, and GPS receiver.


The utilization of data provided by these integrated sensors in various applications laid the groundwork for new interfaces. One notable example is the game Ingress, which debuted in 2014. This game integrated smartphone sensor data and required players to physically move around the real world to capture locations that were displayed on a map within the smartphone when they were in close proximity.


Now, commands need to be coded, taking into account our movements in the physical world. The context for interaction extends beyond the confines of the machine and incorporates data collected from the surrounding environment. Sending a command often involves gestures, adding a second layer of interaction that people must navigate. While the freedom to move one’s fingers across a touch screen allows for more intuitive control, it also introduces the challenge of understanding how to execute these gestures effectively.

The fifth type of interfaces

  • Way of interaction: the tecnology interact with real world, screen-gestures
  • Communication mode: coded, location-sensitive.
  • Type of command: coded directly into the technology and human action interpreted
  • Construction method: sensors
  • Command interpretation: environment-context sensitive
  • Way of providing the output: visually on device
  • Operator capabilities: basic device software knowledge.

This trend has been further strengthened by the proliferation of sensors and the utilization of camera data as input for software. High-end smartphones now come equipped with a variety of sensors including gyroscopes, accelerometers, barometers, fingerprint readers, facial recognition systems, and proximity sensors. These sensors can also collect data from other devices, such as heart rate sensors on smartwatches. Together, they provide data not only for specific applications like Maps, Compass, and sports trackers but also to enhance the contextual understanding in which commands are executed. Additionally, data from other sources such as cameras and satellite connections are utilized, particularly in emergency situations, to better define the context in which humans find themselves when issuing commands to technology.


Another important development has taken place. The traditional notion of commands, as defined in earlier interface types, is gradually fading away. While it may not disappear entirely as we currently understand it, the term “command” becomes increasingly inadequate. It is more logical to refer to these actions as interactions, interpreted by technology in response to the surrounding context.


The concept I have come to realize is that we can describe these interactions as such when context plays a significant role. Therefore, I propose that, in addition to commands, we can use the term interactions in window-based interfaces. The advantage of this shift is clear: once users become familiar with the context, the use of technology becomes more intuitive, and there is no longer a need for extensive expertise in complex technologies.


The technologies introduced in the market over the past 10-15 years have significantly enhanced the ability to enrich context and facilitate more natural interactions between humans and technology. Language recognition capabilities are becoming increasingly widespread, even in elementary technologies that integrate with others (such as IoT), leading to the prevalence and increased usage of natural interactions.


Thanks to the collaboration between elementary technologies and improved sensor capabilities, we can now control various devices with just our voice. For instance, we can turn on a lamp, start the air conditioning, and perform other tasks simply by speaking commands.


I’d like to share an example involving my mother-in-law. On her 85th birthday, I gifted her a Google Assistant Nest Hub, primarily intending to set it up as a digital photo frame displaying pictures of her loved ones. She had always resisted adopting any technological innovations beyond television. However, while I was conducting tests and configuring the device, she became intrigued and amused by my use of the phrase “Hey Google.” Months later, during a lunch conversation about the weather, she turned to the device and said, “Hey googgie, what is the wather” (an amusing, non-standardized pronunciation of “Google”). The device promptly responded by providing the local weather forecast for the next few hours. This anecdote vividly exemplifies the concept of “no high skills needed,” doesn’t it?


Nevertheless, these types of interfaces can also lead to amusing moments, as shown in the following video:


Technology and machines share the world

On June 5th of this year, Apple made an announcement that I believe is a game-changer: the Apple Vision PRO.


While devices like the VIVE XR and META Quest 2 (released in July 2022) were already on the market, their prices are significantly higher compared to the Apple Vision PRO. However, these devices still require joystick devices for interactions.


The Microsoft Hololens 2, announced on November 7th, 2019, is the only truly comparable device in terms of features and comes with a similar price tag of around $3500. However, I believe the real innovation from Apple lies in its visionOS, which they refer to as the “first space operating system,” and its ability to tap into a vast ecosystem of developers. This can be inferred from the substantial search interest in all four devices over the past four years.


Both Hololens 2 and Vision PRO introduce a highly innovative concept to interface design. They have the ability to “read” the user by tracking their eyes and hands, using this data to control the interface. The human becomes an integral part of the context, actively contributing data to help interpret the surrounding environment.


An exceptional feature that leverages the human-in-the-context concept is the devices’ capability to determine if someone nearby is speaking to us. In such instances, the device projects an authentic representation of our face, including all our expressions, onto the external glass. This feature not only relies on the wearer of the device to construct the context but also involves the participation of others.


To gain a better understanding of the distinctions between the two devices, let’s examine their specifications below.

Apple Vision PRO

MS Hololens 2


  • M2 processor 8 core 2.42-3.48 GHz (4 performance and 4 power-efficiency)
  • R1 chip for high graphics and sensor processing
  • Snapdragon 850 processor 8 core 2.75-2.96 GHz (4 performance and 4 power-efficiency)


  • resolution 2880 x 1720
  • field of view (FOV) 110 degrees
  • resolution 2048 x 1080
  • field of view (FOV) 52 degrees


  • 8 speakers
  • bone conduction technology
  • 8 speakers
  • bone conduction technology


  • hand-tracking technology
  • voice commands
  • eye-tracking
  • hand-tracking technology
  • voice commands
  • eye-tracking


  • All iOS and iPadOS apps (about 1.6 million)
  • 321 apps available

The sixth type of interfaces

  • Way of interaction: human-like
  • Communication mode: natural
  • Type of command: interpreted, no hard-coded
  • Construction method: sensors, recognition sw
  • Command interpretation: human-context sensitive
  • Way of providing the output: natural
  • Operator capabilities: none

What is next?

When considering the evolution of technology, it is valuable to examine it from a different perspective and envision what the future may hold. The question arises: can interfaces be further improved? The answer to the first part is a resounding yes. To address the second part, let us piece together some elements.

The Rise of Robotics

The field of robotics has witnessed significant progress in recent years, bringing us closer to a future where technology can autonomously move and interact. This advancement is already evident in various controlled environments, such as factory production lines, where robots carry out tasks with precision and efficiency. Additionally, robots are employed in surveillance tasks at airports, monitoring security and ensuring safety. An interesting example is the use of robots in Singapore during the COVID-19 pandemic to enforce mask compliance on the streets. These real-world applications foreshadow a future where robots seamlessly coexist and collaborate with humans, potentially revolutionizing industries and daily life.

The Power of Artificial Intelligence

Artificial intelligence (AI) has become incredibly prominent, and it is easy to envision its integration into every device, not just as a marketing gimmick but as a pervasive force. This integration will likely introduce another revolutionary concept, as significant as the introduction of context itself. I refer to this concept as “situation.” It entails that our interactions will be interpreted not only within the context in which they occur but also in relation to the preceding contexts that exist over time.

The Potential of Molecular Computers

While the current trend is focused on quantum computers, I believe that interfaces could undergo radical changes with the advent of molecular computers. Although this idea stems purely from my imagination without substantial supporting studies, it holds the potential to transcend a common barrier shared by all the interfaces described thus far. Traditional communication between humans and machines occurs through our senses. However, the combination of molecular computers and advancements in biomedical research may pave the way for direct human-machine interaction with the brain. I am not referring to a sci-fi future where individuals control machines solely with their thoughts, but rather to a scenario similar to the experience of rehabilitation after tendon surgery. In such situations, we regain awareness of the affected limb, directing our attention to the effort required to perform healing movements. Under normal conditions, we navigate our limbs without conscious thought. I envision a similar scenario unfolding in the realm of human-machine interaction.

One Last Thought

Lastly, I propose an interactive summary that visualizes the journey I have imagined and described here. I am keen to read comments from others, especially from those who hold different perspectives on this matter.