
For the official version of record, see here:
Lyons, O. (2025). How Heavy This Camera: Kinetic Aesthetics and the Mobile Camera. Media Theory, 9(1), 201–224. https://doi.org/10.70064/mt.v9i1.1170
How Heavy This Camera:
Kinetic Aesthetics and the Mobile Camera
OWEN LYONS
Toronto Metropolitan University, CANADA
Abstract
This paper examines the parallel development of cinematic and military camera stabilization technologies to trace a “kinetic aesthetic” that persists across both cinematic imagery and the “operational images” of military industrial products. Using a media-archaeological approach, it connects early cinematic experiments with the mobile camera with military tracking systems, inertial navigation, and stabilized airborne weapons platforms. It demonstrates how the spectator’s judgement of the smooth motion of the camera—which is interpolated as simple harmonic motion—became a criterion for determining the realism in both physical and virtual cinematography. Though predating the computer-generated image, technologies such as the Steadicam inform expectations for the realism of the massless digital camera. This paper also examines films that foreground aerial combat, such as Top Gun (1986), Stealth (2005), and Top Gun: Maverick (2022), in order to discuss the close relationship between military and cinematic applications of image-tracking systems and the stabilized camera.
Keywords
Camera movement, Camera stabilization, Computer-generated imagery, Kinetic aesthetics, Military targeting systems, Virtual viewpoints
I am kino-eye, I am a mechanical eye. I, a machine, show you the world as only I can see it. Now and forever, I free myself from human immobility, I am in constant motion, I draw near, then away from objects, I crawl under, I climb onto them. I move apace with the muzzle of a galloping horse, I plunge full speed into a crowd, I outstrip running soldiers, I fall on my back, I ascend with an airplane, I plunge and soar together with plunging and soaring bodies
—Vertov, 1984: 17.
Writing in 1923, Dziga Vertov already envisioned an aesthetic of the moving camera closely aligned with the machinery of modern warfare. He describes a camera trajectory that races along the ground before soaring to the height of aircraft, suggesting a degree of freedom that was impossible at the time. This well-known statement, composed just after the First World War, recalls the ballistic trajectories as well as the mobile viewpoints that had been revealed during the war by the cutting edge of military technologies. Shortly after this, F. W. Murnau worked with Karl Freund to develop the “unchained camera” (entfesselte Kamera) effect, first seen in his Der letzte Mann (1924), which, among other movements, suspended the camera by wire in mid-air to transition from the mouth of a flugelhorn in a courtyard, to the ear of its protagonist, listening through a window high above. In notes collected by Lotte Eisner, Murnau expressed his desire for “a camera that can move freely in space” a camera that “at any moment can go anywhere, at any speed.” Murnau claimed that this camera would enable an “architectural” form of filmmaking:
…the fluid architecture of bodies with blood in their veins moving through mobile space; the interplay of lines rising, falling, disappearing; the encounter of surfaces, stimulation and its opposite, calm; construction and collapse; the formation and destruction of a hitherto almost unsuspected life; all this adds up to a symphony made up of the harmony of bodies and the rhythm of space; the play of pure movement, vigorous and abundant. All this we shall be able to create when the camera has at last been de-materialized (Murnau in Eisner, 1973: 84).
Less a description of a cinema of the built environment, as the name might suggest, Murnau’s architectural vision recalls the early human motion studies of Edweard Muybridge, while the revelation of a “hitherto almost unsuspected life” vividly evokes Walter Benjamin’s concept of the “optical unconscious” (Benjamin, 1972: 7) that was revealed by the emergence of mechanical vision technologies. Muybridge’s work prefigures camera stabilization and subject tracking technologies that continue to be refined today. His motion studies are carefully composed around the central goal of keeping the human subject centered and stabilized in frame by using controlled conditions in a highly determinate environment. In a similar sense, Murnau’s unchained camera enables a “play of pure movement” but, significantly, it is a moving viewpoint that is intended to frame “bodies […] moving through mobile space,” recalling Vertov’s notion of an aeronautical camera that would capture “plunging and soaring bodies.”
Muybridge’s experiments, with their careful attention to centering the human subject, prefigure the development of camera and support technologies that assist with the stabilization and centering of objects within the frame. Throughout the 20th century, the histories of the development of such technologies by both the film industry and the military-industrial complex have been closely intertwined. Here, these histories will be discussed and examined in order to reveal how these two industries—both concerned with the production of “stabilized” imagery but with very different purposes—have produced an aesthetics of the moving camera that is in close dialogue.
In his 2001 work Eye/Machine,Haroun Farocki (2004) called images emerging from military hardware that are not intended for human contemplation but rather captured and reincorporated into the operation of the machines that produce them, “operative images” (often referred to as “operational images”). These images can also be understood as emerging from the tradition of analytical vision that uses mechanical apparatuses to enhance human vision and implies an arrangement which is then taken to be superior to the human eye. The latest expression of this trend is seen in the current move toward removing the human element from visual machines altogether, allowing algorithms to not only “see” but also make increasingly more important decisions based on their disembodied observations, especially in the realm of warfare. This also has important implications for our understanding of the relationship between the imagery generated by large language models and the “ground truth” of the lens-based image-making practice that it aggregates and stochastically re-assembles—which will be returned to.
Vertov was a proponent of the camera “eye” as superior to its human counterpart. As Jonathan Dawson (2003) argues:
[Vertov] clearly saw it as some kind of innocent machine that could record without bias or superfluous aesthetic considerations (as would, say, its human operator) the world as it really was. The camera lens was a machine that could be perfected bit by bit, to seize the world in its entirety and organize visual chaos into a coherent, objective set of pictures.
Like Murnau, Vertov’s impulse and desire point to the creation of ever more mobile and agile camera technologies—an increased mobility that implies the eventual “dematerialization” of the camera itself that Murnau describes. For the human camera operator, the history of this impulse throughout the twentieth century has been one of gradual extinction as embodied camera techniques have been replaced by disembodied automated platforms and then eradicated with the increasing adoption of the virtual viewpoint of computer-generated cinema—a “movie camera” without a “man.” Though the camera itself may disappear into the virtual viewpoint or the prompted cinematic image of “AI,” as we shall see here, in the rules that govern its movement there remains a clear trace of physical reality—a holdover of kinetic and dynamic systems. In order to examine this inscription of the physical in the virtual, a media-archaeological approach that addresses the development of the moving camera and its stabilization across the film and military industries is productive. What emerges is a discernible “kinetic aesthetic” of the moving camera that survives its disappearance into the digital realm, an aesthetic that is also used to judge the realism of the digital cinematic image and that is indebted to the parallel development of military visual visualization technologies.
Stabilization and the moving camera
As Paul Virilio (1989) has convincingly demonstrated, the histories of the cinematic camera and military targeting technologies are closely intertwined. The history of the mobile military camera includes that of remote-control visual platforms such as drones and precision-guided bombs. While aerial surveillance (using balloons) can be traced back at least to the Napoleonic wars, military aerial photography was developed shortly before the First World War and has been used extensively ever since. Bishop and Phillips (2010: 26) write that, since then, “the story of military technology has been one of prosthetic extension, especially that of sight, with weapons becoming gifted with sensory perception and intelligence.” These technologies have now developed past the point of mere surveillance to the contemporary attack drones that have redefined warfare in Ukraine. Guided munitions were experimented with as early as the Second World War by the German military in the form of the ‘Fritz X’ radio-guided bomb. Just before this, in 1936, the U.S. Naval Research Laboratories and the Naval Aviation Factory, under the codename “Drone,” began to explore the development of unmanned aircraft that could be guided by radio waves. As early as 1934, Radio Corporation of America engineer Vladimir Zworykin, an early pioneer of televisual technologies, proposed to design a “Flying Torpedo with an Electronic Eye,” already envisioning a missile that utilized camera guidance (See Chandler, 2017: 92). The development of these weapons was continued by the United States in the Korean War, where various methods of remote control were experimented including ones that operated in the visual and non-visual range of the electromagnetic spectrum, such as radio control and infrared detection. This eventually led to munitions that were designated as “fire and forget,” meaning they required no further human guidance or indication from secondary support units (such as in the case of laser-guided bombs which rely on ground troops using a laser designator to “paint” a target) after their launch. This development marks a major shift in the history of missile weapons since never had the process of target designation been entrusted to a machine rather than a human mind. Devices that have emerged since, such as cruise missiles, suggest a sophistication of predictive machine vision that relies on several key technologies (that exceed the visual realm) to keep them in the air as well as distinguish their target from background noise.
Attaching a camera to a missile that is moving at a high rate of speed and through turbulence, however, presents a significant challenge to the creation of a usable operational image. The development of military optical guidance systems relied heavily on the invention of stabilization systems that could counteract and compensate for unwanted movement and keep munitions on target. An early example of this appears in the German Vergeltungswaffe 2 (“V2”) ballistic missile that appeared near the end of the Second World War and relied on a pendulous integrating gyroscopic accelerometer (PIGA) system to aid in its flight. This system had been adapted from the first inertial sensors designed by Max Schuler in 1912, which were used in airplanes and naval vessels (Haeussermann, 1981). During the war, the American scientist and inventor Charles Stark Draper developed guidance and stabilization systems for military applications and would incorporate a version of a recovered V2 rocket guidance system into later guidance systems used as part of NASA’s Apollo missions as well as the intercontinental ballistic nuclear armaments of the Cold War (Haeussermann et al., 2001). Draper was a pioneer of inertial navigation systems and invented the first “lead-computing” gunsights—a device that allowed its operator to point the sight directly at a moving target without having to estimate how far to “lead” the target based on range and velocity since the targeting system would compensate. Draper’s Mark 14 gunsight, which was gyro-stabilized and used on ship-mounted anti-aircraft guns, appeared in August, 1941 and could compensate for the roll and pitch of waves while simultaneously “leading” the targeted aircraft—a significant advance in not only targeting but also visual stabilization platforms (Wildenberg, 2013).
During and after the Second World War, cinematic camera mobility underwent its own parallel development. Lighter cameras—originally designed for war journalism—allowed for handheld operation. Smaller and lighter, however, meant less stable and shakier; what was initially considered a technical deficiency of the handheld shot has developed into an aesthetic of camera motion that signifies immediacy, urgency, disorientation and a sense of being in the action—the opposite of the extended and choreographed stable, fluid tracking shot. Historically, shots that required steady movement were limited by dolly track constraints such as a requirement of even floors and the length of extension afforded by camera cranes, at least until the invention of the Steadicam by Garrett Brown in 1973. While the shaky handheld aesthetic was favoured by various post-war European New Wave filmmakers, as Jean Pierre Geuens (1993: 10) puts it, “with Steadicam, Garret Brown did more than just rein in the rebellious foreign cinema, he did it while remaining true to the Hollywood tradition,” since his new device re-inscribed the heavy movements that were previously only possible with heavy studio equipment back into the handheld image.
The problem of counteracting camera shake was like that of stabilizing guided munitions. Either the camera’s speed could be increased or its mass—more specifically, its inertial mass—could be increased so that it is more difficult to divert it from its path. The concept behind Brown’s mechanical image stabilizer is simple: it uses a balanced counterweight on an arm to create a high inertial mass (the camera rigs can weigh in excess of 30 kg) that is too heavy to be shaken by the small motions of the operator’s hands and thus allows it to be carried by the operator, independent of tracks. Movement of the camera is governed by a passive, spring-based inertial damping system and quick jerks are smoothed into slower, more graceful, curves. Since the success of the initial Steadicam, Brown has brought an array of mobile camera rigs to market that use passive kinetic damping and controlled inertial mass to smooth out the motion of the camera. FlyCam is a small remote-control camera capable of tilting, panning and zooming as well as traversing a long steel cable. Originally designed by Brown and Pat Hally for the 1996 Summer Olympics in Atlanta, the FlyCam’s primary uses are sporting events and live performance coverage where it is often used to create sweeping crowd shots—to the point of becoming a visual cliché today. Similarly, the GoCam is often used in the coverage of swimming and track and field events where repetitive back and forth motion is well suited to the linear range of motion that it is constrained to. Brown’s SkyCam is primarily used for recording American football games and large events. This stabilized tracking camera system is fully rotatable and is mounted on four steel cables of variable tension that span the entire width of a stadium and afford it a range of motion that approaches total three-dimensional coverage of its airspace. The locations of obstructions, such as the upright goal posts, can be programmed into the memory of the SkyCam system in order to automatically avoid collision—a primitive form of automated guidance.
Even though Brown’s later inventions would eventually include active and computer-controlled stabilization, all these mobile camera technologies are descended from the original passively dampened Steadicam mechanism and, as such, are designed to create a similarly stabilized aesthetic in the images that they are used to produce. The term “smooth” is often used to describe the movement of these mobile camera systems, a term that does not describe an arbitrary form of motion, but rather, specific types of trajectories. For example, redirection of the FlyCam as it moves is governed and limited by a braking mechanism that incorporates a fly wheel. This heavy wheel is attached to the spooling mechanism thus increasing the kinetic energy of the otherwise light camera and making it behave as a much heavier object. Accordingly, the rate of change of its movements (its acceleration or deceleration) is attenuated through this inertial dampening that increases the kinetic energy of the system. This physical characteristic of the system results in less erratic changes in direction that are more stable and, arguably, more immersive for the viewer. As a result, the system moves in a manner that more closely resembles the smooth trajectories of the simple harmonic motion of oscillating springs and pendulums—the more closely the movements resemble sinusoidal and mathematically simple trajectories, the smoother they are deemed to be. When commentators describe these camera motions as “graceful” or “fluid,” what they are often identifying is the close approximation to simple harmonic motion: the physics of pendulums and springs, trajectories governed by sine waves and parabolas. Conversely, “shaky” images are those whose chaotic motion resists such mathematical modeling.
Brown’s inventions effectively changed the historical trajectory of cinematic movement. Designed as a gyroscopically stabilized camera mount that could mimic the gliding motion of a dolly shot while retaining the fluid mobility of handheld operation, Brown’s stabilized systems initiated a new aesthetic of non-corporeal, floating vision. Viewers were granted access to new perspectives: the camera could now follow bodies through impossible architectural geometries, climb stairs, circle actors in motion, and drift ghost-like through space. As Patrick Keating and Philippe Bédard have suggested, the Steadicam stands at “the contraction between anthropomorphism and omnipresence” (Keating, 2015) and “appears to act like a ‘ghost’; a liminal figure that exists in both physical and ethereal realms” (Bédard, 2017: 26). Daniel Morgan writes of the Steadicam that “the very smoothness of the camera’s movement—its apparent detachment from the terrain—means that it feels divorced from the operator’s body” and that, in reference to its uncanny use in The Shining (Kubrick, 1980), it becomes “a kind of character, one that is within the world of the film yet also not quite of it” (Morgan, 2021: 183–84).
The kinetic aesthetics of the virtual moving camera
Though tethered to a human operator, the Steadicam’s visual logic transcends the body, gesturing toward a mechanized observer not bound by fatigue, mass, or inertia. But the Steadicam also introduced new aesthetic criteria for the judgement of both physical and virtual cinematography that eliminates the physical camera entirely. Discussing the emergence of computer-generated animation and its applications to the cinematic image, Brown identified the problem of the “massless camera.” He writes that “when a lens just zips down from the stratosphere through a keyhole and onto an eyelash, it suggests that the camera has no more substance than a neutron or a quark, and the result is correspondingly trivial” (Brown, 2000). Though the rectilinear movement possible with a virtual camera has been hinted at for visual effect in Tron (Lisberger, 1982), most computer-generated camera movements, and the software systems that simulate them, have adapted a “kinetic aesthetic” of camera movement into the realm of the purely virtual. Computer graphic software incorporates simulated models of the type of stable movement that is described above. In effect, a “camera” with no mass, no physical existence, is programmed to exhibit the properties of an actual moving object and limited from the types of “movements” it could achieve such as turning instantly or changing speed with no period of acceleration or deceleration. Sudden motion of the virtual viewpoint would destroy the realism of the shot since it would reveal that there is no camera to speak of. Accordingly, the spectator judges the physicality of the movement of the virtual viewpoint based on a combination of real-world perception of movement and a memory of cinematic tropes. Thus, an awareness of physical techniques of cinematic motion is used to judge a completely artificial representation of motion in terms of its realism and entertainment value. Conversely, “camera shake” is frequently added back into the sinusoidal trajectories of the virtual camera to increase kinetic realism or to re-embody the camera and create the illusion of its physicality. This is an example of what Shane Denson (2020) has called a “discorrelated image” that reincorporates a simulacrum of analogue cinema back into the digital image.
While I have outlined a relatively linear and teleological progression from the physical Steadicam to the stabilized and smooth virtual viewpoint, as mentioned, a media-archaeological approach that recognizes the interconnection and feedback between the development of these two apparently divergent tracks is potentially more accurate. If we look more closely at examples of the emergence of the stabilized image (both physical and algorithmic, as well as the pre-digital video image) we will discover a hybrid aesthetic and discursive space in which techniques in the physical realm inform those in the virtual, and vice versa. It is illuminating in this regard to consider the example of the first stabilized video camera images. The first commercially available video-based optical image stabilization system that was small enough to fit inside a camcorder was introduced by Panasonic in 1988 (Oshima et al., 2023). This system used a miniaturized version of the type of tuning fork gyro, first patented in 1942 by Irish engineer (and double agent) Frederick William Meredith, for the British war effort in the Second World War, to compensate for small movements of the camera by moving its internal video sensor (Collinson, 2011: 260). Developments in both video and software-based image stabilization incorporate and mimic the passive and kinetically dampened aesthetic of their physical precursors. Visual tracking and image stabilization algorithms simplify complex movements, reducing them to more approximate versions of sinusoidal movement, Bézier curves, or a Fourier transform (which is itself based on the idea that any function can be expressed as a series of sine waves) (Martinez-de Dios & Ollero, 2004). In other words, stabilization algorithms take complex motion and interpolate it into “best fit” models based on combinations of simple curvatures, such as parabolas or sine waves, that approximate and interpolate actual movement to within an acceptable degree of error. Systems such as these create real-time models of real-world inputs and attempt to identify patterns that arise in the simplified data to anticipate the trajectory of an object or the shake of an image sensor and then predictively compensate for it.
While they mimic their physically more massive precursors, stabilized but lighter cameras move on trajectories that are mathematically simpler to represent, and thus, the images they produce are in a sense more akin to those generated by computer systems. What emerged historically, in effect, was a hybrid image aesthetic borne out by the simultaneous emergence of camera viewpoints that were previously physically impossible with camera viewpoints that were never physical in the first place. For example, in this early example of press coverage from the time of the release of the first SkyCam, Charles Zelkovich (2002), writing for The Toronto Star, noted the uncanny resemblance between SkyCam footage and the overhead virtual viewpoint of the sports videogame:
Typical of what SkyCam adds to a game was a play Saturday night that brought viewers so close to Philadelphia quarterback A. J. Feeley they could almost feel him getting hit. Yes, the SkyCam produces angles that look suspiciously like a video game—no doubt an attempt to lure younger viewers—but used in moderation it’s a winner.
Even as it introduced the aesthetic of the free and mobile but dampened camera, the SkyCam also responded to the emergence of virtual and camera-free representations of the sports spectacle. Zelkovich’s approval of the SkyCam system stems from its demonstrated ability to more closely frame and track the quarterback in motion. At the same time, he also approves of the new camera technology’s approximation of the impossible viewpoints of the virtual image. In his estimation, the recorded image’s resemblance to the deterministic logic of the videogame image heightens the aesthetic effect of the spectacle since it more accurately renders the human figures of the players within the controlled, rules-based logic of the game.
Anticipatory movement and zones of determinism
Predictability and controlled, deterministic environments are generally the best settings for the use of the highly mobile camera systems described above. This is why many of the applications for Garrett Brown’s mobile camera inventions are in the realm of sporting event coverage. The rules and clear boundaries of sports create “black and white” conceptual divisions as well as divisions between things (figurative as well as literal). However, tracking the movement of players in a sports environment is a question of both kinetics and the anticipation of motion. As such, suspense and aesthetic pleasure in sport broadcasting is heightened through the cinematographer’s management of the fine balance between the tracking of the predictable movements of players and their potential to elude or exceed the limits of the frame. In a soccer match, for example, as a player in possession of the ball nears the opposing net, convention dictates that a camera will stay framed on the player with the ball up until the point where it is anticipated that they will attempt to shoot it at the net. Just before the moment of the kick, the camera operator will zoom out and whip pan, simultaneously widening the frame and “following” the ball. Rather than react to the shot, which would miss the action, the camera operator usually pre-empts and anticipates the shot taking place. The predictive nature of this camera motion becomes most apparent when a player makes a slight deviation from their assumed course of action and only fakes a shot, causing the camera to momentarily veer off course before reframing the action. Suspense, for the television viewer, is thus heightened by the operator of the apparatus that tracks the play. Tracking motion becomes, in and of itself, an object of aesthetic contemplation.
This situation has also been exploited to create suspense in the cinema. For example, consider Top Gun (Scott, 1986), a film lauded for the physicality and realism of its special effects sequences, which incorporated extensive use of U.S. military hardware and pilots.In one of this film’s later aerial dogfight scenes, a point-of-view shot from the pilot of a U.S. Navy F-14 fighter plane who is in pursuit of a faceless and nameless target in a late Soviet-era “MiG-28” is used. The pursued aircraft is shown closely trailed by a green crosshair that signifies the visual locking mechanism of the plane’s weapon system and suspense is generated through the efforts of the pilot attempting to “lock on” to an enemy aircraft. In other words, the tension and release of the narrative progression of the scene pivots on a pilot using a vision machine to try and anticipate the movement of the target enough that he may then “fire and forget” a missile with a presumed guarantee of interception. Earlier scenes in Top Gun depicting the Navy pilots conducting training flights and mock engagements—in which no missiles are actually fired—prepare the audience for this use of the tracking “lock on” shot as a device of suspense. In these scenes, although there may be a possibility of evasion after the launch of an air-to-air missile, this is irrelevant for the plot—the pilots in the exercise are “killed” at the instant the machine vision apparatus achieves a “lock.” The physical missile is less important than the visual apparatus that controls it. While these scenes demonstrate what has now become a common trope of action cinema, they also clearly exemplify how the kinetic aesthetic of the mobile tracking camera has been incorporated into the cinema. The film’s use of the F-14 is also notable in that this was one of the last fighter attack planes in the U.S. arsenal that utilized a manual control system. The F-16 that superseded it, by comparison, was designed with an unstable airframe that made it more maneuverable but required cybernetic control (Tomayko, 2000: 38). This new “fly-by-wire” system incorporated pilot input but otherwise automatically stabilized the airplane using an analog computer system that relied on accelerometers and gyroscopes.
The centrality of the “lock on” shot in Top Gun also recalls the specific history of the development of military stabilization and target-acquisition systems discussed above. Top Gun relies on the creation of tension through its use of the kinetic aesthetic and anticipatory tracking and, in so doing, reveals the close connection between camera stabilization technologies and military weapons and visualization platforms. The film both looks back to the television-guided cameras proposed before World War II and forward to complex computer-guided and pilotless drones and weapons systems. Its high-flying cinematography is made physical and material by its emphasis on the humanity of its pilots, grounding its fantasy of mobility by reminding its audience of their vulnerability through the freak accident that leads to the death of the copilot and navigator, “Goose.” Created at a moment just before the adoption of fly-by-wire systems and drones would radically change the nature of war, Top Gun reads now as an anxious text that foregrounds the sweat and flesh of its human characters even as this era of military aviation was largely ending—the missions soon to be replaced by remote pilots in air-conditioned command centers operating attack drones half a world away.
It is illuminating to compare the anxieties of the original film to its recent blockbuster sequel, Top Gun: Maverick (Kosinski, 2022). Much of the marketing of this later film, and Tom Cruise’s statements about it, promoted the idea that “everything you see is real” and that “no CGI was used in the film” (The Movie Rabbit Hole, 2024). The film was successfully marketed as a return to a more muscular and physical version of airborne warfare in the age of the drone. While it is true that there are many shots of the actors flying in real cockpits and the film did in fact employ US Navy pilots to fly real planes, most of these scenes were heavily edited using visual effects (the film contains over 2,400 visual effects shots). Often, pilots flew real “reference” planes that were replaced in post-production by computer-generated FA-18s. At other times in the film, footage of a real plane (or a reference that has been replaced) would be combined with up to three other computer-generated ones. Top Gun: Maverick’s success, however, lay in its appeal to the kinetic aesthetic of the original—an appeal that also pushed back against the perception of the loss of the body in post-cinematic production.
In weapons guidance systems and modern drones, digital informational models of terrain and surroundings have increasingly been adopted that privilege the use of complex computer models over visual data. Cruise missile guidance systems, for example, do not rely solely on live optical data but rather incorporate complex terrain models for navigation. These pre-modelled maps are combined with camera information to guide missiles at speeds where real-time feedback would be too slow, whereas a model of terrain allows a missile to alter course before visually detecting an obstacle. The aim is to afford greater speed and maneuverability through the imposition of a deterministic model onto a chaotic world. This is an attempt to bring warfare out of the real world and into the realm of a binary rule system more akin to a tennis match in which boundaries and targets are determined by automated visualization systems that have more authority than the human referees making calls—the infamous on-court outbursts of players like John McEnroe have now been silenced by the cold mechanical eye of the “electronic line judge.” This overriding of human agency on the sports field has a chilling parallel in modern conflicts where automated “AI” target selection tools, such as Palantir’s “Lavender” system that has been deployed by Israel in Gaza, are used to “generate” more targets, and more quickly, than their human counterparts are capable of (Iraqi, 2024). There is an epistemological smoothing at work here in the ruthless sorting and selection within the database that echoes the statistical method of finding the line of “best fit.”
It should come as little surprise that camera and weapons systems that utilize similar stabilization technologies are ontologically related in terms of their tendency to eliminate the human form and favor the presentation of the world as geometric space—a form of the “architectural film” that Murnau suggested. Jean Pierre Geuens (1993: 16) argues that “by disconnecting the camera from the operator, by making it float above the world,” the Steadicam “was at last able to fully activate Euclid’s non-human, ideal geometric space,” and further:
The repercussion was immediate: although on the surface the world was still the same, its visual apprehension could no longer reflect the production or experience of a human agent. Leaving behind its empirical grounding, the perceptual field now generates what appears as pure, omnipresent, and objective visuality. In other words, vision seems to emanate from a notional point hypothetically constructed outside of space and time—a situation that effectively detaches what is known from any knower.
The visual logic of this fantasy of control has perhaps been most vividly captured in the grainy, black-and-white “bomb’s-eye” videos released during the U.S. invasion of Iraq in 2003 (a visual trope that has largely fallen out of fashion in news coverage of recent imperialist conflicts). These hyper-stabilized images, often captured from kilometers away, show buildings and vehicles centered and in high contrast yet conspicuously free of visible human figures. At the same time, the high rate of speed of the cameras that capture them is encoded in images that slowly rotate around or fly over targets—producing eerie canted and inverted angles. These are images engineered for machine readability and optimized for automatic tracking and targeting. The clarity of these images—targets crisply framed and “locked on” within algorithmically smoothed motion—presents a visual field that seems devoid of ambiguity. Yet this clarity is only achieved by suppressing complexity and smoothing the great speeds and turbulent motion of the airborne weapons platforms from which they are captured.
The extreme lock-on stabilization effect that appears in the operational images from guided munitions can now be easily replicated for cinematic production using commercially available digital cameras and video editing software suites. As YouTube tutorials demonstrate (The Car Video Guy, 2025), when combined with stabilized camera footage captured with a gimbal and with high saturation imagery, this effect can sometimes be exploited to create the overly smooth and hyper-real imagery of product videography that appears to be computer-generated but that is lens-based and captured from reality. Often these images incorporate sudden accelerations and decelerations of the camera along its path to diverge from the conventions of “smooth” motion, thus further suggesting that they are created through the use of computer-generated virtual perspectives by breaking the rules of the kinetic aesthetic—projecting an illusion of the smoothed space of CG back on to the lens-based image.
Conclusion: Kinetic realism and the computer-generated image
Familiarization with images of warfare, either fictional, from videogames, or mediated through the news media and ‘bomb’s eye’ cameras, are another factor that contributes to the creation of criteria for the identification of the kinetic aesthetic described here. These images train audiences, giving them an expertise that they can then apply to judgments on the apparent realism of computer-generated sequences from films that depict war and high-speed combat even when the ‘feasibility’ of trajectories and stresses being represented are in fact complete simulacra. While there are now countless CG-rich films today that demonstrate the “crazy cameras” of post-cinematic filmmaking, or that meticulously re-model and simulate the kinetics of real cameras, in keeping with the already discussed examples, I would like to close with a discussion of another American action film that features air-to-air combat—this time with an added element of artificial intelligence automation that also points toward the current moment of prompt-based generative imagery.
Rob Cohen’s Stealth (2005) is a summer blockbuster action “theme ride” film set in a near-future world of advanced military hardware and featuring an AI-piloted fighter jet named EDI (“Extreme Deep Invader,” pronounced “Eddie”) as its villain. The film was produced in close collaboration with the United States military, using their aircraft carriers and other military hardware, and the U.S. Navy was granted script approval over the final product (Cohen, 2007). Firmly entrenched in the rich tradition of films depicting sinister artificial intelligences that threaten to replace their human counterparts, the film is doubly interesting as a cultural object in that it predates the current moment of anxiety around artificial intelligence and is also an early example of the CG and green screen heavy cinematography that would soon come to dominate Hollywood action films. The plot revolves around an AI-controlled prototype aircraft that threatens the jobs of three elite U.S. Navy pilots who have been tasked with working with and training it. In the film, EDI malfunctions and threatens world peace until it eventually learns how to be a team player—first as a pure act of self-preservation when it requires the assistance of its human counterpart, and later when it destroys itself in an act of self-sacrifice to save the human leads of the film from certain death. The final act of the film recalls the buddy or wingman narrative of Top Gun, except here the pairing is between human and machine as Lt. Ben Gannon (Josh Lucas) climbs into the cockpit of the “UCAV” (Unmanned Combat Aerial Vehicle) to fly together with it on a desperate mission to save fellow pilot, and love interest, Lt. Kara Wade (Jessica Biel) from behind enemy lines within North Korea.
Much of the structure of Stealth’s dogfight scenes resembles films that precede it such as Top Gun, with the major difference being that Stealth is a film that relies heavily on computer graphics and virtual camera work. The action sequences in Stealth nevertheless exhibit all the concern with kinetic realism that audiences have come to expect from such films. The skills of the elite pilots and the AI airplane are advanced and they execute several fantastic maneuvers—but not so exaggerated that they cross the threshold of belief that is so important to the illusion of these sequences. For example, when the AI plane suddenly and aerobatically spins around, surprising and destroying a pursuing jet fighter, the rendering of its movement by the virtual camera can be seen as akin to the stabilized image of the soccer player who fakes the shot—almost but not quite escaping the tracking gaze of the camera. The movement remains within the parameters of its system but is entertaining or exciting precisely because it pushes against them and reveals the boundaries of that system. By mimicking ‘real’ physics, even in entirely determined and simulated film sequences, the sensation of speed is not trivialized since the audience still has an imaginary sensation of the danger of impending collision. Here we see again the tracking and targeting as a device of narrative tension but also as a reminder of the legacy of the military technologies that have influenced their creation.
The virtual cinematography of Stealth is a crude but direct expression of Vertov’s desire for a camera that “outstrip running soldiers,” “ascend with an airplane,” and that is “free” from “human immobility.” The film is also demonstrative of a version of the dematerialized camera described originally by Murnau, but fully realized in post-cinema. While its cinematographic techniques demonstrate the elimination of the human behind the camera, the film’s images depict the elimination of the human body itself and, specifically, the body of the human pilot. In the film, elite airman Lt. Henry Purcell (Jamie Fox) meets his end while pursuing EDI after the plane goes rogue. Significantly, his death scene uses a practical effect and gasoline explosion and is rendered in a slow-motion shot of a one-eighth scale miniature fighter jet colliding at full speed with a mountainside. Purcell’s death does not leave a corpse and is achieved by a complete annihilation of the human form. Perhaps, as Walter Benjamin (2007: 242) would have it, Stealth is merely an inevitable expression of a humanity whose “self-alienation has reached such a degree that it can experience its own destruction as an aesthetic pleasure of the first order.” While this is certainly feasible, I would add that the film be read more specifically as an expression of the alignment of military and cinematic guidance and tracking technologies that have been outlined here.
Besides being a direct product of the United States military-entertainment complex, several key elements of Stealth’s production are notable. The film’s cinematography combined exterior computer-generated shots of the fictional “Talon” fighter jets—the aircraft had been designed in coordination with engineers from the weapons manufacturer Northrop Grumman—with close-up shots of actors inside large model cockpits (Bielik, 2005). To give the actors a “physical experience of being a combat aviator […] that could not be acted,” Cohen had a “hundred tonne, two-million dollar machine” built that was known on set as “the gimbal” (Cohen, 2005). This full-size prop cockpit was mounted on a pneumatic system that allowed for three-axis movement (Bielik, 2005). The system was “flown by combat pilots by remote control so that the actors were actually getting the motion that the combat pilots knew would happen under certain conditions” (Cohen, 2005). The physicality of shooting scenes that used the gimbal was so intense that it reportedly led to nausea in Fox and a concussion suffered by Lucas in a crash-landing scene (Cohen, 2005).
While the problem of realism in the physical performance of the actors was solved through this elaborate kinetic arrangement, the film’s requirement for the portrayal of vast foreground and background landscapes, due to the large distances covered by the high-speed fictional fighter jets, posed a technical challenge for the digital effects team. To solve this issue, Stealth was the first film to employ Digital Domain’s prototype Terragen software (now an industry standard), which used datasets of existing real-world terrain to create “the basic geometry of the landscape” (Bielik, 2005) for the modelled terrain over which the CG jets traversed. Terragen was used in over eighty percent of the aerial shots in the film (Bielik, 2005) as it proved more convenient than using actual aerial footage for backdrops and was more adaptable to the rapid virtual camera movements that are featured throughout. In using Terragen, Stealth’s cinematography reminds us of the shift in military guidance systems from those of the mid-twentieth century that were reliant on visual imagery, to more recent weapons, like the cruise missile, that instead use virtual terrain models to move over real terrain at high speeds. Cohen himself stated that the film “puts a spotlight” on the U.S. military’s statements that it was building its last human-piloted fighters in the coming decades. For, as Bishop and Phillips put it (2010: 30): “Only computers and other machines will be able to read the non-image of future warfare.” Stealth thus stands as a document of a turning point in both cinematographic and military vision, stabilization and tracking systems, produced during a shift away from anthropocentric optical systems and towards computer vision and algorithmic control.
Postscript: Kinetic aesthetics and the stochastic image
As a final thought, I would like to turn to the ways in which the kinetic aesthetic that I have outlined here may have implications for our evaluation of images created using the generative techniques of large language models. Through the central anxiety of its plot—AI replacement—Stealth anticipates by several decades not only the replacement of human pilots by computers but also today’s fears of the replacement of cinematic images by AI-generated ones through the entirely secondary role that the human elements of the film (the actors) take to the action and the computer generated effects. Anxieties abound today concerning the replacement of human filmworkers by automated systems and virtual production environments—so much so that major Hollywood film studios regularly go to great lengths to disavow the use of computer-generated visual effects through marketing campaigns that insist that today’s visual-effects heavy films were created “in camera” or with “practical effects.” In order to, in effect, cover their tracks, CG-heavy productions frequently use the kinetic aesthetic to “ground” the camera and create the illusion of the physics-based reality of the film. At the same time, the recent “AI” images and video created by tools that aggregate databases of content extracted from creators on the open web—such as OpenAI’s Sora, Luma Labs, Runway and others—are the most recent iteration of image creation technologies that have called into question our fundamental relationship with the “ground truth” of the lens-based image. These are what I have called the “stochastic images” of large language models that first emerged in 2015 in engineer Alexander Mordvintsev’s work for Google’s DeepDream. They are statistically generated images derived from underlying datasets—“photo-surrealistic moving imagery […] based on an internal semantic logic” of a new “language system for the cinematic image” that recall the earlier efforts of Christian Metz or Sergei Eisenstein to formulate their own such language systems (Lyons, 2023: 444). These images have today, in many cases, become indistinguishable from images of reality while we, as viewers, continue to engage in a relentless arms race of image sleuthing to identify tell-tale signs of “AI” imagery—such as extra fingers or inconsistent lighting reflections in eyes—that are patched and removed from subsequent versions. In the recent explosion of prompt-generated videos, one tell-tale sign remains that has proven to be a difficult problem for the large language model video generation paradigm—the recreation of classical mechanics, dynamics and kinetic relationships between objects in the frame and the movement of the frame itself. Researchers have tested these stochastic images to see if the underlying language-based model can predict or learn physical relationships governed by classical mechanics with poor results, finding instead that “models fail to abstract general physical rules and instead exhibit ‘case-based’ generalization behavior, i.e., mimicking the closest training example” rather than demonstrating their “understanding” of physical reality (Kang et al., 2024: 1).[1] OpenAI has claimed that “scaling” its “video generation models is a promising path towards building general purpose simulators of the physical world” and that “Sora can generate videos with dynamic camera motion” but it also admits that “it does not accurately model the physics of many basic interactions” (OpenAI, 2024). Discerning these generated images from those captured with physical camera rests on an assessment of not only their basic fidelity and correlation with the physical world, but also a judgment of how they conform to the accepted dynamics of the moving camera that we have learned as spectators over the last century of cinematic production. Thus, at least for now, this latest iteration of simulated imagery is once again being judged according to its ability to replicate the kinetic aesthetic of media that have preceded it.
References
Bédard, P. (2017) ‘The Protean Camera’, Synoptique, 4(2): 16–36.
Benjamin, W. (1972) ‘A Short History of Photography’, Screen 13(1): 5–26. Available at: https://doi.org/10.1093/screen/13.1.5.
Benjamin, W. (2007) Illuminations. New York: Schocken Books.
Bielik, A. (2005) ‘“Stealth”: Keeping Speed with Jet-Fast F/X’, Animation World Network. Available at: https://www.awn.com/vfxworld/stealth-keeping-speed-jet-fast-fx (Accessed: 25 June 2025).
Bird, K. (2017) ‘“Dancing, Flying Camera Jockeys”: Invisible Labor, Craft Discourse, and Embodied Steadicam and Panaglide Technique from 1972 to 1985’, The Velvet Light Trap, 80(80): 48–65. Available at: https://doi.org/10.7560/VLT8005.
Bishop, R. and J. Phillips (2010) Modernist Avant-Garde Aesthetics and Contemporary Military Technology: Technicities of Perception. Edinburgh: Edinburgh University Press.
Brown, G. (2000) ‘The Moving Camera’, garrettcam.com. Available at: https://www.garrettcam.com/the-moving-camera-part-1 (Accessed 6 June 2025).
Chandler, K. (2017) ‘American Kamikaze: Television-Guided Assault Drones in World War II’ in L. Parks and C. Kaplan (eds.) Life in the Age of Drone Warfare. Durham: Duke University Press.
Cohen, R. (2005) ‘Stealth’. Interviewed by Bobbie Wygant. Available at: https://www.youtube.com/watch?v=ckybYQZxjrw.
Collinson, R.P.G. (2011) Introduction to Avionics Systems. Dordrecht: Springer Netherlands. Available at: https://doi.org/10.1007/978-94-007-0708-5.
Dawson, J. (2003) Vertov, Dziga. Available at: https://www.sensesofcinema.com/2003/great-directors/vertov/ (Accessed 5 June 2025).
Denson, S. (2020) Discorrelated Images. Durham, Durham: Duke University Press.
Eisner, L. (1973) The Haunted Screen: Expressionism in the German Cinema and the Influence of Max Reinhardt. London: Secker & Warburg.
Elsaesser, T. (2016) Film History as Media Archaeology: Tracking Digital Cinema. Amsterdam: Amsterdam University Press.
Farocki, H. (2004) ‘Phantom Images’, Public, 29, New Localities.
Garrido, Q., N. Ballas, M. Assran, A. Bardes, L. Najman, M. Rabbat, E. Dupoux and Y. LeCun(2025) ‘Intuitive physics understanding emerges from self-supervised pretraining on natural videos’, FAIR at Meta, University Gustave Eiffel, EHESS, arXiv. Available at: https://doi.org/10.48550/arXiv.2502.11831.
Geuens, J.P. (1993) ‘Visuality and Power: The Work of the Steadicam’, Film Quarterly, 47(2): 8–17. Available at: https://doi.org/10.2307/1213198.
Haeussermann, W. (1981) ‘Developments in the field of automatic guidance and control of rockets’, Journal of Guidance, Control, and Dynamics, 4(3). Available at: https://arc.aiaa.org/doi/abs/10.2514/3.19735?journalCode=jgc (Accessed: 10 June 2025).
Haeussermann, W., F. Mueller and R. Hopkins (2001) ‘The pendulous integrating gyroscope accelerometer (PIGA) from the V-2 to trident D5, the strategic instrument of choice’, in AIAA Guidance, Navigation, and Control Conference and Exhibit, Montreal, Canada: American Institute of Aeronautics and Astronautics. Available at: https://doi.org/10.2514/6.2001-4288.
Iraqi, A. (2024) ‘“Lavender”: The AI machine directing Israel’s bombing spree in Gaza’, +972 Magazine, 3 April. Available at: https://www.972mag.com/lavender-ai-israeli-army-gaza/ (Accessed: 10 June 2025).
Kang, B., Y. Yue, R. Lu, Z. Lin, Y. Zhao, K. Wang, G. Huang and J.Feng(2024) ‘How Far is Video Generation from World Model: A Physical Law Perspective’. arXiv. Available at: https://doi.org/10.48550/arXiv.2411.02385.
Keating, P. (2015) ‘A Homeless Ghost: The Moving Camera and its Analogies’, [in]Transition, 2(4). Available at: https://doi.org/10.16995/intransition.11364.
Long, C. (2023) ‘Between Handheld Camera and Steadicam: The Body-Mounted Cinematography of Seconds (1966) and Its Legacies’, Film History 35(2): 26–51. Available at: https://doi.org/10.2979/fih.00002.
Lyons, O. (2023) ‘Towards a Theory of Machine Learning and the Cinematic Image’, Proceedings of the twenty-seventh International Symposium on Electronic Art: Possibles, pp. 439–446. Available at: https://isea-archives.org/docs/2022/proceedings/ISEA2022-BCN-Proceedings_.pdf.
Martinez-de Dios, J.R. and A. Ollero (2004) ‘A Real-Time Image Stabilization System Based on Fourier-Mellin Transform’, in A. Campilho and M. Kamel (eds.) Image Analysis and Recognition. Berlin, Heidelberg: Springer Berlin Heidelberg (Lecture Notes in Computer Science), pp. 376–383. Available at: https://doi.org/10.1007/978-3-540-30125-7_47.
Morgan, D. (2021) The Lure of the Image: Epistemic Fantasies of the Moving Camera. California: University of California Press.
Motamed, S., L. Culp, K. Swersky, P. Jaini and R. Geirhos(2025) ‘Do generative video models understand physical principles?’ [Preprint] INSAIT, Sofia University, Google DeepMind, arXiv. Available at: https://doi.org/10.48550/arXiv.2501.09038.
OpenAI (2024) openai.com. Available at: https://openai.com/index/video-generation-models-as-world-simulators/ (Accessed on 10 June 2025).
Oshima, M., T. Hayashi, S. Fujioka, T. Inaji, H. Mitani, J. Kajino, K. Ikeda and K. Komoda (1989) ‘VHS camcorder with electronic image stabilizer’, IEEE Transactions on Consumer Electronics, 35(4): 749–758. Available at: https://doi.org/10.1109/30.106892.
Oshima, M., T. Hayashi, S. Matsui, M. Fukui and I. Shirakawa(2023) ‘History of World’s First Commercialization of Image Stabilizers for Handheld Cameras’, in 2023 8th IEEE History of Electrotechnology Conference (HISTELCON), Florence, Italy: IEEE: 56–58. Available at: https://doi.org/10.1109/HISTELCON56357.2023.10365854.
Pierson, R. (2015) ‘Whole-Screen Metamorphosis and the Imagined Camera (Notes on Perspectival Movement in Animation)’, Animation: An Interdisciplinary Journal, 10(1): 6–21. Available at: https://doi.org/10.1177/1746847715570812.
Stealth (2005) Directed by Rob Cohen. Available at: Amazon Prime Video (Accessed: 28 June 2025).
The Car Video Guy (2025) ‘My SECRETS to Buttery SMOOTH Gimbal Shots’. Available at https://www.youtube.com/watch?v=Z2D8GNKE9MU&t=309s (Accessed: 10 June 2025).
The Movie Rabbit Hole (2024) ‘“NO CGI” is really just INVISIBLE CGI (1/5)’. Available at https://www.youtube.com/watch?v=7ttG90raCNo&t=498s (Accessed: 10 June 2025).
The Shining (1980) Directed by Stanley Kubrick. Available at: Netflix (Accessed: 28 June 2025).
Tomayko, J.E. (2000) Computers Take Flight: A History of NASA’s Pioneering Digital Fly-by-Wire Project. Washington, D.C.: NASA (The NASA History Series). Available at: https://ntrs.nasa.gov/citations/20050157919 (Accessed: 28 June 2025).
Top Gun (1986) Directed by Tony Scott. Available at: Amazon Prime Video (Accessed: 28 June 2025).
Top Gun: Maverick (2022) Directed by Joseph Kosinski. Available at: Amazon Prime Video (Accessed: 28 June 2025).
Tron (1982) Directed by Steven Lisberger. Available at: Disney+ (Accessed 28 June 2025).
Vertov, D. (1984) Kino-Eye: The Writings of Dziga Vertov. Translated by K. O’Brien. Berkeley, California: University of California Press.
Virilio, P. (1989) War and Cinema. London: Verso.
Wildenberg, T. (2013) ‘The Shoebox that Transformed Antiaircraft Fire Control’, Naval History Magazine,27(6). Available at: https://www.usni.org/magazines/naval-history-magazine/2013/november/shoebox-transformed-antiaircraft-fire-control (Accessed: 10 June 2025).
Zelkovich, C. (2002) ‘Nobody Does Football Better than ESPN’, The Toronto Star, 23 December 2002.
Notes
[1] The question of whether LLM-based video generation systems can internally “understand” physical world models and kinetic, fluid, or thermodynamic systems is one of active debate amongst researchers. See: Motamed et al., 2025; Garrido et al., 2025.
Owen Lyons is an Assistant Professor in the School of Image Arts at Toronto Metropolitan University. His recent monograph, Finance and the World Economy in Weimar Cinema, published in 2023 by Amsterdam University Press, addresses depictions of finance, speculation, and capital that appear in the films and visual culture of the Weimar Republic and their intersection with gender, modernity and nation. His research interests include Weimar cinema, media archaeology, automation and the cinematic image, and the visual culture of financial markets.
Email: owen.lyons@torontomu.ca
Conflicts of interest
None declared
Funding
None declared
Article history
Article submitted: 26/5/2025
Date of original decision: 26/5/2025
Revised article submitted: 15/6/2025
Article accepted: 25/6/2025


Leave a Reply