Virtual Musical Instruments:
Accessing the Sound Synthesis Universe as a Performer.

Axel Mulder, School of Kinesiology, Simon Fraser University,
Burnaby, B.C., V5A 1S6 Canada

© Copyright 1994 Axel Mulder. All rights reserved.

You can get an older version of this paper that was published in the proceedings of the first Brazilian Symposium on Computers and Music here.


With current state-of-the-art human movement tracking techology it is possible to represent in real-time most of the degrees of freedom of a (part of the) human body. This allows for the design of a virtual musical instrument (VMI), analogous to a physical musical instrument, as a gestural interface, that will however provide for much greater freedom in the mapping of movement to sound. A musical performer may control therefore parameters of sound synthesis systems that in real-time performance situations are currently not controlled to their full potential or simply not controlled at all. In order to decrease the learning and adaptation needed and avoid injuries, the design must address the musculo-skeletal, neuro-motor and symbolic levels that are involved in the programming and control of human movement. The use of virtual musical instruments will likely result in new ways of making music and new musical styles.


In figure 1 an attempt has been made to show the development of musical instruments. Acoustic instruments transduce movements of a performer into sound. The performer has limited timbral control and a limited gesture set that can hardly be adapted to the performer's needs. Electroacoustic instruments do not allow for more gestures or adaptivity, but increase the range of sounds compared to the acoustic instrument without the electronic gadgetry. Normally this increased range of sounds is an extension of the sounds that are produced with the original acoustic instrument.

Examining electronic musical instruments it can be seen that gestural interfaces for sound synthesis systems have largely been copied from traditional physical musical instrument designs (e.g. MIDI keyboard or piano, MIDI guitar, Yamaha wind controller, percussion controller, Zeta violin, MIDI accordion etc., see Pressing (1992)). Many performing musicians are dissatisfied with the expressive capabilities of these instruments when compared with traditional acoustic instruments. On the one hand this dissatisfaction can be attributed to limited resolution, accuracy and responsiveness of the gestural interface, on the other hand the sound synthesis system that is driven by the gestures, usually via MIDI, is not adequate to satisfy the auditive needs of the performer. Another important point that is seldomly addressed, is that, while sound synthesis systems are controlled through other or more parameters than were available on traditional physical musical instruments, such gestural interfaces do not allow for real-time control during performances of all of these parameters. Instead, these parameters have been explored in great detail with simple functions as well as by using complex, even chaotic, functions in non-real-time situations, i.e. the studio. However, whatever the complexity of these functions, a human most likely will be able to produce more complex or abstract functions through movements in real-time as a result of complex inner processes.

With current state-of-the-art human movement tracking techology it is possible to represent in real-time most of the degrees of freedom of a (part of the) human body. This allows for the design of a virtual musical instrument (VMI), analogous to a physical musical instrument, as a gestural interface, that will however provide for much greater freedom in the mapping of movement to sound. A musical performer may control therefore parameters of sound synthesis systems that in real-time performance situations are currently not controlled to their full potential or simply not controlled at all. These new control possibilities, due to their real-time capabilities, may result in new sounds or musics, not yet encountered in the studio. Although it would be possible to build a simulation of, for example, a guitar as a VMI it may be more interesting and desirable to design a VMI that matches as closely as possible the capabilities of the human performer, such that the human expressive capability is fully captured as well as that less learning and adaptation is needed and injuries avoided. These VMI designs will not necessarily replace existing musical instruments but may also extend them or operate in combination with them.

In order to achieve these ergonomic goals the musculo-skeletal, neuro-motor and semiotic issues that are involved in the design of a VMI must be addressed. The scope of this research is wide. It can be limited by defining a class of VMI's that are controlled through movements of the upper limbs only and that are perceived mainly through kinaesthetic and auditory feedback as "audio-kinaesthetic objects", thereby leaving out tactile and force feedback. However, some indirect tactile (e.g. fingers touching each other) and visual (e.g. seeing your own fingers) feedback remains available. The loss of tactile and force feedback is significant, because the bandwith is much greater than kinaesthetic feedback and results in greater movement accuracy, but cannot be easily implemented in a VMI. Sensory replacement, e.g. by applying vibration instead of force, may be useful.

It is likely that the resulting VMI cannot be physically constructed, so that new ways of making music and musical styles may develop. Some of these numerous possibilities for musical performers to change the presentation of musical ideas and the re-integration of music with dance will be discussed.

A VMI as defined in this paper must be distinguished from physical modeled musical instruments, sometimes also called virtual musical instruments. These models do not address the gestural interaction between a performer and the instrument but the generation of sound by physical objects.

Electronic musical instruments: Overview of experiments with new controllers

Due to the development of (computer) technology, many musicians who are technically oriented, have experimented with these technologies to change various aspects of musical performance. On the one hand much effort has gone into designing new ways of sound generation. On the other hand some effort has gone into designing new controllers, physical devices that implement a motion sensing technology and translate that motion, generated by the performer, into a (MIDI) signal that in its turn controls a sound synthesis device. Tables 1a to 1e list some of the people and their designs, classified mostly by the human movement tracking method. In the case of use of a glove for conducting purposes this classification scheme does not work perfectly as one can see. In essence this shows the difficulty in providing clear boundaries in the levels of control of human movement and the levels operating in auditory perception.

At any rate, these new musical instrument designs were hardly or not at all based on a model of human performance, nor on a model of human auditory perception, let alone their relation. Therefore, most of these designs are not more attractive to use for a performer, than traditional acoustic instruments. In effect, most of these designs are mainly concerned with the implementation of the technology instead of exploring the use of psychomotor parameters in the human system.

In addition to the above designs an innumerable number of setups have been created that involve (electro-) acoustic instruments equipped with electronics, on the instrument or as signal processors, or computers. Such setups are generally called performances with live electronics. As the input or controller is usually not the innovative component, but the way in which the sounds are processed or generated in such setups, no further attention wull be paid in this paper to such designs. Also, setups where the computer, via keyboard or mouse, is the main controlling device will not be further considered here.

Table 1a. An overview of recent experiments with new musical instrument designs
Author Musical instrument
Rubine & McAvinney (1990) Videoharp MIDI controller (optical sensing of fingertips)
Mathews & Schloss (1989) Radiodrum MIDI controller (short range EM sensing)
Palmtree Instruments Inc., La Jolla CA, USA Airdrum MIDI controller (acceleration sensitive)
Machover & Chung (1989), Neil Gershenfeld Hyperinstruments: acoustic instruments extended with extra control abilities

Table 1b. An overview of recent experiments with devices for conducting
Author Conducting device
Keane & Gross (1989) MIDI baton (AM EM sensing)
Bertini & Carosi (1992) Light baton (LED sensed by CCD camera)
Don Buchla Lightning MIDI controller (Infrared LED sensing)
Machover & Chung (1989) Exos DHM glove for conducting MIDI devices

Table 1c. An overview of recent experiments with
new musical instrument designs involving gloves
Author Glove application
Pausch & Williams (1992) Handmotions control speech synthesizer
Fels & Hinton (1993) GloveTalk: CyberGlove controls a speech synthesizer
CNMAT, Berkeley; Scott Gresham-Lancaster, Berkeley; Mark Trayle, San Francisco; James McCartney, University of Texas; Thomas Dougherty, Stanford University; William J. Sequeira, AT&T; Tom Meyer, Brown University; Rob Baaima, Amsterdam; Lance Norskog; The Hub and probably many others Powerglove as a MIDI controller
Gustav's Party Virtual reality rock band using a.o. Datagloves as MIDIcontrollers

Table 1d. An overview of experiments with
new musical instrument designs involving whole body movements
Author Motion to sound design
Lev Theremin (see Vail, 1993) Capacitively coupled motion detector controls electronic oscillator
Chabot (1990) Ultrasound ranging to detect whole body movements and to control MIDI devices
Bauer & Foss (1992) GAMS: Ultrasound ranging to detect whole body movements and to control MIDI devices
Camurri (1987) Costel opto-electronic human movement tracking system controls a knowledge based computer music system

Table 1e. An overview of experiments with
new musical instrument designs involving bioelectric signals
Author Bioelectric design
Knapp & Lusted (1990) Biomuse EMG signals control a DSP
Chris van Raalte / Ed Severinghaus, San Francisco CA, USA BodySynth MIDI controller
Rosenboom (1990), Richard Teitelbaum, Germany, Pjotr van Moock, the Netherlands and probably others EEG/EMG interface to synthesizer setup

Psychomotor issues

Pressing (1990) addresses musical instrument design from an ergonomic point of view. His work is very useful and important. Besides being based upon work in the field of motor control, it also draws from common musical instrument design knowledge and performance experience. The field of motor control specifically aims to provide models of human movement through empirical evidence.

Human movement control can be conceived of as taking place at musculo-skeletal, neuro-motor and symbolic levels, where each of these levels interact. So far gestural interfaces have mainly addressed the musculo-skeletal level (e.g. number of degrees of freedom, movement range, bandwith and resolution), whilst some work has been done at the symbolic level (e.g. American Sign Language recognition). Neuro-motor aspects that can be included in gestural interfaces, although they are mostly unresolved as yet, are amongst others the control of speed and accuracy and their trade-off, the various phases during a movement, each of them with different control parameters and timing and the various internal representations of the movement and their transformations. Central to the neuromotor level is the concept of a motor program.

Additionally, the identification and structure of the audio-motor channel, similar to the visuo-motor channel of Jeannerod (1990), may provide a significant framework. For instance, what is the relation of the visuo-motor channel and the audio-motor channel. At a low level, via perturbation experiments in singing, e.g. perturbed feedback of pitch (shifted pitch), timbre (formant remapping), or timing (delayed signals), a definition of the lower level aspects of the audio-motor channel may be obtained. The recognition of gestures has up to now been implemented with engineering techniques that describe the movement in physical terms at a musculo-skeletal level. Recognition in terms of neuromotor and symbolic level models of human movement is as yet unimplemented, although some work has been done using a connectionist paradigm (see below).

Palmer & van de Sande (1993) and Palmer (1989) are concerned with music theories and linguistic theories in the way they relate syntactic (structural) representations to phonetic (sounded) representations. The object of their studies is the ordering and manipulation of "sound objects" (phonemes, phones) as symbols, and not the creation of the sound objects (with their pitch, duration, volume, and timbre functions) themselves. Shaffer (1989), also concerned with high level aspect of motor programs, discusses the implementation of music performance by a robot. Sloboda (1985) discusses three basic aspects of music performance: sight reading (how are performance plans acquired), performance after repeated exposure or rehearsal (what is the role of feedback in performance plans) and expert or skilled performance. Clynes & Nettheim (1982) work addresses emotion and meaning of human movement. Baily (1985) studied movements patterns of African music players. His work investigates a.o. whether the spatial properties of an instrument may influence the shape of the music played on it.

Music performance is a skill that is acquired slowly compared to a lifetime. A VMI may be able to shorten this learning period because it can adapt to the movement patterns of the performer. However, the performer will need to find a VMI design that addresses his/her musical/sonic needs. These new mappings of movement to sound will have to developed over time and are most likely very personal. It is unlikely that new mappings can be designed instantly, without a learning period, because they wouldn't make sense. Although the capabilities of an acoustical musical instrument and a VMI are different, it is of interest whether adaptation of the performer to a physical instrument is shorter.

Whilst most of this research is done by researchers in behavioural science, approaches by choreographers may be helpful too. Their efforts include the definition of dance notation systems, e.g. Labanotation which specifically addresses the concept of effort in movement. This can be conceived of as an approach to approximate psychomotor parameters. The concept of effort, as used in labanotation, has been further explored in a musical instrument design context by Ryan (1991, 1992).

It is interesting to note that the computer music community has paid a great deal of attention to production of sound directly from abstract thought by implementing a model of cognition, using e.g. artificial intelligence technology. These models usually only address the higher levels involved in the performance of music, in contrast to human music performers, who physically effectuate the performance - with effort. Also, such models only apply in cases where the musical style is well defined and formalized, and does not apply in situations that involve a great deal of improvisation. Pressing (1984) outlines some of the cognitive aspects in improvisation.

VMI's and the future

From the above and figure 1, a VMI is characterised by at least two features. Any gesture set can be used to control the sound synthesis process and the mapping is entirely programmable and limited by the sound synthesis model only. In other words, a VMI cannot produce any kind of sound, i.e. cover the entire sound space, but it is configurable in any way the sound synthesis system permits. In addition, the mapping possibly incorporates motor control or ergonomic principles and may be adaptive to the user. In other words, the shape and sound generation and their relation of a VMI are not defined by physical laws necessarily, but can be arbitrarily defined, most likely by ergonomic principles. Due to the fact that music and sound can be controlled at various levels of abstraction a VMI can be conceived of in various ways, with varying amounts of complexity or intelligence. At a high level, a virtual orchestra, explored by Morita et al (1991), comes to mind. At a low level Gibet (1990) explored virtual drumming by modeling a human arm and a vibrating surface (using the synergetics approach of Haken). Stephen Pope at CCRMA, Stanford CA, USA, explores virtual sound objects and Bolas & Stone (1992) explored a virtual theremin and a virtual drum, using virtual reality technology. Obviously the relation between dance and music can become very tight using the above ideas (Mulder, 1991). Ungvary et al (1992) presented their system NUNTIUS, which provides direct data transfer and interpretation between dance and music. Their system implements high level relations between dance and music using amongst others labanotation. It is as yet unknown what level of accuracy and detail is needed when tracking human motion, because it is unknown what humans are controlling. At one end of the spectrum are performers who expect the instrument to react in a predictable way to their gestures, at the other end are performers who succeed in performing musically with sloppy, unreliable controllers with random behaviour. Apparently the musical process is represented differently by each performer, resulting in different gestural control needs. This also relates to the amount of constraints in terms of gestural or movement and sonic or musical possibilities a VMI should contain. In order to obtain musical expression, it is likely that some constraints should be implemented such that effort, as an essential component of expression, must be applied by the performer. For example, there is no effort involved when the performer simply presses a button and recorded or sequenced sounds are produced automatically. mental effort is involved when the mapping is complex and the performer needs conscious attention to produce desired musical results. Physical effort is involved when the performer must make complex, difficult, but also forceful or tense movements.

The author's work included experiments with an instrumented bodysuit that registered human joint angles. Two performances were implemented in 1993. In one performance the values of the joint angles of the performer (Egmont Zwaan) were used to drive a Lexicon LXP 5 effects processor that processed the voice of the performer. During the performance, the performer lost track of what he was actually controlling, i.e. he was more involved with his movements than the aural result and/or there were too many parameters to control (more learning is needed). Also, the mapping was too simple, too direct, i.e. some parameters were too sensitive to movement. There was no compensation for interaction between various acoustic effects, that resulted in changing sensitivities of the parameters. Last but not least the LXP 5 had some problems processing the amount or combinations of MIDI data.

In the other performance only a few joint angle values of the author's movements were used to drive the effects processor. The performance included Mari Kimura playing a Zeta MIDI violin and a dancer (Anita Cheng). The effects processor transformed the violin signals. The dancer interacted physically with the author and symbolically with the violin player, so that all performers were communicating. The most interesting result in the context of this paper was that the author was dancing and not playing an instrument - while in fact he was. This illustrates the possibilities for merging dance and music as one art form, as it used to be (and still is) for many African tribes.

The translation of the joint angles into the parameters of the LXP 5 was simple - there was no algorithm involved. Due to this simple mapping the VMI was not very intuitive. However, the aim of user adaptivity was achieved: it was possible to map any movement to any parameter of the LXP 5. Also the real-time capabilities of the sound processing device were fully used. Furthermore it became very clear that human movement tracking is a hard problem (Mulder, 1994).

Other ideas that may be worthwhile exploring are control of a singing or speech synthesizer by hand or arm gestures or even whole body movements. Lee & Wessel (1992) have built systems that use artificial neural nets to implement a control structure that adapts to nonlinear human behaviour. Similarly, Fels & Hinton (1993) have used neural networks to achieve translation of hand shape and position to articulated speech. The speech synthesizer may also be replaced by a granular synthesizer processing a text sample, a vocoder or an effects processor (as above) with voice input. It would be possible then for instance to present a poem with very strange and strong expression, both gesturally and acoustically. Currently the author is investigating use of an instrumented glove to control a granular synthesis system developed by Barry Truax and Harmonic Functions in Vancouver, BC, Canada.

Another obvious performance idea would be to use an instrumented suit to control percussive synthesizers. The fact that disco, house, African and many other dance forms involve mostly repetitive movements may allow for interpretation or recognition in terms of the so-called dynamic pattern approach in human movement control. As for the performer, he or she might become involved in an intense audio-kinaesthetic experience.

A future musical ensemble may consist of a drummer or percussive controller, various timbral (sound) controllers, various melodic controllers (musical structure controllers), a spatialization controller and an effects controller. All the movements of these performers would be choreographed to achieve a maximum performance effect.


I would like to thank Tom Calvert, Ron Marteniuk, Christine Mackenzie and Barry Truax for giving comments on this paper and my research in general and making this research possible.


You can also search my bibliography (ca. 140K)

Baily, J. (1985). Music structure and human movement. In: Howell, P., Cross, I., (editors), Musical structure and cognition, 237-258. London, UK: Academic Press.

Bauer, W. & Foss, B. (1992). GAMS: an integrated media controller system. Computer Music Journal, 16 (1), 19-24.

Bertini, G. & Carosi, P. (1992). The light baton: a system for conducting computer music performance. Proceedings International Computer Music Conference, San Jose, California, USA, 73-76. San Francisco CA, USA: International Computer Music Association.

Bolas, M. & Stone, P. (1992). Virtual mutant theremin. Proceedings International Computer Music Conference, San Jose, California, USA, 360-361. San Francisco CA, USA: International Computer Music Association.

Cadoz, C., Luciani, A. & Florens, J-L. (1984). Responsive input devices and sound synthesis by simulation of instrumental mechanisms: The CORDIS system. Computer Music Journal, (3), 60-73.

Camurri, A. et al (1987). Interactions between music and movement: A system for music generation from 3D animations. Proceedings of the 4th international conference on event perception and action, Trieste.

Chabot, X. (1990). Gesture interfaces and a software toolkit for performance with electronics. Computer Music Journal,, 14 (2), 15-27.

Clynes, M. & Nettheim, N. (1982). The living quality of music: neurobiologic basis of communicating feeling. In: Clynes, M., (editor), Music, mind and brain: the neuropsychology of music, 47-82. New York, USA: Plenum Press.

Coniglio, M. (1992). Introduction to the Interactor language. Proceedings International Computer Music Conference, San Jose, California, USA, 170-177. San Francisco CA, USA: International Computer Music Association.

Fels, S.S. & Hinton, G.E. (1993). Glove-talk: A neural network interface between a dataglove and a speech synthesizer, IEEE Transactions on neural networks, 4 (1), 2-8.

Gibet, S. & Marteau, P.-F. (1990). Gestural control of sound synthesis. Proceedings International Computer Music Conference, Glasgow, UK, 387-391. San Francisco CA, USA: International Computer Music Association.

Jeannerod, M. (1990) The neural and behavioural organization of goal directed movements. New York, USA: Oxford University Press.

Keane, D. & Gross, P. (1989). The MIDI baton. Proceedings International Computer Music Conference, Columbus, Ohio, USA. San Francisco CA, USA: International Computer Music Association.

Knapp, R.B. & Lusted, H. (1990). A bioelectric controller for computer music applications. Computer Music Journal, 14 (1), 42-47.

Krefeld, V. (1990). The Hand in the Web: An interview with Michel Waisvisz. Computer Music Journal, 14 (2), 28-33.

Lee, M. & Wessel, D. (1992). Connectionist models for real-time control of synthesis and compositional algorithms. Proceedings International Computer Music Conference, San Jose, California, USA, 277-280. San Fransisco CA, USA: International Computer Music Association.

Machover, T. & Chung, J. (1989). Hyperinstruments: Musically intelligent and interactive performance and creativity systems. Proceedings International Computer Music Conference, Columbus, Ohio, USA. San Fransisco CA, USA: International Computer Music Association.

Mathews, M. & Schloss A. (1989) The radiodrum as a synthesis controller. Proceedings International Computer Music Conference, Columbus, Ohio, USA. San Francisco CA, USA: International Computer Music Association.

Moog, B. (1989). An industrial design student's MIDI controller. Keyboard, January, 108-109.

Morita, H., Hashimoto, S. & Ohteru, S. (1991). A computer music system that follows a human conductor. IEEE Computer, July, 44-53.

Mulder, A.G.E. (1991). Viewing dance as instrumental to music. Interface 4 (2), 15-17. Columbus, Ohio, USA: ACCAD, Ohio state university.

Mulder, A.G.E. (1994). Build a better PowerGlove. PCVR, 16.

Palmer, C., van de Sande, C. (1993). Units of knowledge in music performance. Journal of experimental psychology: learning, memory and cognition, 19 (2), 457-470.

Palmer, C. (1989). Mapping musical thought to musical performance. Journal of experimental psychology: human perception and performance, 15 (12), 331-346.

Pausch, R. & Williams, R.D. (1992). Giving CANDY to children: user tailored gesture input driving an articulator based speech synthesizer. Communications of the ACM, 35(5), 60-66.

Pressing, J. (1990). Cybernetic issues in interactive performance systems. Computer Music Journal, 14 (1), 12-25.

Pressing, J. (1992). Synthesizer performance and real-time techniques. Madison, Wisconsin, USA: A-R editions.

Pressing, J. (1984). Cognitive processes in improvisation. In: Crozier, W.R., Chapman, A.J., Cognitive processes in the perception of art, 345-363. Amsterdam, The Netherlands: Elsevier Science Publishers.

Rosenboom, D. (1990). The performing brain. Computer Music Journal,, vol 14 no 1 p 49-66.

Rubine, D. & McAvinney, P. (1990). Programmable finger tracking instrument controllers. Computer Music Journal,, 14 (1), 26-41.

Ryan, J. (1992). Effort and expression. Proceedings International Computer Music Conference, San Jose, California, USA, 414-416. San Francisco CA, USA: International Computer Music Association.

Ryan, J. (1991). Some remarks on musical instrument design at STEIM. Contemporary music review, 6 part 1, 3-17.

Shaffer, L.H. (1989). Cognition and affect in musical performance. Contemporary music review, 4, 381-389.

Sloboda, J.A. (1985). The musical mind: the cognitive psychology of music. Oxford, UK: Clarendon press.

Ungvary, T., Waters, S. & Rajka, P. (1992). NUNTIUS: A computer system for the interactive composition and analysis of music and dance. Leonardo, 25 (1), 55-68.

Vail, M. (1993). It's Dr. Moog's traveling show of electronic controllers. Keyboard, March, 44-49.