Communication is the discriminatory response of an organism to a stimulus (Cherry, 1957). If we are to reckon with communication beyond formal rhetoric or syntax, whether English or computer graphics, we must address ourselves to the versatility of the discriminating mechanism—the interface. In this case the interface is the point of contact and interaction between a machine and the “information environment,” most often the physical environment itself.
We have looked at graphic interfaces for one, and teletypes for another, but a dialogue demands a redundant and multichanneled concoction of sensory and motor devices far beyond these two mechanisms. We are talking about a total observation channel for an architecture machine.
For a machine to have an image of a designer, of a problem, or of a physical environment, three properties are inherently necessary: an event, a manifestation, a representation. The event can be visual, auditory, olfactory, tactile, extrasensory, or a motor command. The manifestation measures the event with the appropriate parameters: luminance, frequency, brain wavelength, angle of rotation, and so forth. The representation is the act of mapping the information into a receptacle that is compatible with the organism’s processing characteristics. These three properties—event, manifestation, representation—form the interface between any two organisms. The aspect of this interface with which we are primarily concerned is the manifestation, encompassed primarily by a piece of hardware.
In an architect-machine relationship, perhaps the most interesting sensory interfaces are auditory and visual. Machines that are capable of visual perception and speech recognition are two of the prime targets of researchers in artificial intelligence. Someday, machines that can see and hear will be commonplace machines. Setting aside the phantasmagoria of robot designers, consider speaking to a machine that sees you—a machine with eyes and ears, a machine that walks and talks. In our present culture the thought is either frightening, foolish, or, to some, quite realistic. To our children it will be an ordinary daily occurrence. To Mortimer Taube (1961) it is offensive. To Marvin Minsky (1966) it is obvious.
In the meantime, extrapolations into the future should recognize current problems of implementation. In the January 1967 issue of Datamation, Leslie Mezie portrayed a conversation with a machine that could listen, and talk. A fragment went as follows:
Professor:
What time is it?
Computer:
It is 8:30 p.m., Thursday, December 5, 1985.
Professor:
I think I would like to start with some music today, let’s have some chamber music.
Computer:
You listen to Telemann most.
Professor:
No, something earlier.
Computer:
What about.…
Six months later, in the August issue of Datamation, B. W. Boehm parodied the sketch implanting aspects of present-day dilemmas:
Professor:
What time is it?
Computer:
I did not catch the last word. Or was it two words?
Professor:
What is the time?
Computer:
The dime is a copper coin worth one-tenth of a dollar. The word derives from the Latin decem, meaning.…
Professor:
No. No. What is the time? The time?
Computer:
It is 8:30 p.m., Thursday, December 5, 1985. We have been having some trouble with your linguals recently. Sometimes I can’t tell your d’s from your t’s. Let’s practice them. Watch the display screen for the intonation pattern, and repeat after me: Teddy’s daddy toted two dead toads to Detroit.
Professor:
Teddy’s daddy toted.…
The hand of “Butterfingers,” the Stanford Hand-Eye Project. (J. A. Feldman et al., 1969)
The M.I.T. Robot Project’s hand, Project MAC.
The eye and the hand of the Stanford project. This particular eye is a scanning device, a vidicon.
Voice input to the Stanford robot. Pierre Vicens of that project giving simple voice instructions (with a French accent), instructions like “pick up the big block.”
Nilo Lindgren’s (1965a and b) comprehensive survey describes a host of intriguing research efforts in speech recognition, all of which fall into one of three catagories: the auditory sensation, the acoustical disturbance freely propagating through air, and a sequence of articulatory events in a psychological structure. The reader should also refer to the recent works of Bobrow and Klatt (1968), Reddy and Vicens (1968), and Rabiner (1968).
Beyond giving a machine ears, giving a machine eyes is extremely critical to architecture machines. Just on the hunch that a blind machine will have shortcomings similar to those of a blind architect, the relevance of a seeing machine warrants research. Outside of the design professions, giving machines eyes is of imminent importance. For instance, space exploration will eventually require machines that can both see and process the seen information. This is because the remote monitoring of a space robot’s movements by earthlings requires too much transmission time (to Mars and back, for example), and a machine would crash into that which it is told to avoid only because the message to stop might arrive too late. More domestic applications involve visual discrimination of simple objects. Eventually, machines will package your purchased goods at the counter of your neighborhood supermarket.
The two diagrams represent an interface, in this case between man and machine. The top one is redrawn from Nilo Lindgren’s “Human Factors in Engineering” (1966b). The important feature is that the “human factors” thinking treats the entire man-machine assemblage as a single entity. This implies that the interface is so smooth and so adaptable that in effect it does not exist.
The illustration is redrawn from a reinterpretation of the above by Avery Johnson. Still considered as a single entity, the man-machine assemblage has a more active interface. In this interpretation, the interface has local computing power and can thus exhibit a behavior. This implies a continuous sensing and effecting mechanism, and it is the behavior of this device that is observed by both higher-order processors.
SEEK. This device is a homemade sensor/effector built by architecture students. The device has multiple attachments (magnets, photocells, markers, etc.) which it can position in three dimensions under computer control. It is anticipated that the mechanism will pile blocks, carry TV cameras, observe colors, and generally act as a peripheral device for student experiments in sensors and effectors that interact with the physical environment.
Oliver Selfridge (and Neisser, 1963) is credited with the founding works in pattern recognition. His mechanism, PANDEMONIUM, would observe many localized visual characteristics. Each local verdict as to what was seen would be voiced by “demons” (thus, pandemonium), and with enough pieces of local evidence the pattern could be recognized. The more recent work of Marvin Minsky and Seymour Papert (1969) has extensively shown that solely local information is not enough; certain general observations are necessary in order to achieve complete visual discrimination.
At present, these works are being applied to architectural problems as an exercise preliminary to the construction of an architecture machine. Anthony Platt and Mark Drazen are applying the Minsky-Papert eye to the problem of looking at physical models (Negroponte, 1969d). The interim goal of this exercise is to observe, recognize, and determine the “intents” of several models built from plastic blocks. Combined with Platt’s previously described LEARN, this experiment is an attempt at machine learning through machine seeing. In contrast to describing criteria and asking the machine to generate physical form, this exercise focuses on generating criteria from physical form.
The M.I.T. Minsky/Papert eye. In this case the eye is an image dissector, a random-access device that does not scan back and forth but rather goes to discrete positions under computer control. This was the eye used for the Platt/Drazen vision experiment under the supervision of Seymour Papert.
Some vision problems—reflections and tone changes. Note that the top surfaces of the front lower cubes are a light gray, while the rear upper one has a black top surface. In other words, in such lighting the orientation of a surface cannot be assumed from its gray tone.
Other vision problems—depth of field and shadows.
More problems—disconnected bodies.
A typical model presented to the eye.
Printer output of the light intensities.
Contours of similar intensities.
A cathode-ray tube display of the discovered contours. The “noise” is due to both bad lighting and a poor choice of contours.
The seen lines. These would be the lines seen under ideal, noise-free conditions.
Minimal surfaces. A parallelogram indicates a complete surface that can be used as “strong evidence” to place others in space.
GROPE groping on the Urban Atlas map of New York’s residential population density.
The old GROPE.
The new GROPE. The slight glow beneath GROPE is from three little lights that illuminate the area for the fifteen photocells. It is interesting to note that, like most Architecture Machine projects, GROPE started as a toy costing $15. Even though it has evolved into a major experiment, its circuitry and hardware have cost less than $80.
A second example of interfacing with the real world is Steven Gregory’s GROPE (Negroponte et al., 1969b). GROPE is a small mobile unit that crawls over maps, in this case Passonneau and Wurman’s (1966) Urban Atlas maps. It employs a low-resolution seeing mechanism constructed with simple photocells that register only states of on or off, “I see light” or “I don’t see light.” In contrast to the Platt experiment, GROPE knows nothing about images; it deploys a controller that must be furnished with a context and a role (as opposed to a goal: play chess as opposed to winning at chess). GROPE’s role is to seek out “interesting things.” To determine future moves, the little robot compares where he has been to where he is, compares the past to the present, and occasionally employs random numbers to avoid ruts. The onlooking human or architecture machine observes what is “interesting” by observing GROPE’s behavior rather than by receiving the testimony that this or that is “interesting.” At present, some aspects of GROPE are simulated and other aspects use the local computing power on GROPE’s plastic back. GROPE will be one of the first appendages to an architecture machine, because it is an interface that explores the real world. An architecture machine must watch devices such as GROPE and observe their behavior rather than listen to their comments.
Before the Architecture Machine Project had its own dedicated computing power, aspects of GROPE were simulated on the ARDS display. The four illustrations represent a sequence that traces GROPE’s path through an internal machine representation of Urban Atlas data for Boston. Note that by the fourth frame GROPE has “scrubbed out” two areas of the upper right. It turns out that this is Boston’s downtown waterfront, indeed an “interesting” area of the map.
A photographic overlay of GROPE’s path with a road map of Boston.
An overlay with “personal income” data.
An actual numerical display of the “personal income” data.
An overlay with “land use” and “residential density.”
But why not supply the machine with a coordinate description of the form on punch cards and proceed with the same experiment? Why must a machine actually see it? The answer is twofold. First, if the machine were supplied a nonvisual input, the machine could not learn to solicit such information without depending on humans. Second, it turns out that the computational task of simply seeing, the physiology of vision (as opposed to the psychology of perception) involves a set of heuristics that are apparently those very rules of thumb that were missing from LEARN, that made LEARN a mannerist rather than a student.
It seems natural that architecture machines would be superb clients for sophisticated sensors. Architecture itself demands a sensory involvement. Cardboard models and line drawings describe some of the physical and some of the visual worlds, but who has ever smelt a model, heard a model, lived in a model? Most surely, computer-aided architecture is the best client for “full interfacing.” Designers need an involvement with the sensory aspects of our physical environments, and it is not difficult to imagine that their machine partners need a similar involvement.