Human Centered Science and Biomedical Engineering Graduate Major
Depatrment of Mechanical Engineering
Graduate School of Engineering
Tokyo Institute of Technology

Yagi Laboratory


Bio-mimetic Robot Vision

(Abstract) (Publication) (Links)


It is the marvelous fact that the size of the dendritic arbor in the retinal ganglion cells increases with eccentricity from the center of the retina. It is suspected that this morphological feature might affect visual information processing. If so, the determination of a fixation point would be greatly influenced, because the higher visual system relies directly on visual information coming from the retina. In order to confirm the hypothesis, two different approaches were introduced in this study; psychophysics and computational biology.

In the psychophysical study, human fixation was analyzed when various kinds of visual stimulus were displayed. In this study, various features of visual stimuli such as size, intensity, color, shape, etc., were controlled while recording the fixation. In the computational study, the retinal morphological features mentioned above were modeled computationally. Processing displayed visual stimulus with this model, the relation between the morphological feature in the retina and fixation ability were evaluated quantitatively.


In addition to the above studies, those findings were applied to robot vision. Here, a very simple method was proposed to detect an object from a camera-taken 2-D image, when many objects were present in a scene. The advantage of this method is its simplicity. In this method, a camera-taken image has high spatial resolution near the optical axis of the camera, and lower resolution in the periphery. Such inhomogeneous image processing, which resembles to a primate's visual information processing, seems to be very effective to determine successively the position of each object. In the experiment, this method was implemented into a proposed active vision system which sent the position data of each target object to a robot hand controller in order to assist manipulation of the robot hand.

Experimental results have shown that the position of each object was detected quite easily and precisely. The system was able to achieve precise localization when simple scenes were displayed. The otical axis of the camera was directed close to each object by the first camera movement during approximate localization; each object was then correctly localized within a few small additional movements in the adjustment localization phase.


The figure on the right shows another example of computed fixation using the proposed model. In the image, the intersection of lines and the corners are considered as visual stimuli. Therefore, a computer eye directed toward those areas preferably. This tendency is identical with human fixation. Besides, eye movements in both cases seem to depend on the proximity of the visual stimulus to the fovea. This result also supports that the processes at the primitive level determine fixation primarily. It suggests that the visual information processing at the primitive level such as the retinal processing affects fixation fundamentally. Moreover, the location of a target object is determined, prior to eye movement. It is speculated that each visual stimulus is probably localized without identifying what it is during fixation. Such a system is advantageous for animals to calculate a fixation point faster. The faster decision will raise the possibility to survive in nature.


The proposed technique can be said as bottom-up processing. Based on a low-level visual feature such as intensity, an object was localized without the assistance of top-down processing. Moreover, this technique did not require any noise-reduction or segmentation processing, consequently reducing processing time. Some criticisms can be leveled against such simple processing. Because low-level visual features mainly affect the determination of a fixation point, the localizing ability of the system is still at a primitive level. The experimental results, however, have shown that the proposed technique achieves adequate localization in a scene where several objects are placed. It is known that localization can be realized without object discrimination. Hence it is suggested that such primitive processing is a key element of localization, and a key to understand attention, which involves top-down processing.

Publication List

Back to Research Top