We propose two approaches where voting is used for: i)
response fusion, and ii) action fusion. The first
approach makes the use of ``raw'' responses from the employed visual
cues in the image which also represents the action space,
. Here, the response is represented either by a binary
function (yes/no) answer, or in the interval [0,1] (these values
are scaled between [0,255] to allow visual monitoring).
The second approach uses a different action space represented by a
direction and a speed, see
Fig. . Compared to the first approach, where the
position of the tracked region is estimated, this approach can be
viewed as estimating its velocity. Again, each cue votes for different
actions from the action space,
, which is now
the velocity space.
Action fusion approach:
the desired direction is
(down and left) with a (slow) speed.