A robust visual tracking with respect to variations in natural environments is one of a key research issues nowadays. We argue that robust tracking may be achieved in an integrated framework by employing a consensus of several visual cues. This idea has been investigated before where a Bayesian framework was used for integration , . In , Incremental Focus of Attention architecture performs tracking in a multi-layered framework. One modality/cue is used at any given moment, processing occurs in a single layer and when the a-priori given constraints are met, the layer is changed. Contrary to this serial or, according to , strong coupling approach, we propose a parallel or a weak coupling framework where all cues are used at each time step. Their importance or the effect on the overall result are determined by the assigned weights. Voting is here adopted as the underlying integration strategy . Compared to the Bayesian approaches, voting requires no detailed models of the form which may be difficult or even impossible to determine. A very simple or no model is used to represent this relationship giving it the advantage to operate ``model-free'' with respect to individual cues. In the simplest case, each estimator may be a classifier that votes for a particular attribute or against it where the level of belief (Dempster-Shafer) or degree of uncertainty (Bayesian) is completely abstracted to give a binary output.