Hand gesture and posture recognition play an important role in Human-Computer Interaction (HCI) applications. They are main attributes in object or environment manipulations using vision-based inter- faces. However, before interpreting these gestures and postures as operational activities, a meaningful involvement with the target object should be detected. This meaningful involvement is called engagement. Upper-body posture gives significant information about user engagement.
In this research, for our first contribution, a novel multi-modal model for engagement detection, called Disengagement, Attention, Intention, Action (DAIA) framework is presented. Disengagement happens when the user is disengaged from the target object. Attention occurs when user pays attention to the target, but doesn't have the intention to take any actions. In Intention state, the user intends to perform an action, but still does not.
Action state is when the user is performing an action with hand. Using DAIA, the spectrum of mental status for performing a manipulative action is quantized in a finite number of engagement states. The second contribution of this research is in designing multiple binary classifiers based on upper-body postures for state detection. 3D skeleton data is extracted from depth image and is used to extract body posture information. Combining the output of all binary classifiers in an order makes engagement feature vector. Moreover, This feature vector could be extended using other channels of biometric information such as voice or gaze. However the engagemnet classifiers recognize the state change with acceptable accuracy, minor changes in body postures or false detection of joint locations for some milliseconds may result in transition to another states. For removing this unwanted noise and increasing the accuracy of the system, an Finite State Machine (FSM) is designed based on the properties of human activities. The design of Engagement FSM is our third major contribution. Finally, rotation matrix is used to increase the number of samples for training the deep learning classifier for hand posture recognition.