Publicação
An intelligent system for detection and identification of human behaviors from unconstrained video
| Resumo: | In this work, an intelligent system for human action recognition and destination trajectory prediction from unconstrained video is presented. For the automatic human action recognition, the video is processed frame by frame and blob analysis is performed to look for any active blobs. In order to select only humans and to remove noise, we defined a minimum pixel area for blob selection, which was set to 2000 pixels (e.g., 45x45, 60x34) after some preliminary experiments. For background subtraction, we tested the Gaussian Mixture Models (GMMs), for separating the foreground pixels from the background. This detector works on data collected from a stationary camera and compares a color or gray scale video frame to a background model to figure out whether it is part of the background or foreground. It then computes a foreground mask based on Gaussian Mixture Models (GMM). The human action recognition of our system is based on Hidden Markov Model (HMM) using the Bag of Words method (BoWs) (with boundary of humans as the main feature). Time-sequential images of human actions were transformed into feature vectors. We targeted two action classes: walking and sitting. Overall, high accuracy results were achieved. The proposed system for trajectory destination area prediction adopts a passive collection of video, works directly with raw video data and extracts motion features (position, velocity, and acceleration) from automatically detected human skeletons (with positions of the body of mass, head, hands and legs). It includes three main modules: human blob detection, an enhanced version of human blob detection to achieve improved silhouette; star skeleton detection, encompassing shadow removal and contour peak detection; and the final destination area prediction, based on preprocessing (dimensionality reduction and balancing sampling methods) and four classification methods: Multinomial logistic regression (MLR), Multilayer Perceptron (MLP) network, Random Forests (RF) and Support Vector Machine (SVM). For the second main task of this PhD, trajectory destination area prediction, the human blob detection was modified by adding another component: shadow and highlight removal. We also replaced the GMM background segmentation method by a simpler background subtraction method, such that the latter provided faster and better results. As a case study, we analyzed an exterior scene from a university campus that includes five main destination areas and 348 pedestrian trajectories from 171 videos. A realistic growing window evaluation was used in order to test four classifiers under six data processing combinations. The best results were achieved by the all inputs, undersampling and RF model. This model obtained the best global Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) analysis, which corresponds to a high quality class discrimination (median AUC of 87%). Moreover, the suggested model provided very good ahead time predictions for four of the classes (A, B, C and D) and a reasonable ahead discrimination performance for class E. While we achieved interesting results in the analyzed university campus case study, we believe the proposed system is useful for other application scenarios. |
|---|---|
| Autores principais: | Afsar, Palwasha |
| Assunto: | Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
| Ano: | 2018 |
| País: | Portugal |
| Tipo de documento: | tese de doutoramento |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade do Minho |
| Idioma: | inglês |
| Origem: | RepositóriUM - Universidade do Minho |
| Resumo: | In this work, an intelligent system for human action recognition and destination trajectory prediction from unconstrained video is presented. For the automatic human action recognition, the video is processed frame by frame and blob analysis is performed to look for any active blobs. In order to select only humans and to remove noise, we defined a minimum pixel area for blob selection, which was set to 2000 pixels (e.g., 45x45, 60x34) after some preliminary experiments. For background subtraction, we tested the Gaussian Mixture Models (GMMs), for separating the foreground pixels from the background. This detector works on data collected from a stationary camera and compares a color or gray scale video frame to a background model to figure out whether it is part of the background or foreground. It then computes a foreground mask based on Gaussian Mixture Models (GMM). The human action recognition of our system is based on Hidden Markov Model (HMM) using the Bag of Words method (BoWs) (with boundary of humans as the main feature). Time-sequential images of human actions were transformed into feature vectors. We targeted two action classes: walking and sitting. Overall, high accuracy results were achieved. The proposed system for trajectory destination area prediction adopts a passive collection of video, works directly with raw video data and extracts motion features (position, velocity, and acceleration) from automatically detected human skeletons (with positions of the body of mass, head, hands and legs). It includes three main modules: human blob detection, an enhanced version of human blob detection to achieve improved silhouette; star skeleton detection, encompassing shadow removal and contour peak detection; and the final destination area prediction, based on preprocessing (dimensionality reduction and balancing sampling methods) and four classification methods: Multinomial logistic regression (MLR), Multilayer Perceptron (MLP) network, Random Forests (RF) and Support Vector Machine (SVM). For the second main task of this PhD, trajectory destination area prediction, the human blob detection was modified by adding another component: shadow and highlight removal. We also replaced the GMM background segmentation method by a simpler background subtraction method, such that the latter provided faster and better results. As a case study, we analyzed an exterior scene from a university campus that includes five main destination areas and 348 pedestrian trajectories from 171 videos. A realistic growing window evaluation was used in order to test four classifiers under six data processing combinations. The best results were achieved by the all inputs, undersampling and RF model. This model obtained the best global Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) analysis, which corresponds to a high quality class discrimination (median AUC of 87%). Moreover, the suggested model provided very good ahead time predictions for four of the classes (A, B, C and D) and a reasonable ahead discrimination performance for class E. While we achieved interesting results in the analyzed university campus case study, we believe the proposed system is useful for other application scenarios. |
|---|