Publicação
Deep Learning for activity recognition in real-time video streams
| Resumo: | In an ever more connected world, smart cities are becoming ever more present in our society. In these smart cities, use cases in which innovations that will benefit its inhabitants are also growing, improving their quality of life. One of these areas is safety, in which Machine Learning (ML) models reveal potential in real-time video-stream analysis in order to determine if violence exists in them. These ML approaches concern the field of Computer Vision, a field responsible for traducing digital images and videos, and be able to extract knowledge and understandable information from them, in order to be used in diverse contexts. Some of the available alternatives to recognise actions in video streams are based on ML approaches, such as Deep Learning (DL), that grew in popularity in the last years, as it was realised that it had massive potential in several applications that could benefit from having a machine recognising diverse human actions. In this project, the creation of a ML model that can determine if violence exists in a video-stream is proposed. This model will leverage technology being used in State of the Art methods, such as video classifiers, but also audio classifiers, and Early/Late Fusion (EF / LF) schemes that allow the merging different modalities, in the case of the present work: audio and video. Conclusions will also be drawn as to the accuracy rates of the different types of classifiers, to determine if any other type of classifiers should have more prominence in the State of the Art. This document begins with an introduction to the work being conducted, in which both the its context, mo tivation and objectives are explained. Afterwards, the methodology used in order to more efficiently conduct the research in this Thesis is clarified. Following that, the State of the Art concerning ML based approaches to Action Recognition and Violence Detection is explored. After being brought to date in what are the State of the Art approaches, one is able to move forward to the following chapter, in which the Training method that will be employed to train the models that were seen as the best candidates to detect violence is detailed. Subsequently, the selected models are scrutinized in an effort to better understand their architecture, and why they are suited to detect violence. Afterwards, the results achieved by these models are explored, in order to better comprehend how well these performed. Lastly, the conclusions that were reached after conducting this research are stated, and possibilities for expanding this work further are also presented. The obtained results prove the success and prevalence of video classifiers, and also show the efficacy of models that make use of some kind of fusion. |
|---|---|
| Autores principais: | Reinolds, Francisco André Vieira |
| Assunto: | Machine Learning Deep Learning Action recognition Violence detection Early fusion Late fusion Aprendizagem Máquina Aprendizagem Profunda Reconhecimento de ações Deteção de violência Fusão antecipada Fusão tardia |
| Ano: | 2022 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade do Minho |
| Idioma: | inglês |
| Origem: | RepositóriUM - Universidade do Minho |
| Resumo: | In an ever more connected world, smart cities are becoming ever more present in our society. In these smart cities, use cases in which innovations that will benefit its inhabitants are also growing, improving their quality of life. One of these areas is safety, in which Machine Learning (ML) models reveal potential in real-time video-stream analysis in order to determine if violence exists in them. These ML approaches concern the field of Computer Vision, a field responsible for traducing digital images and videos, and be able to extract knowledge and understandable information from them, in order to be used in diverse contexts. Some of the available alternatives to recognise actions in video streams are based on ML approaches, such as Deep Learning (DL), that grew in popularity in the last years, as it was realised that it had massive potential in several applications that could benefit from having a machine recognising diverse human actions. In this project, the creation of a ML model that can determine if violence exists in a video-stream is proposed. This model will leverage technology being used in State of the Art methods, such as video classifiers, but also audio classifiers, and Early/Late Fusion (EF / LF) schemes that allow the merging different modalities, in the case of the present work: audio and video. Conclusions will also be drawn as to the accuracy rates of the different types of classifiers, to determine if any other type of classifiers should have more prominence in the State of the Art. This document begins with an introduction to the work being conducted, in which both the its context, mo tivation and objectives are explained. Afterwards, the methodology used in order to more efficiently conduct the research in this Thesis is clarified. Following that, the State of the Art concerning ML based approaches to Action Recognition and Violence Detection is explored. After being brought to date in what are the State of the Art approaches, one is able to move forward to the following chapter, in which the Training method that will be employed to train the models that were seen as the best candidates to detect violence is detailed. Subsequently, the selected models are scrutinized in an effort to better understand their architecture, and why they are suited to detect violence. Afterwards, the results achieved by these models are explored, in order to better comprehend how well these performed. Lastly, the conclusions that were reached after conducting this research are stated, and possibilities for expanding this work further are also presented. The obtained results prove the success and prevalence of video classifiers, and also show the efficacy of models that make use of some kind of fusion. |
|---|