Publicação

Adjoint differentiation for matrix spectrum cutoff operation

Detalhes bibliográficos
Resumo:	Adjoint differentiation is based on a sequence of arithmetic operations so it requires a very large amount of memory to save all intermediate derivatives, O(N3) for matrix multiplication. With an analytical formula we can reduce the memory usage to only the matrices, O(N2). Consider a operation f(A) where f is a scalar function and A is a symmetric matrix. Usually this operation is performed in the orthonormal basis (matrix U) formed by the eigenvectors of A. There is an analytical formula for the adjoint Ā that does not require Ū. In some applications, for example multi-linear regression, it is very common to have large matrices with a considerable amount of eigenvalues close to 0, so it could be beneficial to use a low rank (k) approximation of the original matrix. This can be done with several algorithms of Reduced Singular Value Decomposition SVD. This implies the complete matrix U is not known and the analytical formula cannot be applied in a straightforward manner. We can define Uk, as the first k columns of U (eigenvectors of A). This matrix can P be found using a reduced SVD algorithm. Given an epsilon, we define f(A) = ∑ λi>ϵ λiuiuTi as the spectrum cutoff function. Epsilon should be related with k by λk+1 < ϵ < λk for eigenvalues sorted in descending order. In this work, we propose an approximate formula to Ā that uses results of a Projection Based Randomized Algorithm for SVD and the reduced matrix Uk, saving memory usage. We also created an interface that can be used with any given matrix and configured with several parameters. Our findings showed that we need to extend matrix Uk namely using QR decomposition to find additional column vectors that maintain orthogonality of U. We also need too assume a default value for the unknown eigenvalues of A. Our final results showed an MRE error of 1.85% for both small and large matrices when they have k small eigenvalues and N − k large ones. We did not find an improvement in computational time but we still have the capability to save a considerable amount of memory usage. We concluded that this is a good approach to compute adjoint derivatives of large matrices namely for the previously mentioned popular applications.
Autores principais:	Rodrigues, Alexandre da Rocha
Assunto:	Adjoint differentiation Eingenvalues and eigenvectors Matrix decomposition QR SVD
Ano:	2023
País:	Portugal
Tipo de documento:	dissertação de mestrado
Tipo de acesso:	acesso aberto
Instituição associada:	Universidade de Aveiro
Idioma:	inglês
Origem:	RIA - Repositório Institucional da Universidade de Aveiro

Descrição
Resumo:	Adjoint differentiation is based on a sequence of arithmetic operations so it requires a very large amount of memory to save all intermediate derivatives, O(N3) for matrix multiplication. With an analytical formula we can reduce the memory usage to only the matrices, O(N2). Consider a operation f(A) where f is a scalar function and A is a symmetric matrix. Usually this operation is performed in the orthonormal basis (matrix U) formed by the eigenvectors of A. There is an analytical formula for the adjoint Ā that does not require Ū. In some applications, for example multi-linear regression, it is very common to have large matrices with a considerable amount of eigenvalues close to 0, so it could be beneficial to use a low rank (k) approximation of the original matrix. This can be done with several algorithms of Reduced Singular Value Decomposition SVD. This implies the complete matrix U is not known and the analytical formula cannot be applied in a straightforward manner. We can define Uk, as the first k columns of U (eigenvectors of A). This matrix can P be found using a reduced SVD algorithm. Given an epsilon, we define f(A) = ∑ λi>ϵ λiuiuTi as the spectrum cutoff function. Epsilon should be related with k by λk+1 < ϵ < λk for eigenvalues sorted in descending order. In this work, we propose an approximate formula to Ā that uses results of a Projection Based Randomized Algorithm for SVD and the reduced matrix Uk, saving memory usage. We also created an interface that can be used with any given matrix and configured with several parameters. Our findings showed that we need to extend matrix Uk namely using QR decomposition to find additional column vectors that maintain orthogonality of U. We also need too assume a default value for the unknown eigenvalues of A. Our final results showed an MRE error of 1.85% for both small and large matrices when they have k small eigenvalues and N − k large ones. We did not find an improvement in computational time but we still have the capability to save a considerable amount of memory usage. We concluded that this is a good approach to compute adjoint derivatives of large matrices namely for the previously mentioned popular applications.