2024-03-28T20:06:49Zhttps://eprints.lib.hokudai.ac.jp/dspace-oai/requestoai:eprints.lib.hokudai.ac.jp:2115/791652022-11-17T02:08:08Zhdl_2115_20061hdl_2115_153SHIFTED CHOLESKY QR FOR COMPUTING THE QR FACTORIZATION OF ILL-CONDITIONED MATRICESFukaya, TakeshiKannan, RamaseshanNakatsukasa, YujiYamamoto, YusakuYanagisawa, YukaQR factorizationCholesky QR factorizationoblique inner productroundoff error analysiscommunication-avoiding algorithms007The Cholesky QR algorithm is an efficient communication-minimizing algorithm for computing the QR factorization of a tall-skinny matrix X epsilon R-mxn, where m >> n. Unfortunately it is inherently unstable and often breaks down when the matrix is ill-conditioned. A recent work [Yamamoto et al., ETNA, 44, pp. 306--326 (2015)] establishes that the instability can be cured by repeating the algorithm twice (called CholeskyQR2). However, the applicability of CholeskyQR2 is still limited by the requirement that the Cholesky factorization of the Gram matrix X-inverted perpendicular X runs to completion, which means that it does not always work for matrices X with the 2-norm condition number kappa(2)(X) roughly greater than u(-1/2), where u is the unit roundoff. In this work we extend the applicability to kappa(2)(X) = O (u(-1)) by introducing a shift to the computed Gram matrix so as to guarantee the Cholesky factorization R-inverted perpendicular R = A(inverted perpendicular) A+sI succeeds numerically. We show that the computed AR(-1) has reduced condition number that is roughly bounded by u(-1/2), for which CholeskyQR2 safely computes the QR factorization, yielding a computed Q of orthogonality vertical bar vertical bar Q(inverted perpendicular) - Q I vertical bar vertical bar(2) and residual vertical bar vertical bar A - QR vertical bar vertical bar(F) / vertical bar vertical bar A vertical bar vertical bar(F) both of the order of u. Thus we obtain the required QR factorization by essentially running Cholesky QR thrice. We extensively analyze the resulting algorithm shiftedCholeskyQR3 to reveal its excellent numerical stability. The shiftedCholeskyQR3 algorithm is also highly parallelizable, and applicable and effective also when working with an oblique inner product. We illustrate our findings through experiments, in which we achieve significant speedup over alternative methods.Society for Industrial and Applied Mathematics(SIAM)Journal Articlehttp://hdl.handle.net/2115/791651064-8275SIAM Journal on Scientific Computing421A477A5032020-02-20enginfo:doi/10.1137/18M1218212none