1 SUBROUTINE DGESVJ( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V,
2 $ LDV, WORK, LWORK, INFO )
3 *
4 * -- LAPACK routine (version 3.3.1) --
5 *
6 * -- Contributed by Zlatko Drmac of the University of Zagreb and --
7 * -- Kresimir Veselic of the Fernuniversitaet Hagen --
8 * -- April 2011 --
9 *
10 * -- LAPACK is a software package provided by Univ. of Tennessee, --
11 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
12 *
13 * This routine is also part of SIGMA (version 1.23, October 23. 2008.)
14 * SIGMA is a library of algorithms for highly accurate algorithms for
15 * computation of SVD, PSVD, QSVD, (H,K)-SVD, and for solution of the
16 * eigenvalue problems Hx = lambda M x, H M x = lambda x with H, M > 0.
17 *
18 IMPLICIT NONE
19 * ..
20 * .. Scalar Arguments ..
21 INTEGER INFO, LDA, LDV, LWORK, M, MV, N
22 CHARACTER*1 JOBA, JOBU, JOBV
23 * ..
24 * .. Array Arguments ..
25 DOUBLE PRECISION A( LDA, * ), SVA( N ), V( LDV, * ),
26 $ WORK( LWORK )
27 * ..
28 *
29 * Purpose
30 * =======
31 *
32 * DGESVJ computes the singular value decomposition (SVD) of a real
33 * M-by-N matrix A, where M >= N. The SVD of A is written as
34 * [++] [xx] [x0] [xx]
35 * A = U * SIGMA * V^t, [++] = [xx] * [ox] * [xx]
36 * [++] [xx]
37 * where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal
38 * matrix, and V is an N-by-N orthogonal matrix. The diagonal elements
39 * of SIGMA are the singular values of A. The columns of U and V are the
40 * left and the right singular vectors of A, respectively.
41 *
42 * Further Details
43 * ~~~~~~~~~~~~~~~
44 * The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane
45 * rotations. The rotations are implemented as fast scaled rotations of
46 * Anda and Park [1]. In the case of underflow of the Jacobi angle, a
47 * modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses
48 * column interchanges of de Rijk [2]. The relative accuracy of the computed
49 * singular values and the accuracy of the computed singular vectors (in
50 * angle metric) is as guaranteed by the theory of Demmel and Veselic [3].
51 * The condition number that determines the accuracy in the full rank case
52 * is essentially min_{D=diag} kappa(A*D), where kappa(.) is the
53 * spectral condition number. The best performance of this Jacobi SVD
54 * procedure is achieved if used in an accelerated version of Drmac and
55 * Veselic [5,6], and it is the kernel routine in the SIGMA library [7].
56 * Some tunning parameters (marked with [TP]) are available for the
57 * implementer.
58 * The computational range for the nonzero singular values is the machine
59 * number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even
60 * denormalized singular values can be computed with the corresponding
61 * gradual loss of accurate digits.
62 *
63 * Contributors
64 * ~~~~~~~~~~~~
65 * Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
66 *
67 * References
68 * ~~~~~~~~~~
69 * [1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling.
70 * SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174.
71 * [2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the
72 * singular value decomposition on a vector computer.
73 * SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371.
74 * [3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR.
75 * [4] Z. Drmac: Implementation of Jacobi rotations for accurate singular
76 * value computation in floating point arithmetic.
77 * SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222.
78 * [5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
79 * SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
80 * LAPACK Working note 169.
81 * [6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
82 * SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
83 * LAPACK Working note 170.
84 * [7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
85 * QSVD, (H,K)-SVD computations.
86 * Department of Mathematics, University of Zagreb, 2008.
87 *
88 * Bugs, Examples and Comments
89 * ~~~~~~~~~~~~~~~~~~~~~~~~~~~
90 * Please report all bugs and send interesting test examples and comments to
91 * drmac@math.hr. Thank you.
92 *
93 * Arguments
94 * =========
95 *
96 * JOBA (input) CHARACTER* 1
97 * Specifies the structure of A.
98 * = 'L': The input matrix A is lower triangular;
99 * = 'U': The input matrix A is upper triangular;
100 * = 'G': The input matrix A is general M-by-N matrix, M >= N.
101 *
102 * JOBU (input) CHARACTER*1
103 * Specifies whether to compute the left singular vectors
104 * (columns of U):
105 * = 'U': The left singular vectors corresponding to the nonzero
106 * singular values are computed and returned in the leading
107 * columns of A. See more details in the description of A.
108 * The default numerical orthogonality threshold is set to
109 * approximately TOL=CTOL*EPS, CTOL=DSQRT(M), EPS=DLAMCH('E').
110 * = 'C': Analogous to JOBU='U', except that user can control the
111 * level of numerical orthogonality of the computed left
112 * singular vectors. TOL can be set to TOL = CTOL*EPS, where
113 * CTOL is given on input in the array WORK.
114 * No CTOL smaller than ONE is allowed. CTOL greater
115 * than 1 / EPS is meaningless. The option 'C'
116 * can be used if M*EPS is satisfactory orthogonality
117 * of the computed left singular vectors, so CTOL=M could
118 * save few sweeps of Jacobi rotations.
119 * See the descriptions of A and WORK(1).
120 * = 'N': The matrix U is not computed. However, see the
121 * description of A.
122 *
123 * JOBV (input) CHARACTER*1
124 * Specifies whether to compute the right singular vectors, that
125 * is, the matrix V:
126 * = 'V' : the matrix V is computed and returned in the array V
127 * = 'A' : the Jacobi rotations are applied to the MV-by-N
128 * array V. In other words, the right singular vector
129 * matrix V is not computed explicitly, instead it is
130 * applied to an MV-by-N matrix initially stored in the
131 * first MV rows of V.
132 * = 'N' : the matrix V is not computed and the array V is not
133 * referenced
134 *
135 * M (input) INTEGER
136 * The number of rows of the input matrix A. 1/DLAMCH('E') > M >= 0.
137 *
138 * N (input) INTEGER
139 * The number of columns of the input matrix A.
140 * M >= N >= 0.
141 *
142 * A (input/output) DOUBLE PRECISION array, dimension (LDA,N)
143 * On entry, the M-by-N matrix A.
144 * On exit :
145 * If JOBU .EQ. 'U' .OR. JOBU .EQ. 'C' :
146 * If INFO .EQ. 0 :
147 * RANKA orthonormal columns of U are returned in the
148 * leading RANKA columns of the array A. Here RANKA <= N
149 * is the number of computed singular values of A that are
150 * above the underflow threshold DLAMCH('S'). The singular
151 * vectors corresponding to underflowed or zero singular
152 * values are not computed. The value of RANKA is returned
153 * in the array WORK as RANKA=NINT(WORK(2)). Also see the
154 * descriptions of SVA and WORK. The computed columns of U
155 * are mutually numerically orthogonal up to approximately
156 * TOL=DSQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU.EQ.'C'),
157 * see the description of JOBU.
158 * If INFO .GT. 0 :
159 * the procedure DGESVJ did not converge in the given number
160 * of iterations (sweeps). In that case, the computed
161 * columns of U may not be orthogonal up to TOL. The output
162 * U (stored in A), SIGMA (given by the computed singular
163 * values in SVA(1:N)) and V is still a decomposition of the
164 * input matrix A in the sense that the residual
165 * ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small.
166 *
167 * If JOBU .EQ. 'N' :
168 * If INFO .EQ. 0 :
169 * Note that the left singular vectors are 'for free' in the
170 * one-sided Jacobi SVD algorithm. However, if only the
171 * singular values are needed, the level of numerical
172 * orthogonality of U is not an issue and iterations are
173 * stopped when the columns of the iterated matrix are
174 * numerically orthogonal up to approximately M*EPS. Thus,
175 * on exit, A contains the columns of U scaled with the
176 * corresponding singular values.
177 * If INFO .GT. 0 :
178 * the procedure DGESVJ did not converge in the given number
179 * of iterations (sweeps).
180 *
181 * LDA (input) INTEGER
182 * The leading dimension of the array A. LDA >= max(1,M).
183 *
184 * SVA (workspace/output) DOUBLE PRECISION array, dimension (N)
185 * On exit :
186 * If INFO .EQ. 0 :
187 * depending on the value SCALE = WORK(1), we have:
188 * If SCALE .EQ. ONE :
189 * SVA(1:N) contains the computed singular values of A.
190 * During the computation SVA contains the Euclidean column
191 * norms of the iterated matrices in the array A.
192 * If SCALE .NE. ONE :
193 * The singular values of A are SCALE*SVA(1:N), and this
194 * factored representation is due to the fact that some of the
195 * singular values of A might underflow or overflow.
196 * If INFO .GT. 0 :
197 * the procedure DGESVJ did not converge in the given number of
198 * iterations (sweeps) and SCALE*SVA(1:N) may not be accurate.
199 *
200 * MV (input) INTEGER
201 * If JOBV .EQ. 'A', then the product of Jacobi rotations in DGESVJ
202 * is applied to the first MV rows of V. See the description of JOBV.
203 *
204 * V (input/output) DOUBLE PRECISION array, dimension (LDV,N)
205 * If JOBV = 'V', then V contains on exit the N-by-N matrix of
206 * the right singular vectors;
207 * If JOBV = 'A', then V contains the product of the computed right
208 * singular vector matrix and the initial matrix in
209 * the array V.
210 * If JOBV = 'N', then V is not referenced.
211 *
212 * LDV (input) INTEGER
213 * The leading dimension of the array V, LDV .GE. 1.
214 * If JOBV .EQ. 'V', then LDV .GE. max(1,N).
215 * If JOBV .EQ. 'A', then LDV .GE. max(1,MV) .
216 *
217 * WORK (input/workspace/output) DOUBLE PRECISION array, dimension max(4,M+N).
218 * On entry :
219 * If JOBU .EQ. 'C' :
220 * WORK(1) = CTOL, where CTOL defines the threshold for convergence.
221 * The process stops if all columns of A are mutually
222 * orthogonal up to CTOL*EPS, EPS=DLAMCH('E').
223 * It is required that CTOL >= ONE, i.e. it is not
224 * allowed to force the routine to obtain orthogonality
225 * below EPS.
226 * On exit :
227 * WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N)
228 * are the computed singular values of A.
229 * (See description of SVA().)
230 * WORK(2) = NINT(WORK(2)) is the number of the computed nonzero
231 * singular values.
232 * WORK(3) = NINT(WORK(3)) is the number of the computed singular
233 * values that are larger than the underflow threshold.
234 * WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi
235 * rotations needed for numerical convergence.
236 * WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep.
237 * This is useful information in cases when DGESVJ did
238 * not converge, as it can be used to estimate whether
239 * the output is stil useful and for post festum analysis.
240 * WORK(6) = the largest absolute value over all sines of the
241 * Jacobi rotation angles in the last sweep. It can be
242 * useful for a post festum analysis.
243 *
244 * LWORK (input) INTEGER
245 * length of WORK, WORK >= MAX(6,M+N)
246 *
247 * INFO (output) INTEGER
248 * = 0 : successful exit.
249 * < 0 : if INFO = -i, then the i-th argument had an illegal value
250 * > 0 : DGESVJ did not converge in the maximal allowed number (30)
251 * of sweeps. The output may still be useful. See the
252 * description of WORK.
253 *
254 * =====================================================================
255 *
256 * .. Local Parameters ..
257 DOUBLE PRECISION ZERO, HALF, ONE, TWO
258 PARAMETER ( ZERO = 0.0D0, HALF = 0.5D0, ONE = 1.0D0,
259 $ TWO = 2.0D0 )
260 INTEGER NSWEEP
261 PARAMETER ( NSWEEP = 30 )
262 * ..
263 * .. Local Scalars ..
264 DOUBLE PRECISION AAPP, AAPP0, AAPQ, AAQQ, APOAQ, AQOAP, BIG,
265 $ BIGTHETA, CS, CTOL, EPSLN, LARGE, MXAAPQ,
266 $ MXSINJ, ROOTBIG, ROOTEPS, ROOTSFMIN, ROOTTOL,
267 $ SKL, SFMIN, SMALL, SN, T, TEMP1, THETA,
268 $ THSIGN, TOL
269 INTEGER BLSKIP, EMPTSW, i, ibr, IERR, igl, IJBLSK, ir1,
270 $ ISWROT, jbc, jgl, KBL, LKAHEAD, MVL, N2, N34,
271 $ N4, NBL, NOTROT, p, PSKIPPED, q, ROWSKIP,
272 $ SWBAND
273 LOGICAL APPLV, GOSCALE, LOWER, LSVEC, NOSCALE, ROTOK,
274 $ RSVEC, UCTOL, UPPER
275 * ..
276 * .. Local Arrays ..
277 DOUBLE PRECISION FASTR( 5 )
278 * ..
279 * .. Intrinsic Functions ..
280 INTRINSIC DABS, DMAX1, DMIN1, DBLE, MIN0, DSIGN, DSQRT
281 * ..
282 * .. External Functions ..
283 * ..
284 * from BLAS
285 DOUBLE PRECISION DDOT, DNRM2
286 EXTERNAL DDOT, DNRM2
287 INTEGER IDAMAX
288 EXTERNAL IDAMAX
289 * from LAPACK
290 DOUBLE PRECISION DLAMCH
291 EXTERNAL DLAMCH
292 LOGICAL LSAME
293 EXTERNAL LSAME
294 * ..
295 * .. External Subroutines ..
296 * ..
297 * from BLAS
298 EXTERNAL DAXPY, DCOPY, DROTM, DSCAL, DSWAP
299 * from LAPACK
300 EXTERNAL DLASCL, DLASET, DLASSQ, XERBLA
301 *
302 EXTERNAL DGSVJ0, DGSVJ1
303 * ..
304 * .. Executable Statements ..
305 *
306 * Test the input arguments
307 *
308 LSVEC = LSAME( JOBU, 'U' )
309 UCTOL = LSAME( JOBU, 'C' )
310 RSVEC = LSAME( JOBV, 'V' )
311 APPLV = LSAME( JOBV, 'A' )
312 UPPER = LSAME( JOBA, 'U' )
313 LOWER = LSAME( JOBA, 'L' )
314 *
315 IF( .NOT.( UPPER .OR. LOWER .OR. LSAME( JOBA, 'G' ) ) ) THEN
316 INFO = -1
317 ELSE IF( .NOT.( LSVEC .OR. UCTOL .OR. LSAME( JOBU, 'N' ) ) ) THEN
318 INFO = -2
319 ELSE IF( .NOT.( RSVEC .OR. APPLV .OR. LSAME( JOBV, 'N' ) ) ) THEN
320 INFO = -3
321 ELSE IF( M.LT.0 ) THEN
322 INFO = -4
323 ELSE IF( ( N.LT.0 ) .OR. ( N.GT.M ) ) THEN
324 INFO = -5
325 ELSE IF( LDA.LT.M ) THEN
326 INFO = -7
327 ELSE IF( MV.LT.0 ) THEN
328 INFO = -9
329 ELSE IF( ( RSVEC .AND. ( LDV.LT.N ) ) .OR.
330 $ ( APPLV .AND. ( LDV.LT.MV ) ) ) THEN
331 INFO = -11
332 ELSE IF( UCTOL .AND. ( WORK( 1 ).LE.ONE ) ) THEN
333 INFO = -12
334 ELSE IF( LWORK.LT.MAX0( M+N, 6 ) ) THEN
335 INFO = -13
336 ELSE
337 INFO = 0
338 END IF
339 *
340 * #:(
341 IF( INFO.NE.0 ) THEN
342 CALL XERBLA( 'DGESVJ', -INFO )
343 RETURN
344 END IF
345 *
346 * #:) Quick return for void matrix
347 *
348 IF( ( M.EQ.0 ) .OR. ( N.EQ.0 ) )RETURN
349 *
350 * Set numerical parameters
351 * The stopping criterion for Jacobi rotations is
352 *
353 * max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS
354 *
355 * where EPS is the round-off and CTOL is defined as follows:
356 *
357 IF( UCTOL ) THEN
358 * ... user controlled
359 CTOL = WORK( 1 )
360 ELSE
361 * ... default
362 IF( LSVEC .OR. RSVEC .OR. APPLV ) THEN
363 CTOL = DSQRT( DBLE( M ) )
364 ELSE
365 CTOL = DBLE( M )
366 END IF
367 END IF
368 * ... and the machine dependent parameters are
369 *[!] (Make sure that DLAMCH() works properly on the target machine.)
370 *
371 EPSLN = DLAMCH( 'Epsilon' )
372 ROOTEPS = DSQRT( EPSLN )
373 SFMIN = DLAMCH( 'SafeMinimum' )
374 ROOTSFMIN = DSQRT( SFMIN )
375 SMALL = SFMIN / EPSLN
376 BIG = DLAMCH( 'Overflow' )
377 * BIG = ONE / SFMIN
378 ROOTBIG = ONE / ROOTSFMIN
379 LARGE = BIG / DSQRT( DBLE( M*N ) )
380 BIGTHETA = ONE / ROOTEPS
381 *
382 TOL = CTOL*EPSLN
383 ROOTTOL = DSQRT( TOL )
384 *
385 IF( DBLE( M )*EPSLN.GE.ONE ) THEN
386 INFO = -4
387 CALL XERBLA( 'DGESVJ', -INFO )
388 RETURN
389 END IF
390 *
391 * Initialize the right singular vector matrix.
392 *
393 IF( RSVEC ) THEN
394 MVL = N
395 CALL DLASET( 'A', MVL, N, ZERO, ONE, V, LDV )
396 ELSE IF( APPLV ) THEN
397 MVL = MV
398 END IF
399 RSVEC = RSVEC .OR. APPLV
400 *
401 * Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N )
402 *(!) If necessary, scale A to protect the largest singular value
403 * from overflow. It is possible that saving the largest singular
404 * value destroys the information about the small ones.
405 * This initial scaling is almost minimal in the sense that the
406 * goal is to make sure that no column norm overflows, and that
407 * DSQRT(N)*max_i SVA(i) does not overflow. If INFinite entries
408 * in A are detected, the procedure returns with INFO=-6.
409 *
410 SKL= ONE / DSQRT( DBLE( M )*DBLE( N ) )
411 NOSCALE = .TRUE.
412 GOSCALE = .TRUE.
413 *
414 IF( LOWER ) THEN
415 * the input matrix is M-by-N lower triangular (trapezoidal)
416 DO 1874 p = 1, N
417 AAPP = ZERO
418 AAQQ = ONE
419 CALL DLASSQ( M-p+1, A( p, p ), 1, AAPP, AAQQ )
420 IF( AAPP.GT.BIG ) THEN
421 INFO = -6
422 CALL XERBLA( 'DGESVJ', -INFO )
423 RETURN
424 END IF
425 AAQQ = DSQRT( AAQQ )
426 IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN
427 SVA( p ) = AAPP*AAQQ
428 ELSE
429 NOSCALE = .FALSE.
430 SVA( p ) = AAPP*( AAQQ*SKL)
431 IF( GOSCALE ) THEN
432 GOSCALE = .FALSE.
433 DO 1873 q = 1, p - 1
434 SVA( q ) = SVA( q )*SKL
435 1873 CONTINUE
436 END IF
437 END IF
438 1874 CONTINUE
439 ELSE IF( UPPER ) THEN
440 * the input matrix is M-by-N upper triangular (trapezoidal)
441 DO 2874 p = 1, N
442 AAPP = ZERO
443 AAQQ = ONE
444 CALL DLASSQ( p, A( 1, p ), 1, AAPP, AAQQ )
445 IF( AAPP.GT.BIG ) THEN
446 INFO = -6
447 CALL XERBLA( 'DGESVJ', -INFO )
448 RETURN
449 END IF
450 AAQQ = DSQRT( AAQQ )
451 IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN
452 SVA( p ) = AAPP*AAQQ
453 ELSE
454 NOSCALE = .FALSE.
455 SVA( p ) = AAPP*( AAQQ*SKL)
456 IF( GOSCALE ) THEN
457 GOSCALE = .FALSE.
458 DO 2873 q = 1, p - 1
459 SVA( q ) = SVA( q )*SKL
460 2873 CONTINUE
461 END IF
462 END IF
463 2874 CONTINUE
464 ELSE
465 * the input matrix is M-by-N general dense
466 DO 3874 p = 1, N
467 AAPP = ZERO
468 AAQQ = ONE
469 CALL DLASSQ( M, A( 1, p ), 1, AAPP, AAQQ )
470 IF( AAPP.GT.BIG ) THEN
471 INFO = -6
472 CALL XERBLA( 'DGESVJ', -INFO )
473 RETURN
474 END IF
475 AAQQ = DSQRT( AAQQ )
476 IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN
477 SVA( p ) = AAPP*AAQQ
478 ELSE
479 NOSCALE = .FALSE.
480 SVA( p ) = AAPP*( AAQQ*SKL)
481 IF( GOSCALE ) THEN
482 GOSCALE = .FALSE.
483 DO 3873 q = 1, p - 1
484 SVA( q ) = SVA( q )*SKL
485 3873 CONTINUE
486 END IF
487 END IF
488 3874 CONTINUE
489 END IF
490 *
491 IF( NOSCALE )SKL= ONE
492 *
493 * Move the smaller part of the spectrum from the underflow threshold
494 *(!) Start by determining the position of the nonzero entries of the
495 * array SVA() relative to ( SFMIN, BIG ).
496 *
497 AAPP = ZERO
498 AAQQ = BIG
499 DO 4781 p = 1, N
500 IF( SVA( p ).NE.ZERO )AAQQ = DMIN1( AAQQ, SVA( p ) )
501 AAPP = DMAX1( AAPP, SVA( p ) )
502 4781 CONTINUE
503 *
504 * #:) Quick return for zero matrix
505 *
506 IF( AAPP.EQ.ZERO ) THEN
507 IF( LSVEC )CALL DLASET( 'G', M, N, ZERO, ONE, A, LDA )
508 WORK( 1 ) = ONE
509 WORK( 2 ) = ZERO
510 WORK( 3 ) = ZERO
511 WORK( 4 ) = ZERO
512 WORK( 5 ) = ZERO
513 WORK( 6 ) = ZERO
514 RETURN
515 END IF
516 *
517 * #:) Quick return for one-column matrix
518 *
519 IF( N.EQ.1 ) THEN
520 IF( LSVEC )CALL DLASCL( 'G', 0, 0, SVA( 1 ), SKL, M, 1,
521 $ A( 1, 1 ), LDA, IERR )
522 WORK( 1 ) = ONE / SKL
523 IF( SVA( 1 ).GE.SFMIN ) THEN
524 WORK( 2 ) = ONE
525 ELSE
526 WORK( 2 ) = ZERO
527 END IF
528 WORK( 3 ) = ZERO
529 WORK( 4 ) = ZERO
530 WORK( 5 ) = ZERO
531 WORK( 6 ) = ZERO
532 RETURN
533 END IF
534 *
535 * Protect small singular values from underflow, and try to
536 * avoid underflows/overflows in computing Jacobi rotations.
537 *
538 SN = DSQRT( SFMIN / EPSLN )
539 TEMP1 = DSQRT( BIG / DBLE( N ) )
540 IF( ( AAPP.LE.SN ) .OR. ( AAQQ.GE.TEMP1 ) .OR.
541 $ ( ( SN.LE.AAQQ ) .AND. ( AAPP.LE.TEMP1 ) ) ) THEN
542 TEMP1 = DMIN1( BIG, TEMP1 / AAPP )
543 * AAQQ = AAQQ*TEMP1
544 * AAPP = AAPP*TEMP1
545 ELSE IF( ( AAQQ.LE.SN ) .AND. ( AAPP.LE.TEMP1 ) ) THEN
546 TEMP1 = DMIN1( SN / AAQQ, BIG / ( AAPP*DSQRT( DBLE( N ) ) ) )
547 * AAQQ = AAQQ*TEMP1
548 * AAPP = AAPP*TEMP1
549 ELSE IF( ( AAQQ.GE.SN ) .AND. ( AAPP.GE.TEMP1 ) ) THEN
550 TEMP1 = DMAX1( SN / AAQQ, TEMP1 / AAPP )
551 * AAQQ = AAQQ*TEMP1
552 * AAPP = AAPP*TEMP1
553 ELSE IF( ( AAQQ.LE.SN ) .AND. ( AAPP.GE.TEMP1 ) ) THEN
554 TEMP1 = DMIN1( SN / AAQQ, BIG / ( DSQRT( DBLE( N ) )*AAPP ) )
555 * AAQQ = AAQQ*TEMP1
556 * AAPP = AAPP*TEMP1
557 ELSE
558 TEMP1 = ONE
559 END IF
560 *
561 * Scale, if necessary
562 *
563 IF( TEMP1.NE.ONE ) THEN
564 CALL DLASCL( 'G', 0, 0, ONE, TEMP1, N, 1, SVA, N, IERR )
565 END IF
566 SKL= TEMP1*SKL
567 IF( SKL.NE.ONE ) THEN
568 CALL DLASCL( JOBA, 0, 0, ONE, SKL, M, N, A, LDA, IERR )
569 SKL= ONE / SKL
570 END IF
571 *
572 * Row-cyclic Jacobi SVD algorithm with column pivoting
573 *
574 EMPTSW = ( N*( N-1 ) ) / 2
575 NOTROT = 0
576 FASTR( 1 ) = ZERO
577 *
578 * A is represented in factored form A = A * diag(WORK), where diag(WORK)
579 * is initialized to identity. WORK is updated during fast scaled
580 * rotations.
581 *
582 DO 1868 q = 1, N
583 WORK( q ) = ONE
584 1868 CONTINUE
585 *
586 *
587 SWBAND = 3
588 *[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective
589 * if DGESVJ is used as a computational routine in the preconditioned
590 * Jacobi SVD algorithm DGESVJ. For sweeps i=1:SWBAND the procedure
591 * works on pivots inside a band-like region around the diagonal.
592 * The boundaries are determined dynamically, based on the number of
593 * pivots above a threshold.
594 *
595 KBL = MIN0( 8, N )
596 *[TP] KBL is a tuning parameter that defines the tile size in the
597 * tiling of the p-q loops of pivot pairs. In general, an optimal
598 * value of KBL depends on the matrix dimensions and on the
599 * parameters of the computer's memory.
600 *
601 NBL = N / KBL
602 IF( ( NBL*KBL ).NE.N )NBL = NBL + 1
603 *
604 BLSKIP = KBL**2
605 *[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL.
606 *
607 ROWSKIP = MIN0( 5, KBL )
608 *[TP] ROWSKIP is a tuning parameter.
609 *
610 LKAHEAD = 1
611 *[TP] LKAHEAD is a tuning parameter.
612 *
613 * Quasi block transformations, using the lower (upper) triangular
614 * structure of the input matrix. The quasi-block-cycling usually
615 * invokes cubic convergence. Big part of this cycle is done inside
616 * canonical subspaces of dimensions less than M.
617 *
618 IF( ( LOWER .OR. UPPER ) .AND. ( N.GT.MAX0( 64, 4*KBL ) ) ) THEN
619 *[TP] The number of partition levels and the actual partition are
620 * tuning parameters.
621 N4 = N / 4
622 N2 = N / 2
623 N34 = 3*N4
624 IF( APPLV ) THEN
625 q = 0
626 ELSE
627 q = 1
628 END IF
629 *
630 IF( LOWER ) THEN
631 *
632 * This works very well on lower triangular matrices, in particular
633 * in the framework of the preconditioned Jacobi SVD (xGEJSV).
634 * The idea is simple:
635 * [+ 0 0 0] Note that Jacobi transformations of [0 0]
636 * [+ + 0 0] [0 0]
637 * [+ + x 0] actually work on [x 0] [x 0]
638 * [+ + x x] [x x]. [x x]
639 *
640 CALL DGSVJ0( JOBV, M-N34, N-N34, A( N34+1, N34+1 ), LDA,
641 $ WORK( N34+1 ), SVA( N34+1 ), MVL,
642 $ V( N34*q+1, N34+1 ), LDV, EPSLN, SFMIN, TOL,
643 $ 2, WORK( N+1 ), LWORK-N, IERR )
644 *
645 CALL DGSVJ0( JOBV, M-N2, N34-N2, A( N2+1, N2+1 ), LDA,
646 $ WORK( N2+1 ), SVA( N2+1 ), MVL,
647 $ V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 2,
648 $ WORK( N+1 ), LWORK-N, IERR )
649 *
650 CALL DGSVJ1( JOBV, M-N2, N-N2, N4, A( N2+1, N2+1 ), LDA,
651 $ WORK( N2+1 ), SVA( N2+1 ), MVL,
652 $ V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 1,
653 $ WORK( N+1 ), LWORK-N, IERR )
654 *
655 CALL DGSVJ0( JOBV, M-N4, N2-N4, A( N4+1, N4+1 ), LDA,
656 $ WORK( N4+1 ), SVA( N4+1 ), MVL,
657 $ V( N4*q+1, N4+1 ), LDV, EPSLN, SFMIN, TOL, 1,
658 $ WORK( N+1 ), LWORK-N, IERR )
659 *
660 CALL DGSVJ0( JOBV, M, N4, A, LDA, WORK, SVA, MVL, V, LDV,
661 $ EPSLN, SFMIN, TOL, 1, WORK( N+1 ), LWORK-N,
662 $ IERR )
663 *
664 CALL DGSVJ1( JOBV, M, N2, N4, A, LDA, WORK, SVA, MVL, V,
665 $ LDV, EPSLN, SFMIN, TOL, 1, WORK( N+1 ),
666 $ LWORK-N, IERR )
667 *
668 *
669 ELSE IF( UPPER ) THEN
670 *
671 *
672 CALL DGSVJ0( JOBV, N4, N4, A, LDA, WORK, SVA, MVL, V, LDV,
673 $ EPSLN, SFMIN, TOL, 2, WORK( N+1 ), LWORK-N,
674 $ IERR )
675 *
676 CALL DGSVJ0( JOBV, N2, N4, A( 1, N4+1 ), LDA, WORK( N4+1 ),
677 $ SVA( N4+1 ), MVL, V( N4*q+1, N4+1 ), LDV,
678 $ EPSLN, SFMIN, TOL, 1, WORK( N+1 ), LWORK-N,
679 $ IERR )
680 *
681 CALL DGSVJ1( JOBV, N2, N2, N4, A, LDA, WORK, SVA, MVL, V,
682 $ LDV, EPSLN, SFMIN, TOL, 1, WORK( N+1 ),
683 $ LWORK-N, IERR )
684 *
685 CALL DGSVJ0( JOBV, N2+N4, N4, A( 1, N2+1 ), LDA,
686 $ WORK( N2+1 ), SVA( N2+1 ), MVL,
687 $ V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 1,
688 $ WORK( N+1 ), LWORK-N, IERR )
689
690 END IF
691 *
692 END IF
693 *
694 * .. Row-cyclic pivot strategy with de Rijk's pivoting ..
695 *
696 DO 1993 i = 1, NSWEEP
697 *
698 * .. go go go ...
699 *
700 MXAAPQ = ZERO
701 MXSINJ = ZERO
702 ISWROT = 0
703 *
704 NOTROT = 0
705 PSKIPPED = 0
706 *
707 * Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs
708 * 1 <= p < q <= N. This is the first step toward a blocked implementation
709 * of the rotations. New implementation, based on block transformations,
710 * is under development.
711 *
712 DO 2000 ibr = 1, NBL
713 *
714 igl = ( ibr-1 )*KBL + 1
715 *
716 DO 1002 ir1 = 0, MIN0( LKAHEAD, NBL-ibr )
717 *
718 igl = igl + ir1*KBL
719 *
720 DO 2001 p = igl, MIN0( igl+KBL-1, N-1 )
721 *
722 * .. de Rijk's pivoting
723 *
724 q = IDAMAX( N-p+1, SVA( p ), 1 ) + p - 1
725 IF( p.NE.q ) THEN
726 CALL DSWAP( M, A( 1, p ), 1, A( 1, q ), 1 )
727 IF( RSVEC )CALL DSWAP( MVL, V( 1, p ), 1,
728 $ V( 1, q ), 1 )
729 TEMP1 = SVA( p )
730 SVA( p ) = SVA( q )
731 SVA( q ) = TEMP1
732 TEMP1 = WORK( p )
733 WORK( p ) = WORK( q )
734 WORK( q ) = TEMP1
735 END IF
736 *
737 IF( ir1.EQ.0 ) THEN
738 *
739 * Column norms are periodically updated by explicit
740 * norm computation.
741 * Caveat:
742 * Unfortunately, some BLAS implementations compute DNRM2(M,A(1,p),1)
743 * as DSQRT(DDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to
744 * overflow for ||A(:,p)||_2 > DSQRT(overflow_threshold), and to
745 * underflow for ||A(:,p)||_2 < DSQRT(underflow_threshold).
746 * Hence, DNRM2 cannot be trusted, not even in the case when
747 * the true norm is far from the under(over)flow boundaries.
748 * If properly implemented DNRM2 is available, the IF-THEN-ELSE
749 * below should read "AAPP = DNRM2( M, A(1,p), 1 ) * WORK(p)".
750 *
751 IF( ( SVA( p ).LT.ROOTBIG ) .AND.
752 $ ( SVA( p ).GT.ROOTSFMIN ) ) THEN
753 SVA( p ) = DNRM2( M, A( 1, p ), 1 )*WORK( p )
754 ELSE
755 TEMP1 = ZERO
756 AAPP = ONE
757 CALL DLASSQ( M, A( 1, p ), 1, TEMP1, AAPP )
758 SVA( p ) = TEMP1*DSQRT( AAPP )*WORK( p )
759 END IF
760 AAPP = SVA( p )
761 ELSE
762 AAPP = SVA( p )
763 END IF
764 *
765 IF( AAPP.GT.ZERO ) THEN
766 *
767 PSKIPPED = 0
768 *
769 DO 2002 q = p + 1, MIN0( igl+KBL-1, N )
770 *
771 AAQQ = SVA( q )
772 *
773 IF( AAQQ.GT.ZERO ) THEN
774 *
775 AAPP0 = AAPP
776 IF( AAQQ.GE.ONE ) THEN
777 ROTOK = ( SMALL*AAPP ).LE.AAQQ
778 IF( AAPP.LT.( BIG / AAQQ ) ) THEN
779 AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1,
780 $ q ), 1 )*WORK( p )*WORK( q ) /
781 $ AAQQ ) / AAPP
782 ELSE
783 CALL DCOPY( M, A( 1, p ), 1,
784 $ WORK( N+1 ), 1 )
785 CALL DLASCL( 'G', 0, 0, AAPP,
786 $ WORK( p ), M, 1,
787 $ WORK( N+1 ), LDA, IERR )
788 AAPQ = DDOT( M, WORK( N+1 ), 1,
789 $ A( 1, q ), 1 )*WORK( q ) / AAQQ
790 END IF
791 ELSE
792 ROTOK = AAPP.LE.( AAQQ / SMALL )
793 IF( AAPP.GT.( SMALL / AAQQ ) ) THEN
794 AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1,
795 $ q ), 1 )*WORK( p )*WORK( q ) /
796 $ AAQQ ) / AAPP
797 ELSE
798 CALL DCOPY( M, A( 1, q ), 1,
799 $ WORK( N+1 ), 1 )
800 CALL DLASCL( 'G', 0, 0, AAQQ,
801 $ WORK( q ), M, 1,
802 $ WORK( N+1 ), LDA, IERR )
803 AAPQ = DDOT( M, WORK( N+1 ), 1,
804 $ A( 1, p ), 1 )*WORK( p ) / AAPP
805 END IF
806 END IF
807 *
808 MXAAPQ = DMAX1( MXAAPQ, DABS( AAPQ ) )
809 *
810 * TO rotate or NOT to rotate, THAT is the question ...
811 *
812 IF( DABS( AAPQ ).GT.TOL ) THEN
813 *
814 * .. rotate
815 *[RTD] ROTATED = ROTATED + ONE
816 *
817 IF( ir1.EQ.0 ) THEN
818 NOTROT = 0
819 PSKIPPED = 0
820 ISWROT = ISWROT + 1
821 END IF
822 *
823 IF( ROTOK ) THEN
824 *
825 AQOAP = AAQQ / AAPP
826 APOAQ = AAPP / AAQQ
827 THETA = -HALF*DABS(AQOAP-APOAQ)/AAPQ
828 *
829 IF( DABS( THETA ).GT.BIGTHETA ) THEN
830 *
831 T = HALF / THETA
832 FASTR( 3 ) = T*WORK( p ) / WORK( q )
833 FASTR( 4 ) = -T*WORK( q ) /
834 $ WORK( p )
835 CALL DROTM( M, A( 1, p ), 1,
836 $ A( 1, q ), 1, FASTR )
837 IF( RSVEC )CALL DROTM( MVL,
838 $ V( 1, p ), 1,
839 $ V( 1, q ), 1,
840 $ FASTR )
841 SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO,
842 $ ONE+T*APOAQ*AAPQ ) )
843 AAPP = AAPP*DSQRT( DMAX1( ZERO,
844 $ ONE-T*AQOAP*AAPQ ) )
845 MXSINJ = DMAX1( MXSINJ, DABS( T ) )
846 *
847 ELSE
848 *
849 * .. choose correct signum for THETA and rotate
850 *
851 THSIGN = -DSIGN( ONE, AAPQ )
852 T = ONE / ( THETA+THSIGN*
853 $ DSQRT( ONE+THETA*THETA ) )
854 CS = DSQRT( ONE / ( ONE+T*T ) )
855 SN = T*CS
856 *
857 MXSINJ = DMAX1( MXSINJ, DABS( SN ) )
858 SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO,
859 $ ONE+T*APOAQ*AAPQ ) )
860 AAPP = AAPP*DSQRT( DMAX1( ZERO,
861 $ ONE-T*AQOAP*AAPQ ) )
862 *
863 APOAQ = WORK( p ) / WORK( q )
864 AQOAP = WORK( q ) / WORK( p )
865 IF( WORK( p ).GE.ONE ) THEN
866 IF( WORK( q ).GE.ONE ) THEN
867 FASTR( 3 ) = T*APOAQ
868 FASTR( 4 ) = -T*AQOAP
869 WORK( p ) = WORK( p )*CS
870 WORK( q ) = WORK( q )*CS
871 CALL DROTM( M, A( 1, p ), 1,
872 $ A( 1, q ), 1,
873 $ FASTR )
874 IF( RSVEC )CALL DROTM( MVL,
875 $ V( 1, p ), 1, V( 1, q ),
876 $ 1, FASTR )
877 ELSE
878 CALL DAXPY( M, -T*AQOAP,
879 $ A( 1, q ), 1,
880 $ A( 1, p ), 1 )
881 CALL DAXPY( M, CS*SN*APOAQ,
882 $ A( 1, p ), 1,
883 $ A( 1, q ), 1 )
884 WORK( p ) = WORK( p )*CS
885 WORK( q ) = WORK( q ) / CS
886 IF( RSVEC ) THEN
887 CALL DAXPY( MVL, -T*AQOAP,
888 $ V( 1, q ), 1,
889 $ V( 1, p ), 1 )
890 CALL DAXPY( MVL,
891 $ CS*SN*APOAQ,
892 $ V( 1, p ), 1,
893 $ V( 1, q ), 1 )
894 END IF
895 END IF
896 ELSE
897 IF( WORK( q ).GE.ONE ) THEN
898 CALL DAXPY( M, T*APOAQ,
899 $ A( 1, p ), 1,
900 $ A( 1, q ), 1 )
901 CALL DAXPY( M, -CS*SN*AQOAP,
902 $ A( 1, q ), 1,
903 $ A( 1, p ), 1 )
904 WORK( p ) = WORK( p ) / CS
905 WORK( q ) = WORK( q )*CS
906 IF( RSVEC ) THEN
907 CALL DAXPY( MVL, T*APOAQ,
908 $ V( 1, p ), 1,
909 $ V( 1, q ), 1 )
910 CALL DAXPY( MVL,
911 $ -CS*SN*AQOAP,
912 $ V( 1, q ), 1,
913 $ V( 1, p ), 1 )
914 END IF
915 ELSE
916 IF( WORK( p ).GE.WORK( q ) )
917 $ THEN
918 CALL DAXPY( M, -T*AQOAP,
919 $ A( 1, q ), 1,
920 $ A( 1, p ), 1 )
921 CALL DAXPY( M, CS*SN*APOAQ,
922 $ A( 1, p ), 1,
923 $ A( 1, q ), 1 )
924 WORK( p ) = WORK( p )*CS
925 WORK( q ) = WORK( q ) / CS
926 IF( RSVEC ) THEN
927 CALL DAXPY( MVL,
928 $ -T*AQOAP,
929 $ V( 1, q ), 1,
930 $ V( 1, p ), 1 )
931 CALL DAXPY( MVL,
932 $ CS*SN*APOAQ,
933 $ V( 1, p ), 1,
934 $ V( 1, q ), 1 )
935 END IF
936 ELSE
937 CALL DAXPY( M, T*APOAQ,
938 $ A( 1, p ), 1,
939 $ A( 1, q ), 1 )
940 CALL DAXPY( M,
941 $ -CS*SN*AQOAP,
942 $ A( 1, q ), 1,
943 $ A( 1, p ), 1 )
944 WORK( p ) = WORK( p ) / CS
945 WORK( q ) = WORK( q )*CS
946 IF( RSVEC ) THEN
947 CALL DAXPY( MVL,
948 $ T*APOAQ, V( 1, p ),
949 $ 1, V( 1, q ), 1 )
950 CALL DAXPY( MVL,
951 $ -CS*SN*AQOAP,
952 $ V( 1, q ), 1,
953 $ V( 1, p ), 1 )
954 END IF
955 END IF
956 END IF
957 END IF
958 END IF
959 *
960 ELSE
961 * .. have to use modified Gram-Schmidt like transformation
962 CALL DCOPY( M, A( 1, p ), 1,
963 $ WORK( N+1 ), 1 )
964 CALL DLASCL( 'G', 0, 0, AAPP, ONE, M,
965 $ 1, WORK( N+1 ), LDA,
966 $ IERR )
967 CALL DLASCL( 'G', 0, 0, AAQQ, ONE, M,
968 $ 1, A( 1, q ), LDA, IERR )
969 TEMP1 = -AAPQ*WORK( p ) / WORK( q )
970 CALL DAXPY( M, TEMP1, WORK( N+1 ), 1,
971 $ A( 1, q ), 1 )
972 CALL DLASCL( 'G', 0, 0, ONE, AAQQ, M,
973 $ 1, A( 1, q ), LDA, IERR )
974 SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO,
975 $ ONE-AAPQ*AAPQ ) )
976 MXSINJ = DMAX1( MXSINJ, SFMIN )
977 END IF
978 * END IF ROTOK THEN ... ELSE
979 *
980 * In the case of cancellation in updating SVA(q), SVA(p)
981 * recompute SVA(q), SVA(p).
982 *
983 IF( ( SVA( q ) / AAQQ )**2.LE.ROOTEPS )
984 $ THEN
985 IF( ( AAQQ.LT.ROOTBIG ) .AND.
986 $ ( AAQQ.GT.ROOTSFMIN ) ) THEN
987 SVA( q ) = DNRM2( M, A( 1, q ), 1 )*
988 $ WORK( q )
989 ELSE
990 T = ZERO
991 AAQQ = ONE
992 CALL DLASSQ( M, A( 1, q ), 1, T,
993 $ AAQQ )
994 SVA( q ) = T*DSQRT( AAQQ )*WORK( q )
995 END IF
996 END IF
997 IF( ( AAPP / AAPP0 ).LE.ROOTEPS ) THEN
998 IF( ( AAPP.LT.ROOTBIG ) .AND.
999 $ ( AAPP.GT.ROOTSFMIN ) ) THEN
1000 AAPP = DNRM2( M, A( 1, p ), 1 )*
1001 $ WORK( p )
1002 ELSE
1003 T = ZERO
1004 AAPP = ONE
1005 CALL DLASSQ( M, A( 1, p ), 1, T,
1006 $ AAPP )
1007 AAPP = T*DSQRT( AAPP )*WORK( p )
1008 END IF
1009 SVA( p ) = AAPP
1010 END IF
1011 *
1012 ELSE
1013 * A(:,p) and A(:,q) already numerically orthogonal
1014 IF( ir1.EQ.0 )NOTROT = NOTROT + 1
1015 *[RTD] SKIPPED = SKIPPED + 1
1016 PSKIPPED = PSKIPPED + 1
1017 END IF
1018 ELSE
1019 * A(:,q) is zero column
1020 IF( ir1.EQ.0 )NOTROT = NOTROT + 1
1021 PSKIPPED = PSKIPPED + 1
1022 END IF
1023 *
1024 IF( ( i.LE.SWBAND ) .AND.
1025 $ ( PSKIPPED.GT.ROWSKIP ) ) THEN
1026 IF( ir1.EQ.0 )AAPP = -AAPP
1027 NOTROT = 0
1028 GO TO 2103
1029 END IF
1030 *
1031 2002 CONTINUE
1032 * END q-LOOP
1033 *
1034 2103 CONTINUE
1035 * bailed out of q-loop
1036 *
1037 SVA( p ) = AAPP
1038 *
1039 ELSE
1040 SVA( p ) = AAPP
1041 IF( ( ir1.EQ.0 ) .AND. ( AAPP.EQ.ZERO ) )
1042 $ NOTROT = NOTROT + MIN0( igl+KBL-1, N ) - p
1043 END IF
1044 *
1045 2001 CONTINUE
1046 * end of the p-loop
1047 * end of doing the block ( ibr, ibr )
1048 1002 CONTINUE
1049 * end of ir1-loop
1050 *
1051 * ... go to the off diagonal blocks
1052 *
1053 igl = ( ibr-1 )*KBL + 1
1054 *
1055 DO 2010 jbc = ibr + 1, NBL
1056 *
1057 jgl = ( jbc-1 )*KBL + 1
1058 *
1059 * doing the block at ( ibr, jbc )
1060 *
1061 IJBLSK = 0
1062 DO 2100 p = igl, MIN0( igl+KBL-1, N )
1063 *
1064 AAPP = SVA( p )
1065 IF( AAPP.GT.ZERO ) THEN
1066 *
1067 PSKIPPED = 0
1068 *
1069 DO 2200 q = jgl, MIN0( jgl+KBL-1, N )
1070 *
1071 AAQQ = SVA( q )
1072 IF( AAQQ.GT.ZERO ) THEN
1073 AAPP0 = AAPP
1074 *
1075 * .. M x 2 Jacobi SVD ..
1076 *
1077 * Safe Gram matrix computation
1078 *
1079 IF( AAQQ.GE.ONE ) THEN
1080 IF( AAPP.GE.AAQQ ) THEN
1081 ROTOK = ( SMALL*AAPP ).LE.AAQQ
1082 ELSE
1083 ROTOK = ( SMALL*AAQQ ).LE.AAPP
1084 END IF
1085 IF( AAPP.LT.( BIG / AAQQ ) ) THEN
1086 AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1,
1087 $ q ), 1 )*WORK( p )*WORK( q ) /
1088 $ AAQQ ) / AAPP
1089 ELSE
1090 CALL DCOPY( M, A( 1, p ), 1,
1091 $ WORK( N+1 ), 1 )
1092 CALL DLASCL( 'G', 0, 0, AAPP,
1093 $ WORK( p ), M, 1,
1094 $ WORK( N+1 ), LDA, IERR )
1095 AAPQ = DDOT( M, WORK( N+1 ), 1,
1096 $ A( 1, q ), 1 )*WORK( q ) / AAQQ
1097 END IF
1098 ELSE
1099 IF( AAPP.GE.AAQQ ) THEN
1100 ROTOK = AAPP.LE.( AAQQ / SMALL )
1101 ELSE
1102 ROTOK = AAQQ.LE.( AAPP / SMALL )
1103 END IF
1104 IF( AAPP.GT.( SMALL / AAQQ ) ) THEN
1105 AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1,
1106 $ q ), 1 )*WORK( p )*WORK( q ) /
1107 $ AAQQ ) / AAPP
1108 ELSE
1109 CALL DCOPY( M, A( 1, q ), 1,
1110 $ WORK( N+1 ), 1 )
1111 CALL DLASCL( 'G', 0, 0, AAQQ,
1112 $ WORK( q ), M, 1,
1113 $ WORK( N+1 ), LDA, IERR )
1114 AAPQ = DDOT( M, WORK( N+1 ), 1,
1115 $ A( 1, p ), 1 )*WORK( p ) / AAPP
1116 END IF
1117 END IF
1118 *
1119 MXAAPQ = DMAX1( MXAAPQ, DABS( AAPQ ) )
1120 *
1121 * TO rotate or NOT to rotate, THAT is the question ...
1122 *
1123 IF( DABS( AAPQ ).GT.TOL ) THEN
1124 NOTROT = 0
1125 *[RTD] ROTATED = ROTATED + 1
1126 PSKIPPED = 0
1127 ISWROT = ISWROT + 1
1128 *
1129 IF( ROTOK ) THEN
1130 *
1131 AQOAP = AAQQ / AAPP
1132 APOAQ = AAPP / AAQQ
1133 THETA = -HALF*DABS(AQOAP-APOAQ)/AAPQ
1134 IF( AAQQ.GT.AAPP0 )THETA = -THETA
1135 *
1136 IF( DABS( THETA ).GT.BIGTHETA ) THEN
1137 T = HALF / THETA
1138 FASTR( 3 ) = T*WORK( p ) / WORK( q )
1139 FASTR( 4 ) = -T*WORK( q ) /
1140 $ WORK( p )
1141 CALL DROTM( M, A( 1, p ), 1,
1142 $ A( 1, q ), 1, FASTR )
1143 IF( RSVEC )CALL DROTM( MVL,
1144 $ V( 1, p ), 1,
1145 $ V( 1, q ), 1,
1146 $ FASTR )
1147 SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO,
1148 $ ONE+T*APOAQ*AAPQ ) )
1149 AAPP = AAPP*DSQRT( DMAX1( ZERO,
1150 $ ONE-T*AQOAP*AAPQ ) )
1151 MXSINJ = DMAX1( MXSINJ, DABS( T ) )
1152 ELSE
1153 *
1154 * .. choose correct signum for THETA and rotate
1155 *
1156 THSIGN = -DSIGN( ONE, AAPQ )
1157 IF( AAQQ.GT.AAPP0 )THSIGN = -THSIGN
1158 T = ONE / ( THETA+THSIGN*
1159 $ DSQRT( ONE+THETA*THETA ) )
1160 CS = DSQRT( ONE / ( ONE+T*T ) )
1161 SN = T*CS
1162 MXSINJ = DMAX1( MXSINJ, DABS( SN ) )
1163 SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO,
1164 $ ONE+T*APOAQ*AAPQ ) )
1165 AAPP = AAPP*DSQRT( DMAX1( ZERO,
1166 $ ONE-T*AQOAP*AAPQ ) )
1167 *
1168 APOAQ = WORK( p ) / WORK( q )
1169 AQOAP = WORK( q ) / WORK( p )
1170 IF( WORK( p ).GE.ONE ) THEN
1171 *
1172 IF( WORK( q ).GE.ONE ) THEN
1173 FASTR( 3 ) = T*APOAQ
1174 FASTR( 4 ) = -T*AQOAP
1175 WORK( p ) = WORK( p )*CS
1176 WORK( q ) = WORK( q )*CS
1177 CALL DROTM( M, A( 1, p ), 1,
1178 $ A( 1, q ), 1,
1179 $ FASTR )
1180 IF( RSVEC )CALL DROTM( MVL,
1181 $ V( 1, p ), 1, V( 1, q ),
1182 $ 1, FASTR )
1183 ELSE
1184 CALL DAXPY( M, -T*AQOAP,
1185 $ A( 1, q ), 1,
1186 $ A( 1, p ), 1 )
1187 CALL DAXPY( M, CS*SN*APOAQ,
1188 $ A( 1, p ), 1,
1189 $ A( 1, q ), 1 )
1190 IF( RSVEC ) THEN
1191 CALL DAXPY( MVL, -T*AQOAP,
1192 $ V( 1, q ), 1,
1193 $ V( 1, p ), 1 )
1194 CALL DAXPY( MVL,
1195 $ CS*SN*APOAQ,
1196 $ V( 1, p ), 1,
1197 $ V( 1, q ), 1 )
1198 END IF
1199 WORK( p ) = WORK( p )*CS
1200 WORK( q ) = WORK( q ) / CS
1201 END IF
1202 ELSE
1203 IF( WORK( q ).GE.ONE ) THEN
1204 CALL DAXPY( M, T*APOAQ,
1205 $ A( 1, p ), 1,
1206 $ A( 1, q ), 1 )
1207 CALL DAXPY( M, -CS*SN*AQOAP,
1208 $ A( 1, q ), 1,
1209 $ A( 1, p ), 1 )
1210 IF( RSVEC ) THEN
1211 CALL DAXPY( MVL, T*APOAQ,
1212 $ V( 1, p ), 1,
1213 $ V( 1, q ), 1 )
1214 CALL DAXPY( MVL,
1215 $ -CS*SN*AQOAP,
1216 $ V( 1, q ), 1,
1217 $ V( 1, p ), 1 )
1218 END IF
1219 WORK( p ) = WORK( p ) / CS
1220 WORK( q ) = WORK( q )*CS
1221 ELSE
1222 IF( WORK( p ).GE.WORK( q ) )
1223 $ THEN
1224 CALL DAXPY( M, -T*AQOAP,
1225 $ A( 1, q ), 1,
1226 $ A( 1, p ), 1 )
1227 CALL DAXPY( M, CS*SN*APOAQ,
1228 $ A( 1, p ), 1,
1229 $ A( 1, q ), 1 )
1230 WORK( p ) = WORK( p )*CS
1231 WORK( q ) = WORK( q ) / CS
1232 IF( RSVEC ) THEN
1233 CALL DAXPY( MVL,
1234 $ -T*AQOAP,
1235 $ V( 1, q ), 1,
1236 $ V( 1, p ), 1 )
1237 CALL DAXPY( MVL,
1238 $ CS*SN*APOAQ,
1239 $ V( 1, p ), 1,
1240 $ V( 1, q ), 1 )
1241 END IF
1242 ELSE
1243 CALL DAXPY( M, T*APOAQ,
1244 $ A( 1, p ), 1,
1245 $ A( 1, q ), 1 )
1246 CALL DAXPY( M,
1247 $ -CS*SN*AQOAP,
1248 $ A( 1, q ), 1,
1249 $ A( 1, p ), 1 )
1250 WORK( p ) = WORK( p ) / CS
1251 WORK( q ) = WORK( q )*CS
1252 IF( RSVEC ) THEN
1253 CALL DAXPY( MVL,
1254 $ T*APOAQ, V( 1, p ),
1255 $ 1, V( 1, q ), 1 )
1256 CALL DAXPY( MVL,
1257 $ -CS*SN*AQOAP,
1258 $ V( 1, q ), 1,
1259 $ V( 1, p ), 1 )
1260 END IF
1261 END IF
1262 END IF
1263 END IF
1264 END IF
1265 *
1266 ELSE
1267 IF( AAPP.GT.AAQQ ) THEN
1268 CALL DCOPY( M, A( 1, p ), 1,
1269 $ WORK( N+1 ), 1 )
1270 CALL DLASCL( 'G', 0, 0, AAPP, ONE,
1271 $ M, 1, WORK( N+1 ), LDA,
1272 $ IERR )
1273 CALL DLASCL( 'G', 0, 0, AAQQ, ONE,
1274 $ M, 1, A( 1, q ), LDA,
1275 $ IERR )
1276 TEMP1 = -AAPQ*WORK( p ) / WORK( q )
1277 CALL DAXPY( M, TEMP1, WORK( N+1 ),
1278 $ 1, A( 1, q ), 1 )
1279 CALL DLASCL( 'G', 0, 0, ONE, AAQQ,
1280 $ M, 1, A( 1, q ), LDA,
1281 $ IERR )
1282 SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO,
1283 $ ONE-AAPQ*AAPQ ) )
1284 MXSINJ = DMAX1( MXSINJ, SFMIN )
1285 ELSE
1286 CALL DCOPY( M, A( 1, q ), 1,
1287 $ WORK( N+1 ), 1 )
1288 CALL DLASCL( 'G', 0, 0, AAQQ, ONE,
1289 $ M, 1, WORK( N+1 ), LDA,
1290 $ IERR )
1291 CALL DLASCL( 'G', 0, 0, AAPP, ONE,
1292 $ M, 1, A( 1, p ), LDA,
1293 $ IERR )
1294 TEMP1 = -AAPQ*WORK( q ) / WORK( p )
1295 CALL DAXPY( M, TEMP1, WORK( N+1 ),
1296 $ 1, A( 1, p ), 1 )
1297 CALL DLASCL( 'G', 0, 0, ONE, AAPP,
1298 $ M, 1, A( 1, p ), LDA,
1299 $ IERR )
1300 SVA( p ) = AAPP*DSQRT( DMAX1( ZERO,
1301 $ ONE-AAPQ*AAPQ ) )
1302 MXSINJ = DMAX1( MXSINJ, SFMIN )
1303 END IF
1304 END IF
1305 * END IF ROTOK THEN ... ELSE
1306 *
1307 * In the case of cancellation in updating SVA(q)
1308 * .. recompute SVA(q)
1309 IF( ( SVA( q ) / AAQQ )**2.LE.ROOTEPS )
1310 $ THEN
1311 IF( ( AAQQ.LT.ROOTBIG ) .AND.
1312 $ ( AAQQ.GT.ROOTSFMIN ) ) THEN
1313 SVA( q ) = DNRM2( M, A( 1, q ), 1 )*
1314 $ WORK( q )
1315 ELSE
1316 T = ZERO
1317 AAQQ = ONE
1318 CALL DLASSQ( M, A( 1, q ), 1, T,
1319 $ AAQQ )
1320 SVA( q ) = T*DSQRT( AAQQ )*WORK( q )
1321 END IF
1322 END IF
1323 IF( ( AAPP / AAPP0 )**2.LE.ROOTEPS ) THEN
1324 IF( ( AAPP.LT.ROOTBIG ) .AND.
1325 $ ( AAPP.GT.ROOTSFMIN ) ) THEN
1326 AAPP = DNRM2( M, A( 1, p ), 1 )*
1327 $ WORK( p )
1328 ELSE
1329 T = ZERO
1330 AAPP = ONE
1331 CALL DLASSQ( M, A( 1, p ), 1, T,
1332 $ AAPP )
1333 AAPP = T*DSQRT( AAPP )*WORK( p )
1334 END IF
1335 SVA( p ) = AAPP
1336 END IF
1337 * end of OK rotation
1338 ELSE
1339 NOTROT = NOTROT + 1
1340 *[RTD] SKIPPED = SKIPPED + 1
1341 PSKIPPED = PSKIPPED + 1
1342 IJBLSK = IJBLSK + 1
1343 END IF
1344 ELSE
1345 NOTROT = NOTROT + 1
1346 PSKIPPED = PSKIPPED + 1
1347 IJBLSK = IJBLSK + 1
1348 END IF
1349 *
1350 IF( ( i.LE.SWBAND ) .AND. ( IJBLSK.GE.BLSKIP ) )
1351 $ THEN
1352 SVA( p ) = AAPP
1353 NOTROT = 0
1354 GO TO 2011
1355 END IF
1356 IF( ( i.LE.SWBAND ) .AND.
1357 $ ( PSKIPPED.GT.ROWSKIP ) ) THEN
1358 AAPP = -AAPP
1359 NOTROT = 0
1360 GO TO 2203
1361 END IF
1362 *
1363 2200 CONTINUE
1364 * end of the q-loop
1365 2203 CONTINUE
1366 *
1367 SVA( p ) = AAPP
1368 *
1369 ELSE
1370 *
1371 IF( AAPP.EQ.ZERO )NOTROT = NOTROT +
1372 $ MIN0( jgl+KBL-1, N ) - jgl + 1
1373 IF( AAPP.LT.ZERO )NOTROT = 0
1374 *
1375 END IF
1376 *
1377 2100 CONTINUE
1378 * end of the p-loop
1379 2010 CONTINUE
1380 * end of the jbc-loop
1381 2011 CONTINUE
1382 *2011 bailed out of the jbc-loop
1383 DO 2012 p = igl, MIN0( igl+KBL-1, N )
1384 SVA( p ) = DABS( SVA( p ) )
1385 2012 CONTINUE
1386 ***
1387 2000 CONTINUE
1388 *2000 :: end of the ibr-loop
1389 *
1390 * .. update SVA(N)
1391 IF( ( SVA( N ).LT.ROOTBIG ) .AND. ( SVA( N ).GT.ROOTSFMIN ) )
1392 $ THEN
1393 SVA( N ) = DNRM2( M, A( 1, N ), 1 )*WORK( N )
1394 ELSE
1395 T = ZERO
1396 AAPP = ONE
1397 CALL DLASSQ( M, A( 1, N ), 1, T, AAPP )
1398 SVA( N ) = T*DSQRT( AAPP )*WORK( N )
1399 END IF
1400 *
1401 * Additional steering devices
1402 *
1403 IF( ( i.LT.SWBAND ) .AND. ( ( MXAAPQ.LE.ROOTTOL ) .OR.
1404 $ ( ISWROT.LE.N ) ) )SWBAND = i
1405 *
1406 IF( ( i.GT.SWBAND+1 ) .AND. ( MXAAPQ.LT.DSQRT( DBLE( N ) )*
1407 $ TOL ) .AND. ( DBLE( N )*MXAAPQ*MXSINJ.LT.TOL ) ) THEN
1408 GO TO 1994
1409 END IF
1410 *
1411 IF( NOTROT.GE.EMPTSW )GO TO 1994
1412 *
1413 1993 CONTINUE
1414 * end i=1:NSWEEP loop
1415 *
1416 * #:( Reaching this point means that the procedure has not converged.
1417 INFO = NSWEEP - 1
1418 GO TO 1995
1419 *
1420 1994 CONTINUE
1421 * #:) Reaching this point means numerical convergence after the i-th
1422 * sweep.
1423 *
1424 INFO = 0
1425 * #:) INFO = 0 confirms successful iterations.
1426 1995 CONTINUE
1427 *
1428 * Sort the singular values and find how many are above
1429 * the underflow threshold.
1430 *
1431 N2 = 0
1432 N4 = 0
1433 DO 5991 p = 1, N - 1
1434 q = IDAMAX( N-p+1, SVA( p ), 1 ) + p - 1
1435 IF( p.NE.q ) THEN
1436 TEMP1 = SVA( p )
1437 SVA( p ) = SVA( q )
1438 SVA( q ) = TEMP1
1439 TEMP1 = WORK( p )
1440 WORK( p ) = WORK( q )
1441 WORK( q ) = TEMP1
1442 CALL DSWAP( M, A( 1, p ), 1, A( 1, q ), 1 )
1443 IF( RSVEC )CALL DSWAP( MVL, V( 1, p ), 1, V( 1, q ), 1 )
1444 END IF
1445 IF( SVA( p ).NE.ZERO ) THEN
1446 N4 = N4 + 1
1447 IF( SVA( p )*SKL.GT.SFMIN )N2 = N2 + 1
1448 END IF
1449 5991 CONTINUE
1450 IF( SVA( N ).NE.ZERO ) THEN
1451 N4 = N4 + 1
1452 IF( SVA( N )*SKL.GT.SFMIN )N2 = N2 + 1
1453 END IF
1454 *
1455 * Normalize the left singular vectors.
1456 *
1457 IF( LSVEC .OR. UCTOL ) THEN
1458 DO 1998 p = 1, N2
1459 CALL DSCAL( M, WORK( p ) / SVA( p ), A( 1, p ), 1 )
1460 1998 CONTINUE
1461 END IF
1462 *
1463 * Scale the product of Jacobi rotations (assemble the fast rotations).
1464 *
1465 IF( RSVEC ) THEN
1466 IF( APPLV ) THEN
1467 DO 2398 p = 1, N
1468 CALL DSCAL( MVL, WORK( p ), V( 1, p ), 1 )
1469 2398 CONTINUE
1470 ELSE
1471 DO 2399 p = 1, N
1472 TEMP1 = ONE / DNRM2( MVL, V( 1, p ), 1 )
1473 CALL DSCAL( MVL, TEMP1, V( 1, p ), 1 )
1474 2399 CONTINUE
1475 END IF
1476 END IF
1477 *
1478 * Undo scaling, if necessary (and possible).
1479 IF( ( ( SKL.GT.ONE ) .AND. ( SVA( 1 ).LT.( BIG /
1480 $ SKL) ) ) .OR. ( ( SKL.LT.ONE ) .AND. ( SVA( N2 ).GT.
1481 $ ( SFMIN / SKL) ) ) ) THEN
1482 DO 2400 p = 1, N
1483 SVA( p ) = SKL*SVA( p )
1484 2400 CONTINUE
1485 SKL= ONE
1486 END IF
1487 *
1488 WORK( 1 ) = SKL
1489 * The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE
1490 * then some of the singular values may overflow or underflow and
1491 * the spectrum is given in this factored representation.
1492 *
1493 WORK( 2 ) = DBLE( N4 )
1494 * N4 is the number of computed nonzero singular values of A.
1495 *
1496 WORK( 3 ) = DBLE( N2 )
1497 * N2 is the number of singular values of A greater than SFMIN.
1498 * If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers
1499 * that may carry some information.
1500 *
1501 WORK( 4 ) = DBLE( i )
1502 * i is the index of the last sweep before declaring convergence.
1503 *
1504 WORK( 5 ) = MXAAPQ
1505 * MXAAPQ is the largest absolute value of scaled pivots in the
1506 * last sweep
1507 *
1508 WORK( 6 ) = MXSINJ
1509 * MXSINJ is the largest absolute value of the sines of Jacobi angles
1510 * in the last sweep
1511 *
1512 RETURN
1513 * ..
1514 * .. END OF DGESVJ
1515 * ..
1516 END
2 $ LDV, WORK, LWORK, INFO )
3 *
4 * -- LAPACK routine (version 3.3.1) --
5 *
6 * -- Contributed by Zlatko Drmac of the University of Zagreb and --
7 * -- Kresimir Veselic of the Fernuniversitaet Hagen --
8 * -- April 2011 --
9 *
10 * -- LAPACK is a software package provided by Univ. of Tennessee, --
11 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
12 *
13 * This routine is also part of SIGMA (version 1.23, October 23. 2008.)
14 * SIGMA is a library of algorithms for highly accurate algorithms for
15 * computation of SVD, PSVD, QSVD, (H,K)-SVD, and for solution of the
16 * eigenvalue problems Hx = lambda M x, H M x = lambda x with H, M > 0.
17 *
18 IMPLICIT NONE
19 * ..
20 * .. Scalar Arguments ..
21 INTEGER INFO, LDA, LDV, LWORK, M, MV, N
22 CHARACTER*1 JOBA, JOBU, JOBV
23 * ..
24 * .. Array Arguments ..
25 DOUBLE PRECISION A( LDA, * ), SVA( N ), V( LDV, * ),
26 $ WORK( LWORK )
27 * ..
28 *
29 * Purpose
30 * =======
31 *
32 * DGESVJ computes the singular value decomposition (SVD) of a real
33 * M-by-N matrix A, where M >= N. The SVD of A is written as
34 * [++] [xx] [x0] [xx]
35 * A = U * SIGMA * V^t, [++] = [xx] * [ox] * [xx]
36 * [++] [xx]
37 * where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal
38 * matrix, and V is an N-by-N orthogonal matrix. The diagonal elements
39 * of SIGMA are the singular values of A. The columns of U and V are the
40 * left and the right singular vectors of A, respectively.
41 *
42 * Further Details
43 * ~~~~~~~~~~~~~~~
44 * The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane
45 * rotations. The rotations are implemented as fast scaled rotations of
46 * Anda and Park [1]. In the case of underflow of the Jacobi angle, a
47 * modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses
48 * column interchanges of de Rijk [2]. The relative accuracy of the computed
49 * singular values and the accuracy of the computed singular vectors (in
50 * angle metric) is as guaranteed by the theory of Demmel and Veselic [3].
51 * The condition number that determines the accuracy in the full rank case
52 * is essentially min_{D=diag} kappa(A*D), where kappa(.) is the
53 * spectral condition number. The best performance of this Jacobi SVD
54 * procedure is achieved if used in an accelerated version of Drmac and
55 * Veselic [5,6], and it is the kernel routine in the SIGMA library [7].
56 * Some tunning parameters (marked with [TP]) are available for the
57 * implementer.
58 * The computational range for the nonzero singular values is the machine
59 * number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even
60 * denormalized singular values can be computed with the corresponding
61 * gradual loss of accurate digits.
62 *
63 * Contributors
64 * ~~~~~~~~~~~~
65 * Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
66 *
67 * References
68 * ~~~~~~~~~~
69 * [1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling.
70 * SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174.
71 * [2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the
72 * singular value decomposition on a vector computer.
73 * SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371.
74 * [3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR.
75 * [4] Z. Drmac: Implementation of Jacobi rotations for accurate singular
76 * value computation in floating point arithmetic.
77 * SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222.
78 * [5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
79 * SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
80 * LAPACK Working note 169.
81 * [6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
82 * SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
83 * LAPACK Working note 170.
84 * [7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
85 * QSVD, (H,K)-SVD computations.
86 * Department of Mathematics, University of Zagreb, 2008.
87 *
88 * Bugs, Examples and Comments
89 * ~~~~~~~~~~~~~~~~~~~~~~~~~~~
90 * Please report all bugs and send interesting test examples and comments to
91 * drmac@math.hr. Thank you.
92 *
93 * Arguments
94 * =========
95 *
96 * JOBA (input) CHARACTER* 1
97 * Specifies the structure of A.
98 * = 'L': The input matrix A is lower triangular;
99 * = 'U': The input matrix A is upper triangular;
100 * = 'G': The input matrix A is general M-by-N matrix, M >= N.
101 *
102 * JOBU (input) CHARACTER*1
103 * Specifies whether to compute the left singular vectors
104 * (columns of U):
105 * = 'U': The left singular vectors corresponding to the nonzero
106 * singular values are computed and returned in the leading
107 * columns of A. See more details in the description of A.
108 * The default numerical orthogonality threshold is set to
109 * approximately TOL=CTOL*EPS, CTOL=DSQRT(M), EPS=DLAMCH('E').
110 * = 'C': Analogous to JOBU='U', except that user can control the
111 * level of numerical orthogonality of the computed left
112 * singular vectors. TOL can be set to TOL = CTOL*EPS, where
113 * CTOL is given on input in the array WORK.
114 * No CTOL smaller than ONE is allowed. CTOL greater
115 * than 1 / EPS is meaningless. The option 'C'
116 * can be used if M*EPS is satisfactory orthogonality
117 * of the computed left singular vectors, so CTOL=M could
118 * save few sweeps of Jacobi rotations.
119 * See the descriptions of A and WORK(1).
120 * = 'N': The matrix U is not computed. However, see the
121 * description of A.
122 *
123 * JOBV (input) CHARACTER*1
124 * Specifies whether to compute the right singular vectors, that
125 * is, the matrix V:
126 * = 'V' : the matrix V is computed and returned in the array V
127 * = 'A' : the Jacobi rotations are applied to the MV-by-N
128 * array V. In other words, the right singular vector
129 * matrix V is not computed explicitly, instead it is
130 * applied to an MV-by-N matrix initially stored in the
131 * first MV rows of V.
132 * = 'N' : the matrix V is not computed and the array V is not
133 * referenced
134 *
135 * M (input) INTEGER
136 * The number of rows of the input matrix A. 1/DLAMCH('E') > M >= 0.
137 *
138 * N (input) INTEGER
139 * The number of columns of the input matrix A.
140 * M >= N >= 0.
141 *
142 * A (input/output) DOUBLE PRECISION array, dimension (LDA,N)
143 * On entry, the M-by-N matrix A.
144 * On exit :
145 * If JOBU .EQ. 'U' .OR. JOBU .EQ. 'C' :
146 * If INFO .EQ. 0 :
147 * RANKA orthonormal columns of U are returned in the
148 * leading RANKA columns of the array A. Here RANKA <= N
149 * is the number of computed singular values of A that are
150 * above the underflow threshold DLAMCH('S'). The singular
151 * vectors corresponding to underflowed or zero singular
152 * values are not computed. The value of RANKA is returned
153 * in the array WORK as RANKA=NINT(WORK(2)). Also see the
154 * descriptions of SVA and WORK. The computed columns of U
155 * are mutually numerically orthogonal up to approximately
156 * TOL=DSQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU.EQ.'C'),
157 * see the description of JOBU.
158 * If INFO .GT. 0 :
159 * the procedure DGESVJ did not converge in the given number
160 * of iterations (sweeps). In that case, the computed
161 * columns of U may not be orthogonal up to TOL. The output
162 * U (stored in A), SIGMA (given by the computed singular
163 * values in SVA(1:N)) and V is still a decomposition of the
164 * input matrix A in the sense that the residual
165 * ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small.
166 *
167 * If JOBU .EQ. 'N' :
168 * If INFO .EQ. 0 :
169 * Note that the left singular vectors are 'for free' in the
170 * one-sided Jacobi SVD algorithm. However, if only the
171 * singular values are needed, the level of numerical
172 * orthogonality of U is not an issue and iterations are
173 * stopped when the columns of the iterated matrix are
174 * numerically orthogonal up to approximately M*EPS. Thus,
175 * on exit, A contains the columns of U scaled with the
176 * corresponding singular values.
177 * If INFO .GT. 0 :
178 * the procedure DGESVJ did not converge in the given number
179 * of iterations (sweeps).
180 *
181 * LDA (input) INTEGER
182 * The leading dimension of the array A. LDA >= max(1,M).
183 *
184 * SVA (workspace/output) DOUBLE PRECISION array, dimension (N)
185 * On exit :
186 * If INFO .EQ. 0 :
187 * depending on the value SCALE = WORK(1), we have:
188 * If SCALE .EQ. ONE :
189 * SVA(1:N) contains the computed singular values of A.
190 * During the computation SVA contains the Euclidean column
191 * norms of the iterated matrices in the array A.
192 * If SCALE .NE. ONE :
193 * The singular values of A are SCALE*SVA(1:N), and this
194 * factored representation is due to the fact that some of the
195 * singular values of A might underflow or overflow.
196 * If INFO .GT. 0 :
197 * the procedure DGESVJ did not converge in the given number of
198 * iterations (sweeps) and SCALE*SVA(1:N) may not be accurate.
199 *
200 * MV (input) INTEGER
201 * If JOBV .EQ. 'A', then the product of Jacobi rotations in DGESVJ
202 * is applied to the first MV rows of V. See the description of JOBV.
203 *
204 * V (input/output) DOUBLE PRECISION array, dimension (LDV,N)
205 * If JOBV = 'V', then V contains on exit the N-by-N matrix of
206 * the right singular vectors;
207 * If JOBV = 'A', then V contains the product of the computed right
208 * singular vector matrix and the initial matrix in
209 * the array V.
210 * If JOBV = 'N', then V is not referenced.
211 *
212 * LDV (input) INTEGER
213 * The leading dimension of the array V, LDV .GE. 1.
214 * If JOBV .EQ. 'V', then LDV .GE. max(1,N).
215 * If JOBV .EQ. 'A', then LDV .GE. max(1,MV) .
216 *
217 * WORK (input/workspace/output) DOUBLE PRECISION array, dimension max(4,M+N).
218 * On entry :
219 * If JOBU .EQ. 'C' :
220 * WORK(1) = CTOL, where CTOL defines the threshold for convergence.
221 * The process stops if all columns of A are mutually
222 * orthogonal up to CTOL*EPS, EPS=DLAMCH('E').
223 * It is required that CTOL >= ONE, i.e. it is not
224 * allowed to force the routine to obtain orthogonality
225 * below EPS.
226 * On exit :
227 * WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N)
228 * are the computed singular values of A.
229 * (See description of SVA().)
230 * WORK(2) = NINT(WORK(2)) is the number of the computed nonzero
231 * singular values.
232 * WORK(3) = NINT(WORK(3)) is the number of the computed singular
233 * values that are larger than the underflow threshold.
234 * WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi
235 * rotations needed for numerical convergence.
236 * WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep.
237 * This is useful information in cases when DGESVJ did
238 * not converge, as it can be used to estimate whether
239 * the output is stil useful and for post festum analysis.
240 * WORK(6) = the largest absolute value over all sines of the
241 * Jacobi rotation angles in the last sweep. It can be
242 * useful for a post festum analysis.
243 *
244 * LWORK (input) INTEGER
245 * length of WORK, WORK >= MAX(6,M+N)
246 *
247 * INFO (output) INTEGER
248 * = 0 : successful exit.
249 * < 0 : if INFO = -i, then the i-th argument had an illegal value
250 * > 0 : DGESVJ did not converge in the maximal allowed number (30)
251 * of sweeps. The output may still be useful. See the
252 * description of WORK.
253 *
254 * =====================================================================
255 *
256 * .. Local Parameters ..
257 DOUBLE PRECISION ZERO, HALF, ONE, TWO
258 PARAMETER ( ZERO = 0.0D0, HALF = 0.5D0, ONE = 1.0D0,
259 $ TWO = 2.0D0 )
260 INTEGER NSWEEP
261 PARAMETER ( NSWEEP = 30 )
262 * ..
263 * .. Local Scalars ..
264 DOUBLE PRECISION AAPP, AAPP0, AAPQ, AAQQ, APOAQ, AQOAP, BIG,
265 $ BIGTHETA, CS, CTOL, EPSLN, LARGE, MXAAPQ,
266 $ MXSINJ, ROOTBIG, ROOTEPS, ROOTSFMIN, ROOTTOL,
267 $ SKL, SFMIN, SMALL, SN, T, TEMP1, THETA,
268 $ THSIGN, TOL
269 INTEGER BLSKIP, EMPTSW, i, ibr, IERR, igl, IJBLSK, ir1,
270 $ ISWROT, jbc, jgl, KBL, LKAHEAD, MVL, N2, N34,
271 $ N4, NBL, NOTROT, p, PSKIPPED, q, ROWSKIP,
272 $ SWBAND
273 LOGICAL APPLV, GOSCALE, LOWER, LSVEC, NOSCALE, ROTOK,
274 $ RSVEC, UCTOL, UPPER
275 * ..
276 * .. Local Arrays ..
277 DOUBLE PRECISION FASTR( 5 )
278 * ..
279 * .. Intrinsic Functions ..
280 INTRINSIC DABS, DMAX1, DMIN1, DBLE, MIN0, DSIGN, DSQRT
281 * ..
282 * .. External Functions ..
283 * ..
284 * from BLAS
285 DOUBLE PRECISION DDOT, DNRM2
286 EXTERNAL DDOT, DNRM2
287 INTEGER IDAMAX
288 EXTERNAL IDAMAX
289 * from LAPACK
290 DOUBLE PRECISION DLAMCH
291 EXTERNAL DLAMCH
292 LOGICAL LSAME
293 EXTERNAL LSAME
294 * ..
295 * .. External Subroutines ..
296 * ..
297 * from BLAS
298 EXTERNAL DAXPY, DCOPY, DROTM, DSCAL, DSWAP
299 * from LAPACK
300 EXTERNAL DLASCL, DLASET, DLASSQ, XERBLA
301 *
302 EXTERNAL DGSVJ0, DGSVJ1
303 * ..
304 * .. Executable Statements ..
305 *
306 * Test the input arguments
307 *
308 LSVEC = LSAME( JOBU, 'U' )
309 UCTOL = LSAME( JOBU, 'C' )
310 RSVEC = LSAME( JOBV, 'V' )
311 APPLV = LSAME( JOBV, 'A' )
312 UPPER = LSAME( JOBA, 'U' )
313 LOWER = LSAME( JOBA, 'L' )
314 *
315 IF( .NOT.( UPPER .OR. LOWER .OR. LSAME( JOBA, 'G' ) ) ) THEN
316 INFO = -1
317 ELSE IF( .NOT.( LSVEC .OR. UCTOL .OR. LSAME( JOBU, 'N' ) ) ) THEN
318 INFO = -2
319 ELSE IF( .NOT.( RSVEC .OR. APPLV .OR. LSAME( JOBV, 'N' ) ) ) THEN
320 INFO = -3
321 ELSE IF( M.LT.0 ) THEN
322 INFO = -4
323 ELSE IF( ( N.LT.0 ) .OR. ( N.GT.M ) ) THEN
324 INFO = -5
325 ELSE IF( LDA.LT.M ) THEN
326 INFO = -7
327 ELSE IF( MV.LT.0 ) THEN
328 INFO = -9
329 ELSE IF( ( RSVEC .AND. ( LDV.LT.N ) ) .OR.
330 $ ( APPLV .AND. ( LDV.LT.MV ) ) ) THEN
331 INFO = -11
332 ELSE IF( UCTOL .AND. ( WORK( 1 ).LE.ONE ) ) THEN
333 INFO = -12
334 ELSE IF( LWORK.LT.MAX0( M+N, 6 ) ) THEN
335 INFO = -13
336 ELSE
337 INFO = 0
338 END IF
339 *
340 * #:(
341 IF( INFO.NE.0 ) THEN
342 CALL XERBLA( 'DGESVJ', -INFO )
343 RETURN
344 END IF
345 *
346 * #:) Quick return for void matrix
347 *
348 IF( ( M.EQ.0 ) .OR. ( N.EQ.0 ) )RETURN
349 *
350 * Set numerical parameters
351 * The stopping criterion for Jacobi rotations is
352 *
353 * max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS
354 *
355 * where EPS is the round-off and CTOL is defined as follows:
356 *
357 IF( UCTOL ) THEN
358 * ... user controlled
359 CTOL = WORK( 1 )
360 ELSE
361 * ... default
362 IF( LSVEC .OR. RSVEC .OR. APPLV ) THEN
363 CTOL = DSQRT( DBLE( M ) )
364 ELSE
365 CTOL = DBLE( M )
366 END IF
367 END IF
368 * ... and the machine dependent parameters are
369 *[!] (Make sure that DLAMCH() works properly on the target machine.)
370 *
371 EPSLN = DLAMCH( 'Epsilon' )
372 ROOTEPS = DSQRT( EPSLN )
373 SFMIN = DLAMCH( 'SafeMinimum' )
374 ROOTSFMIN = DSQRT( SFMIN )
375 SMALL = SFMIN / EPSLN
376 BIG = DLAMCH( 'Overflow' )
377 * BIG = ONE / SFMIN
378 ROOTBIG = ONE / ROOTSFMIN
379 LARGE = BIG / DSQRT( DBLE( M*N ) )
380 BIGTHETA = ONE / ROOTEPS
381 *
382 TOL = CTOL*EPSLN
383 ROOTTOL = DSQRT( TOL )
384 *
385 IF( DBLE( M )*EPSLN.GE.ONE ) THEN
386 INFO = -4
387 CALL XERBLA( 'DGESVJ', -INFO )
388 RETURN
389 END IF
390 *
391 * Initialize the right singular vector matrix.
392 *
393 IF( RSVEC ) THEN
394 MVL = N
395 CALL DLASET( 'A', MVL, N, ZERO, ONE, V, LDV )
396 ELSE IF( APPLV ) THEN
397 MVL = MV
398 END IF
399 RSVEC = RSVEC .OR. APPLV
400 *
401 * Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N )
402 *(!) If necessary, scale A to protect the largest singular value
403 * from overflow. It is possible that saving the largest singular
404 * value destroys the information about the small ones.
405 * This initial scaling is almost minimal in the sense that the
406 * goal is to make sure that no column norm overflows, and that
407 * DSQRT(N)*max_i SVA(i) does not overflow. If INFinite entries
408 * in A are detected, the procedure returns with INFO=-6.
409 *
410 SKL= ONE / DSQRT( DBLE( M )*DBLE( N ) )
411 NOSCALE = .TRUE.
412 GOSCALE = .TRUE.
413 *
414 IF( LOWER ) THEN
415 * the input matrix is M-by-N lower triangular (trapezoidal)
416 DO 1874 p = 1, N
417 AAPP = ZERO
418 AAQQ = ONE
419 CALL DLASSQ( M-p+1, A( p, p ), 1, AAPP, AAQQ )
420 IF( AAPP.GT.BIG ) THEN
421 INFO = -6
422 CALL XERBLA( 'DGESVJ', -INFO )
423 RETURN
424 END IF
425 AAQQ = DSQRT( AAQQ )
426 IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN
427 SVA( p ) = AAPP*AAQQ
428 ELSE
429 NOSCALE = .FALSE.
430 SVA( p ) = AAPP*( AAQQ*SKL)
431 IF( GOSCALE ) THEN
432 GOSCALE = .FALSE.
433 DO 1873 q = 1, p - 1
434 SVA( q ) = SVA( q )*SKL
435 1873 CONTINUE
436 END IF
437 END IF
438 1874 CONTINUE
439 ELSE IF( UPPER ) THEN
440 * the input matrix is M-by-N upper triangular (trapezoidal)
441 DO 2874 p = 1, N
442 AAPP = ZERO
443 AAQQ = ONE
444 CALL DLASSQ( p, A( 1, p ), 1, AAPP, AAQQ )
445 IF( AAPP.GT.BIG ) THEN
446 INFO = -6
447 CALL XERBLA( 'DGESVJ', -INFO )
448 RETURN
449 END IF
450 AAQQ = DSQRT( AAQQ )
451 IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN
452 SVA( p ) = AAPP*AAQQ
453 ELSE
454 NOSCALE = .FALSE.
455 SVA( p ) = AAPP*( AAQQ*SKL)
456 IF( GOSCALE ) THEN
457 GOSCALE = .FALSE.
458 DO 2873 q = 1, p - 1
459 SVA( q ) = SVA( q )*SKL
460 2873 CONTINUE
461 END IF
462 END IF
463 2874 CONTINUE
464 ELSE
465 * the input matrix is M-by-N general dense
466 DO 3874 p = 1, N
467 AAPP = ZERO
468 AAQQ = ONE
469 CALL DLASSQ( M, A( 1, p ), 1, AAPP, AAQQ )
470 IF( AAPP.GT.BIG ) THEN
471 INFO = -6
472 CALL XERBLA( 'DGESVJ', -INFO )
473 RETURN
474 END IF
475 AAQQ = DSQRT( AAQQ )
476 IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN
477 SVA( p ) = AAPP*AAQQ
478 ELSE
479 NOSCALE = .FALSE.
480 SVA( p ) = AAPP*( AAQQ*SKL)
481 IF( GOSCALE ) THEN
482 GOSCALE = .FALSE.
483 DO 3873 q = 1, p - 1
484 SVA( q ) = SVA( q )*SKL
485 3873 CONTINUE
486 END IF
487 END IF
488 3874 CONTINUE
489 END IF
490 *
491 IF( NOSCALE )SKL= ONE
492 *
493 * Move the smaller part of the spectrum from the underflow threshold
494 *(!) Start by determining the position of the nonzero entries of the
495 * array SVA() relative to ( SFMIN, BIG ).
496 *
497 AAPP = ZERO
498 AAQQ = BIG
499 DO 4781 p = 1, N
500 IF( SVA( p ).NE.ZERO )AAQQ = DMIN1( AAQQ, SVA( p ) )
501 AAPP = DMAX1( AAPP, SVA( p ) )
502 4781 CONTINUE
503 *
504 * #:) Quick return for zero matrix
505 *
506 IF( AAPP.EQ.ZERO ) THEN
507 IF( LSVEC )CALL DLASET( 'G', M, N, ZERO, ONE, A, LDA )
508 WORK( 1 ) = ONE
509 WORK( 2 ) = ZERO
510 WORK( 3 ) = ZERO
511 WORK( 4 ) = ZERO
512 WORK( 5 ) = ZERO
513 WORK( 6 ) = ZERO
514 RETURN
515 END IF
516 *
517 * #:) Quick return for one-column matrix
518 *
519 IF( N.EQ.1 ) THEN
520 IF( LSVEC )CALL DLASCL( 'G', 0, 0, SVA( 1 ), SKL, M, 1,
521 $ A( 1, 1 ), LDA, IERR )
522 WORK( 1 ) = ONE / SKL
523 IF( SVA( 1 ).GE.SFMIN ) THEN
524 WORK( 2 ) = ONE
525 ELSE
526 WORK( 2 ) = ZERO
527 END IF
528 WORK( 3 ) = ZERO
529 WORK( 4 ) = ZERO
530 WORK( 5 ) = ZERO
531 WORK( 6 ) = ZERO
532 RETURN
533 END IF
534 *
535 * Protect small singular values from underflow, and try to
536 * avoid underflows/overflows in computing Jacobi rotations.
537 *
538 SN = DSQRT( SFMIN / EPSLN )
539 TEMP1 = DSQRT( BIG / DBLE( N ) )
540 IF( ( AAPP.LE.SN ) .OR. ( AAQQ.GE.TEMP1 ) .OR.
541 $ ( ( SN.LE.AAQQ ) .AND. ( AAPP.LE.TEMP1 ) ) ) THEN
542 TEMP1 = DMIN1( BIG, TEMP1 / AAPP )
543 * AAQQ = AAQQ*TEMP1
544 * AAPP = AAPP*TEMP1
545 ELSE IF( ( AAQQ.LE.SN ) .AND. ( AAPP.LE.TEMP1 ) ) THEN
546 TEMP1 = DMIN1( SN / AAQQ, BIG / ( AAPP*DSQRT( DBLE( N ) ) ) )
547 * AAQQ = AAQQ*TEMP1
548 * AAPP = AAPP*TEMP1
549 ELSE IF( ( AAQQ.GE.SN ) .AND. ( AAPP.GE.TEMP1 ) ) THEN
550 TEMP1 = DMAX1( SN / AAQQ, TEMP1 / AAPP )
551 * AAQQ = AAQQ*TEMP1
552 * AAPP = AAPP*TEMP1
553 ELSE IF( ( AAQQ.LE.SN ) .AND. ( AAPP.GE.TEMP1 ) ) THEN
554 TEMP1 = DMIN1( SN / AAQQ, BIG / ( DSQRT( DBLE( N ) )*AAPP ) )
555 * AAQQ = AAQQ*TEMP1
556 * AAPP = AAPP*TEMP1
557 ELSE
558 TEMP1 = ONE
559 END IF
560 *
561 * Scale, if necessary
562 *
563 IF( TEMP1.NE.ONE ) THEN
564 CALL DLASCL( 'G', 0, 0, ONE, TEMP1, N, 1, SVA, N, IERR )
565 END IF
566 SKL= TEMP1*SKL
567 IF( SKL.NE.ONE ) THEN
568 CALL DLASCL( JOBA, 0, 0, ONE, SKL, M, N, A, LDA, IERR )
569 SKL= ONE / SKL
570 END IF
571 *
572 * Row-cyclic Jacobi SVD algorithm with column pivoting
573 *
574 EMPTSW = ( N*( N-1 ) ) / 2
575 NOTROT = 0
576 FASTR( 1 ) = ZERO
577 *
578 * A is represented in factored form A = A * diag(WORK), where diag(WORK)
579 * is initialized to identity. WORK is updated during fast scaled
580 * rotations.
581 *
582 DO 1868 q = 1, N
583 WORK( q ) = ONE
584 1868 CONTINUE
585 *
586 *
587 SWBAND = 3
588 *[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective
589 * if DGESVJ is used as a computational routine in the preconditioned
590 * Jacobi SVD algorithm DGESVJ. For sweeps i=1:SWBAND the procedure
591 * works on pivots inside a band-like region around the diagonal.
592 * The boundaries are determined dynamically, based on the number of
593 * pivots above a threshold.
594 *
595 KBL = MIN0( 8, N )
596 *[TP] KBL is a tuning parameter that defines the tile size in the
597 * tiling of the p-q loops of pivot pairs. In general, an optimal
598 * value of KBL depends on the matrix dimensions and on the
599 * parameters of the computer's memory.
600 *
601 NBL = N / KBL
602 IF( ( NBL*KBL ).NE.N )NBL = NBL + 1
603 *
604 BLSKIP = KBL**2
605 *[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL.
606 *
607 ROWSKIP = MIN0( 5, KBL )
608 *[TP] ROWSKIP is a tuning parameter.
609 *
610 LKAHEAD = 1
611 *[TP] LKAHEAD is a tuning parameter.
612 *
613 * Quasi block transformations, using the lower (upper) triangular
614 * structure of the input matrix. The quasi-block-cycling usually
615 * invokes cubic convergence. Big part of this cycle is done inside
616 * canonical subspaces of dimensions less than M.
617 *
618 IF( ( LOWER .OR. UPPER ) .AND. ( N.GT.MAX0( 64, 4*KBL ) ) ) THEN
619 *[TP] The number of partition levels and the actual partition are
620 * tuning parameters.
621 N4 = N / 4
622 N2 = N / 2
623 N34 = 3*N4
624 IF( APPLV ) THEN
625 q = 0
626 ELSE
627 q = 1
628 END IF
629 *
630 IF( LOWER ) THEN
631 *
632 * This works very well on lower triangular matrices, in particular
633 * in the framework of the preconditioned Jacobi SVD (xGEJSV).
634 * The idea is simple:
635 * [+ 0 0 0] Note that Jacobi transformations of [0 0]
636 * [+ + 0 0] [0 0]
637 * [+ + x 0] actually work on [x 0] [x 0]
638 * [+ + x x] [x x]. [x x]
639 *
640 CALL DGSVJ0( JOBV, M-N34, N-N34, A( N34+1, N34+1 ), LDA,
641 $ WORK( N34+1 ), SVA( N34+1 ), MVL,
642 $ V( N34*q+1, N34+1 ), LDV, EPSLN, SFMIN, TOL,
643 $ 2, WORK( N+1 ), LWORK-N, IERR )
644 *
645 CALL DGSVJ0( JOBV, M-N2, N34-N2, A( N2+1, N2+1 ), LDA,
646 $ WORK( N2+1 ), SVA( N2+1 ), MVL,
647 $ V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 2,
648 $ WORK( N+1 ), LWORK-N, IERR )
649 *
650 CALL DGSVJ1( JOBV, M-N2, N-N2, N4, A( N2+1, N2+1 ), LDA,
651 $ WORK( N2+1 ), SVA( N2+1 ), MVL,
652 $ V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 1,
653 $ WORK( N+1 ), LWORK-N, IERR )
654 *
655 CALL DGSVJ0( JOBV, M-N4, N2-N4, A( N4+1, N4+1 ), LDA,
656 $ WORK( N4+1 ), SVA( N4+1 ), MVL,
657 $ V( N4*q+1, N4+1 ), LDV, EPSLN, SFMIN, TOL, 1,
658 $ WORK( N+1 ), LWORK-N, IERR )
659 *
660 CALL DGSVJ0( JOBV, M, N4, A, LDA, WORK, SVA, MVL, V, LDV,
661 $ EPSLN, SFMIN, TOL, 1, WORK( N+1 ), LWORK-N,
662 $ IERR )
663 *
664 CALL DGSVJ1( JOBV, M, N2, N4, A, LDA, WORK, SVA, MVL, V,
665 $ LDV, EPSLN, SFMIN, TOL, 1, WORK( N+1 ),
666 $ LWORK-N, IERR )
667 *
668 *
669 ELSE IF( UPPER ) THEN
670 *
671 *
672 CALL DGSVJ0( JOBV, N4, N4, A, LDA, WORK, SVA, MVL, V, LDV,
673 $ EPSLN, SFMIN, TOL, 2, WORK( N+1 ), LWORK-N,
674 $ IERR )
675 *
676 CALL DGSVJ0( JOBV, N2, N4, A( 1, N4+1 ), LDA, WORK( N4+1 ),
677 $ SVA( N4+1 ), MVL, V( N4*q+1, N4+1 ), LDV,
678 $ EPSLN, SFMIN, TOL, 1, WORK( N+1 ), LWORK-N,
679 $ IERR )
680 *
681 CALL DGSVJ1( JOBV, N2, N2, N4, A, LDA, WORK, SVA, MVL, V,
682 $ LDV, EPSLN, SFMIN, TOL, 1, WORK( N+1 ),
683 $ LWORK-N, IERR )
684 *
685 CALL DGSVJ0( JOBV, N2+N4, N4, A( 1, N2+1 ), LDA,
686 $ WORK( N2+1 ), SVA( N2+1 ), MVL,
687 $ V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 1,
688 $ WORK( N+1 ), LWORK-N, IERR )
689
690 END IF
691 *
692 END IF
693 *
694 * .. Row-cyclic pivot strategy with de Rijk's pivoting ..
695 *
696 DO 1993 i = 1, NSWEEP
697 *
698 * .. go go go ...
699 *
700 MXAAPQ = ZERO
701 MXSINJ = ZERO
702 ISWROT = 0
703 *
704 NOTROT = 0
705 PSKIPPED = 0
706 *
707 * Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs
708 * 1 <= p < q <= N. This is the first step toward a blocked implementation
709 * of the rotations. New implementation, based on block transformations,
710 * is under development.
711 *
712 DO 2000 ibr = 1, NBL
713 *
714 igl = ( ibr-1 )*KBL + 1
715 *
716 DO 1002 ir1 = 0, MIN0( LKAHEAD, NBL-ibr )
717 *
718 igl = igl + ir1*KBL
719 *
720 DO 2001 p = igl, MIN0( igl+KBL-1, N-1 )
721 *
722 * .. de Rijk's pivoting
723 *
724 q = IDAMAX( N-p+1, SVA( p ), 1 ) + p - 1
725 IF( p.NE.q ) THEN
726 CALL DSWAP( M, A( 1, p ), 1, A( 1, q ), 1 )
727 IF( RSVEC )CALL DSWAP( MVL, V( 1, p ), 1,
728 $ V( 1, q ), 1 )
729 TEMP1 = SVA( p )
730 SVA( p ) = SVA( q )
731 SVA( q ) = TEMP1
732 TEMP1 = WORK( p )
733 WORK( p ) = WORK( q )
734 WORK( q ) = TEMP1
735 END IF
736 *
737 IF( ir1.EQ.0 ) THEN
738 *
739 * Column norms are periodically updated by explicit
740 * norm computation.
741 * Caveat:
742 * Unfortunately, some BLAS implementations compute DNRM2(M,A(1,p),1)
743 * as DSQRT(DDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to
744 * overflow for ||A(:,p)||_2 > DSQRT(overflow_threshold), and to
745 * underflow for ||A(:,p)||_2 < DSQRT(underflow_threshold).
746 * Hence, DNRM2 cannot be trusted, not even in the case when
747 * the true norm is far from the under(over)flow boundaries.
748 * If properly implemented DNRM2 is available, the IF-THEN-ELSE
749 * below should read "AAPP = DNRM2( M, A(1,p), 1 ) * WORK(p)".
750 *
751 IF( ( SVA( p ).LT.ROOTBIG ) .AND.
752 $ ( SVA( p ).GT.ROOTSFMIN ) ) THEN
753 SVA( p ) = DNRM2( M, A( 1, p ), 1 )*WORK( p )
754 ELSE
755 TEMP1 = ZERO
756 AAPP = ONE
757 CALL DLASSQ( M, A( 1, p ), 1, TEMP1, AAPP )
758 SVA( p ) = TEMP1*DSQRT( AAPP )*WORK( p )
759 END IF
760 AAPP = SVA( p )
761 ELSE
762 AAPP = SVA( p )
763 END IF
764 *
765 IF( AAPP.GT.ZERO ) THEN
766 *
767 PSKIPPED = 0
768 *
769 DO 2002 q = p + 1, MIN0( igl+KBL-1, N )
770 *
771 AAQQ = SVA( q )
772 *
773 IF( AAQQ.GT.ZERO ) THEN
774 *
775 AAPP0 = AAPP
776 IF( AAQQ.GE.ONE ) THEN
777 ROTOK = ( SMALL*AAPP ).LE.AAQQ
778 IF( AAPP.LT.( BIG / AAQQ ) ) THEN
779 AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1,
780 $ q ), 1 )*WORK( p )*WORK( q ) /
781 $ AAQQ ) / AAPP
782 ELSE
783 CALL DCOPY( M, A( 1, p ), 1,
784 $ WORK( N+1 ), 1 )
785 CALL DLASCL( 'G', 0, 0, AAPP,
786 $ WORK( p ), M, 1,
787 $ WORK( N+1 ), LDA, IERR )
788 AAPQ = DDOT( M, WORK( N+1 ), 1,
789 $ A( 1, q ), 1 )*WORK( q ) / AAQQ
790 END IF
791 ELSE
792 ROTOK = AAPP.LE.( AAQQ / SMALL )
793 IF( AAPP.GT.( SMALL / AAQQ ) ) THEN
794 AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1,
795 $ q ), 1 )*WORK( p )*WORK( q ) /
796 $ AAQQ ) / AAPP
797 ELSE
798 CALL DCOPY( M, A( 1, q ), 1,
799 $ WORK( N+1 ), 1 )
800 CALL DLASCL( 'G', 0, 0, AAQQ,
801 $ WORK( q ), M, 1,
802 $ WORK( N+1 ), LDA, IERR )
803 AAPQ = DDOT( M, WORK( N+1 ), 1,
804 $ A( 1, p ), 1 )*WORK( p ) / AAPP
805 END IF
806 END IF
807 *
808 MXAAPQ = DMAX1( MXAAPQ, DABS( AAPQ ) )
809 *
810 * TO rotate or NOT to rotate, THAT is the question ...
811 *
812 IF( DABS( AAPQ ).GT.TOL ) THEN
813 *
814 * .. rotate
815 *[RTD] ROTATED = ROTATED + ONE
816 *
817 IF( ir1.EQ.0 ) THEN
818 NOTROT = 0
819 PSKIPPED = 0
820 ISWROT = ISWROT + 1
821 END IF
822 *
823 IF( ROTOK ) THEN
824 *
825 AQOAP = AAQQ / AAPP
826 APOAQ = AAPP / AAQQ
827 THETA = -HALF*DABS(AQOAP-APOAQ)/AAPQ
828 *
829 IF( DABS( THETA ).GT.BIGTHETA ) THEN
830 *
831 T = HALF / THETA
832 FASTR( 3 ) = T*WORK( p ) / WORK( q )
833 FASTR( 4 ) = -T*WORK( q ) /
834 $ WORK( p )
835 CALL DROTM( M, A( 1, p ), 1,
836 $ A( 1, q ), 1, FASTR )
837 IF( RSVEC )CALL DROTM( MVL,
838 $ V( 1, p ), 1,
839 $ V( 1, q ), 1,
840 $ FASTR )
841 SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO,
842 $ ONE+T*APOAQ*AAPQ ) )
843 AAPP = AAPP*DSQRT( DMAX1( ZERO,
844 $ ONE-T*AQOAP*AAPQ ) )
845 MXSINJ = DMAX1( MXSINJ, DABS( T ) )
846 *
847 ELSE
848 *
849 * .. choose correct signum for THETA and rotate
850 *
851 THSIGN = -DSIGN( ONE, AAPQ )
852 T = ONE / ( THETA+THSIGN*
853 $ DSQRT( ONE+THETA*THETA ) )
854 CS = DSQRT( ONE / ( ONE+T*T ) )
855 SN = T*CS
856 *
857 MXSINJ = DMAX1( MXSINJ, DABS( SN ) )
858 SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO,
859 $ ONE+T*APOAQ*AAPQ ) )
860 AAPP = AAPP*DSQRT( DMAX1( ZERO,
861 $ ONE-T*AQOAP*AAPQ ) )
862 *
863 APOAQ = WORK( p ) / WORK( q )
864 AQOAP = WORK( q ) / WORK( p )
865 IF( WORK( p ).GE.ONE ) THEN
866 IF( WORK( q ).GE.ONE ) THEN
867 FASTR( 3 ) = T*APOAQ
868 FASTR( 4 ) = -T*AQOAP
869 WORK( p ) = WORK( p )*CS
870 WORK( q ) = WORK( q )*CS
871 CALL DROTM( M, A( 1, p ), 1,
872 $ A( 1, q ), 1,
873 $ FASTR )
874 IF( RSVEC )CALL DROTM( MVL,
875 $ V( 1, p ), 1, V( 1, q ),
876 $ 1, FASTR )
877 ELSE
878 CALL DAXPY( M, -T*AQOAP,
879 $ A( 1, q ), 1,
880 $ A( 1, p ), 1 )
881 CALL DAXPY( M, CS*SN*APOAQ,
882 $ A( 1, p ), 1,
883 $ A( 1, q ), 1 )
884 WORK( p ) = WORK( p )*CS
885 WORK( q ) = WORK( q ) / CS
886 IF( RSVEC ) THEN
887 CALL DAXPY( MVL, -T*AQOAP,
888 $ V( 1, q ), 1,
889 $ V( 1, p ), 1 )
890 CALL DAXPY( MVL,
891 $ CS*SN*APOAQ,
892 $ V( 1, p ), 1,
893 $ V( 1, q ), 1 )
894 END IF
895 END IF
896 ELSE
897 IF( WORK( q ).GE.ONE ) THEN
898 CALL DAXPY( M, T*APOAQ,
899 $ A( 1, p ), 1,
900 $ A( 1, q ), 1 )
901 CALL DAXPY( M, -CS*SN*AQOAP,
902 $ A( 1, q ), 1,
903 $ A( 1, p ), 1 )
904 WORK( p ) = WORK( p ) / CS
905 WORK( q ) = WORK( q )*CS
906 IF( RSVEC ) THEN
907 CALL DAXPY( MVL, T*APOAQ,
908 $ V( 1, p ), 1,
909 $ V( 1, q ), 1 )
910 CALL DAXPY( MVL,
911 $ -CS*SN*AQOAP,
912 $ V( 1, q ), 1,
913 $ V( 1, p ), 1 )
914 END IF
915 ELSE
916 IF( WORK( p ).GE.WORK( q ) )
917 $ THEN
918 CALL DAXPY( M, -T*AQOAP,
919 $ A( 1, q ), 1,
920 $ A( 1, p ), 1 )
921 CALL DAXPY( M, CS*SN*APOAQ,
922 $ A( 1, p ), 1,
923 $ A( 1, q ), 1 )
924 WORK( p ) = WORK( p )*CS
925 WORK( q ) = WORK( q ) / CS
926 IF( RSVEC ) THEN
927 CALL DAXPY( MVL,
928 $ -T*AQOAP,
929 $ V( 1, q ), 1,
930 $ V( 1, p ), 1 )
931 CALL DAXPY( MVL,
932 $ CS*SN*APOAQ,
933 $ V( 1, p ), 1,
934 $ V( 1, q ), 1 )
935 END IF
936 ELSE
937 CALL DAXPY( M, T*APOAQ,
938 $ A( 1, p ), 1,
939 $ A( 1, q ), 1 )
940 CALL DAXPY( M,
941 $ -CS*SN*AQOAP,
942 $ A( 1, q ), 1,
943 $ A( 1, p ), 1 )
944 WORK( p ) = WORK( p ) / CS
945 WORK( q ) = WORK( q )*CS
946 IF( RSVEC ) THEN
947 CALL DAXPY( MVL,
948 $ T*APOAQ, V( 1, p ),
949 $ 1, V( 1, q ), 1 )
950 CALL DAXPY( MVL,
951 $ -CS*SN*AQOAP,
952 $ V( 1, q ), 1,
953 $ V( 1, p ), 1 )
954 END IF
955 END IF
956 END IF
957 END IF
958 END IF
959 *
960 ELSE
961 * .. have to use modified Gram-Schmidt like transformation
962 CALL DCOPY( M, A( 1, p ), 1,
963 $ WORK( N+1 ), 1 )
964 CALL DLASCL( 'G', 0, 0, AAPP, ONE, M,
965 $ 1, WORK( N+1 ), LDA,
966 $ IERR )
967 CALL DLASCL( 'G', 0, 0, AAQQ, ONE, M,
968 $ 1, A( 1, q ), LDA, IERR )
969 TEMP1 = -AAPQ*WORK( p ) / WORK( q )
970 CALL DAXPY( M, TEMP1, WORK( N+1 ), 1,
971 $ A( 1, q ), 1 )
972 CALL DLASCL( 'G', 0, 0, ONE, AAQQ, M,
973 $ 1, A( 1, q ), LDA, IERR )
974 SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO,
975 $ ONE-AAPQ*AAPQ ) )
976 MXSINJ = DMAX1( MXSINJ, SFMIN )
977 END IF
978 * END IF ROTOK THEN ... ELSE
979 *
980 * In the case of cancellation in updating SVA(q), SVA(p)
981 * recompute SVA(q), SVA(p).
982 *
983 IF( ( SVA( q ) / AAQQ )**2.LE.ROOTEPS )
984 $ THEN
985 IF( ( AAQQ.LT.ROOTBIG ) .AND.
986 $ ( AAQQ.GT.ROOTSFMIN ) ) THEN
987 SVA( q ) = DNRM2( M, A( 1, q ), 1 )*
988 $ WORK( q )
989 ELSE
990 T = ZERO
991 AAQQ = ONE
992 CALL DLASSQ( M, A( 1, q ), 1, T,
993 $ AAQQ )
994 SVA( q ) = T*DSQRT( AAQQ )*WORK( q )
995 END IF
996 END IF
997 IF( ( AAPP / AAPP0 ).LE.ROOTEPS ) THEN
998 IF( ( AAPP.LT.ROOTBIG ) .AND.
999 $ ( AAPP.GT.ROOTSFMIN ) ) THEN
1000 AAPP = DNRM2( M, A( 1, p ), 1 )*
1001 $ WORK( p )
1002 ELSE
1003 T = ZERO
1004 AAPP = ONE
1005 CALL DLASSQ( M, A( 1, p ), 1, T,
1006 $ AAPP )
1007 AAPP = T*DSQRT( AAPP )*WORK( p )
1008 END IF
1009 SVA( p ) = AAPP
1010 END IF
1011 *
1012 ELSE
1013 * A(:,p) and A(:,q) already numerically orthogonal
1014 IF( ir1.EQ.0 )NOTROT = NOTROT + 1
1015 *[RTD] SKIPPED = SKIPPED + 1
1016 PSKIPPED = PSKIPPED + 1
1017 END IF
1018 ELSE
1019 * A(:,q) is zero column
1020 IF( ir1.EQ.0 )NOTROT = NOTROT + 1
1021 PSKIPPED = PSKIPPED + 1
1022 END IF
1023 *
1024 IF( ( i.LE.SWBAND ) .AND.
1025 $ ( PSKIPPED.GT.ROWSKIP ) ) THEN
1026 IF( ir1.EQ.0 )AAPP = -AAPP
1027 NOTROT = 0
1028 GO TO 2103
1029 END IF
1030 *
1031 2002 CONTINUE
1032 * END q-LOOP
1033 *
1034 2103 CONTINUE
1035 * bailed out of q-loop
1036 *
1037 SVA( p ) = AAPP
1038 *
1039 ELSE
1040 SVA( p ) = AAPP
1041 IF( ( ir1.EQ.0 ) .AND. ( AAPP.EQ.ZERO ) )
1042 $ NOTROT = NOTROT + MIN0( igl+KBL-1, N ) - p
1043 END IF
1044 *
1045 2001 CONTINUE
1046 * end of the p-loop
1047 * end of doing the block ( ibr, ibr )
1048 1002 CONTINUE
1049 * end of ir1-loop
1050 *
1051 * ... go to the off diagonal blocks
1052 *
1053 igl = ( ibr-1 )*KBL + 1
1054 *
1055 DO 2010 jbc = ibr + 1, NBL
1056 *
1057 jgl = ( jbc-1 )*KBL + 1
1058 *
1059 * doing the block at ( ibr, jbc )
1060 *
1061 IJBLSK = 0
1062 DO 2100 p = igl, MIN0( igl+KBL-1, N )
1063 *
1064 AAPP = SVA( p )
1065 IF( AAPP.GT.ZERO ) THEN
1066 *
1067 PSKIPPED = 0
1068 *
1069 DO 2200 q = jgl, MIN0( jgl+KBL-1, N )
1070 *
1071 AAQQ = SVA( q )
1072 IF( AAQQ.GT.ZERO ) THEN
1073 AAPP0 = AAPP
1074 *
1075 * .. M x 2 Jacobi SVD ..
1076 *
1077 * Safe Gram matrix computation
1078 *
1079 IF( AAQQ.GE.ONE ) THEN
1080 IF( AAPP.GE.AAQQ ) THEN
1081 ROTOK = ( SMALL*AAPP ).LE.AAQQ
1082 ELSE
1083 ROTOK = ( SMALL*AAQQ ).LE.AAPP
1084 END IF
1085 IF( AAPP.LT.( BIG / AAQQ ) ) THEN
1086 AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1,
1087 $ q ), 1 )*WORK( p )*WORK( q ) /
1088 $ AAQQ ) / AAPP
1089 ELSE
1090 CALL DCOPY( M, A( 1, p ), 1,
1091 $ WORK( N+1 ), 1 )
1092 CALL DLASCL( 'G', 0, 0, AAPP,
1093 $ WORK( p ), M, 1,
1094 $ WORK( N+1 ), LDA, IERR )
1095 AAPQ = DDOT( M, WORK( N+1 ), 1,
1096 $ A( 1, q ), 1 )*WORK( q ) / AAQQ
1097 END IF
1098 ELSE
1099 IF( AAPP.GE.AAQQ ) THEN
1100 ROTOK = AAPP.LE.( AAQQ / SMALL )
1101 ELSE
1102 ROTOK = AAQQ.LE.( AAPP / SMALL )
1103 END IF
1104 IF( AAPP.GT.( SMALL / AAQQ ) ) THEN
1105 AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1,
1106 $ q ), 1 )*WORK( p )*WORK( q ) /
1107 $ AAQQ ) / AAPP
1108 ELSE
1109 CALL DCOPY( M, A( 1, q ), 1,
1110 $ WORK( N+1 ), 1 )
1111 CALL DLASCL( 'G', 0, 0, AAQQ,
1112 $ WORK( q ), M, 1,
1113 $ WORK( N+1 ), LDA, IERR )
1114 AAPQ = DDOT( M, WORK( N+1 ), 1,
1115 $ A( 1, p ), 1 )*WORK( p ) / AAPP
1116 END IF
1117 END IF
1118 *
1119 MXAAPQ = DMAX1( MXAAPQ, DABS( AAPQ ) )
1120 *
1121 * TO rotate or NOT to rotate, THAT is the question ...
1122 *
1123 IF( DABS( AAPQ ).GT.TOL ) THEN
1124 NOTROT = 0
1125 *[RTD] ROTATED = ROTATED + 1
1126 PSKIPPED = 0
1127 ISWROT = ISWROT + 1
1128 *
1129 IF( ROTOK ) THEN
1130 *
1131 AQOAP = AAQQ / AAPP
1132 APOAQ = AAPP / AAQQ
1133 THETA = -HALF*DABS(AQOAP-APOAQ)/AAPQ
1134 IF( AAQQ.GT.AAPP0 )THETA = -THETA
1135 *
1136 IF( DABS( THETA ).GT.BIGTHETA ) THEN
1137 T = HALF / THETA
1138 FASTR( 3 ) = T*WORK( p ) / WORK( q )
1139 FASTR( 4 ) = -T*WORK( q ) /
1140 $ WORK( p )
1141 CALL DROTM( M, A( 1, p ), 1,
1142 $ A( 1, q ), 1, FASTR )
1143 IF( RSVEC )CALL DROTM( MVL,
1144 $ V( 1, p ), 1,
1145 $ V( 1, q ), 1,
1146 $ FASTR )
1147 SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO,
1148 $ ONE+T*APOAQ*AAPQ ) )
1149 AAPP = AAPP*DSQRT( DMAX1( ZERO,
1150 $ ONE-T*AQOAP*AAPQ ) )
1151 MXSINJ = DMAX1( MXSINJ, DABS( T ) )
1152 ELSE
1153 *
1154 * .. choose correct signum for THETA and rotate
1155 *
1156 THSIGN = -DSIGN( ONE, AAPQ )
1157 IF( AAQQ.GT.AAPP0 )THSIGN = -THSIGN
1158 T = ONE / ( THETA+THSIGN*
1159 $ DSQRT( ONE+THETA*THETA ) )
1160 CS = DSQRT( ONE / ( ONE+T*T ) )
1161 SN = T*CS
1162 MXSINJ = DMAX1( MXSINJ, DABS( SN ) )
1163 SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO,
1164 $ ONE+T*APOAQ*AAPQ ) )
1165 AAPP = AAPP*DSQRT( DMAX1( ZERO,
1166 $ ONE-T*AQOAP*AAPQ ) )
1167 *
1168 APOAQ = WORK( p ) / WORK( q )
1169 AQOAP = WORK( q ) / WORK( p )
1170 IF( WORK( p ).GE.ONE ) THEN
1171 *
1172 IF( WORK( q ).GE.ONE ) THEN
1173 FASTR( 3 ) = T*APOAQ
1174 FASTR( 4 ) = -T*AQOAP
1175 WORK( p ) = WORK( p )*CS
1176 WORK( q ) = WORK( q )*CS
1177 CALL DROTM( M, A( 1, p ), 1,
1178 $ A( 1, q ), 1,
1179 $ FASTR )
1180 IF( RSVEC )CALL DROTM( MVL,
1181 $ V( 1, p ), 1, V( 1, q ),
1182 $ 1, FASTR )
1183 ELSE
1184 CALL DAXPY( M, -T*AQOAP,
1185 $ A( 1, q ), 1,
1186 $ A( 1, p ), 1 )
1187 CALL DAXPY( M, CS*SN*APOAQ,
1188 $ A( 1, p ), 1,
1189 $ A( 1, q ), 1 )
1190 IF( RSVEC ) THEN
1191 CALL DAXPY( MVL, -T*AQOAP,
1192 $ V( 1, q ), 1,
1193 $ V( 1, p ), 1 )
1194 CALL DAXPY( MVL,
1195 $ CS*SN*APOAQ,
1196 $ V( 1, p ), 1,
1197 $ V( 1, q ), 1 )
1198 END IF
1199 WORK( p ) = WORK( p )*CS
1200 WORK( q ) = WORK( q ) / CS
1201 END IF
1202 ELSE
1203 IF( WORK( q ).GE.ONE ) THEN
1204 CALL DAXPY( M, T*APOAQ,
1205 $ A( 1, p ), 1,
1206 $ A( 1, q ), 1 )
1207 CALL DAXPY( M, -CS*SN*AQOAP,
1208 $ A( 1, q ), 1,
1209 $ A( 1, p ), 1 )
1210 IF( RSVEC ) THEN
1211 CALL DAXPY( MVL, T*APOAQ,
1212 $ V( 1, p ), 1,
1213 $ V( 1, q ), 1 )
1214 CALL DAXPY( MVL,
1215 $ -CS*SN*AQOAP,
1216 $ V( 1, q ), 1,
1217 $ V( 1, p ), 1 )
1218 END IF
1219 WORK( p ) = WORK( p ) / CS
1220 WORK( q ) = WORK( q )*CS
1221 ELSE
1222 IF( WORK( p ).GE.WORK( q ) )
1223 $ THEN
1224 CALL DAXPY( M, -T*AQOAP,
1225 $ A( 1, q ), 1,
1226 $ A( 1, p ), 1 )
1227 CALL DAXPY( M, CS*SN*APOAQ,
1228 $ A( 1, p ), 1,
1229 $ A( 1, q ), 1 )
1230 WORK( p ) = WORK( p )*CS
1231 WORK( q ) = WORK( q ) / CS
1232 IF( RSVEC ) THEN
1233 CALL DAXPY( MVL,
1234 $ -T*AQOAP,
1235 $ V( 1, q ), 1,
1236 $ V( 1, p ), 1 )
1237 CALL DAXPY( MVL,
1238 $ CS*SN*APOAQ,
1239 $ V( 1, p ), 1,
1240 $ V( 1, q ), 1 )
1241 END IF
1242 ELSE
1243 CALL DAXPY( M, T*APOAQ,
1244 $ A( 1, p ), 1,
1245 $ A( 1, q ), 1 )
1246 CALL DAXPY( M,
1247 $ -CS*SN*AQOAP,
1248 $ A( 1, q ), 1,
1249 $ A( 1, p ), 1 )
1250 WORK( p ) = WORK( p ) / CS
1251 WORK( q ) = WORK( q )*CS
1252 IF( RSVEC ) THEN
1253 CALL DAXPY( MVL,
1254 $ T*APOAQ, V( 1, p ),
1255 $ 1, V( 1, q ), 1 )
1256 CALL DAXPY( MVL,
1257 $ -CS*SN*AQOAP,
1258 $ V( 1, q ), 1,
1259 $ V( 1, p ), 1 )
1260 END IF
1261 END IF
1262 END IF
1263 END IF
1264 END IF
1265 *
1266 ELSE
1267 IF( AAPP.GT.AAQQ ) THEN
1268 CALL DCOPY( M, A( 1, p ), 1,
1269 $ WORK( N+1 ), 1 )
1270 CALL DLASCL( 'G', 0, 0, AAPP, ONE,
1271 $ M, 1, WORK( N+1 ), LDA,
1272 $ IERR )
1273 CALL DLASCL( 'G', 0, 0, AAQQ, ONE,
1274 $ M, 1, A( 1, q ), LDA,
1275 $ IERR )
1276 TEMP1 = -AAPQ*WORK( p ) / WORK( q )
1277 CALL DAXPY( M, TEMP1, WORK( N+1 ),
1278 $ 1, A( 1, q ), 1 )
1279 CALL DLASCL( 'G', 0, 0, ONE, AAQQ,
1280 $ M, 1, A( 1, q ), LDA,
1281 $ IERR )
1282 SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO,
1283 $ ONE-AAPQ*AAPQ ) )
1284 MXSINJ = DMAX1( MXSINJ, SFMIN )
1285 ELSE
1286 CALL DCOPY( M, A( 1, q ), 1,
1287 $ WORK( N+1 ), 1 )
1288 CALL DLASCL( 'G', 0, 0, AAQQ, ONE,
1289 $ M, 1, WORK( N+1 ), LDA,
1290 $ IERR )
1291 CALL DLASCL( 'G', 0, 0, AAPP, ONE,
1292 $ M, 1, A( 1, p ), LDA,
1293 $ IERR )
1294 TEMP1 = -AAPQ*WORK( q ) / WORK( p )
1295 CALL DAXPY( M, TEMP1, WORK( N+1 ),
1296 $ 1, A( 1, p ), 1 )
1297 CALL DLASCL( 'G', 0, 0, ONE, AAPP,
1298 $ M, 1, A( 1, p ), LDA,
1299 $ IERR )
1300 SVA( p ) = AAPP*DSQRT( DMAX1( ZERO,
1301 $ ONE-AAPQ*AAPQ ) )
1302 MXSINJ = DMAX1( MXSINJ, SFMIN )
1303 END IF
1304 END IF
1305 * END IF ROTOK THEN ... ELSE
1306 *
1307 * In the case of cancellation in updating SVA(q)
1308 * .. recompute SVA(q)
1309 IF( ( SVA( q ) / AAQQ )**2.LE.ROOTEPS )
1310 $ THEN
1311 IF( ( AAQQ.LT.ROOTBIG ) .AND.
1312 $ ( AAQQ.GT.ROOTSFMIN ) ) THEN
1313 SVA( q ) = DNRM2( M, A( 1, q ), 1 )*
1314 $ WORK( q )
1315 ELSE
1316 T = ZERO
1317 AAQQ = ONE
1318 CALL DLASSQ( M, A( 1, q ), 1, T,
1319 $ AAQQ )
1320 SVA( q ) = T*DSQRT( AAQQ )*WORK( q )
1321 END IF
1322 END IF
1323 IF( ( AAPP / AAPP0 )**2.LE.ROOTEPS ) THEN
1324 IF( ( AAPP.LT.ROOTBIG ) .AND.
1325 $ ( AAPP.GT.ROOTSFMIN ) ) THEN
1326 AAPP = DNRM2( M, A( 1, p ), 1 )*
1327 $ WORK( p )
1328 ELSE
1329 T = ZERO
1330 AAPP = ONE
1331 CALL DLASSQ( M, A( 1, p ), 1, T,
1332 $ AAPP )
1333 AAPP = T*DSQRT( AAPP )*WORK( p )
1334 END IF
1335 SVA( p ) = AAPP
1336 END IF
1337 * end of OK rotation
1338 ELSE
1339 NOTROT = NOTROT + 1
1340 *[RTD] SKIPPED = SKIPPED + 1
1341 PSKIPPED = PSKIPPED + 1
1342 IJBLSK = IJBLSK + 1
1343 END IF
1344 ELSE
1345 NOTROT = NOTROT + 1
1346 PSKIPPED = PSKIPPED + 1
1347 IJBLSK = IJBLSK + 1
1348 END IF
1349 *
1350 IF( ( i.LE.SWBAND ) .AND. ( IJBLSK.GE.BLSKIP ) )
1351 $ THEN
1352 SVA( p ) = AAPP
1353 NOTROT = 0
1354 GO TO 2011
1355 END IF
1356 IF( ( i.LE.SWBAND ) .AND.
1357 $ ( PSKIPPED.GT.ROWSKIP ) ) THEN
1358 AAPP = -AAPP
1359 NOTROT = 0
1360 GO TO 2203
1361 END IF
1362 *
1363 2200 CONTINUE
1364 * end of the q-loop
1365 2203 CONTINUE
1366 *
1367 SVA( p ) = AAPP
1368 *
1369 ELSE
1370 *
1371 IF( AAPP.EQ.ZERO )NOTROT = NOTROT +
1372 $ MIN0( jgl+KBL-1, N ) - jgl + 1
1373 IF( AAPP.LT.ZERO )NOTROT = 0
1374 *
1375 END IF
1376 *
1377 2100 CONTINUE
1378 * end of the p-loop
1379 2010 CONTINUE
1380 * end of the jbc-loop
1381 2011 CONTINUE
1382 *2011 bailed out of the jbc-loop
1383 DO 2012 p = igl, MIN0( igl+KBL-1, N )
1384 SVA( p ) = DABS( SVA( p ) )
1385 2012 CONTINUE
1386 ***
1387 2000 CONTINUE
1388 *2000 :: end of the ibr-loop
1389 *
1390 * .. update SVA(N)
1391 IF( ( SVA( N ).LT.ROOTBIG ) .AND. ( SVA( N ).GT.ROOTSFMIN ) )
1392 $ THEN
1393 SVA( N ) = DNRM2( M, A( 1, N ), 1 )*WORK( N )
1394 ELSE
1395 T = ZERO
1396 AAPP = ONE
1397 CALL DLASSQ( M, A( 1, N ), 1, T, AAPP )
1398 SVA( N ) = T*DSQRT( AAPP )*WORK( N )
1399 END IF
1400 *
1401 * Additional steering devices
1402 *
1403 IF( ( i.LT.SWBAND ) .AND. ( ( MXAAPQ.LE.ROOTTOL ) .OR.
1404 $ ( ISWROT.LE.N ) ) )SWBAND = i
1405 *
1406 IF( ( i.GT.SWBAND+1 ) .AND. ( MXAAPQ.LT.DSQRT( DBLE( N ) )*
1407 $ TOL ) .AND. ( DBLE( N )*MXAAPQ*MXSINJ.LT.TOL ) ) THEN
1408 GO TO 1994
1409 END IF
1410 *
1411 IF( NOTROT.GE.EMPTSW )GO TO 1994
1412 *
1413 1993 CONTINUE
1414 * end i=1:NSWEEP loop
1415 *
1416 * #:( Reaching this point means that the procedure has not converged.
1417 INFO = NSWEEP - 1
1418 GO TO 1995
1419 *
1420 1994 CONTINUE
1421 * #:) Reaching this point means numerical convergence after the i-th
1422 * sweep.
1423 *
1424 INFO = 0
1425 * #:) INFO = 0 confirms successful iterations.
1426 1995 CONTINUE
1427 *
1428 * Sort the singular values and find how many are above
1429 * the underflow threshold.
1430 *
1431 N2 = 0
1432 N4 = 0
1433 DO 5991 p = 1, N - 1
1434 q = IDAMAX( N-p+1, SVA( p ), 1 ) + p - 1
1435 IF( p.NE.q ) THEN
1436 TEMP1 = SVA( p )
1437 SVA( p ) = SVA( q )
1438 SVA( q ) = TEMP1
1439 TEMP1 = WORK( p )
1440 WORK( p ) = WORK( q )
1441 WORK( q ) = TEMP1
1442 CALL DSWAP( M, A( 1, p ), 1, A( 1, q ), 1 )
1443 IF( RSVEC )CALL DSWAP( MVL, V( 1, p ), 1, V( 1, q ), 1 )
1444 END IF
1445 IF( SVA( p ).NE.ZERO ) THEN
1446 N4 = N4 + 1
1447 IF( SVA( p )*SKL.GT.SFMIN )N2 = N2 + 1
1448 END IF
1449 5991 CONTINUE
1450 IF( SVA( N ).NE.ZERO ) THEN
1451 N4 = N4 + 1
1452 IF( SVA( N )*SKL.GT.SFMIN )N2 = N2 + 1
1453 END IF
1454 *
1455 * Normalize the left singular vectors.
1456 *
1457 IF( LSVEC .OR. UCTOL ) THEN
1458 DO 1998 p = 1, N2
1459 CALL DSCAL( M, WORK( p ) / SVA( p ), A( 1, p ), 1 )
1460 1998 CONTINUE
1461 END IF
1462 *
1463 * Scale the product of Jacobi rotations (assemble the fast rotations).
1464 *
1465 IF( RSVEC ) THEN
1466 IF( APPLV ) THEN
1467 DO 2398 p = 1, N
1468 CALL DSCAL( MVL, WORK( p ), V( 1, p ), 1 )
1469 2398 CONTINUE
1470 ELSE
1471 DO 2399 p = 1, N
1472 TEMP1 = ONE / DNRM2( MVL, V( 1, p ), 1 )
1473 CALL DSCAL( MVL, TEMP1, V( 1, p ), 1 )
1474 2399 CONTINUE
1475 END IF
1476 END IF
1477 *
1478 * Undo scaling, if necessary (and possible).
1479 IF( ( ( SKL.GT.ONE ) .AND. ( SVA( 1 ).LT.( BIG /
1480 $ SKL) ) ) .OR. ( ( SKL.LT.ONE ) .AND. ( SVA( N2 ).GT.
1481 $ ( SFMIN / SKL) ) ) ) THEN
1482 DO 2400 p = 1, N
1483 SVA( p ) = SKL*SVA( p )
1484 2400 CONTINUE
1485 SKL= ONE
1486 END IF
1487 *
1488 WORK( 1 ) = SKL
1489 * The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE
1490 * then some of the singular values may overflow or underflow and
1491 * the spectrum is given in this factored representation.
1492 *
1493 WORK( 2 ) = DBLE( N4 )
1494 * N4 is the number of computed nonzero singular values of A.
1495 *
1496 WORK( 3 ) = DBLE( N2 )
1497 * N2 is the number of singular values of A greater than SFMIN.
1498 * If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers
1499 * that may carry some information.
1500 *
1501 WORK( 4 ) = DBLE( i )
1502 * i is the index of the last sweep before declaring convergence.
1503 *
1504 WORK( 5 ) = MXAAPQ
1505 * MXAAPQ is the largest absolute value of scaled pivots in the
1506 * last sweep
1507 *
1508 WORK( 6 ) = MXSINJ
1509 * MXSINJ is the largest absolute value of the sines of Jacobi angles
1510 * in the last sweep
1511 *
1512 RETURN
1513 * ..
1514 * .. END OF DGESVJ
1515 * ..
1516 END