matrixops.pd (PDL-2.074) | : | matrixops.pd (PDL-2.075) | ||
---|---|---|---|---|

skipping to change at line 62 | skipping to change at line 62 | |||

matrix itself, counting rightwards and downwards from the upper left | matrix itself, counting rightwards and downwards from the upper left | |||

corner. This means that if you print a PDL that contains a matrix, | corner. This means that if you print a PDL that contains a matrix, | |||

the matrix appears correctly on the screen, but if you index a matrix | the matrix appears correctly on the screen, but if you index a matrix | |||

element, you use the indices in the reverse order that you would in a | element, you use the indices in the reverse order that you would in a | |||

math textbook. If you prefer your matrices indexed in (row, column) | math textbook. If you prefer your matrices indexed in (row, column) | |||

order, you can try using the L<PDL::Matrix> object, which | order, you can try using the L<PDL::Matrix> object, which | |||

includes an implicit exchange of the first two dimensions but should | includes an implicit exchange of the first two dimensions but should | |||

be compatible with most of these matrix operations. TIMTOWDTI.) | be compatible with most of these matrix operations. TIMTOWDTI.) | |||

Matrices, row vectors, and column vectors can be multiplied with the 'x' | Matrices, row vectors, and column vectors can be multiplied with the 'x' | |||

operator (which is, of course, threadable): | operator (which is, of course, broadcastable): | |||

$m3 = $m1 x $m2; | $m3 = $m1 x $m2; | |||

$col_vec2 = $m1 x $col_vec1; | $col_vec2 = $m1 x $col_vec1; | |||

$row_vec2 = $row_vec1 x $m1; | $row_vec2 = $row_vec1 x $m1; | |||

$scalar = $row_vec x $col_vec; | $scalar = $row_vec x $col_vec; | |||

Because of the (column,row) addressing order, 1-D PDLs are treated as | Because of the (column,row) addressing order, 1-D PDLs are treated as | |||

_row_ vectors; if you want a _column_ vector you must add a dummy dimension: | _row_ vectors; if you want a _column_ vector you must add a dummy dimension: | |||

$rowvec = pdl(1,2); # row vector | $rowvec = pdl(1,2); # row vector | |||

$colvec = $rowvec->slice('*1'); # 1x2 column vector | $colvec = $rowvec->slice('*1'); # 1x2 column vector | |||

$matrix = pdl([[3,4],[6,2]]); # 2x2 matrix | $matrix = pdl([[3,4],[6,2]]); # 2x2 matrix | |||

$rowvec2 = $rowvec x $matrix; # right-multiplication by matrix | $rowvec2 = $rowvec x $matrix; # right-multiplication by matrix | |||

$colvec = $matrix x $colvec; # left-multiplication by matrix | $colvec = $matrix x $colvec; # left-multiplication by matrix | |||

$m2 = $matrix x $rowvec; # Throws an error | $m2 = $matrix x $rowvec; # Throws an error | |||

Implicit threading works correctly with most matrix operations, but | Implicit broadcasting works correctly with most matrix operations, but | |||

you must be extra careful that you understand the dimensionality. In | you must be extra careful that you understand the dimensionality. In | |||

particular, matrix multiplication and other matrix ops need nx1 PDLs | particular, matrix multiplication and other matrix ops need nx1 PDLs | |||

as row vectors and 1xn PDLs as column vectors. In most cases you must | as row vectors and 1xn PDLs as column vectors. In most cases you must | |||

explicitly include the trailing 'x1' dimension in order to get the expected | explicitly include the trailing 'x1' dimension in order to get the expected | |||

results when you thread over multiple row vectors. | results when you broadcast over multiple row vectors. | |||

When threading over matrices, it's very easy to get confused about | When broadcasting over matrices, it's very easy to get confused about | |||

which dimension goes where. It is useful to include comments with | which dimension goes where. It is useful to include comments with | |||

every expression, explaining what you think each dimension means: | every expression, explaining what you think each dimension means: | |||

$x = xvals(360)*3.14159/180; # (angle) | $x = xvals(360)*3.14159/180; # (angle) | |||

$rot = cat(cat(cos($x),sin($x)), # rotmat: (col,row,angle) | $rot = cat(cat(cos($x),sin($x)), # rotmat: (col,row,angle) | |||

cat(-sin($x),cos($x))); | cat(-sin($x),cos($x))); | |||

=head1 ACKNOWLEDGEMENTS | =head1 ACKNOWLEDGEMENTS | |||

MatrixOps includes algorithms and pre-existing code from several | MatrixOps includes algorithms and pre-existing code from several | |||

skipping to change at line 360 | skipping to change at line 360 | |||

=for sig | =for sig | |||

Signature: (a(m,m)) | Signature: (a(m,m)) | |||

=for usage | =for usage | |||

$det = determinant($x); | $det = determinant($x); | |||

=for ref | =for ref | |||

Determinant of a square matrix, using recursive descent (threadable). | Determinant of a square matrix, using recursive descent (broadcastable). | |||

This is the traditional, robust recursive determinant method taught in | This is the traditional, robust recursive determinant method taught in | |||

most linear algebra courses. It scales like C<O(n!)> (and hence is | most linear algebra courses. It scales like C<O(n!)> (and hence is | |||

pitifully slow for large matrices) but is very robust because no | pitifully slow for large matrices) but is very robust because no | |||

division is involved (hence no division-by-zero errors for singular | division is involved (hence no division-by-zero errors for singular | |||

matrices). It's also threadable, so you can find the determinants of | matrices). It's also broadcastable, so you can find the determinants of | |||

a large collection of matrices all at once if you want. | a large collection of matrices all at once if you want. | |||

Matrices up to 3x3 are handled by direct multiplication; larger matrices | Matrices up to 3x3 are handled by direct multiplication; larger matrices | |||

are handled by recursive descent to the 3x3 case. | are handled by recursive descent to the 3x3 case. | |||

The LU-decomposition method L</det> is faster in isolation for | The LU-decomposition method L</det> is faster in isolation for | |||

single matrices larger than about 4x4, and is much faster if you end up | single matrices larger than about 4x4, and is much faster if you end up | |||

reusing the LU decomposition of C<$a> (NOTE: check performance and | reusing the LU decomposition of C<$a> (NOTE: check performance and | |||

threading benchmarks with new code). | broadcasting benchmarks with new code). | |||

=cut | =cut | |||

*PDL::determinant = \&determinant; | *PDL::determinant = \&determinant; | |||

sub determinant { | sub determinant { | |||

my($x) = shift; | my($x) = shift; | |||

my($n); | my($n); | |||

return undef unless( | return undef unless( | |||

UNIVERSAL::isa($x,'PDL') && | UNIVERSAL::isa($x,'PDL') && | |||

$x->getndims >= 2 && | $x->getndims >= 2 && | |||

($n = $x->dim(0)) == $x->dim(1) | ($n = $x->dim(0)) == $x->dim(1) | |||

); | ); | |||

return $x->clump(2) if($n==1); | return $x->clump(2) if($n==1); | |||

if($n==2) { | if($n==2) { | |||

my($y) = $x->clump(2); | my($y) = $x->clump(2); | |||

return $y->index(0)*$y->index(3) - $y->index(1)*$y->index(2); | return $y->index(0)*$y->index(3) - $y->index(1)*$y->index(2); | |||

} | } | |||

if($n==3) { | if($n==3) { | |||

my($y) = $x->clump(2); | my($y) = $x->clump(2); | |||

my $y3 = $y->index(3); | my $y3 = $y->index(3); | |||

my $y4 = $y->index(4); | my $y4 = $y->index(4); | |||

my $y5 = $y->index(5); | my $y5 = $y->index(5); | |||

my $y6 = $y->index(6); | my $y6 = $y->index(6); | |||

my $y7 = $y->index(7); | my $y7 = $y->index(7); | |||

my $y8 = $y->index(8); | my $y8 = $y->index(8); | |||

return ( | return ( | |||

$y->index(0) * ( $y4 * $y8 - $y5 * $y7 ) | $y->index(0) * ( $y4 * $y8 - $y5 * $y7 ) | |||

+ $y->index(1) * ( $y5 * $y6 - $y3 * $y8 ) | + $y->index(1) * ( $y5 * $y6 - $y3 * $y8 ) | |||

skipping to change at line 464 | skipping to change at line 463 | |||

my ($sym) = 0.5*($x + $x->transpose); | my ($sym) = 0.5*($x + $x->transpose); | |||

my ($err) = PDL::max(abs($sym)); | my ($err) = PDL::max(abs($sym)); | |||

barf "Need symmetric component non-zero for eigens_sym" | barf "Need symmetric component non-zero for eigens_sym" | |||

if $err == 0; | if $err == 0; | |||

$err = PDL::max(abs($x-$sym))/$err; | $err = PDL::max(abs($x-$sym))/$err; | |||

warn "Using symmetrized version of the matrix in eigens_sym" | warn "Using symmetrized version of the matrix in eigens_sym" | |||

if $err > 1e-5 && $PDL::debug; | if $err > 1e-5 && $PDL::debug; | |||

## Get lower diagonal form | ## Get lower diagonal form | |||

## Use whichND/indexND because whereND doesn\'t exist (yet?) and | ## Use whichND/indexND because whereND doesn\'t exist (yet?) and | |||

## the combo is threadable (unlike where). Note that for historical | ## the combo is broadcastable (unlike where). Note that for historical | |||

## reasons whichND needs a scalar() around it to give back a | ## reasons whichND needs a scalar() around it to give back a | |||

## nice 2xn PDL index. | ## nice 2xn PDL index. | |||

my $lt = PDL::indexND($sym, | my $lt = PDL::indexND($sym, | |||

scalar(PDL::whichND(PDL->xvals($n,$n) <= | scalar(PDL::whichND(PDL->xvals($n,$n) <= | |||

PDL->yvals($n,$n))) | PDL->yvals($n,$n))) | |||

)->copy; | )->copy; | |||

my $ev = PDL->zeroes($sym->dims); | my $ev = PDL->zeroes($sym->dims); | |||

my $e = PDL->zeroes($sym->index(0)->dims); | my $e = PDL->zeroes($sym->index(0)->dims); | |||

&PDL::_eigens_sym_int($lt, $ev, $e); | &PDL::_eigens_sym_int($lt, $ev, $e); | |||

skipping to change at line 488 | skipping to change at line 487 | |||

$e; #just eigenvalues | $e; #just eigenvalues | |||

} | } | |||

' | ' | |||

, Doc => ' | , Doc => ' | |||

=for ref | =for ref | |||

Eigenvalues and -vectors of a symmetric square matrix. If passed | Eigenvalues and -vectors of a symmetric square matrix. If passed | |||

an asymmetric matrix, the routine will warn and symmetrize it, by taking | an asymmetric matrix, the routine will warn and symmetrize it, by taking | |||

the average value. That is, it will solve for 0.5*($a+$a->transpose). | the average value. That is, it will solve for 0.5*($a+$a->transpose). | |||

It\'s threadable, so if C<$a> is 3x3x100, it\'s treated as 100 separate 3x3 | It\'s broadcastable, so if C<$a> is 3x3x100, it\'s treated as 100 separate 3x3 | |||

matrices, and both C<$ev> and C<$e> get extra dimensions accordingly. | matrices, and both C<$ev> and C<$e> get extra dimensions accordingly. | |||

If called in scalar context it hands back only the eigenvalues. Ultimately, | If called in scalar context it hands back only the eigenvalues. Ultimately, | |||

it should switch to a faster algorithm in this case (as discarding the | it should switch to a faster algorithm in this case (as discarding the | |||

eigenvectors is wasteful). | eigenvectors is wasteful). | |||

The algorithm used is due to J. vonNeumann, which was a rediscovery of | The algorithm used is due to J. vonNeumann, which was a rediscovery of | |||

L<Jacobi\'s Method|http://en.wikipedia.org/wiki/Jacobi_eigenvalue_algorithm> . | L<Jacobi\'s Method|http://en.wikipedia.org/wiki/Jacobi_eigenvalue_algorithm> . | |||

The eigenvectors are returned in COLUMNS of the returned PDL. That | The eigenvectors are returned in COLUMNS of the returned PDL. That | |||

skipping to change at line 685 | skipping to change at line 684 | |||

experimental! For asymmetric matrices, nearly all observed matrices | experimental! For asymmetric matrices, nearly all observed matrices | |||

with real eigenvalues produce incorrect results, due to errors of the | with real eigenvalues produce incorrect results, due to errors of the | |||

sslib algorithm. If your assymmetric matrix returns all NaNs, do not | sslib algorithm. If your assymmetric matrix returns all NaNs, do not | |||

assume that the values are complex. Also, problems with memory access | assume that the values are complex. Also, problems with memory access | |||

is known in this library. | is known in this library. | |||

Not all square matrices are diagonalizable. If you feed in a | Not all square matrices are diagonalizable. If you feed in a | |||

non-diagonalizable matrix, then one or more of the eigenvectors will | non-diagonalizable matrix, then one or more of the eigenvectors will | |||

be set to NaN, along with the corresponding eigenvalues. | be set to NaN, along with the corresponding eigenvalues. | |||

C<eigens> is threadable, so you can solve 100 eigenproblems by | C<eigens> is broadcastable, so you can solve 100 eigenproblems by | |||

feeding in a 3x3x100 array. Both C<$ev> and C<$e> get extra dimensions according ly. | feeding in a 3x3x100 array. Both C<$ev> and C<$e> get extra dimensions according ly. | |||

If called in scalar context C<eigens> hands back only the eigenvalues. This | If called in scalar context C<eigens> hands back only the eigenvalues. This | |||

is somewhat wasteful, as it calculates the eigenvectors anyway. | is somewhat wasteful, as it calculates the eigenvectors anyway. | |||

The eigenvectors are returned in COLUMNS of the returned PDL (ie the | The eigenvectors are returned in COLUMNS of the returned PDL (ie the | |||

the 0 dimension). That makes it slightly easier to access individual | the 0 dimension). That makes it slightly easier to access individual | |||

eigenvectors, since the 0th dim of the output PDL runs across the | eigenvectors, since the 0th dim of the output PDL runs across the | |||

eigenvectors and the 1st dim runs across their components. | eigenvectors and the 1st dim runs across their components. | |||

skipping to change at line 762 | skipping to change at line 761 | |||

', | ', | |||

, Doc => q{ | , Doc => q{ | |||

=for usage | =for usage | |||

($u, $s, $v) = svd($x); | ($u, $s, $v) = svd($x); | |||

=for ref | =for ref | |||

Singular value decomposition of a matrix. | Singular value decomposition of a matrix. | |||

C<svd> is threadable. | C<svd> is broadcastable. | |||

Given an m x n matrix C<$a> that has m rows and n columns (m >= n), | Given an m x n matrix C<$a> that has m rows and n columns (m >= n), | |||

C<svd> computes matrices C<$u> and C<$v>, and a vector of the singular | C<svd> computes matrices C<$u> and C<$v>, and a vector of the singular | |||

values C<$s>. Like most implementations, C<svd> computes what is | values C<$s>. Like most implementations, C<svd> computes what is | |||

commonly referred to as the "thin SVD" of C<$a>, such that C<$u> is m | commonly referred to as the "thin SVD" of C<$a>, such that C<$u> is m | |||

x n, C<$v> is n x n, and there are <=n singular values. As long as m | x n, C<$v> is n x n, and there are <=n singular values. As long as m | |||

>= n, the original matrix can be reconstructed as follows: | >= n, the original matrix can be reconstructed as follows: | |||

($u,$s,$v) = svd($x); | ($u,$s,$v) = svd($x); | |||

$ess = zeroes($x->dim(0),$x->dim(0)); | $ess = zeroes($x->dim(0),$x->dim(0)); | |||

skipping to change at line 798 | skipping to change at line 797 | |||

EXAMPLE | EXAMPLE | |||

The computing literature has loads of examples of how to use SVD. | The computing literature has loads of examples of how to use SVD. | |||

Here's a trivial example (used in L<PDL::Transform::map|PDL::Transform/map>) | Here's a trivial example (used in L<PDL::Transform::map|PDL::Transform/map>) | |||

of how to make a matrix less, er, singular, without changing the | of how to make a matrix less, er, singular, without changing the | |||

orientation of the ellipsoid of transformation: | orientation of the ellipsoid of transformation: | |||

{ my($r1,$s,$r2) = svd $x; | { my($r1,$s,$r2) = svd $x; | |||

$s++; # fatten all singular values | $s++; # fatten all singular values | |||

$r2 *= $s; # implicit threading for cheap mult. | $r2 *= $s; # implicit broadcasting for cheap mult. | |||

$x .= $r2 x $r1; # a gets r2 x ess x r1 | $x .= $r2 x $r1; # a gets r2 x ess x r1 | |||

} | } | |||

=cut | =cut | |||

},); | },); | |||

###################################################################### | ###################################################################### | |||

pp_add_exported('','lu_decomp'); | pp_add_exported('','lu_decomp'); | |||

pp_addpm(<<'EOD'); | pp_addpm(<<'EOD'); | |||

skipping to change at line 834 | skipping to change at line 833 | |||

$lu = lu_decomp($x, $perm, $par); # $perm and $par are outputs! | $lu = lu_decomp($x, $perm, $par); # $perm and $par are outputs! | |||

lu_decomp($x->inplace,$perm,$par); # Everything in place. | lu_decomp($x->inplace,$perm,$par); # Everything in place. | |||

=for description | =for description | |||

C<lu_decomp> returns an LU decomposition of a square matrix, | C<lu_decomp> returns an LU decomposition of a square matrix, | |||

using Crout's method with partial pivoting. It's ported | using Crout's method with partial pivoting. It's ported | |||

from I<Numerical Recipes>. The partial pivoting keeps it | from I<Numerical Recipes>. The partial pivoting keeps it | |||

numerically stable but means a little more overhead from | numerically stable but means a little more overhead from | |||

threading. | broadcasting. | |||

C<lu_decomp> decomposes the input matrix into matrices L and | C<lu_decomp> decomposes the input matrix into matrices L and | |||

U such that LU = A, L is a subdiagonal matrix, and U is a | U such that LU = A, L is a subdiagonal matrix, and U is a | |||

superdiagonal matrix. By convention, the diagonal of L is | superdiagonal matrix. By convention, the diagonal of L is | |||

all 1's. | all 1's. | |||

The single output matrix contains all the variable elements | The single output matrix contains all the variable elements | |||

of both the L and U matrices, stacked together. Because the | of both the L and U matrices, stacked together. Because the | |||

method uses pivoting (rearranging the lower part of the | method uses pivoting (rearranging the lower part of the | |||

matrix for better numerical stability), you have to permute | matrix for better numerical stability), you have to permute | |||

skipping to change at line 872 | skipping to change at line 871 | |||

those are handled OK but give a zero determinant (and hence | those are handled OK but give a zero determinant (and hence | |||

can't be inverted). | can't be inverted). | |||

C<lu_decomp> uses pivoting, which rearranges the values in the | C<lu_decomp> uses pivoting, which rearranges the values in the | |||

matrix for more numerical stability. This makes it really | matrix for more numerical stability. This makes it really | |||

good for large and even near-singular matrices. There is | good for large and even near-singular matrices. There is | |||

a non-pivoting version C<lu_decomp2> available which is | a non-pivoting version C<lu_decomp2> available which is | |||

from 5 to 60 percent faster for typical problems at | from 5 to 60 percent faster for typical problems at | |||

the expense of failing to compute a result in some cases. | the expense of failing to compute a result in some cases. | |||

Now that the C<lu_decomp> is threaded, it is the recommended | Now that the C<lu_decomp> is broadcasted, it is the recommended | |||

LU decomposition routine. It no longer falls back to C<lu_decomp2>. | LU decomposition routine. It no longer falls back to C<lu_decomp2>. | |||

C<lu_decomp> is ported from I<Numerical Recipes> to PDL. It | C<lu_decomp> is ported from I<Numerical Recipes> to PDL. It | |||

should probably be implemented in C. | should probably be implemented in C. | |||

=cut | =cut | |||

*PDL::lu_decomp = \&lu_decomp; | *PDL::lu_decomp = \&lu_decomp; | |||

sub lu_decomp { | sub lu_decomp { | |||

skipping to change at line 1166 | skipping to change at line 1165 | |||

# or with LAPACK | # or with LAPACK | |||

use PDL::LinearAlgebra::Real; | use PDL::LinearAlgebra::Real; | |||

getrf($lu=$A->copy, $ipiv=null, $info=null); | getrf($lu=$A->copy, $ipiv=null, $info=null); | |||

getrs($lu, 1, $x=$B->transpose->copy, $ipiv, $info=null); # again, need transp ose | getrs($lu, 1, $x=$B->transpose->copy, $ipiv, $info=null); # again, need transp ose | |||

$x=$x->inplace->transpose; | $x=$x->inplace->transpose; | |||

# or with GSL | # or with GSL | |||

use PDL::GSL::LINALG; | use PDL::GSL::LINALG; | |||

LU_decomp(my $lu=$A->copy, my $p=null, my $signum=null); | LU_decomp(my $lu=$A->copy, my $p=null, my $signum=null); | |||

# $B and $x, first dim is because GSL treats as vector, higher dims thread | # $B and $x, first dim is because GSL treats as vector, higher dims broadcast | |||

# so we transpose in and back out | # so we transpose in and back out | |||

LU_solve($lu, $p, $B->transpose, my $x=null); | LU_solve($lu, $p, $B->transpose, my $x=null); | |||

$x=$x->inplace->transpose; | $x=$x->inplace->transpose; | |||

# proof of the pudding is in the eating: | # proof of the pudding is in the eating: | |||

print $A x $x; | print $A x $x; | |||

=for description | =for description | |||

Given the LU decomposition of a square matrix (from L</lu_decomp>), | Given the LU decomposition of a square matrix (from L</lu_decomp>), | |||

C<lu_backsub> does back substitution into the matrix to solve | C<lu_backsub> does back substitution into the matrix to solve | |||

C<a x = b> for given vector C<b>. It is separated from the | C<a x = b> for given vector C<b>. It is separated from the | |||

C<lu_decomp> method so that you can call the cheap C<lu_backsub> | C<lu_decomp> method so that you can call the cheap C<lu_backsub> | |||

multiple times and not have to do the expensive LU decomposition | multiple times and not have to do the expensive LU decomposition | |||

more than once. | more than once. | |||

C<lu_backsub> acts on single vectors and threads in the usual | C<lu_backsub> acts on single vectors and broadcasts in the usual | |||

way, which means that it treats C<$y> as the I<transpose> | way, which means that it treats C<$y> as the I<transpose> | |||

of the input. If you want to process a matrix, you must | of the input. If you want to process a matrix, you must | |||

hand in the I<transpose> of the matrix, and then transpose | hand in the I<transpose> of the matrix, and then transpose | |||

the output when you get it back. that is because pdls are | the output when you get it back. that is because pdls are | |||

indexed by (col,row), and matrices are (row,column) by | indexed by (col,row), and matrices are (row,column) by | |||

convention, so a 1-D pdl corresponds to a row vector, not a | convention, so a 1-D pdl corresponds to a row vector, not a | |||

column vector. | column vector. | |||

If C<$lu> is dense and you have more than a few points to | If C<$lu> is dense and you have more than a few points to | |||

solve for, it is probably cheaper to find C<a^-1> with | solve for, it is probably cheaper to find C<a^-1> with | |||

skipping to change at line 1225 | skipping to change at line 1224 | |||

unless defined($lu); | unless defined($lu); | |||

barf("Usage: \$x = lu_backsub(\$lu,\$perm,\$y); all must be PDLs\n") | barf("Usage: \$x = lu_backsub(\$lu,\$perm,\$y); all must be PDLs\n") | |||

unless(UNIVERSAL::isa($lu,'PDL') && | unless(UNIVERSAL::isa($lu,'PDL') && | |||

UNIVERSAL::isa($perm,'PDL') && | UNIVERSAL::isa($perm,'PDL') && | |||

UNIVERSAL::isa($y,'PDL')); | UNIVERSAL::isa($y,'PDL')); | |||

my $n = $y->dim(0); | my $n = $y->dim(0); | |||

my $n1 = $n; $n1--; | my $n1 = $n; $n1--; | |||

# Make sure threading dimensions are compatible. | # Make sure broadcasting dimensions are compatible. | |||

# There are two possible sources of thread dims: | # There are two possible sources of broadcast dims: | |||

# | # | |||

# (1) over multiple LU (i.e., $lu,$perm) instances | # (1) over multiple LU (i.e., $lu,$perm) instances | |||

# (2) over multiple B (i.e., $y) column instances | # (2) over multiple B (i.e., $y) column instances | |||

# | # | |||

# The full dimensions of the function call looks like | # The full dimensions of the function call looks like | |||

# | # | |||

# lu_backsub( lu(m,m,X), perm(m,X), b(m,Y) ) | # lu_backsub( lu(m,m,X), perm(m,X), b(m,Y) ) | |||

# | # | |||

# where X is the list of extra LU dims and Y is | # where X is the list of extra LU dims and Y is | |||

# the list of extra B dims. We have several possible | # the list of extra B dims. We have several possible | |||

skipping to change at line 1256 | skipping to change at line 1255 | |||

my $m = $ludims->slice("(0)"); # this is the sig dimension | my $m = $ludims->slice("(0)"); # this is the sig dimension | |||

unless ( ($ludims->slice(0) == $m) and ($ludims->slice(1) == $m) and | unless ( ($ludims->slice(0) == $m) and ($ludims->slice(1) == $m) and | |||

($permdims->slice(0) == $m) and ($bdims->slice(0) == $m)) { | ($permdims->slice(0) == $m) and ($bdims->slice(0) == $m)) { | |||

barf "lu_backsub: mismatched sig dimensions"; | barf "lu_backsub: mismatched sig dimensions"; | |||

} | } | |||

my $lunumthr = $ludims->dim(0)-2; | my $lunumthr = $ludims->dim(0)-2; | |||

my $permnumthr = $permdims->dim(0)-1; | my $permnumthr = $permdims->dim(0)-1; | |||

my $bnumthr = $bdims->dim(0)-1; | my $bnumthr = $bdims->dim(0)-1; | |||

unless ( ($lunumthr == $permnumthr) and ($ludims->slice("1:-1") == $permdims) ->all ) { | unless ( ($lunumthr == $permnumthr) and ($ludims->slice("1:-1") == $permdims) ->all ) { | |||

barf "lu_backsub: \$lu and \$perm thread dims not equal! \n"; | barf "lu_backsub: \$lu and \$perm broadcast dims not equal! \n"; | |||

} | } | |||

# (2) If X == Y then default threading is ok | # (2) If X == Y then default broadcasting is ok | |||

if ( ($bnumthr==$permnumthr) and ($bdims==$permdims)->all) { | if ( ($bnumthr==$permnumthr) and ($bdims==$permdims)->all) { | |||

print STDERR "lu_backsub: have explicit thread dims, goto THREAD_OK\n" if | print STDERR "lu_backsub: have explicit broadcast dims, goto BROADCAST_OK\ | |||

$PDL::debug; | n" if $PDL::debug; | |||

goto THREAD_OK; | goto BROADCAST_OK; | |||

} | } | |||

# (3) If X == (x,Y) then add x dummy to lu,perm | # (3) If X == (x,Y) then add x dummy to lu,perm | |||

# (4) If ndims(X) > ndims(Y) then must have #3 | # (4) If ndims(X) > ndims(Y) then must have #3 | |||

# (5) If ndims(X) < ndims(Y) then foreach | # (5) If ndims(X) < ndims(Y) then foreach | |||

# non-trivial leading dim in X (x0,x1,..) | # non-trivial leading dim in X (x0,x1,..) | |||

# insert dummy (x0,x1) into lu and perm | # insert dummy (x0,x1) into lu and perm | |||

# This means that threading occurs over all | # This means that broadcasting occurs over all | |||

# leading non-trivial (not length 1) dims of | # leading non-trivial (not length 1) dims of | |||

# B unless all the thread dims are explicitly | # B unless all the broadcast dims are explicitly | |||

# matched to the LU dims. | # matched to the LU dims. | |||

THREAD_OK: | BROADCAST_OK: | |||

# Permute the vector and make a copy if necessary. | # Permute the vector and make a copy if necessary. | |||

my $out = $y->dummy(1,$y->dim(0))->index($perm->dummy(1)); | my $out = $y->dummy(1,$y->dim(0))->index($perm->dummy(1)); | |||

$out = $out->sever if !$y->is_inplace; | $out = $out->sever if !$y->is_inplace; | |||

print STDERR "lu_backsub: starting with \$out" . pdl($out->dims) . "\n" if $P DL::debug; | print STDERR "lu_backsub: starting with \$out" . pdl($out->dims) . "\n" if $P DL::debug; | |||

# Make sure threading over lu happens OK... | # Make sure broadcasting over lu happens OK... | |||

if($out->ndims < $lu->ndims-1) { | if($out->ndims < $lu->ndims-1) { | |||

print STDERR "lu_backsub: adjusting dims for \$out" . pdl($out->dims) . "\ n" if $PDL::debug; | print STDERR "lu_backsub: adjusting dims for \$out" . pdl($out->dims) . "\ n" if $PDL::debug; | |||

do { | do { | |||

$out = $out->dummy(-1,$lu->dim($out->ndims+1)); | $out = $out->dummy(-1,$lu->dim($out->ndims+1)); | |||

} while($out->ndims < $lu->ndims-1); | } while($out->ndims < $lu->ndims-1); | |||

} | } | |||

## Do forward substitution into L | ## Do forward substitution into L | |||

my $row; my $r1; | my $row; my $r1; | |||

skipping to change at line 1309 | skipping to change at line 1308 | |||

my $tmp; # work around perl -d "feature | my $tmp; # work around perl -d "feature | |||

($tmp = $out->index($row)) -= ($lu->slice("0:$r1,$row") * | ($tmp = $out->index($row)) -= ($lu->slice("0:$r1,$row") * | |||

$out->slice("0:$r1") | $out->slice("0:$r1") | |||

)->sumover; | )->sumover; | |||

} | } | |||

## Do backward substitution into U, and normalize by the diagonal | ## Do backward substitution into U, and normalize by the diagonal | |||

my $ludiag = $lu->diagonal(0,1); | my $ludiag = $lu->diagonal(0,1); | |||

{ | { | |||

my $tmp; # work around for perl -d "feature" | my $tmp; # work around for perl -d "feature" | |||

($tmp = $out->index($n1)) /= $ludiag->index($n1)->dummy(0); # TODO: check threading | ($tmp = $out->index($n1)) /= $ludiag->index($n1)->dummy(0); # TODO: check broadcasting | |||

} | } | |||

for ($row=$n1; $row>0; $row--) { | for ($row=$n1; $row>0; $row--) { | |||

$r1 = $row-1; | $r1 = $row-1; | |||

my $tmp; # work around for perl -d "feature" | my $tmp; # work around for perl -d "feature" | |||

($tmp = $out->index($r1)) -= ($lu->slice("$row:$n1,$r1") * # TODO: check thread dims | ($tmp = $out->index($r1)) -= ($lu->slice("$row:$n1,$r1") * # TODO: check broadcast dims | |||

$out->slice("$row:$n1") | $out->slice("$row:$n1") | |||

)->sumover; | )->sumover; | |||

($tmp = $out->index($r1)) /= $ludiag->index($r1)->dummy(0); # TODO: check thread dims | ($tmp = $out->index($r1)) /= $ludiag->index($r1)->dummy(0); # TODO: check broadcast dims | |||

} | } | |||

if ($y->is_inplace) { | if ($y->is_inplace) { | |||

$y->setdims([$out->dims]) if !PDL::all($y->shape == $out->shape); # assgn n eeds same shape | $y->setdims([$out->dims]) if !PDL::all($y->shape == $out->shape); # assgn n eeds same shape | |||

$y .= $out; | $y .= $out; | |||

} | } | |||

$out; | $out; | |||

} | } | |||

EOD | EOD | |||

skipping to change at line 1388 | skipping to change at line 1387 | |||

# unnecessary increase (admittedly small) in the amount of | # unnecessary increase (admittedly small) in the amount of | |||

# code | # code | |||

# | # | |||

pp_def("squaretotri", | pp_def("squaretotri", | |||

Pars => 'a(n,n); b(m)', | Pars => 'a(n,n); b(m)', | |||

Code => ' | Code => ' | |||

register int mna=0, nb=0, ns = $SIZE (n); | register int mna=0, nb=0, ns = $SIZE (n); | |||

if($SIZE (m) != (ns * (ns+1))/2) { | if($SIZE (m) != (ns * (ns+1))/2) { | |||

$CROAK("Wrong sized args for squaretotri"); | $CROAK("Wrong sized args for squaretotri"); | |||

} | } | |||

threadloop %{ | broadcastloop %{ | |||

loop(m) %{ | loop(m) %{ | |||

$b() = $a(n0 => mna, n1 => nb); | $b() = $a(n0 => mna, n1 => nb); | |||

mna++; if(mna > nb) {mna = 0; nb ++;} | mna++; if(mna > nb) {mna = 0; nb ++;} | |||

%} | %} | |||

%} | %} | |||

', | ', | |||

Doc => ' | Doc => ' | |||

=for ref | =for ref | |||

Convert a symmetric square matrix to triangular vector storage. | Convert a symmetric square matrix to triangular vector storage. | |||

End of changes. 29 change blocks. | ||||

32 lines changed or deleted | | 31 lines changed or added |