"Fossies" - the Fresh Open Source Software Archive  

Source code changes of the file "Basic/Pod/Indexing.pod" between
PDL-2.074.tar.gz and PDL-2.075.tar.gz

About: PDL (Perl Data Language) aims to turn perl into an efficient numerical language for scientific computing (similar to IDL and MatLab).

Indexing.pod  (PDL-2.074):Indexing.pod  (PDL-2.075)
=head1 NAME =head1 NAME
PDL::Indexing - Introduction to indexing and slicing ndarrays. PDL::Indexing - Introduction to indexing and slicing ndarrays.
=head1 OVERVIEW =head1 OVERVIEW
This man page should serve as a first tutorial on the indexing and This man page should serve as a first tutorial on the indexing and
threading features of I<PDL>. broadcasting features of I<PDL>.
Like all vectorized languages, PDL automates looping over multi-dimensional data structures ("ndarrays") using Like all vectorized languages, PDL automates looping over multi-dimensional data structures ("ndarrays") using
a variant of mathematical vector notation. The automatic looping is called a variant of mathematical vector notation. The automatic looping is called
"threading", in part because ultimately PDL will implement parallel processing "broadcasting", similar to NumPy and Julia. PDL also automatically runs
to speed up the loops. broadcasting computation in parallel - see L<PDL::ParallelCPU>.
A lot of the flexibility and power of PDL relies on the indexing and threading A lot of the flexibility and power of PDL relies on the indexing and broadcastin g
features of the Perl extension. Indexing allows access to the data of an ndarra y features of the Perl extension. Indexing allows access to the data of an ndarra y
in a very flexible way. Threading provides efficient vectorization of simple in a very flexible way. Broadcasting provides efficient vectorization of simple
operations. operations.
The values of an ndarray are stored compactly as typed values in a single block of memory, The values of an ndarray are stored compactly as typed values in a single block of memory,
not (as in a normal Perl list-of-lists) as individual Perl scalars. not (as in a normal Perl list-of-lists) as individual Perl scalars.
In the sections that follow many "methods" are called out -- these are Perl oper ators In the sections that follow many "methods" are called out -- these are Perl oper ators
that apply to ndarrays. From the L<perldl> (or L<pdl2>) shell, you that apply to ndarrays. From the L<perldl> (or L<pdl2>) shell, you
can find out more about each method by typing "?" followed by the method name. can find out more about each method by typing "?" followed by the method name.
=head2 Dimension lists =head2 Dimension lists
skipping to change at line 73 skipping to change at line 73
child and parent ndarray: child and parent ndarray:
=over 3 =over 3
=item copy - forces an explicit copy of an ndarray =item copy - forces an explicit copy of an ndarray
=item sever - breaks the dataflow connection between an ndarray and its parents (if any) =item sever - breaks the dataflow connection between an ndarray and its parents (if any)
=back =back
=head2 Threading and Dimension Order =head2 Broadcasting and Dimension Order
Most PDL operations act on the first few dimensions of their ndarray arguments. For Most PDL operations act on the first few dimensions of their ndarray arguments. For
example, C<sumover> sums all elements along the first dimension in the list (dim ension 0). example, C<sumover> sums all elements along the first dimension in the list (dim ension 0).
If you feed in a three-dimensional ndarray, then the first dimension is consider ed the If you feed in a three-dimensional ndarray, then the first dimension is consider ed the
"active" dimension and the later dimensions are "thread" dimensions because they are simply "active" dimension and the later dimensions are "broadcast" dimensions because t hey are simply
looped over. There are several ways to transpose or re-order the dimension list of an ndarray. looped over. There are several ways to transpose or re-order the dimension list of an ndarray.
Those techniques are very fast since they don't touch the underlying data, only change the Those techniques are very fast since they don't touch the underlying data, only change the
way that PDL accesses the data. The main dimension ordering functions are: way that PDL accesses the data. The main dimension ordering functions are:
=over 3 =over 3
=item mv - moves a particular dimension somewhere else in the dimension list =item mv - moves a particular dimension somewhere else in the dimension list
=item xchg - exchanges two dimensions in the dimension list, leaving the rest al one =item xchg - exchanges two dimensions in the dimension list, leaving the rest al one
skipping to change at line 103 skipping to change at line 103
=item squeeze - eliminates any dimensions of size 1 =item squeeze - eliminates any dimensions of size 1
=back =back
=head2 Physical and Dummy Dimensions =head2 Physical and Dummy Dimensions
=over 5 =over 5
=item * =item *
document Perl level threading document Perl level broadcasting
=item * =item *
threadids broadcastids
=item * =item *
update and correct description of slice update and correct description of slice
=item * =item *
new functions in slice.pd (affine, lag, splitdim) new functions in slice.pd (affine, lag, splitdim)
=item * =item *
reworking of paragraph on explicit threading reworking of paragraph on explicit broadcasting
=back =back
=head1 Indexing and threading with PDL =head1 Indexing and broadcasting with PDL
A lot of the flexibility and power of PDL relies on the indexing and A lot of the flexibility and power of PDL relies on the indexing and
looping features of the Perl extension. Indexing allows access to the looping features of the Perl extension. Indexing allows access to the
data of an ndarray in a very flexible way. Threading provides data of an ndarray in a very flexible way. Broadcasting provides
efficient implicit looping functionality (since the loops are efficient implicit looping functionality (since the loops are
implemented as optimized C code). implemented as optimized C code).
ndarrays are Perl objects that ndarrays are Perl objects that
represent multidimensional arrays and operations on those. In contrast represent multidimensional arrays and operations on those. In contrast
to simple Perl C<@x> style lists the array data is compactly stored in to simple Perl C<@x> style lists the array data is compactly stored in
a single block of memory thus taking up a lot less memory and enabling a single block of memory thus taking up a lot less memory and enabling
use of fast C code to implement operations (e.g. addition, use of fast C code to implement operations (e.g. addition,
etc) on ndarrays. etc) on ndarrays.
skipping to change at line 544 skipping to change at line 544
which returns the same slice but severs the connection of the slice which returns the same slice but severs the connection of the slice
to its parent. to its parent.
=head2 Other functions that manipulate dimensions =head2 Other functions that manipulate dimensions
Having talked extensively about the Having talked extensively about the
L<slice|PDL::Slices/slice> function it should be L<slice|PDL::Slices/slice> function it should be
noted that this is not the only PDL indexing function. There noted that this is not the only PDL indexing function. There
are additional indexing functions which are also useful are additional indexing functions which are also useful
(especially in the context of threading which we will talk about (especially in the context of broadcasting which we will talk about
later). Here are a list and some examples how to use them. later). Here are a list and some examples how to use them.
=over 4 =over 4
=item C<dummy> =item C<dummy>
inserts a dummy dimension of the size you specify (default 1) at the inserts a dummy dimension of the size you specify (default 1) at the
chosen location. You can't wait to hear how that is achieved? Well, chosen location. You can't wait to hear how that is achieved? Well,
all elements with index C<(X,x,Y)> (C<0E<lt>=xE<lt>size_of_dummy_dim>) just map to all elements with index C<(X,x,Y)> (C<0E<lt>=xE<lt>size_of_dummy_dim>) just map to
the element with index C<(X,Y)> of the parent ndarray (where C<X> and C<Y> refer to the element with index C<(X,Y)> of the parent ndarray (where C<X> and C<Y> refer to
the group of indices before and after the location where the dummy the group of indices before and after the location where the dummy
dimension was inserted.) dimension was inserted.)
This example calculates the x coordinate of the centroid of an This example calculates the x coordinate of the centroid of an
image (later we will learn that we didn't actually need the dummy image (later we will learn that we didn't actually need the dummy
dimension thanks to the magic of implicit threading; but using dummy dimension thanks to the magic of implicit broadcasting; but using dummy
dimensions the code would also work in a thread-less world; though once dimensions the code would also work in a broadcast-less world; though once
you have worked with PDL threads you wouldn't want to live without you have worked with PDL broadcasting you wouldn't want to live without
them again). them again).
# centroid # centroid
($xd,$yd) = $im->dims; ($xd,$yd) = $im->dims;
$xc = sum($im*xvals(zeroes($xd))->dummy(1,$yd))/sum($im); $xc = sum($im*xvals(zeroes($xd))->dummy(1,$yd))/sum($im);
Let's explain how that works in a little more detail. First, the Let's explain how that works in a little more detail. First, the
product: product:
$xvs = xvals(zeroes($xd)); $xvs = xvals(zeroes($xd));
skipping to change at line 644 skipping to change at line 644
=item C<xchg> and C<mv> =item C<xchg> and C<mv>
L<xchg|PDL::Slices/xchg> exchanges or "transposes" the two specified dimensions . L<xchg|PDL::Slices/xchg> exchanges or "transposes" the two specified dimensions .
A straightforward example: A straightforward example:
# transpose a matrix (without explicitly reshuffling data and # transpose a matrix (without explicitly reshuffling data and
# making a copy) # making a copy)
$prod = $x x $x->xchg(0,1); $prod = $x x $x->xchg(0,1);
C<$prod> should now be pretty close to the unity matrix if C<$x> is an C<$prod> should now be pretty close to the unity matrix if C<$x> is an
orthogonal matrix. Often C<xchg> will be used in the context of threading orthogonal matrix. Often C<xchg> will be used in the context of broadcasting
but more about that later. but more about that later.
L<mv|PDL::Slices/mv> works in a similar fashion. It moves a dimension (specified by L<mv|PDL::Slices/mv> works in a similar fashion. It moves a dimension (specified by
its number in the parent) to a new position in the new child ndarray: its number in the parent) to a new position in the new child ndarray:
$y = $x->mv(4,0); # make the 5th dimension of $x the first in the $y = $x->mv(4,0); # make the 5th dimension of $x the first in the
# new child $y # new child $y
The difference between C<xchg> and C<mv> is that C<xchg> only changes The difference between C<xchg> and C<mv> is that C<xchg> only changes
the position of two dimensions with each other, whereas C<mv> the position of two dimensions with each other, whereas C<mv>
skipping to change at line 680 skipping to change at line 680
pdl> help vars pdl> help vars
PDL variables in package main:: PDL variables in package main::
Name Type Dimension Flow State Mem Name Type Dimension Flow State Mem
---------------------------------------------------------------- ----------------------------------------------------------------
$seqs Double D [8000,50] -C 0.00Kb $seqs Double D [8000,50] -C 0.00Kb
$stack Double D [100,80,50] P 3.05Mb $stack Double D [100,80,50] P 3.05Mb
Unrealistic as it may seem, our confocal microscope software writes Unrealistic as it may seem, our confocal microscope software writes
data (sometimes) this way. But more often you use clump to achieve a data (sometimes) this way. But more often you use clump to achieve a
certain effect when using implicit or explicit threading. certain effect when using implicit or explicit broadcasting.
=back =back
=head2 Calls to indexing functions can be chained =head2 Calls to indexing functions can be chained
As you might have noticed in some of the examples above calls to the As you might have noticed in some of the examples above calls to the
indexing functions can be nicely chained since all of these functions indexing functions can be nicely chained since all of these functions
return a newly created child object. However, when doing extensive return a newly created child object. However, when doing extensive
index manipulations in a chain be sure to keep track of what you are index manipulations in a chain be sure to keep track of what you are
doing, e.g. doing, e.g.
skipping to change at line 722 skipping to change at line 722
can produce unexpected results and the results are explicitly can produce unexpected results and the results are explicitly
B<undefined> by PDL because when PDL gets parallel computing B<undefined> by PDL because when PDL gets parallel computing
features, the current result may well change. features, the current result may well change.
From the point of view of dataflow the introduction of From the point of view of dataflow the introduction of
greater-size-than-one dummy dimensions is regarded as an irreversible greater-size-than-one dummy dimensions is regarded as an irreversible
transformation (similar to the terminology in thermodynamics) which transformation (similar to the terminology in thermodynamics) which
precludes backward propagation of assignment to a parent (which you had precludes backward propagation of assignment to a parent (which you had
explicitly requested using the C<.=> assignment). A similar problem to explicitly requested using the C<.=> assignment). A similar problem to
watch out for occurs in the context of threading where sometimes watch out for occurs in the context of broadcasting where sometimes
dummy dimensions are created implicitly during the thread loop (see below). dummy dimensions are created implicitly during the broadcast loop (see below).
=head2 Reasons for the parent/child (or "pointer") concept =head2 Reasons for the parent/child (or "pointer") concept
[ this will have to wait a bit ] [ this will have to wait a bit ]
XXXXX being memory efficient XXXXX being memory efficient
XXXXX in the context of threading XXXXX in the context of broadcasting
XXXXX very flexible and powerful way of accessing portions of ndarray data XXXXX very flexible and powerful way of accessing portions of ndarray data
(in much more general way than sec, etc allow) (in much more general way than sec, etc allow)
XXXXX efficient implementation XXXXX efficient implementation
XXXXX difference to section/at, etc. XXXXX difference to section/at, etc.
=head2 How to make things physical again =head2 How to make things physical again
[ XXXXX fill in later when everything has settled a bit more ] [ XXXXX fill in later when everything has settled a bit more ]
** When needed (xsub routine interfacing C lib function) ** When needed (xsub routine interfacing C lib function)
** How achieved (->physical) ** How achieved (->physical)
** How to test (isphysical (explain how it works currently)) ** How to test (isphysical (explain how it works currently))
** ->copy and ->sever ** ->copy and ->sever
=head1 Threading =head1 Broadcasting
In the previous paragraph on indexing we have already mentioned the In the previous paragraph on indexing we have already mentioned the
term occasionally but now its really time to talk explicitly about term occasionally but now its really time to talk explicitly about
"threading" with ndarrays. The term threading has many different meanings in "broadcasting" with ndarrays: within the framework of PDL it could
different fields of computing. Within the framework of PDL it could
probably be loosely defined as an implicit looping probably be loosely defined as an implicit looping
facility. It is implicit because you don't specify anything like facility. It is implicit because you don't specify anything like
enclosing for-loops but rather the loops are automatically (or enclosing for-loops but rather the loops are automatically (or
'magically') generated by PDL based on the dimensions of the ndarrays 'magically') generated by PDL based on the dimensions of the ndarrays
involved. This should give you a first idea why the index/dimension involved. This should give you a first idea why the index/dimension
manipulating functions you have met in the previous paragraphs are manipulating functions you have met in the previous paragraphs are
especially important and useful in the context of threading. especially important and useful in the context of broadcasting.
The other ingredient for threading (apart from the ndarrays involved) is The other ingredient for broadcasting (apart from the ndarrays involved) is
a function that is threading aware (generally, these are a function that is broadcasting aware (generally, these are
L<PDL::PP> compiled functions) and that the ndarrays are "threaded" over. L<PDL::PP> compiled functions) and that the ndarrays are "broadcast" over.
So much about the terminology and now let's try to shed some light on what it So much about the terminology and now let's try to shed some light on what it
all means. all means.
=head2 Implicit threading - a first example =head2 Implicit broadcasting - a first example
There are two slightly different variants of threading. We start There are two slightly different variants of broadcasting. We start
with what we call "implicit threading". Let's pick a practical example with what we call "implicit broadcasting". Let's pick a practical example
that involves looping of a function over many elements of a that involves looping of a function over many elements of a
ndarray. Suppose we have an RGB image that we want to convert to ndarray. Suppose we have an RGB image that we want to convert to
grey-scale. The RGB image is represented by a 3-dim ndarray C<im(3,x,y)> where grey-scale. The RGB image is represented by a 3-dim ndarray C<im(3,x,y)> where
the first dimension contains the three color components of each pixel and C<x> the first dimension contains the three color components of each pixel and C<x>
and C<y> are width and height of the image, respectively. Next we need to and C<y> are width and height of the image, respectively. Next we need to
specify how to convert a color-triple at a given pixel into a specify how to convert a color-triple at a given pixel into a
grey-value (to be a realistic example it should represent the relative grey-value (to be a realistic example it should represent the relative
intensity with which our color insensitive eye cells would detect that intensity with which our color insensitive eye cells would detect that
color to achieve what we would call a natural conversion from color to color to achieve what we would call a natural conversion from color to
grey-scale). An approximation that works quite well is to compute the grey-scale). An approximation that works quite well is to compute the
skipping to change at line 801 skipping to change at line 800
$grey=zeroes(@dims[1,2]); # make the ndarray for the resulting grey image $grey=zeroes(@dims[1,2]); # make the ndarray for the resulting grey image
$w = pdl [77,150,29] / 256; # the vector of weights $w = pdl [77,150,29] / 256; # the vector of weights
for ($j=0;$j<dims[2];$j++) { for ($j=0;$j<dims[2];$j++) {
for ($i=0;$i<dims[1];$i++) { for ($i=0;$i<dims[1];$i++) {
# compute the pixel value # compute the pixel value
$tmp = inner($w,$im->slice(':,(i),(j)')); $tmp = inner($w,$im->slice(':,(i),(j)'));
set($grey,$i,$j,$tmp); # and set it in the grey-scale image set($grey,$i,$j,$tmp); # and set it in the grey-scale image
} }
} }
Now we write the same using threading (noting that C<inner> is a threading Now we write the same using broadcasting (noting that C<inner> is a broadcasting
aware function defined in the L<PDL::Primitive> package) aware function defined in the L<PDL::Primitive> package)
$grey = inner($im,pdl([77,150,29]/256)); $grey = inner($im,pdl([77,150,29]/256));
We have ended up with a one-liner that automatically We have ended up with a one-liner that automatically
creates the ndarray C<$grey> with the right number and size of dimensions and creates the ndarray C<$grey> with the right number and size of dimensions and
performs the loops automatically (these loops are implemented as fast C code performs the loops automatically (these loops are implemented as fast C code
in the internals of PDL). in the internals of PDL).
Well, we Well, we
still owe you an explanation how this 'magic' is achieved. still owe you an explanation how this 'magic' is achieved.
=head2 How does the example work ? =head2 How does the example work ?
The first thing to note is that every function that is threading aware The first thing to note is that every function that is broadcasting aware
(these are without exception functions compiled from concise (these are without exception functions compiled from concise
descriptions by L<PDL::PP>, later just called PP-functions) expects a descriptions by L<PDL::PP>, later just called PP-functions) expects a
defined (minimum) number of dimensions (we call them core dimensions) defined (minimum) number of dimensions (we call them core dimensions)
from each of its ndarray arguments. The L<inner|PDL::Primitive/inner> function from each of its ndarray arguments. The L<inner|PDL::Primitive/inner> function
expects two one-dimensional (input) parameters from which it calculates a expects two one-dimensional (input) parameters from which it calculates a
zero-dimensional (output) parameter. We write that symbolically as zero-dimensional (output) parameter. We write that symbolically as
C<inner((n),(n),[o]())> and call it C<inner>'s I<signature>, where n C<inner((n),(n),[o]())> and call it C<inner>'s I<signature>, where n
represents the size of that dimension. n being equal in the first and represents the size of that dimension. n being equal in the first and
second parameter means that those dimensions have to be of equal size second parameter means that those dimensions have to be of equal size
in any call. As a different example take the outer product which takes in any call. As a different example take the outer product which takes
two 1D vectors to generate a 2D matrix, symbolically written as two 1D vectors to generate a 2D matrix, symbolically written as
C<outer((n),(m),[o](n,m))>. The C<[o]> in both examples indicates that C<outer((n),(m),[o](n,m))>. The C<[o]> in both examples indicates that
this (here third) argument is an output argument. In the latter this (here third) argument is an output argument. In the latter
example the dimensions of first and second argument don't have to example the dimensions of first and second argument don't have to
agree but you see how they determine the size of the two dimensions of agree but you see how they determine the size of the two dimensions of
the output ndarray. the output ndarray.
Here is the point when threading finally enters the game. If you call Here is the point when broadcasting finally enters the game. If you call
PP-functions with ndarrays that have I<more> than the required core PP-functions with ndarrays that have I<more> than the required core
dimensions the first dimensions of the ndarray arguments are used as the dimensions the first dimensions of the ndarray arguments are used as the
core dimensions and the additional extra dimensions are threaded core dimensions and the additional extra dimensions are broadcast
over. Let us demonstrate this first with our example above over. Let us demonstrate this first with our example above
$grey = inner($im,$w); # w is the weight vector from above $grey = inner($im,$w); # w is the weight vector from above
In this case $w is 1D and so supplied just the core dimension, C<$im> is In this case $w is 1D and so supplied just the core dimension, C<$im> is
3D, more specifically C<(3,x,y)>. The first dimension (of size 3) is the 3D, more specifically C<(3,x,y)>. The first dimension (of size 3) is the
required core dimension that matches (as required by inner) the first required core dimension that matches (as required by inner) the first
(and only) dimension of C<$w>. The second dimension is the first thread (and only) dimension of C<$w>. The second dimension is the first broadcast
dimension (of size C<x>) and the third is here the second thread dimension (of size C<x>) and the third is here the second broadcast
dimension (of size C<y>). The output ndarray is automatically created (as dimension (of size C<y>). The output ndarray is automatically created (as
requested by setting C<$grey> to "null" prior to invocation). The output requested by setting C<$grey> to "null" prior to invocation). The output
dimensions are obtained by appending the I<loop dimensions> (here dimensions are obtained by appending the I<loop dimensions> (here
C<(x,y)>) to the core output dimensions (here 0D) to yield the final C<(x,y)>) to the core output dimensions (here 0D) to yield the final
dimensions of the auto-created ndarray (here C<0D+2D=2D> to yield a 2D dimensions of the auto-created ndarray (here C<0D+2D=2D> to yield a 2D
output of size C<(x,y)>). output of size C<(x,y)>).
So the above command calls the core functionality that computes the inner So the above command calls the core functionality that computes the inner
product of two 1D vectors C<x*y> times with C<$w> and all 1D slices of the product of two 1D vectors C<x*y> times with C<$w> and all 1D slices of the
form C<(':,(i),(j)')> of C<$im> and sets the respective elements of the form C<(':,(i),(j)')> of C<$im> and sets the respective elements of the
skipping to change at line 873 skipping to change at line 872
. .
$grey(x-2,y-1) = f($w,$im(:,(x-2),(y-1))) $grey(x-2,y-1) = f($w,$im(:,(x-2),(y-1)))
$grey(x-1,y-1) = f($w,$im(:,(x-1),(y-1))) $grey(x-1,y-1) = f($w,$im(:,(x-1),(y-1)))
But this is done automatically by PDL without writing any explicit But this is done automatically by PDL without writing any explicit
Perl loops. We see that the command really creates an output ndarray with Perl loops. We see that the command really creates an output ndarray with
the right dimensions and sets the elements indeed to the result of the the right dimensions and sets the elements indeed to the result of the
computation for each pixel of the input image. computation for each pixel of the input image.
When even more ndarrays and extra dimensions are involved things get a bit When even more ndarrays and extra dimensions are involved things get a bit
more complicated. We will first give the general rules how the thread more complicated. We will first give the general rules how the broadcast
dimensions depend on the dimensions of input ndarrays enabling you to dimensions depend on the dimensions of input ndarrays enabling you to
figure out the dimensionality of an auto-created output ndarray (for any figure out the dimensionality of an auto-created output ndarray (for any
given set of input ndarrays and core dimensions of the PP-function in given set of input ndarrays and core dimensions of the PP-function in
question). The general rules will most likely appear a bit confusing question). The general rules will most likely appear a bit confusing
on first sight so that we'll set out to illustrate the usage with a set on first sight so that we'll set out to illustrate the usage with a set
of further examples (which will hopefully also demonstrate that there of further examples (which will hopefully also demonstrate that there
are indeed many practical situations where threading comes in extremely are indeed many practical situations where broadcasting comes in extremely
handy). handy).
=head2 A call for coding discipline =head2 A call for coding discipline
Before we point out the other technical details of threading, please Before we point out the other technical details of broadcasting, please
note this call for programming discipline when using threading: note this call for programming discipline when using broadcasting:
In order to preserve human readability, I<PLEASE> comment any nontrivial In order to preserve human readability, I<PLEASE> comment any nontrivial
expression in your code involving threading. Most importantly, for expression in your code involving broadcasting. Most importantly, for
any subroutine, include information at the beginning about what you any subroutine, include information at the beginning about what you
expect the dimensions to represent (or ranges of dimensions). expect the dimensions to represent (or ranges of dimensions).
As a warning, look at this undocumented function and try to guess what As a warning, look at this undocumented function and try to guess what
might be going on: might be going on:
sub lookup { sub lookup {
my ($im,$palette) = @_; my ($im,$palette) = @_;
my $res; my $res;
index($palette->xchg(0,1), index($palette->xchg(0,1),
skipping to change at line 913 skipping to change at line 912
} }
Would you agree that it might be difficult to figure out expected Would you agree that it might be difficult to figure out expected
dimensions, purpose of the routine, etc ? dimensions, purpose of the routine, etc ?
(If you want to find out what this piece of code does, see below) (If you want to find out what this piece of code does, see below)
=head2 How to figure out the loop dimensions =head2 How to figure out the loop dimensions
There are a couple of rules that allow you to figure out number and There are a couple of rules that allow you to figure out number and
size of loop dimensions (and if the size of your input ndarrays comply size of loop dimensions (and if the size of your input ndarrays comply
with the threading rules). Dimensions of any ndarray argument are broken with the broadcasting rules). Dimensions of any ndarray argument are broken
down into two groups in the following: Core dimensions (as defined by down into two groups in the following: Core dimensions (as defined by
the PP-function, see B<Appendix B> for a list of PDL primitives) and the PP-function, see B<Appendix B> for a list of PDL primitives) and
extra dimensions which comprises all remaining dimensions of that extra dimensions which comprises all remaining dimensions of that
ndarray. For example calling a function C<func> with the signature ndarray. For example calling a function C<func> with the signature
C<func((n,m),[o](n))> with an ndarray C<$x(2,4,7,1,3)> as C<f($x,($o = null))> C<func((n,m),[o](n))> with an ndarray C<$x(2,4,7,1,3)> as C<f($x,($o = null))>
results in the semantic splitting of x's dimensions into: results in the semantic splitting of x's dimensions into:
core dimensions C<(2,4)> and extra dimensions C<(7,1,3)>. core dimensions C<(2,4)> and extra dimensions C<(7,1,3)>.
=over 6 =over 6
skipping to change at line 949 skipping to change at line 948
loop dimension is given by the maximal size found in any of loop dimension is given by the maximal size found in any of
the ndarrays having this extra dimension. the ndarrays having this extra dimension.
=item R3 =item R3
For all ndarrays that have a given extra dimension the size must be For all ndarrays that have a given extra dimension the size must be
equal to the size of the loop dimension (as determined by the equal to the size of the loop dimension (as determined by the
previous rule) or 1; otherwise you raise a runtime exception. If the previous rule) or 1; otherwise you raise a runtime exception. If the
size of the extra dimension in an ndarray is one it is implicitly treated size of the extra dimension in an ndarray is one it is implicitly treated
as a dummy dimension of size equal to that loop dim size when as a dummy dimension of size equal to that loop dim size when
performing the thread loop. performing the broadcast loop.
=item R4 =item R4
If an ndarray doesn't have a loop dimension, in the thread loop this If an ndarray doesn't have a loop dimension, in the broadcast loop this
ndarray is treated as if having a dummy dimension of size equal to the ndarray is treated as if having a dummy dimension of size equal to the
size of that loop dimension. size of that loop dimension.
=item R5 =item R5
If output auto-creation is used (by setting the relevant ndarray to If output auto-creation is used (by setting the relevant ndarray to
C<PDL-E<gt>null> before invocation) the number of dimensions of the created C<PDL-E<gt>null> before invocation) the number of dimensions of the created
ndarray is equal to the sum of the number of core output dimensions + ndarray is equal to the sum of the number of core output dimensions +
number of loop dimensions. The size of the core output dimensions is number of loop dimensions. The size of the core output dimensions is
derived from the relevant dimension of input ndarrays (as specified in the derived from the relevant dimension of input ndarrays (as specified in the
skipping to change at line 998 skipping to change at line 997
sizes C<(10,11,12)> (by R2); the two output core dimensions are C<(5,2)> sizes C<(10,11,12)> (by R2); the two output core dimensions are C<(5,2)>
(from the signature of func) resulting in a 5-dimensional output ndarray (from the signature of func) resulting in a 5-dimensional output ndarray
C<$c> of size C<(5,2,10,11,12)> (see R5) and (the automatically created) C<$d> C<$c> of size C<(5,2,10,11,12)> (see R5) and (the automatically created) C<$d>
is derived from C<($x,$y,$z)> in a way that can be expressed in pdl is derived from C<($x,$y,$z)> in a way that can be expressed in pdl
pseudo-code as pseudo-code as
$d(:,:,i,j,k) .= func($x(:,:,i,j),$y(:,:,:,i,0,k),$z(:,0,j,k)) $d(:,:,i,j,k) .= func($x(:,:,i,j),$y(:,:,:,i,0,k),$z(:,0,j,k))
with 0<=i<10, 0<=j<=11, 0<=k<12 with 0<=i<10, 0<=j<=11, 0<=k<12
If we analyze the color to grey-scale conversion again with these rules If we analyze the color to grey-scale conversion again with these rules
in mind we note another great advantage of implicit threading. in mind we note another great advantage of implicit broadcasting.
We can call the conversion with an ndarray representing a pixel (C<im(3)>), We can call the conversion with an ndarray representing a pixel (C<im(3)>),
a line of rgb pixels (C<im(3,x)>), a proper color image (C<im(3,x,y)>) or a a line of rgb pixels (C<im(3,x)>), a proper color image (C<im(3,x,y)>) or a
whole stack of RGB images (C<im(3,x,y,z)>). As long as C<$im> is of the form whole stack of RGB images (C<im(3,x,y,z)>). As long as C<$im> is of the form
C<(3,...)> the automatically created output ndarray will contain the right C<(3,...)> the automatically created output ndarray will contain the right
number of dimensions and contain the intensity data as we expect it number of dimensions and contain the intensity data as we expect it
since the loops have been implicitly performed thanks to I<implicit since the loops have been implicitly performed thanks to I<implicit
threading>. You can easily convince yourself that calling with a color broadcasting>. You can easily convince yourself that calling with a color
pixel C<$grey> is 0D, with a line it turns out 1D C<grey(x)>, with an image pixel C<$grey> is 0D, with a line it turns out 1D C<grey(x)>, with an image
we get C<grey(x,y)> and finally we get a converted image stack C<grey(x,y,z)>. we get C<grey(x,y)> and finally we get a converted image stack C<grey(x,y,z)>.
Let's fill these general rules with some more life by going through a Let's fill these general rules with some more life by going through a
couple of further examples. The reader may try to figure out equivalent couple of further examples. The reader may try to figure out equivalent
formulations with explicit for-looping and compare the flexibility of formulations with explicit for-looping and compare the flexibility of
those routines using implicit threading to the explicit those routines using implicit broadcasting to the explicit
formulation. Furthermore, especially when using several thread formulation. Furthermore, especially when using several broadcast
dimensions it is a useful exercise to check the relative speed dimensions it is a useful exercise to check the relative speed
by doing some benchmark tests (which we still have to do). by doing some benchmark tests (which we still have to do).
First in the row is a slightly reworked centroid example, now coded First in the row is a slightly reworked centroid example, now coded
with threading in mind. with broadcasting in mind.
# threaded mult to calculate centroid coords, works for stacks as well # broadcast mult to calculate centroid coords, works for stacks as well
$xc = sumover(($im*xvals(($im->dims)[0]))->clump(2)) / $xc = sumover(($im*xvals(($im->dims)[0]))->clump(2)) /
sumover($im->clump(2)); sumover($im->clump(2));
Let's analyze what's going on step by step. First the product: Let's analyze what's going on step by step. First the product:
$prod = $im*xvals(zeroes(($im->dims)[0])) $prod = $im*xvals(zeroes(($im->dims)[0]))
This will actually work for C<$im> being one, two, three, and higher This will actually work for C<$im> being one, two, three, and higher
dimensional. If C<$im> is one-dimensional it's just an ordinary product dimensional. If C<$im> is one-dimensional it's just an ordinary product
(in the sense that every element of C<$im> is multiplied with the (in the sense that every element of C<$im> is multiplied with the
respective element of C<xvals(...)>), if C<$im> has more dimensions further respective element of C<xvals(...)>), if C<$im> has more dimensions further
threading is done by adding appropriate dummy dimensions to C<xvals(...)> broadcasting is done by adding appropriate dummy dimensions to C<xvals(...)>
according to R4. according to R4.
More importantly, the two L<sumover|PDL::Ufunc/sumover> operations show More importantly, the two L<sumover|PDL::Ufunc/sumover> operations show
a first example of how to make use of the dimension manipulating a first example of how to make use of the dimension manipulating
commands. A quick look at sumover's signature will remind you that commands. A quick look at sumover's signature will remind you that
it will only "gobble up" the first dimension of a given input ndarray. But it will only "gobble up" the first dimension of a given input ndarray. But
what if we want to really compute the sum over all elements of the what if we want to really compute the sum over all elements of the
first two dimensions? Well, nothing keeps us from passing a virtual first two dimensions? Well, nothing keeps us from passing a virtual
ndarray into sumover which in this case is formed by clumping the first ndarray into sumover which in this case is formed by clumping the first
two dimensions of the "parent ndarray" into one. From the point of view of two dimensions of the "parent ndarray" into one. From the point of view of
the parent ndarray the sum is now computed over the first two dimensions, the parent ndarray the sum is now computed over the first two dimensions,
skipping to change at line 1066 skipping to change at line 1065
L<prodover|PDL::Ufunc/prodover>, L<minimum|PDL::Ufunc/minimum> and L<prodover|PDL::Ufunc/prodover>, L<minimum|PDL::Ufunc/minimum> and
L<maximum|PDL::Ufunc/maximum>. L<maximum|PDL::Ufunc/maximum>.
Using again images as examples we might want to calculate Using again images as examples we might want to calculate
the maximum pixel value for each line of an image or image stack. We the maximum pixel value for each line of an image or image stack. We
know how to do that know how to do that
# maxima of lines (as function of line number and time) # maxima of lines (as function of line number and time)
maximum($stack,($ret=null)); maximum($stack,($ret=null));
But what if you want to calculate maxima per column when implicit But what if you want to calculate maxima per column when implicit
threading always applies the core functionality to the first dimension broadcasting always applies the core functionality to the first dimension
and threads over all others? How can we achieve that instead the and broadcasts over all others? How can we achieve that instead the
core functionality is applied to the second dimension and threading is core functionality is applied to the second dimension and broadcasting is
done over the others. Can you guess it? Yes, we make a virtual ndarray done over the others. Can you guess it? Yes, we make a virtual ndarray
that has the second dimension of the "parent ndarray" as its first that has the second dimension of the "parent ndarray" as its first
dimension using the C<mv> command. dimension using the C<mv> command.
# maxima of columns (as function of column number and time) # maxima of columns (as function of column number and time)
maximum($stack->mv(1,0),($ret=null)); maximum($stack->mv(1,0),($ret=null));
and calculating all the sums of sub-slices over the third dimension and calculating all the sums of sub-slices over the third dimension
is now almost too easy is now almost too easy
skipping to change at line 1092 skipping to change at line 1091
Finally, if you want to apply the operation to all elements (like max over Finally, if you want to apply the operation to all elements (like max over
all elements or sum over all elements) regardless of the dimensions of all elements or sum over all elements) regardless of the dimensions of
the ndarray in question C<clump> comes in handy. As an example look at a the ndarray in question C<clump> comes in handy. As an example look at a
definition of C<sum> (summarised from F<Basic/Ufunc/ufunc.pd>): definition of C<sum> (summarised from F<Basic/Ufunc/ufunc.pd>):
sub sum { sub sum {
PDL::Ufunc::sumover($name->clump(-1),($tmp=null)); PDL::Ufunc::sumover($name->clump(-1),($tmp=null));
return $tmp; # return a 0D ndarray return $tmp; # return a 0D ndarray
} }
We have already mentioned that all basic operations support threading We have already mentioned that all basic operations support broadcasting
and assignment is no exception. So here are a couple of threaded and assignment is no exception. So here are a couple of broadcasted
assignments assignments
pdl> $im = zeroes(byte, 10,20) pdl> $im = zeroes(byte, 10,20)
pdl> $line = exp(-rvals(10)**2/9) pdl> $line = exp(-rvals(10)**2/9)
# threaded assignment # broadcasted assignment
pdl> $im .= $line # set every line of $im to $line pdl> $im .= $line # set every line of $im to $line
pdl> $im2 .= 5 # set every element of $im2 to 5 pdl> $im2 .= 5 # set every element of $im2 to 5
By now you probably see how it works and what it does, don't you? By now you probably see how it works and what it does, don't you?
To finish the examples in this paragraph here is a function to create To finish the examples in this paragraph here is a function to create
an RGB image from what is called a palette image. The palette image an RGB image from what is called a palette image. The palette image
consists of two parts: an image of indices into a color lookup table consists of two parts: an image of indices into a color lookup table
and the color lookup table itself. [ describe how it works ] We and the color lookup table itself. [ describe how it works ] We
are going to use a PP-function we haven't encoutered yet in the previous are going to use a PP-function we haven't encoutered yet in the previous
examples. It is the aptly named L<index|PDL::Slices/index> function, signature examples. It is the aptly named L<index|PDL::Slices/index> function, signature
C<((n),(),[o]())> (see B<Appendix B>) with the core functionality that C<((n),(),[o]())> (see B<Appendix B>) with the core functionality that
C<index(pdl (0,2,4,5),2,($ret=null))> will return the element with index C<index(pdl (0,2,4,5),2,($ret=null))> will return the element with index
2 of the first input ndarray. In this case, C<$ret> will contain the value 4. 2 of the first input ndarray. In this case, C<$ret> will contain the value 4.
So here is the example: So here is the example:
# a threaded index lookup to generate an RGB, or RGBA or YMCK image # a broadcasted index lookup to generate an RGB, or RGBA or YMCK image
# from a palette image (represented by a lookup table $palette and # from a palette image (represented by a lookup table $palette and
# an color-index image $im) # an color-index image $im)
# you can say just dummy(0) since the rules of threading make it fit # you can say just dummy(0) since the rules of broadcasting make it fit
pdl> index($palette->xchg(0,1), pdl> index($palette->xchg(0,1),
$im->long->dummy(0,($palette->dim)[0]), $im->long->dummy(0,($palette->dim)[0]),
($res=null)); ($res=null));
Let's go through it and explain the steps involved. Assuming we are Let's go through it and explain the steps involved. Assuming we are
dealing with an RGB lookup-table $palette is of size C<(3,x)>. First we dealing with an RGB lookup-table $palette is of size C<(3,x)>. First we
exchange the dimensions of the palette so that looping is done over exchange the dimensions of the palette so that looping is done over
the first dimension of C<$palette> (of size 3 that represent r, g, and b the first dimension of C<$palette> (of size 3 that represent r, g, and b
components). Now looking at C<$im>, we add a dummy dimension of size components). Now looking at C<$im>, we add a dummy dimension of size
equal to the length of the number of components (in the case we are equal to the length of the number of components (in the case we are
skipping to change at line 1143 skipping to change at line 1142
e.g. e.g.
assuming a certain pixel of C<$im> had the value 4 then the assuming a certain pixel of C<$im> had the value 4 then the
lookup should produce the triple lookup should produce the triple
[palette(0,4),palette(1,4),palette(2,4)] [palette(0,4),palette(1,4),palette(2,4)]
for the new red, green and for the new red, green and
blue components of the output image. Hopefully by now you have some blue components of the output image. Hopefully by now you have some
sort of idea what the above piece of code is supposed to do (it is sort of idea what the above piece of code is supposed to do (it is
often actually quite complicated to describe in detail how a piece of often actually quite complicated to describe in detail how a piece of
threading code works; just go ahead and experiment a bit to get a broadcasting code works; just go ahead and experiment a bit to get a
better feeling for it). better feeling for it).
If you have read the threading rules carefully, then you might have If you have read the broadcasting rules carefully, then you might have
noticed that we didn't have to explicitly state the size of the dummy noticed that we didn't have to explicitly state the size of the dummy
dimension that we created for C<$im>; when we create it with size 1 (the dimension that we created for C<$im>; when we create it with size 1 (the
default) the rules of threading make it automatically fit to the default) the rules of broadcasting make it automatically fit to the
desired size (by rule R3, in our example the size would be 3 assuming desired size (by rule R3, in our example the size would be 3 assuming
a palette of size C<(3,x)>). Since situations like this do occur often in a palette of size C<(3,x)>). Since situations like this do occur often in
practice this is actually why rule R3 has been introduced (the part practice this is actually why rule R3 has been introduced (the part
that makes dimensions of size 1 fit to the thread loop dim size). So that makes dimensions of size 1 fit to the broadcast loop dim size). So
we can just say we can just say
pdl> index($palette->xchg(0,1),$im->long->dummy(0),($res=null)); pdl> index($palette->xchg(0,1),$im->long->dummy(0),($res=null));
Again, you can convince yourself that this routine will create the Again, you can convince yourself that this routine will create the
right output if called with a pixel (C<$im> is 0D), a line (C<$im> is 1D), right output if called with a pixel (C<$im> is 0D), a line (C<$im> is 1D),
an image (C<$im> is 2D), ..., an RGB lookup table (palette is C<(3,x)>) and an image (C<$im> is 2D), ..., an RGB lookup table (palette is C<(3,x)>) and
RGBA lookup table (palette is C<(4,x)>, see e.g. OpenGL). This RGBA lookup table (palette is C<(4,x)>, see e.g. OpenGL). This
flexibility is achieved by the rules of threading which are made to do flexibility is achieved by the rules of broadcasting which are made to do
the right thing in most situations. the right thing in most situations.
To wrap it all up once again, the general idea is as follows. If you To wrap it all up once again, the general idea is as follows. If you
want to achieve looping over certain dimensions and have the I<core functionalit y> want to achieve looping over certain dimensions and have the I<core functionalit y>
applied to another specified set of dimensions you use applied to another specified set of dimensions you use
the dimension manipulating commands to create a (or several) the dimension manipulating commands to create a (or several)
I<virtual> ndarray(s) so that from the point of view of the I<parent> I<virtual> ndarray(s) so that from the point of view of the I<parent>
ndarray(s) you get what you want (always having the signature of the ndarray(s) you get what you want (always having the signature of the
function in question and R1-R5 in mind!). Easy, isn't it ? function in question and R1-R5 in mind!). Easy, isn't it ?
skipping to change at line 1186 skipping to change at line 1185
do with the general calling conventions of PP-functions and the do with the general calling conventions of PP-functions and the
automatic creation of output arguments. automatic creation of output arguments.
Basically, there are two ways of invoking PDL routines, namely Basically, there are two ways of invoking PDL routines, namely
$result = func($x,$y); $result = func($x,$y);
and and
func($x,$y,$result); func($x,$y,$result);
If you are only using implicit threading then the output variable can If you are only using implicit broadcasting then the output variable can
be automatically created by PDL. You flag that to the PP-function by be automatically created by PDL. You flag that to the PP-function by
setting the output argument to a special kind of ndarray that is returned setting the output argument to a special kind of ndarray that is returned
from a call to the function C<PDL-E<gt>null> that returns an essentially from a call to the function C<PDL-E<gt>null> that returns an essentially
"empty" ndarray (for those interested in details there is a flag "empty" ndarray (for those interested in details there is a flag
in the C pdl structure for this). The dimensions in the C pdl structure for this). The dimensions
of the created ndarray are determined by the rules of implicit of the created ndarray are determined by the rules of implicit
threading: the first dimensions are the core output dimensions to broadcasting: the first dimensions are the core output dimensions to
which the threading dimensions are appended (which are in turn which the broadcasting dimensions are appended (which are in turn
determined by the dimensions of the input ndarrays as described above). determined by the dimensions of the input ndarrays as described above).
So you can say So you can say
func($x,$y,($result=PDL->null)); func($x,$y,($result=PDL->null));
or or
$result = func($x,$y) $result = func($x,$y)
which are B<exactly> equivalent. which are B<exactly> equivalent.
Be warned that you can I<not> use output auto-creation when using Be warned that you can I<not> use output auto-creation when using
explicit threading (for reasons explained in the following section on explicit broadcasting (for reasons explained in the following section on
B<explicit threading>, the second variant of threading). B<explicit broadcasting>, the second variant of broadcasting).
In "tight" loops you probably want to avoid the implicit creation of a In "tight" loops you probably want to avoid the implicit creation of a
temporary ndarray in each step of the loop that comes along with the temporary ndarray in each step of the loop that comes along with the
"functional" style but rather say "functional" style but rather say
# create output ndarray of appropriate size only at first invocation # create output ndarray of appropriate size only at first invocation
$result = null; $result = null;
for (0...$n) { for (0...$n) {
func($x,$y,$result); # in all but the first invocation $result func($x,$y,$result); # in all but the first invocation $result
func2($y); # is defined and has the right size to func2($y); # is defined and has the right size to
# take the output provided $y's dims don't change # take the output provided $y's dims don't change
twiddle($result,$x); # do something from $result to $x for iteration twiddle($result,$x); # do something from $result to $x for iteration
} }
The take-home message of this section once more: be aware of the The take-home message of this section once more: be aware of the
limitation on output creation when using B<explicit threading>. limitation on output creation when using B<explicit broadcasting>.
=head2 Explicit threading =head2 Explicit broadcasting
Having so far only talked about the first flavour of threading it is Having so far only talked about the first flavour of broadcasting it is
now about time to introduce the second variant. Instead of shuffling now about time to introduce the second variant. Instead of shuffling
around dimensions all the time and relying on the rules of implicit around dimensions all the time and relying on the rules of implicit
threading to get it all right you sometimes might want to specify in a broadcasting to get it all right you sometimes might want to specify in a
more explicit way how to perform the thread loop. It is probably not more explicit way how to perform the broadcast loop. It is probably not
too surprising that this variant of the game is called I<explicit threading>. too surprising that this variant of the game is called I<explicit broadcasting>.
Now, before we create the wrong impression: it is not Now, before we create the wrong impression: it is not
either I<implicit> or I<explicit>; the two flavours do mix. But more either I<implicit> or I<explicit>; the two flavours do mix. But more
about that later. about that later.
The two most used functions with explicit threading are The two most used functions with explicit broadcasting are
L<thread|PDL::Core/PDL::thread> L<broadcast|PDL::Core/PDL::broadcast>
and L<unthread|PDL::Slices/unthread>. and L<unbroadcast|PDL::Slices/unbroadcast>.
We start with an example that illustrates typical We start with an example that illustrates typical
usage of the former: usage of the former:
[ # ** this is the worst possible example to start with ] [ # ** this is the worst possible example to start with ]
# but can be used to show that $mat += $line is different from # but can be used to show that $mat += $line is different from
# $mat->thread(0) += $line # $mat->broadcast(0) += $line
# explicit threading to add a vector to each column of a matrix # explicit broadcasting to add a vector to each column of a matrix
pdl> $mat = zeroes(4,3) pdl> $mat = zeroes(4,3)
pdl> $line = pdl (3.1416,2,-2) pdl> $line = pdl (3.1416,2,-2)
pdl> ($tmp = $mat->thread(0)) += $line pdl> ($tmp = $mat->broadcast(0)) += $line
In this example, C<$mat-E<gt>thread(0)> tells PDL that you want the second In this example, C<$mat-E<gt>broadcast(0)> tells PDL that you want the second
dimension of this ndarray to be threaded over first leading to a thread dimension of this ndarray to be broadcast over first leading to a broadcast
loop that can be expressed as loop that can be expressed as
for (j=0; j<3; j++) { for (j=0; j<3; j++) {
for (i=0; i<4; i++) { for (i=0; i<4; i++) {
mat(i,j) += src(j); mat(i,j) += src(j);
} }
} }
C<thread> takes a list of numbers as arguments which explicitly C<broadcast> takes a list of numbers as arguments which explicitly
specify which dimensions to thread over first. With the introduction specify which dimensions to broadcast over first. With the introduction
of explicit threading the dimensions of an ndarray are conceptually split into of explicit broadcasting the dimensions of an ndarray are conceptually split int
o
three different groups the latter two of which we have already three different groups the latter two of which we have already
encountered: thread dimensions, core dimensions and extra dimensions. encountered: broadcast dimensions, core dimensions and extra dimensions.
Conceptually, it is best to think of those dimensions of an ndarray that Conceptually, it is best to think of those dimensions of an ndarray that
have been specified in a call to C<thread> as being taken away from have been specified in a call to C<broadcast> as being taken away from
the set of normal dimensions and put on a separate stack. So assuming the set of normal dimensions and put on a separate stack. So assuming
we have an ndarray C<x(4,7,2,8)> saying we have an ndarray C<x(4,7,2,8)> saying
$y = $x->thread(2,1) $y = $x->broadcast(2,1)
creates a new virtual ndarray of dimension C<y(4,8)> (which we call the creates a new virtual ndarray of dimension C<y(4,8)> (which we call the
remaining dims) that also has 2 thread dimensions of size C<(2,7)>. For remaining dims) that also has 2 broadcast dimensions of size C<(2,7)>. For
the purposes of this document we write that symbolically as the purposes of this document we write that symbolically as
C<y(4,8){2,7}>. An important difference to the previous examples where C<y(4,8){2,7}>. An important difference to the previous examples where
only implicit threading was used is the fact that the core dimensions only implicit broadcasting was used is the fact that the core dimensions
are matched against the I<remaining dimensions> which are not are matched against the I<remaining dimensions> which are not
necessarily the first dimensions of the ndarray. We will now specify how necessarily the first dimensions of the ndarray. We will now specify how
the presence of thread dimensions changes the rules R1-R5 for the presence of broadcast dimensions changes the rules R1-R5 for
thread loops (which apply to the special case where none of the ndarray broadcast loops (which apply to the special case where none of the ndarray
arguments has any thread dimensions). arguments has any broadcast dimensions).
=over 4 =over 4
=item T0 =item T0
Core dimensions are matched against the first n I<remaining dimensions> Core dimensions are matched against the first n I<remaining dimensions>
of the ndarray argument (note the difference to R1). Any of the ndarray argument (note the difference to R1). Any
further I<remaining dimensions> are I<extra dimensions> and are used further I<remaining dimensions> are I<extra dimensions> and are used
to determine the I<implicit loop dimensions>. to determine the I<implicit loop dimensions>.
=item T1a =item T1a
The number of I<implicit loop dimensions> is equal to the maximal The number of I<implicit loop dimensions> is equal to the maximal
number of extra dimensions taken over the set of ndarray arguments. number of extra dimensions taken over the set of ndarray arguments.
=item T1b =item T1b
The number of I<explicit loop dimensions> is equal to the maximal The number of I<explicit loop dimensions> is equal to the maximal
number of thread dimensions taken over the set of ndarray arguments. number of broadcast dimensions taken over the set of ndarray arguments.
=item T1c =item T1c
The total number of I<loop dimensions> is equal to the sum of The total number of I<loop dimensions> is equal to the sum of
I<explicit loop dimensions> and I<implicit loop dimensions>. In the I<explicit loop dimensions> and I<implicit loop dimensions>. In the
thread loop, I<explicit loop dimensions> are threaded over first broadcast loop, I<explicit loop dimensions> are broadcasted over first
followed by I<implicit loop dimensions>. followed by I<implicit loop dimensions>.
=item T2 =item T2
The size of each of the I<loop dimensions> is derived from the size of The size of each of the I<loop dimensions> is derived from the size of
the respective dimensions of the ndarray arguments. It is given by the the respective dimensions of the ndarray arguments. It is given by the
maximal size found in any ndarrays having this thread dimension (for maximal size found in any ndarrays having this broadcast dimension (for
I<explicit loop dimensions>) or extra dimension (for I<explicit loop dimensions>) or extra dimension (for
I<implicit loop dimensions>). I<implicit loop dimensions>).
=item T3 =item T3
This rule applies to any I<explicit loop dimension> as well as any This rule applies to any I<explicit loop dimension> as well as any
I<implicit loop dimension>. For all ndarrays that have a given I<implicit loop dimension>. For all ndarrays that have a given
I<thread/extra dimension> the size must be equal to the size of the I<broadcast/extra dimension> the size must be equal to the size of the
respective I<explicit/implicit loop dimension> or 1; otherwise you respective I<explicit/implicit loop dimension> or 1; otherwise you
raise a runtime exception. If the size of a I<thread/extra dimension> raise a runtime exception. If the size of a I<broadcast/extra dimension>
of an ndarray is one it is implicitly treated as a dummy of an ndarray is one it is implicitly treated as a dummy
dimension of size equal to the I<explicit/implicit loop dimension>. dimension of size equal to the I<explicit/implicit loop dimension>.
=item T4 =item T4
If an ndarray doesn't have a I<thread/extra dimension> that corresponds to If an ndarray doesn't have a I<broadcast/extra dimension> that corresponds to
an I<explicit/implicit loop dimension>, in the thread loop this an I<explicit/implicit loop dimension>, in the broadcast loop this
ndarray is treated as if having a dummy dimension of size equal to the ndarray is treated as if having a dummy dimension of size equal to the
size of that loop dimension. size of that loop dimension.
=item T4a =item T4a
All ndarrays that do have I<thread dimensions> must have the same number of All ndarrays that do have I<broadcast dimensions> must have the same number of
thread dimensions. broadcast dimensions.
=item T5 =item T5
Output auto-creation cannot be used if any of the ndarray arguments has any Output auto-creation cannot be used if any of the ndarray arguments has any
I<thread dimensions>. Otherwise R5 applies. I<broadcast dimensions>. Otherwise R5 applies.
=back =back
The same restrictions apply with regard to implicit dummy dimensions The same restrictions apply with regard to implicit dummy dimensions
(created by application of T4) as already mentioned in the section (created by application of T4) as already mentioned in the section
on implicit threading: if any of the output ndarrays has an (explicit or on implicit broadcasting: if any of the output ndarrays has an (explicit or
implicitly created) greater-than-one dummy dimension a runtime implicitly created) greater-than-one dummy dimension a runtime
exception will be raised. exception will be raised.
Let us demonstrate these rules at work in a generic case. Let us demonstrate these rules at work in a generic case.
Suppose we have a (here unspecified) PP-function with Suppose we have a (here unspecified) PP-function with
the signature: the signature:
func((m,n),(m),(),[o](m)) func((m,n),(m),(),[o](m))
and you call it with 3 ndarrays C<a(5,3,10,11)>, C<b(3,5,10,1,12)>, C<c(10)> and an and you call it with 3 ndarrays C<a(5,3,10,11)>, C<b(3,5,10,1,12)>, C<c(10)> and an
output ndarray C<d(3,11,5,10,12)> (which can here I<not> be automatically output ndarray C<d(3,11,5,10,12)> (which can here I<not> be automatically
created) as created) as
func($x->thread(1,3),$y->thread(0,3),$c,$d->thread(0,1)) func($x->broadcast(1,3),$y->broadcast(0,3),$c,$d->broadcast(0,1))
From the signature of func and the above call the ndarrays split into From the signature of func and the above call the ndarrays split into
the following groups of core, extra and thread dimensions (written in the following groups of core, extra and broadcast dimensions (written in
the form C<pdl(core dims){thread dims}[extra dims]>): the form C<pdl(core dims){broadcast dims}[extra dims]>):
a(5,10){3,11}[] b(5){3,1}[10,12] c(){}[10] d(5){3,11}[10,12] a(5,10){3,11}[] b(5){3,1}[10,12] c(){}[10] d(5){3,11}[10,12]
With this to help us along (it is in general helpful to write the With this to help us along (it is in general helpful to write the
arguments down like this when you start playing with threading and arguments down like this when you start playing with broadcasting and
want to keep track of what is going on) we further deduce want to keep track of what is going on) we further deduce
that the number of explicit loop dimensions is 2 (by T1b from C<$a> and C<$b>) that the number of explicit loop dimensions is 2 (by T1b from C<$a> and C<$b>)
with sizes C<(3,11)> (by T2); 2 implicit loop dimensions (by T1a from C<$b> with sizes C<(3,11)> (by T2); 2 implicit loop dimensions (by T1a from C<$b>
and C<$d>) of size C<(10,12)> (by T2) and the elements of are computed from and C<$d>) of size C<(10,12)> (by T2) and the elements of are computed from
the input ndarrays in a way that can be expressed in pdl pseudo-code as the input ndarrays in a way that can be expressed in pdl pseudo-code as
for (l=0;l<12;l++) for (l=0;l<12;l++)
for (k=0;k<10;k++) for (k=0;k<10;k++)
for (j=0;j<11;j++) effect of treating it as dummy dim (index j) for (j=0;j<11;j++) effect of treating it as dummy dim (index j)
for (i=0;i<3;i++) | for (i=0;i<3;i++) |
d(i,j,:,k,l) = func(a(:,i,:,j),b(i,:,k,0,l),c(k)) d(i,j,:,k,l) = func(a(:,i,:,j),b(i,:,k,0,l),c(k))
Ugh, this example was really not easy in terms of bookkeeping. It Ugh, this example was really not easy in terms of bookkeeping. It
serves mostly as an example how to figure out what's going on when you serves mostly as an example how to figure out what's going on when you
encounter a complicated looking expression. But now it is really time encounter a complicated looking expression. But now it is really time
to show that threading is useful by giving some more of our so called to show that broadcasting is useful by giving some more of our so called
"practical" examples. "practical" examples.
[ The following examples will need some additional explanations in the [ The following examples will need some additional explanations in the
future. For the moment please try to live with the comments in the future. For the moment please try to live with the comments in the
code fragments. ] code fragments. ]
Example 1: Example 1:
*** inverse of matrix represented by eigvecs and eigvals *** inverse of matrix represented by eigvecs and eigvals
** given a symmetrical matrix M = A^T x diag(lambda_i) x A ** given a symmetrical matrix M = A^T x diag(lambda_i) x A
** => inverse M^-1 = A^T x diag(1/lambda_i) x A ** => inverse M^-1 = A^T x diag(1/lambda_i) x A
** first $tmp = diag(1/lambda_i)*A ** first $tmp = diag(1/lambda_i)*A
** then A^T * $tmp by threaded inner product ** then A^T * $tmp by broadcasted inner product
# index handling so that matrices print correct under pdl # index handling so that matrices print correct under pdl
$inv .= $evecs*0; # just copy to get appropriately sized output $inv .= $evecs*0; # just copy to get appropriately sized output
$tmp .= $evecs; # initialise, no back-propagation $tmp .= $evecs; # initialise, no back-propagation
($tmp2 = $tmp->thread(0)) /= $evals; # threaded division ($tmp2 = $tmp->broadcast(0)) /= $evals; # broadcasted division
# and now a matrix multiplication in disguise # and now a matrix multiplication in disguise
PDL::Primitive::inner($evecs->xchg(0,1)->thread(-1,1), PDL::Primitive::inner($evecs->xchg(0,1)->broadcast(-1,1),
$tmp->thread(0,-1), $tmp->broadcast(0,-1),
$inv->thread(0,1)); $inv->broadcast(0,1));
# alternative for matrix mult using implicit threading, # alternative for matrix mult using implicit broadcasting,
# first xchg only for transpose # first xchg only for transpose
PDL::Primitive::inner($evecs->xchg(0,1)->dummy(1), PDL::Primitive::inner($evecs->xchg(0,1)->dummy(1),
$tmp->xchg(0,1)->dummy(2), $tmp->xchg(0,1)->dummy(2),
($inv=null)); ($inv=null));
Example 2: Example 2:
# outer product by threaded multiplication # outer product by broadcasted multiplication
# stress that we need to do it with explicit call to my_biop1 # stress that we need to do it with explicit call to my_biop1
# when using explicit threading # when using explicit broadcasting
$res=zeroes(($x->dims)[0],($y->dims)[0]); $res=zeroes(($x->dims)[0],($y->dims)[0]);
my_biop1($x->thread(0,-1),$y->thread(-1,0),$res->(0,1),"*"); my_biop1($x->broadcast(0,-1),$y->broadcast(-1,0),$res->(0,1),"*");
# similar thing by implicit threading with auto-created ndarray # similar thing by implicit broadcasting with auto-created ndarray
$res = $x->dummy(1) * $y->dummy(0); $res = $x->dummy(1) * $y->dummy(0);
Example 3: Example 3:
# different use of thread and unthread to shuffle a number of # different use of broadcast and unbroadcast to shuffle a number of
# dimensions in one go without lots of calls to ->xchg and ->mv # dimensions in one go without lots of calls to ->xchg and ->mv
# use thread/unthread to shuffle dimensions around # use broadcast/unbroadcast to shuffle dimensions around
# just try it out and compare the child ndarray with its parent # just try it out and compare the child ndarray with its parent
$trans = $x->thread(4,1,0,3,2)->unthread; $trans = $x->broadcast(4,1,0,3,2)->unbroadcast;
Example 4: Example 4:
# calculate a couple of bounding boxes # calculate a couple of bounding boxes
# $bb will hold BB as [xmin,xmax],[ymin,ymax],[zmin,zmax] # $bb will hold BB as [xmin,xmax],[ymin,ymax],[zmin,zmax]
# we use again thread and unthread to shuffle dimensions around # we use again broadcast and unbroadcast to shuffle dimensions around
pdl> $bb = zeroes(double, 2,3 ); pdl> $bb = zeroes(double, 2,3 );
pdl> minimum($vertices->thread(0)->clump->unthread(1), $bb->slice('(0),:')); pdl> minimum($vertices->broadcast(0)->clump->unbroadcast(1), $bb->slice('(0),:'
pdl> maximum($vertices->thread(0)->clump->unthread(1), $bb->slice('(1),:')); ));
pdl> maximum($vertices->broadcast(0)->clump->unbroadcast(1), $bb->slice('(1),:'
));
Example 5: Example 5:
# calculate a self-rationed (i.e. self normalized) sequence of images # calculate a self-rationed (i.e. self normalized) sequence of images
# uses explicit threading and an implicitly threaded division # uses explicit broadcasting and an implicitly broadcasted division
$stack = read_image_stack(); $stack = read_image_stack();
# calculate the average (per pixel average) of the first $n+1 images # calculate the average (per pixel average) of the first $n+1 images
$aver = zeroes([stack->dims]->[0,1]); # make the output ndarray $aver = zeroes([stack->dims]->[0,1]); # make the output ndarray
sumover($stack->slice(":,:,0:$n")->thread(0,1),$aver); sumover($stack->slice(":,:,0:$n")->broadcast(0,1),$aver);
$aver /= ($n+1); $aver /= ($n+1);
$stack /= $aver; # normalize the stack by doing a threaded division $stack /= $aver; # normalize the stack by doing a broadcasted division
# implicit versus explicit # implicit versus explicit
# alternatively calculate $aver with implicit threading and auto-creation # alternatively calculate $aver with implicit broadcasting and auto-creation
sumover($stack->slice(":,:,0:$n")->mv(2,0),($aver=null)); sumover($stack->slice(":,:,0:$n")->mv(2,0),($aver=null));
$aver /= ($n+1); $aver /= ($n+1);
# #
=head2 Implicit versus explicit threading =head2 Implicit versus explicit broadcasting
In this paragraph we are going to illustrate when explicit threading In this paragraph we are going to illustrate when explicit broadcasting
is preferable over implicit threading and vice versa. But then again, is preferable over implicit broadcasting and vice versa. But then again,
this is probably not the best way of putting the case since you already this is probably not the best way of putting the case since you already
know: the two flavours do mix. So, it's more about how to get the best know: the two flavours do mix. So, it's more about how to get the best
of both worlds and, anyway, in the best of Perl traditions: TIMTOWTDI ! of both worlds and, anyway, in the best of Perl traditions: TIMTOWTDI !
[ Sorry, this still has to be filled in in a later release; either [ Sorry, this still has to be filled in in a later release; either
refer to above examples or choose some new ones ] refer to above examples or choose some new ones ]
Finally, this may be a good place to justify all the technical detail Finally, this may be a good place to justify all the technical detail
we have been going on about for a couple of pages: why threading ? we have been going on about for a couple of pages: why broadcasting ?
Well, code that uses threading should be (considerably) faster than Well, code that uses broadcasting should be (considerably) faster than
code that uses explicit for-loops (or similar Perl constructs) to achieve code that uses explicit for-loops (or similar Perl constructs) to achieve
the same functionality. Especially on supercomputers (with vector the same functionality. Especially on supercomputers (with vector
computing facilities/parallel processing) PDL threading will be computing facilities/parallel processing) PDL broadcasting is
implemented in a way that takes advantage of the additional facilities implemented in a way that takes advantage of the additional facilities
of these machines. Furthermore, it is a conceptually simple of these machines. Furthermore, it is a conceptually simple
construct (though technical details might get involved at times) and construct (though technical details might get involved at times) and
can I<greatly> reduce the syntactical complexity of PDL code (but keep can I<greatly> reduce the syntactical complexity of PDL code (but keep
the admonition for documentation in mind). Once you the admonition for documentation in mind). Once you
are comfortable with the I<threading> way of thinking (and coding) it are comfortable with the I<broadcasting> way of thinking (and coding) it
shouldn't be too difficult to understand code that somebody else shouldn't be too difficult to understand code that somebody else
has written than (provided he gave has written than (provided he gave
you an idea what expected input dimensions are, etc.). As a general tip to you an idea what expected input dimensions are, etc.). As a general tip to
increase the performance of your code: if you have to introduce a loop increase the performance of your code: if you have to introduce a loop
into your code try to reformulate the problem so that you can use into your code try to reformulate the problem so that you can use
threading to perform the loop (as with anything there are exceptions broadcasting to perform the loop (as with anything there are exceptions
to this rule of thumb; but the authors of this document tend to to this rule of thumb; but the authors of this document tend to
think that these are rare cases ;). think that these are rare cases ;).
=head1 PDL::PP =head1 PDL::PP
=head2 An easy way to define functions that are aware of indexing and threading (and the universe and everything) =head2 An easy way to define functions that are aware of indexing and broadcasti ng (and the universe and everything)
PDL:PP is part of the PDL distribution. It is used to generate PDL:PP is part of the PDL distribution. It is used to generate
functions that are aware of indexing and threading rules from very functions that are aware of indexing and broadcasting rules from very
concise descriptions. It can be useful for you if you want to write concise descriptions. It can be useful for you if you want to write
your own functions or if you want to interface functions from an your own functions or if you want to interface functions from an
external library so that they support indexing and threading (and external library so that they support indexing and broadcasting (and
maybe dataflow as well, see L<PDL::Dataflow>). For further details maybe dataflow as well, see L<PDL::Dataflow>). For further details
check L<PDL::PP>. check L<PDL::PP>.
=head1 Appendix A =head1 Appendix A
=head2 Affine transformations - a special class of simple and powerful transform ations =head2 Affine transformations - a special class of simple and powerful transform ations
[ This is also something to be added in future releases. Do we already [ This is also something to be added in future releases. Do we already
have the general make_affine routine in PDL ? It is possible that we have the general make_affine routine in PDL ? It is possible that we
will reference another appropriate man page from here ] will reference another appropriate man page from here ]
=head1 Appendix B =head1 Appendix B
=head2 signatures of standard PDL::PP compiled functions =head2 signatures of standard PDL::PP compiled functions
A selection of signatures of PDL primitives to show how many A selection of signatures of PDL primitives to show how many
dimensions PP compiled functions gobble up (and therefore you can dimensions PP compiled functions gobble up (and therefore you can
figure out what will be threaded over). Most of those functions are figure out what will be broadcasted over). Most of those functions are
the basic ones defined in C<primitive.pd> the basic ones defined in C<primitive.pd>
# functions in primitive.pd # functions in primitive.pd
# #
sumover ((n),[o]()) sumover ((n),[o]())
prodover ((n),[o]()) prodover ((n),[o]())
axisvalues ((n)) inplace axisvalues ((n)) inplace
inner ((n),(n),[o]()) inner ((n),(n),[o]())
outer ((n),(m),[o](n,m)) outer ((n),(m),[o](n,m))
innerwt ((n),(n),(n),[o]()) innerwt ((n),(n),(n),[o]())
 End of changes. 107 change blocks. 
143 lines changed or deleted 145 lines changed or added

Home  |  About  |  Features  |  All  |  Newest  |  Dox  |  Diffs  |  RSS Feeds  |  Screenshots  |  Comments  |  Imprint  |  Privacy  |  HTTP(S)