ParallelCPU.pod (PDL-2.074) | : | ParallelCPU.pod (PDL-2.075) | ||
---|---|---|---|---|

skipping to change at line 40 | skipping to change at line 40 | |||

# processing operation. | # processing operation. | |||

$actualPthreads = get_autopthread_actual(); | $actualPthreads = get_autopthread_actual(); | |||

# Or compare these to see CPU usage (first one only 1 pthread, second one 10) | # Or compare these to see CPU usage (first one only 1 pthread, second one 10) | |||

# in the PDL shell: | # in the PDL shell: | |||

$x = ones(10,1000,10000); set_autopthread_targ(1); $y = sin($x)*cos($x); p get _autopthread_actual; | $x = ones(10,1000,10000); set_autopthread_targ(1); $y = sin($x)*cos($x); p get _autopthread_actual; | |||

$x = ones(10,1000,10000); set_autopthread_targ(10); $y = sin($x)*cos($x); p ge t_autopthread_actual; | $x = ones(10,1000,10000); set_autopthread_targ(10); $y = sin($x)*cos($x); p ge t_autopthread_actual; | |||

=head1 Terminology | =head1 Terminology | |||

The use of the term I<threading> can be confusing with PDL, because it can refer | To reduce the confusion that existed in PDL before 2.075, this document uses | |||

to I<PDL threading>, | ||||

as defined in the L<PDL::Threading> docs, or to I<processor multi-threading>. | ||||

To reduce confusion with the existing PDL threading terminology, this document u | ||||

ses | ||||

B<pthreading> to refer to I<processor multi-threading>, which is the use of mult iple processor threads | B<pthreading> to refer to I<processor multi-threading>, which is the use of mult iple processor threads | |||

to split up numerical processing into parallel operations. | to split up numerical processing into parallel operations. | |||

=head1 Functions that control PDL pthreads | =head1 Functions that control PDL pthreads | |||

This is a brief listing and description of the PDL pthreading functions, see the L<PDL::Core> docs | This is a brief listing and description of the PDL pthreading functions, see the L<PDL::Core> docs | |||

for detailed information. | for detailed information. | |||

=over 5 | =over 5 | |||

skipping to change at line 92 | skipping to change at line 89 | |||

I<set_autopthread_size> functions made with the environment variable's values. | I<set_autopthread_size> functions made with the environment variable's values. | |||

For example, if the environment var B<PDL_AUTOPTHREAD_TARG> is set to 3, and B<P DL_AUTOPTHREAD_SIZE> is | For example, if the environment var B<PDL_AUTOPTHREAD_TARG> is set to 3, and B<P DL_AUTOPTHREAD_SIZE> is | |||

set to 10, then any pdl script will run as if the following lines were at the to p of the file: | set to 10, then any pdl script will run as if the following lines were at the to p of the file: | |||

set_autopthread_targ(3); | set_autopthread_targ(3); | |||

set_autopthread_size(10); | set_autopthread_size(10); | |||

=head1 How It Works | =head1 How It Works | |||

The auto-pthreading process works by analyzing threaded array dimensions in PDL | The auto-pthreading process works by analyzing broadcast array dimensions in PDL | |||

operations | operations (those above the operation's "signature" dimensions) | |||

and splitting up processing based on the thread dimension sizes and desired numb | and splitting up processing according to those and the desired number of | |||

er of | ||||

pthreads (i.e. the pthread target or pthread_targ). The offsets, | pthreads (i.e. the pthread target or pthread_targ). The offsets, | |||

increments, and dimension-sizes (in case the whole dimension does | increments, and dimension-sizes (in case the whole dimension does | |||

not divide neatly by the number of pthreads) that PDL uses to step | not divide neatly by the number of pthreads) that PDL uses to step | |||

thru the data in memory are modified for each pthread so each one sees a differe nt set of data when | thru the data in memory are modified for each pthread so each one sees a differe nt set of data when | |||

performing processing. | performing processing. | |||

B<Example> | B<Example> | |||

$x = sequence(20,4,3); # Small 3-D Array, size 20,4,3 | $x = sequence(20,4,3); # Small 3-D Array, size 20,4,3 | |||

# Setup auto-pthreading: | # Setup auto-pthreading: | |||

set_autopthread_targ(2); # Target of 2 pthreads | set_autopthread_targ(2); # Target of 2 pthreads | |||

set_autopthread_size(0); # Zero so that the small PDLs in this example will be pthreaded | set_autopthread_size(0); # Zero so that the small PDLs in this example will be pthreaded | |||

# This will be split up into 2 pthreads | # This will be split up into 2 pthreads | |||

$c = maximum($x); | $c = maximum($x); | |||

For the above example, the I<maximum> function has a signature of C<(a(n); [o]c( ))>, which means that the first | For the above example, the I<maximum> function has a signature of C<(a(n); [o]c( ))>, which means that the first | |||

dimension of $x (size 20) is a I<Core> dimension of the I<maximum> function. The other dimensions of $x (size 4,3) | dimension of $x (size 20) is a I<Core> dimension of the I<maximum> function. The other dimensions of $x (size 4,3) | |||

are I<threaded> dimensions (i.e. will be threaded-over in the I<maximum> functio n. | are I<broadcast> dimensions (i.e. will be broadcasted-over in the I<maximum> fun ction. | |||

The auto-pthreading algorithm examines the threaded dims of size (4,3) and picks the 4 dimension, | The auto-pthreading algorithm examines the broadcasted dims of size (4,3) and pi cks the 4 dimension, | |||

since it is evenly divisible by the autopthread_targ of 2. The processing of the maximum function is then | since it is evenly divisible by the autopthread_targ of 2. The processing of the maximum function is then | |||

split into two pthreads on the size-4 dimension, with dim indexes 0,2 processed by one pthread | split into two pthreads on the size-4 dimension, with dim indexes 0,2 processed by one pthread | |||

and dim indexes 1,3 processed by the other pthread. | and dim indexes 1,3 processed by the other pthread. | |||

=head1 Limitations | =head1 Limitations | |||

=head2 Must have POSIX Threads Enabled | =head2 Must have POSIX Threads Enabled | |||

Auto-pthreading only works if your PDL installation was compiled with POSIX thre ads enabled. This is normally | Auto-pthreading only works if your PDL installation was compiled with POSIX thre ads enabled. This is normally | |||

the case if you are running on Windows, Linux, MacOS X, or other unix variants. | the case if you are running on Windows, Linux, MacOS X, or other unix variants. | |||

skipping to change at line 138 | skipping to change at line 135 | |||

Not all the libraries that PDL intefaces to are thread-safe, i.e. they aren't wr itten to operate | Not all the libraries that PDL intefaces to are thread-safe, i.e. they aren't wr itten to operate | |||

in a multi-threaded environment without crashing or causing side-effects. Some e xamples in the PDL | in a multi-threaded environment without crashing or causing side-effects. Some e xamples in the PDL | |||

core is the I<fft> function and the I<pnmout> functions. | core is the I<fft> function and the I<pnmout> functions. | |||

To operate properly with these types of functions, the PPCode flag B<NoPthread> has been introduced to indicate | To operate properly with these types of functions, the PPCode flag B<NoPthread> has been introduced to indicate | |||

a function as I<not> being pthread-safe. See L<PDL::PP> docs for details. | a function as I<not> being pthread-safe. See L<PDL::PP> docs for details. | |||

=head2 Size of PDL Dimensions and pthread Target | =head2 Size of PDL Dimensions and pthread Target | |||

As of PDL 2.058, the threaded dimension sizes do not need to divide | As of PDL 2.058, the broadcasted dimension sizes do not need to divide | |||

exactly by the pthread target, although if one does, it will be | exactly by the pthread target, although if one does, it will be | |||

used. | used. | |||

If no dimension is as large as the pthread target, the number of | If no dimension is as large as the pthread target, the number of | |||

pthreads will be the size of the largest threaded dimension. | pthreads will be the size of the largest broadcasted dimension. | |||

In order to minimise idle CPUs on the last iteration at the end of | In order to minimise idle CPUs on the last iteration at the end of | |||

the threaded dimension, the algorithm that picks the dimension to | the broadcasted dimension, the algorithm that picks the dimension to | |||

pthread on aims for the largest remainder in dividing the pthread | pthread on aims for the largest remainder in dividing the pthread | |||

target into the sizes of the threaded dimensions. For example, if | target into the sizes of the broadcasted dimensions. For example, if | |||

a PDL has threaded dimension sizes of (9,6,2) and the I<auto_pthread_targ> | a PDL has broadcasted dimension sizes of (9,6,2) and the I<auto_pthread_targ> | |||

is 4, the algorithm will pick the 1-th (size 6), as that will leave | is 4, the algorithm will pick the 1-th (size 6), as that will leave | |||

a remainder of 2 (leaving 2 idle at the end) in preference to one | a remainder of 2 (leaving 2 idle at the end) in preference to one | |||

with size 9, which would leave 3 idle. | with size 9, which would leave 3 idle. | |||

=head2 Speed improvement might be less than you expect. | =head2 Speed improvement might be less than you expect. | |||

If you have an 8-core machine and call I<auto_pthread_targ> with 8 | If you have an 8-core machine and call I<auto_pthread_targ> with 8 | |||

to generate 8 parallel pthreads, you | to generate 8 parallel pthreads, you | |||

probably won't get a 8X improvement in speed, due to memory bandwidth issues. Ev en though you have 8 separate | probably won't get a 8X improvement in speed, due to memory bandwidth issues. Ev en though you have 8 separate | |||

CPUs crunching away on data, you will have (for most common machine architecture s) common RAM that now becomes | CPUs crunching away on data, you will have (for most common machine architecture s) common RAM that now becomes | |||

End of changes. 8 change blocks. | ||||

17 lines changed or deleted | | 11 lines changed or added |