"Fossies" - the Fresh Open Source Software Archive  

Source code changes of the file "Basic/Pod/Dataflow.pod" between
PDL-2.078.tar.gz and PDL-2.079.tar.gz

About: PDL (Perl Data Language) aims to turn perl into an efficient numerical language for scientific computing (similar to IDL and MatLab).

Dataflow.pod  (PDL-2.078):Dataflow.pod  (PDL-2.079)
=head1 NAME =head1 NAME
PDL::Dataflow -- description of the dataflow philosophy PDL::Dataflow -- description of the dataflow implementation and philosophy
=head1 SYNOPSIS =head1 SYNOPSIS
pdl> $x = zeroes(10); pdl> $x = zeroes(10);
pdl> $y = $x->slice("2:4:2"); pdl> $y = $x->slice("2:4:2");
pdl> $y ++; pdl> $y ++;
pdl> print $x; pdl> print $x;
[0 0 1 0 1 0 0 0 0 0] [0 0 1 0 1 0 0 0 0 0]
=head1 WARNING =head1 DESCRIPTION
Dataflow is very experimental. Many features of it are disabled
for 2.0, particularly families for one-directional
dataflow. If you wish to use one-directional dataflow for
something, please contact the author first and we'll work out
how to make it functional again.
Two-directional dataflow (which implements ->slice() etc.)
is fully functional, however. Just about any function which
returns some subset of the values in some ndarray will make a binding
so that
$x = some ndarray
$y = $x->slice("some parts");
$y->set(3,3,10);
also changes the corresponding element in $x. $y has become effectively
a window to some sub-elements of $x. You can also define your own routines
that do different types of subsets. If you don't want $y to be a window
to $x, you must do
$y = $x->slice("some parts")->copy;
The copying turns off all dataflow between the two ndarrays.
The difficulties with one-directional As of 2.079, this is now a description of the current implementation,
dataflow are related to sequences like together with some design thoughts from its original author, Tuomas Lukka.
$y = $x + 1; Two-directional dataflow (which implements C<< ->slice() >> etc.)
$y ++; is fully functional, as shown in the SYNOPSIS. One-way is implemented,
but with restrictions.
=head1 TWO-WAY
Just about any function which returns some subset of the values in some
ndarray will make a binding. C<$y> has become effectively a window to
some sub-elements of C<$x>. You can also define your own routines that
do different types of subsets. If you don't want C<$y> to be a window
to C<$x>, you must do
$y = $x->slice("some parts")->sever;
The C<sever> destroys the C<slice> transform, thereby turning off all dataflow
between the two ndarrays.
=head2 Type conversions
This works, thanks to a two-way flowing transform that implements
type-conversions, particularly for supplied outputs of the "wrong"
type for the given transform:
pdl> $a_bad = pdl double, '[1 BAD 3]';
pdl> $b_float = zeroes float, 3;
pdl> $a_bad->assgn($b_float); # could be written as $b_float .= $a_bad
pdl> p $b_float->badflag;
1
pdl> p $b_float;
[1 BAD 3]
=head1 ONE-WAY
You need to explicitly turn on one-way dataflow on an ndarray to activate
it for non-flowing operations, so
pdl> $x = pdl 2,3,4;
pdl> $x->doflow;
pdl> $y = $x * 2;
pdl> print $y;
[4 6 8]
pdl> $x->set(0,5);
pdl> print $y;
[10 6 8]
It is not possible to turn on backwards dataflow (such as is used by
C<slice>-type operations), because there is no general way for PDL (or
maths, in fact) to know how to reverse most operations - consider
C<$z = $x * $y>, then adding one to C<$z>.
Consider the following code:
$u = sequence(3,3); $u->doflow;
$v = ones(3,3); $v->doflow;
$w = $u + $v; $w->doflow; # must turn on for each
$y = $w + 1; $y->doflow;
$x = $w->diagonal(0,1);
$x += 50;
$z = $w + 2;
What do $y and $z contain now?
pdl> p $y
[
[52 3 4]
[ 5 56 7]
[ 8 9 60]
]
pdl> p $z
[
[53 4 5]
[ 6 57 8]
[ 9 10 61]
]
What about when $u is changed and a recalculation is triggered? A problem
arises, in that PDL currently (as of 2.079) disallows (see F<pdlapi.c>),
for normal transforms, output ndarrays with flow, or output ndarrays
with any parent with dataflow. So C<$u++> throws an exception. But it
is currently possible to use C<set>, which is a sort of micro-transform
that calls (in the C API) C<PDL.set> to mutate the data, then
C<PDL.changed> to trigger flow updates:
pdl> $u->set(1,1,90)
pdl> p $y
[
[ 2 3 4]
[ 5 92 7]
[ 8 9 10]
]
You'll notice that while the setting of C<1,1> (the middle) of $u updated
$y, the changes to $y that resulted from adding 50 to the diagonal
(via $x, and two-way flow) got lost. This is one-way flow.
where there are several possible outcomes and the semantics get a little =head1 LAZY EVALUATION
murky.
=head1 DESCRIPTION In one-way flow context like the above, with:
Dataflow is new to PDL2.0. The basic philosophy pdl> $y = $x * 2;
behind dataflow is that
> $x = pdl 2,3,4;
> $y = $x * 2;
> print $y
[2 3 4]
> $x->set(0,5);
> print $y;
[10 3 4]
should work. It doesn't. It was considered that doing this
might be too confusing for novices and occasional users of the language.
Therefore, you need to explicitly turn on dataflow, so
> $x = pdl 2,3,4;
> $x->doflow();
> $y = $x * 2;
...
produces the unexpected result. The rest of this documents
explains various features and details of the dataflow implementation.
=head1 Lazy evaluation
When you calculate something like the above
> $x = pdl 2,3,4;
> $x->doflow();
> $y = $x * 2;
nothing will have been calculated at this point. Even the memory for nothing will have been calculated at this point. Even the memory for
the contents of $y has not been allocated. Only the command the contents of $y has not been allocated. Only the command
> print $y pdl> print $y
will actually cause $y to be calculated. This is important to bear will actually cause $y to be calculated. This is important to bear
in mind when doing performance measurements and benchmarks as well in mind when doing performance measurements and benchmarks as well
as when tracking errors. as when tracking errors.
There is an explanation for this behaviour: it may save cycles There is an explanation for this behaviour: it may save cycles
but more importantly, imagine the following: but more importantly, imagine the following:
> $x = pdl 2,3,4; pdl> $x = pdl 2,3,4; $x->doflow;
> $y = pdl 5,6,7; pdl> $y = pdl 5,6,7; $y->doflow;
> $c = $x + $y; pdl> $c = $x + $y;
... pdl> $x->setdims([4]);
> $x->resize(4); pdl> $y->setdims([4]);
> $y->resize(4); pdl> print $c;
> print $c;
Now, if $c were evaluated between the two resizes, an error condition Now, if $c were evaluated between the two resizes, an error condition
of incompatible sizes would occur. of incompatible sizes would occur.
What happens in the current version is that resizing $x raises What happens in the current version is that resizing $x raises
a flag in $c: "PDL_PARENTDIMSCHANGED" and $y just raises the same flag a flag in $c: "PDL_PARENTDIMSCHANGED" and $y just raises the same flag
again. When $c is next evaluated, the flags are checked and it is found again. When $c is next evaluated, the flags are checked and it is found
that a recalculation is needed. that a recalculation is needed.
Of course, lazy evaluation can sometimes make debugging more painful Of course, lazy evaluation can sometimes make debugging more painful
because errors may occur somewhere where you'd not expect them. because errors may occur somewhere where you'd not expect them.
A better stack trace for errors is in the works for PDL, probably
so that you can toggle a switch $PDL::traceevals and get a good trace
of where the error actually was.
=head1 Families
This is one of the more intricate concepts of one-directional dataflow.
Consider the following code ($x and $y are pdls that have dataflow enabled):
$w = $u + $v;
$y = $w + 1;
$x = $w->diagonal();
$x++;
$z = $w + 1;
What should $y and $z contain now? What about when $u is changed =head1 FAMILIES
and a recalculation is triggered.
This is one of the more intricate concepts of dataflow.
In order to make dataflow work like you'd expect, a rather strange In order to make dataflow work like you'd expect, a rather strange
concept must be introduced: families. Let us make a diagram: concept must be introduced: families. Let us make a diagram of the one-way
flow example - it uses a hypergraph because the transforms (with C<+>)
u v are connectors between ndarrays (with C<*>):
\ /
w u* *v
/| \ /
/ | +(plus)
y x |
1* *w
\ /|\
\ / | \
(plus)+ | +(diagonal)
| | |
y* | *x
|
| *1
|/
+(plus)
|
z*
This is what PDL actually has in memory after the first three lines. This is what PDL actually has in memory after the first three lines.
When $x is changed, we want $w to change but we don't want $y to change When $x is changed, $w changes due to C<diagonal> being a two-way operation.
because it already is on the graph. It may not be clear now why you don't
want it to change but if there were 40 lines of code between the 2nd
and 4th lines, you would. So we need to make a copy of $w and $x:
u v
\ /
w' . . . w
/| |\
/ | | \
y x' . . . x z
Notice that we primed the original w and x, because they do not correspond
to the objects in $w and $x any more. Also, notice the dotted lines
between the two objects: when $u is changed and this diagram is re-evaluated,
$w really does get the value of w' with the diagonal incremented.
To generalize on the above, whenever an ndarray is mutated i.e.
when its actual *value* is forcibly changed (not just the reference):
$x = $x + 1
would produce a completely different result ($w and $x would not be bound
any more whereas
$x .= $x + 1
would yield the same as $x++), a "family" consisting of all other ndarrays
joined to the mutated ndarray by a two-way transformation is created
and all those are copied.
All slices or transformations that simply select a subset of the original
pdl are two-way. Matrix inverse should be. No arithmetic
operators are.
=head1 Sources
What you were told in the previous section is not quite true: If you want flow from $w, you opt in using C<< $w->doflow >> (as shown
the behaviour described is not *always* what you want. Sometimes you in this scenario). If you didn't, then don't enable it. If you have it
would probably like to have a data "source": but want to stop it, call C<< $ndarray->sever >>. That will destroy the
ndarray's C<trans_parent> (here, a node marked with C<+>), and as you
$x = pdl 2,3,4; $y = pdl 5,6,7; can visually tell, will stop changes flowing thereafter. If you want to
$c = $x + $y; leave the flow operating, but get a copy of the ndarray at that point,
line($c); use C<< $ndarray->copy >> - it will have the same data at that moment,
but have no flow relationships.
Now, if you know that $x is going to change and that you want
its children to change with it, you can declare it into a data source =head1 EVENTS
(XXX unimplemented in current version):
There is the start of a mechanism to bind events onto changed data,
$x->datasource(1); intended to allow this to work:
After this, $x++ or $x .= something will not create a new family pdl> $x = pdl 2,3,4
but will alter $x and cut its relation with its previous parents. pdl> $y = $x + 1;
All its children will follow its current value. pdl> $c = $y * 2;
pdl> $c->bind( sub { print "A now: $x, C now: $c\n" } )
So if $c in the previous section had been declared as a source, pdl> PDL::dowhenidle();
$e and $f would remain equal.
=head1 Binding
A dataflow mechanism would not be very useful without the ability
to bind events onto changed data. Therefore, we provide such a mechanism:
> $x = pdl 2,3,4
> $y = $x + 1;
> $c = $y * 2;
> $c->bind( sub { print "A now: $x, C now: $c\n" } )
> PDL::dowhenidle();
A now: [2,3,4], C now: [6 8 10] A now: [2,3,4], C now: [6 8 10]
> $x->set(0,1); pdl> $x->set(0,1);
> $x->set(1,1); pdl> $x->set(1,1);
> PDL::dowhenidle(); pdl> PDL::dowhenidle();
A now: [1,1,4], C now: [4 4 10] A now: [1,1,4], C now: [4 4 10]
Notice how the callbacks only get called during PDL::dowhenidle. This hooks into PDL's C<magic> which resembles Perl's, but does not
An easy way to interface this to Perl event loop mechanisms currently operate.
(such as Tk) is being planned.
There are many kinds of uses for this feature: self-updating graphs,
for instance.
Blah blah blah XXX more explanation
=head1 Limitations
Dataflow as such is a fairly limited addition on top of Perl.
To get a more refined addition, the internals of Perl need to be
hacked a little. A true implementation would enable flow of everything,
including
=over 12
=item data
=item data size
=item datatype
=item operations
=back
At the moment we only have the first two (hey, 50% in a couple of months
is not bad ;) but even this is useful by itself. However, especially
the last one is desirable since it would add the possibility
of flowing closures from place to place and would make many things
more flexible.
To get the rest working, the internals of dataflow probably need to There would be many kinds of uses for this feature: self-updating charts,
be changed to be a more general framework. for instance. It is not yet fully clear whether it would be most useful
to queue up changes (useful for doing asynchronously, e.g. when idle),
or to activate things immediately.
In the 2022 era of both GPUs and multiple cores, it is a pity that
Perl's dominant model remains single-threaded on CPU, but PDL can use
multi-cores for CPU processing (albeit controlled in a single-threaded
style) - see L<PDL::ParallelCPU>. It is planned that PDL will gain the
ability to use GPUs, and there might be a way to hook that up albeit
probably with an event loop to "subscribe" to GPU events.
=head1 TRANSFORMATIONS
PDL implements nearly everything (except for XS oddities like
C<set>) using transforms which connect ndarrays. This includes data
transformations like addition, "slicing" to access/operate on subsets,
and data-type conversions (which have two-way dataflow, see
L</Type conversions>).
Additionally, it would be nice to be able to flow data in time, This does not currently include a resizing transformation, and C<setdims>
lucid-like (so you could easily define all kinds of signal processing mutates its input. This is intended to change.
things).
=head1 AUTHOR =head1 AUTHOR
Copyright(C) 1997 Tuomas J. Lukka (lukka@fas.harvard.edu). Copyright(C) 1997 Tuomas J. Lukka (lukka@fas.harvard.edu).
Redistribution in the same form is allowed provided that the copyright Same terms as the rest of PDL.
notice stays intact but reprinting requires
a permission from the author.
 End of changes. 20 change blocks. 
203 lines changed or deleted 176 lines changed or added

Home  |  About  |  Features  |  All  |  Newest  |  Dox  |  Diffs  |  RSS Feeds  |  Screenshots  |  Comments  |  Imprint  |  Privacy  |  HTTP(S)