"Fossies" - the Fresh Open Source Software Archive

Member "install-tl-20200916/tlpkg/tlperl/lib/Encode/Encoding.pm" (10 Mar 2019, 9063 Bytes) of package /windows/misc/install-tl.zip:


As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) Perl source code syntax highlighting (style: standard) with prefixed line numbers and code folding option. Alternatively you can here view or download the uninterpreted source code file.

    1 package Encode::Encoding;
    2 
    3 # Base class for classes which implement encodings
    4 use strict;
    5 use warnings;
    6 our $VERSION = do { my @r = ( q$Revision: 2.8 $ =~ /\d+/g ); sprintf "%d." . "%02d" x $#r, @r };
    7 
    8 our @CARP_NOT = qw(Encode Encode::Encoder);
    9 
   10 use Carp ();
   11 use Encode ();
   12 use Encode::MIME::Name;
   13 
   14 use constant DEBUG => !!$ENV{PERL_ENCODE_DEBUG};
   15 
   16 sub Define {
   17     my $obj       = shift;
   18     my $canonical = shift;
   19     $obj = bless { Name => $canonical }, $obj unless ref $obj;
   20 
   21     # warn "$canonical => $obj\n";
   22     Encode::define_encoding( $obj, $canonical, @_ );
   23 }
   24 
   25 sub name { return shift->{'Name'} }
   26 
   27 sub mime_name {
   28     return Encode::MIME::Name::get_mime_name(shift->name);
   29 }
   30 
   31 sub renew {
   32     my $self = shift;
   33     my $clone = bless {%$self} => ref($self);
   34     $clone->{renewed}++;    # so the caller can see it
   35     DEBUG and warn $clone->{renewed};
   36     return $clone;
   37 }
   38 
   39 sub renewed { return $_[0]->{renewed} || 0 }
   40 
   41 *new_sequence = \&renew;
   42 
   43 sub needs_lines { 0 }
   44 
   45 sub perlio_ok {
   46     return eval { require PerlIO::encoding } ? 1 : 0;
   47 }
   48 
   49 # (Temporary|legacy) methods
   50 
   51 sub toUnicode   { shift->decode(@_) }
   52 sub fromUnicode { shift->encode(@_) }
   53 
   54 #
   55 # Needs to be overloaded or just croak
   56 #
   57 
   58 sub encode {
   59     my $obj = shift;
   60     my $class = ref($obj) ? ref($obj) : $obj;
   61     Carp::croak( $class . "->encode() not defined!" );
   62 }
   63 
   64 sub decode {
   65     my $obj = shift;
   66     my $class = ref($obj) ? ref($obj) : $obj;
   67     Carp::croak( $class . "->encode() not defined!" );
   68 }
   69 
   70 sub DESTROY { }
   71 
   72 1;
   73 __END__
   74 
   75 =head1 NAME
   76 
   77 Encode::Encoding - Encode Implementation Base Class
   78 
   79 =head1 SYNOPSIS
   80 
   81   package Encode::MyEncoding;
   82   use parent qw(Encode::Encoding);
   83 
   84   __PACKAGE__->Define(qw(myCanonical myAlias));
   85 
   86 =head1 DESCRIPTION
   87 
   88 As mentioned in L<Encode>, encodings are (in the current
   89 implementation at least) defined as objects. The mapping of encoding
   90 name to object is via the C<%Encode::Encoding> hash.  Though you can
   91 directly manipulate this hash, it is strongly encouraged to use this
   92 base class module and add encode() and decode() methods.
   93 
   94 =head2 Methods you should implement
   95 
   96 You are strongly encouraged to implement methods below, at least
   97 either encode() or decode().
   98 
   99 =over 4
  100 
  101 =item -E<gt>encode($string [,$check])
  102 
  103 MUST return the octet sequence representing I<$string>. 
  104 
  105 =over 2
  106 
  107 =item *
  108 
  109 If I<$check> is true, it SHOULD modify I<$string> in place to remove
  110 the converted part (i.e.  the whole string unless there is an error).
  111 If perlio_ok() is true, SHOULD becomes MUST.
  112 
  113 =item *
  114 
  115 If an error occurs, it SHOULD return the octet sequence for the
  116 fragment of string that has been converted and modify $string in-place
  117 to remove the converted part leaving it starting with the problem
  118 fragment.  If perlio_ok() is true, SHOULD becomes MUST.
  119 
  120 =item *
  121 
  122 If I<$check> is false then C<encode> MUST  make a "best effort" to
  123 convert the string - for example, by using a replacement character.
  124 
  125 =back
  126 
  127 =item -E<gt>decode($octets [,$check])
  128 
  129 MUST return the string that I<$octets> represents.
  130 
  131 =over 2
  132 
  133 =item *
  134 
  135 If I<$check> is true, it SHOULD modify I<$octets> in place to remove
  136 the converted part (i.e.  the whole sequence unless there is an
  137 error).  If perlio_ok() is true, SHOULD becomes MUST.
  138 
  139 =item *
  140 
  141 If an error occurs, it SHOULD return the fragment of string that has
  142 been converted and modify $octets in-place to remove the converted
  143 part leaving it starting with the problem fragment.  If perlio_ok() is
  144 true, SHOULD becomes MUST.
  145 
  146 =item *
  147 
  148 If I<$check> is false then C<decode> should make a "best effort" to
  149 convert the string - for example by using Unicode's "\x{FFFD}" as a
  150 replacement character.
  151 
  152 =back
  153 
  154 =back
  155 
  156 If you want your encoding to work with L<encoding> pragma, you should
  157 also implement the method below.
  158 
  159 =over 4
  160 
  161 =item -E<gt>cat_decode($destination, $octets, $offset, $terminator [,$check])
  162 
  163 MUST decode I<$octets> with I<$offset> and concatenate it to I<$destination>.
  164 Decoding will terminate when $terminator (a string) appears in output.
  165 I<$offset> will be modified to the last $octets position at end of decode.
  166 Returns true if $terminator appears output, else returns false.
  167 
  168 =back
  169 
  170 =head2 Other methods defined in Encode::Encodings
  171 
  172 You do not have to override methods shown below unless you have to.
  173 
  174 =over 4
  175 
  176 =item -E<gt>name
  177 
  178 Predefined As:
  179 
  180   sub name  { return shift->{'Name'} }
  181 
  182 MUST return the string representing the canonical name of the encoding.
  183 
  184 =item -E<gt>mime_name
  185 
  186 Predefined As:
  187 
  188   sub mime_name{
  189     return Encode::MIME::Name::get_mime_name(shift->name);
  190   }
  191 
  192 MUST return the string representing the IANA charset name of the encoding.
  193 
  194 =item -E<gt>renew
  195 
  196 Predefined As:
  197 
  198   sub renew {
  199     my $self = shift;
  200     my $clone = bless { %$self } => ref($self);
  201     $clone->{renewed}++;
  202     return $clone;
  203   }
  204 
  205 This method reconstructs the encoding object if necessary.  If you need
  206 to store the state during encoding, this is where you clone your object.
  207 
  208 PerlIO ALWAYS calls this method to make sure it has its own private
  209 encoding object.
  210 
  211 =item -E<gt>renewed
  212 
  213 Predefined As:
  214 
  215   sub renewed { $_[0]->{renewed} || 0 }
  216 
  217 Tells whether the object is renewed (and how many times).  Some
  218 modules emit C<Use of uninitialized value in null operation> warning
  219 unless the value is numeric so return 0 for false.
  220 
  221 =item -E<gt>perlio_ok()
  222 
  223 Predefined As:
  224 
  225   sub perlio_ok { 
  226     return eval { require PerlIO::encoding } ? 1 : 0;
  227   }
  228 
  229 If your encoding does not support PerlIO for some reasons, just;
  230 
  231  sub perlio_ok { 0 }
  232 
  233 =item -E<gt>needs_lines()
  234 
  235 Predefined As:
  236 
  237   sub needs_lines { 0 };
  238 
  239 If your encoding can work with PerlIO but needs line buffering, you
  240 MUST define this method so it returns true.  7bit ISO-2022 encodings
  241 are one example that needs this.  When this method is missing, false
  242 is assumed.
  243 
  244 =back
  245 
  246 =head2 Example: Encode::ROT13
  247 
  248   package Encode::ROT13;
  249   use strict;
  250   use parent qw(Encode::Encoding);
  251 
  252   __PACKAGE__->Define('rot13');
  253 
  254   sub encode($$;$){
  255       my ($obj, $str, $chk) = @_;
  256       $str =~ tr/A-Za-z/N-ZA-Mn-za-m/;
  257       $_[1] = '' if $chk; # this is what in-place edit means
  258       return $str;
  259   }
  260 
  261   # Jr pna or ynml yvxr guvf;
  262   *decode = \&encode;
  263 
  264   1;
  265 
  266 =head1 Why the heck Encode API is different?
  267 
  268 It should be noted that the I<$check> behaviour is different from the
  269 outer public API. The logic is that the "unchecked" case is useful
  270 when the encoding is part of a stream which may be reporting errors
  271 (e.g. STDERR).  In such cases, it is desirable to get everything
  272 through somehow without causing additional errors which obscure the
  273 original one. Also, the encoding is best placed to know what the
  274 correct replacement character is, so if that is the desired behaviour
  275 then letting low level code do it is the most efficient.
  276 
  277 By contrast, if I<$check> is true, the scheme above allows the
  278 encoding to do as much as it can and tell the layer above how much
  279 that was. What is lacking at present is a mechanism to report what
  280 went wrong. The most likely interface will be an additional method
  281 call to the object, or perhaps (to avoid forcing per-stream objects
  282 on otherwise stateless encodings) an additional parameter.
  283 
  284 It is also highly desirable that encoding classes inherit from
  285 C<Encode::Encoding> as a base class. This allows that class to define
  286 additional behaviour for all encoding objects.
  287 
  288   package Encode::MyEncoding;
  289   use parent qw(Encode::Encoding);
  290 
  291   __PACKAGE__->Define(qw(myCanonical myAlias));
  292 
  293 to create an object with C<< bless {Name => ...}, $class >>, and call
  294 define_encoding.  They inherit their C<name> method from
  295 C<Encode::Encoding>.
  296 
  297 =head2 Compiled Encodings
  298 
  299 For the sake of speed and efficiency, most of the encodings are now
  300 supported via a I<compiled form>: XS modules generated from UCM
  301 files.   Encode provides the enc2xs tool to achieve that.  Please see
  302 L<enc2xs> for more details.
  303 
  304 =head1 SEE ALSO
  305 
  306 L<perlmod>, L<enc2xs>
  307 
  308 =begin future
  309 
  310 =over 4
  311 
  312 =item Scheme 1
  313 
  314 The fixup routine gets passed the remaining fragment of string being
  315 processed.  It modifies it in place to remove bytes/characters it can
  316 understand and returns a string used to represent them.  For example:
  317 
  318  sub fixup {
  319    my $ch = substr($_[0],0,1,'');
  320    return sprintf("\x{%02X}",ord($ch);
  321  }
  322 
  323 This scheme is close to how the underlying C code for Encode works,
  324 but gives the fixup routine very little context.
  325 
  326 =item Scheme 2
  327 
  328 The fixup routine gets passed the original string, an index into
  329 it of the problem area, and the output string so far.  It appends
  330 what it wants to the output string and returns a new index into the
  331 original string.  For example:
  332 
  333  sub fixup {
  334    # my ($s,$i,$d) = @_;
  335    my $ch = substr($_[0],$_[1],1);
  336    $_[2] .= sprintf("\x{%02X}",ord($ch);
  337    return $_[1]+1;
  338  }
  339 
  340 This scheme gives maximal control to the fixup routine but is more
  341 complicated to code, and may require that the internals of Encode be tweaked to
  342 keep the original string intact.
  343 
  344 =item Other Schemes
  345 
  346 Hybrids of the above.
  347 
  348 Multiple return values rather than in-place modifications.
  349 
  350 Index into the string could be C<pos($str)> allowing C<s/\G...//>.
  351 
  352 =back
  353 
  354 =end future
  355 
  356 =cut