"Fossies" - the Fresh Open Source Software Archive

Member "perl-5.32.1/lib/unicore/README.perl" (18 Dec 2020, 8010 Bytes) of package /linux/misc/perl-5.32.1.tar.xz:


As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) Perl source code syntax highlighting (style: standard) with prefixed line numbers and code folding option. Alternatively you can here view or download the uninterpreted source code file.

    1 # The goal is for perl to compile and reasonably run any version of Unicode.
    2 # Working reasonably well doesn't mean that the test suite will run without
    3 # showing errors.  A few of the very-Unicode specific test files have been
    4 # modified to account for different versions, but most have not.  For example,
    5 # some tests use characters that aren't encoded in all Unicode versions; others
    6 # have hard-coded the General Categories for a code point that were correct at
    7 # the time the test was written.  Perl itself will not compile under Unicode
    8 # releases prior to 3.0 without a simple change to Unicode::Normalize.
    9 # mktables contains instructions for this.
   10 
   11 # The *.txt files were copied from
   12 
   13 #   ftp://www.unicode.org/Public/UNIDATA
   14 
   15 # (which always points to the latest version) with subdirectories 'extracted' and
   16 # 'auxiliary'.  Older versions are located under Public with an appropriate name.
   17 # They are also available via http at www.unicode.org/versions/
   18 #
   19 
   20 # The Unihan files were not included due to space considerations.  Also NOT
   21 # included were any *.html files.  It is possible to add the Unihan files and
   22 # have some properties from them automatically compiled.  By editing mktables
   23 # (see instructions near its beginning) you can add other Unihan properties.
   24 
   25 # The file named 'version' should exist and be a single line with the Unicode
   26 # version, like:
   27 #
   28 # 5.2.0
   29 #
   30 # (without the initial '# ')
   31 
   32 # To be 8.3 filesystem friendly, the names of some of the input files have been
   33 # changed from the values that are in the Unicode DB.  Not all of the Test
   34 # files are currently used, so may not be present, so some of the mv's can
   35 # fail.  The .html Test files are not touched.
   36 
   37 mv PropertyValueAliases.txt PropValueAliases.txt
   38 mv NamedSequencesProv.txt NamedSqProv.txt
   39 mv NormalizationTest.txt NormTest.txt
   40 mv DerivedAge.txt DAge.txt
   41 mv DerivedCoreProperties.txt DCoreProperties.txt
   42 mv DerivedNormalizationProps.txt DNormalizationProps.txt
   43 mv IdentifierStatus.txt IdStatus.txt
   44 mv IdentifierType.txt IdType.txt
   45 
   46 # Some early releases don't have the extracted directory, and hence these files
   47 # should be moved to it.
   48 mkdir extracted 2>/dev/null
   49 mv DerivedBidiClass.txt DerivedBinaryProperties.txt extracted 2>/dev/null
   50 mv DerivedCombiningClass.txt DerivedDecompositionType.txt extracted 2>/dev/null
   51 mv DerivedEastAsianWidth.txt DerivedGeneralCategory.txt extracted 2>/dev/null
   52 mv DerivedJoiningGroup.txt DerivedJoiningType.txt extracted 2>/dev/null
   53 mv DerivedLineBreak.txt DerivedNumericType.txt DerivedNumericValues.txt extracted 2>/dev/null
   54 
   55 mv extracted/DerivedBidiClass.txt extracted/DBidiClass.txt
   56 mv extracted/DerivedBinaryProperties.txt extracted/DBinaryProperties.txt
   57 mv extracted/DerivedCombiningClass.txt extracted/DCombiningClass.txt
   58 mv extracted/DerivedDecompositionType.txt extracted/DDecompositionType.txt
   59 mv extracted/DerivedEastAsianWidth.txt extracted/DEastAsianWidth.txt
   60 mv extracted/DerivedGeneralCategory.txt extracted/DGeneralCategory.txt
   61 mv extracted/DerivedJoiningGroup.txt extracted/DJoinGroup.txt
   62 mv extracted/DerivedJoiningType.txt extracted/DJoinType.txt
   63 mv extracted/DerivedLineBreak.txt extracted/DLineBreak.txt
   64 mv extracted/DerivedNumericType.txt extracted/DNumType.txt
   65 mv extracted/DerivedNumericValues.txt extracted/DNumValues.txt
   66 mv extracted/DerivedName.txt extracted/DName.txt
   67 rmdir extracted 2>/dev/null     # Will fail if non-empty, but if it is empty
   68                                 # was an early release that didn't have it.
   69 
   70 mv auxiliary/GraphemeBreakTest.txt auxiliary/GCBTest.txt
   71 mv auxiliary/LineBreakTest.txt auxiliary/LBTest.txt
   72 mv auxiliary/SentenceBreakTest.txt auxiliary/SBTest.txt
   73 mv auxiliary/WordBreakTest.txt auxiliary/WBTest.txt
   74 
   75 # If you have the Unihan database (5.2 and above), you should also do the
   76 # following:
   77 
   78 mv Unihan_DictionaryIndices.txt UnihanIndicesDictionary.txt
   79 mv Unihan_DictionaryLikeData.txt UnihanDataDictionaryLike.txt
   80 mv Unihan_IRGSources.txt UnihanIRGSources.txt
   81 mv Unihan_NumericValues.txt UnihanNumericValues.txt
   82 mv Unihan_OtherMappings.txt UnihanOtherMappings.txt
   83 mv Unihan_RadicalStrokeCounts.txt UnihanRadicalStrokeCounts.txt
   84 mv Unihan_Readings.txt UnihanReadings.txt
   85 mv Unihan_Variants.txt UnihanVariants.txt
   86 
   87 # If you download everything, the names of files that are not used by mktables
   88 # are not changed by the above, and hence may not work correctly as-is on 8.3
   89 # filesystems.
   90 
   91 # mktables is used to generate the tables used by the rest of Perl.  It will
   92 # warn you about any *.txt and *.html files in the directory substructure that
   93 # it doesn't know about.  You should remove any so-identified, or edit mktables
   94 # to add them to its lists to process.  You can run
   95 #
   96 #    mktables -globlist
   97 #
   98 # to have it try to process these tables generically.
   99 
  100 # COMPILING ON OLDER UNICODE VERSIONS
  101 #
  102 # To compile perl for use with an older Unicode release, delete everything in
  103 # the lib/unicore directory except mktables and Makefile.  Then download the
  104 # Unicode-supplied files for the desired version to that directory  (A url for
  105 # these is given earlier in this file).  Then create the 'version' file with a
  106 # single line, like '6.1.0'.  Do a 'make test' from the project level.  You
  107 # will get some porting errors for needing to regen.  Regenerate what it tells
  108 # you are needed, and make test again.  If you compile an old enough version,
  109 # you will also have to download a few files from later Unicode versions,
  110 # following the instructions that will be given if warranted.  It should
  111 # compile in any release without warnings, except for some casing conflicts
  112 # in Unicode 2.1.8, and some extraneous files will show up in very early
  113 # releases of the form qr/diff.*\.txt/.  If you add Unihan.txt, one line is in error in
  114 #
  115 # Other glitches are noted in mktables under 'UNICODE VERSIONS NOTES'
  116 
  117 # FOR PUMPKINS
  118 #
  119 # The files are inter-related.  If you take the latest UnicodeData.txt, for
  120 # example, but leave the older versions of other files, there can be subtle
  121 # problems.  So get everything available from Unicode, and delete those which
  122 # aren't needed.
  123 #
  124 # When moving to a new version of Unicode, you need to update 'version' by hand
  125 #
  126 #   p4 edit version
  127 #   ...
  128 #
  129 # You should look in the Unicode release notes (which are probably towards the
  130 # bottom of http://www.unicode.org/reports/tr44/) to see if any properties have
  131 # newly been moved to be Obsolete, Deprecated, or Stabilized.  The full names
  132 # for these should be added to the respective lists near the beginning of
  133 # mktables, using an 'if' to add them for just this Unicode version going
  134 # forward, so that mktables can continue to be used for earlier Unicode
  135 # versions.
  136 #
  137 # When putting out a new Perl release, think about if any of the Deprecated
  138 # properties should be moved to Suppressed.
  139 #
  140 # perlrecharclass.pod has a list of all the characters that are white space,
  141 # which needs to be updated if there are changes.  A quick way to check if
  142 # there have been changes would be to see if the number of such characters
  143 # listed in perluniprops.pod (generated by running mktables) for the property
  144 # \p{White_Space} is no longer 25.  Further investigation would then be
  145 # necessary to classify the new characters as horizontal and vertical.
  146 #
  147 # The code in regexec.c for the \X match construct is intimately tied to the
  148 # regular expression in UAX #29 (http://www.unicode.org/reports/tr29/).  You
  149 # should see if it has changed, and if so, regexec.c should be modified.  The
  150 # current one is
  151 # ( CRLF
  152 # | Prepend* ( RI-sequence | Hangul-Syllable | !Control )
  153 #   ( Grapheme_Extend | SpacingMark )*
  154 # | . )
  155 #
  156 # mktables has many checks to warn you if there are unexpected or novel things
  157 # that it doesn't know how to handle.
  158 #
  159 # Module::CoreList should be changed to include the new release
  160 #
  161 # Also, you should regen l1_char_class_tab.h, by
  162 #
  163 # perl regen/mk_L_charclass.pl
  164 #
  165 # and, regen charclass_invlists.h by
  166 #
  167 # perl regen/mk_invlists.pl
  168 #
  169 # Finally:
  170 #
  171 #   p4 submit
  172 #
  173 # --
  174 # jhi@iki.fi; updated by nick@ccl4.org, public@khwilliamson.com