"Fossies" - the Fresh Open Source Software Archive

Member "emacs-25.3/doc/lispref/strings.texi" (14 Apr 2017, 49679 Bytes) of package /linux/misc/emacs-25.3.tar.xz:


Caution: As a special service "Fossies" has tried to format the requested Texinfo source page into HTML format but that may be not always succeeeded perfectly. Alternatively you can here view or download the uninterpreted Texinfo source code. A member file download can also be achieved by clicking within a package contents listing on the according byte size field. See also the last Fossies "Diffs" side-by-side code changes report for "strings.texi": 25.1_vs_25.2.

[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1 Strings and Characters

A string in Emacs Lisp is an array that contains an ordered sequence of characters. Strings are used as names of symbols, buffers, and files; to send messages to users; to hold text being copied between buffers; and for many other purposes. Because strings are so important, Emacs Lisp has many functions expressly for manipulating them. Emacs Lisp programs use strings more often than individual characters.

@xref{Strings of Events}, for special considerations for strings of keyboard character events.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.1 String and Character Basics

A character is a Lisp object which represents a single character of text. In Emacs Lisp, characters are simply integers; whether an integer is a character or not is determined only by how it is used. @xref{Character Codes}, for details about character representation in Emacs.

A string is a fixed sequence of characters. It is a type of sequence called a array, meaning that its length is fixed and cannot be altered once it is created (@pxref{Sequences Arrays Vectors}). Unlike in C, Emacs Lisp strings are not terminated by a distinguished character code.

Since strings are arrays, and therefore sequences as well, you can operate on them with the general array and sequence functions documented in @ref{Sequences Arrays Vectors}. For example, you can access or change individual characters in a string using the functions aref and aset (@pxref{Array Functions}). However, note that length should not be used for computing the width of a string on display; use string-width (@pxref{Size of Displayed Text}) instead.

There are two text representations for non-ASCII characters in Emacs strings (and in buffers): unibyte and multibyte. For most Lisp programming, you don’t need to be concerned with these two representations. @xref{Text Representations}, for details.

Sometimes key sequences are represented as unibyte strings. When a unibyte string is a key sequence, string elements in the range 128 to 255 represent meta characters (which are large integers) rather than character codes in the range 128 to 255. Strings cannot hold characters that have the hyper, super or alt modifiers; they can hold ASCII control characters, but no other control characters. They do not distinguish case in ASCII control characters. If you want to store such characters in a sequence, such as a key sequence, you must use a vector instead of a string. @xref{Character Type}, for more information about keyboard input characters.

Strings are useful for holding regular expressions. You can also match regular expressions against strings with string-match (@pxref{Regexp Search}). The functions match-string (@pxref{Simple Match Data}) and replace-match (@pxref{Replacing Match}) are useful for decomposing and modifying strings after matching regular expressions against them.

Like a buffer, a string can contain text properties for the characters in it, as well as the characters themselves. @xref{Text Properties}. All the Lisp primitives that copy text from strings to buffers or other strings also copy the properties of the characters being copied.

@xref{Text}, for information about functions that display strings or copy them into buffers. @xref{Character Type}, and @ref{String Type}, for information about the syntax of characters and strings. @xref{Non-ASCII Characters}, for functions to convert between text representations and to encode and decode character codes.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.2 Predicates for Strings

For more information about general sequence and array predicates, see @ref{Sequences Arrays Vectors}, and @ref{Arrays}.

Function: stringp object

This function returns t if object is a string, nil otherwise.

Function: string-or-null-p object

This function returns t if object is a string or nil. It returns nil otherwise.

Function: char-or-string-p object

This function returns t if object is a string or a character (i.e., an integer), nil otherwise.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.3 Creating Strings

The following functions create strings, either from scratch, or by putting strings together, or by taking them apart.

Function: make-string count character

This function returns a string made up of count repetitions of character. If count is negative, an error is signaled.

(make-string 5 ?x)
     ⇒ "xxxxx"
(make-string 0 ?x)
     ⇒ ""

Other functions to compare with this one include make-vector (@pxref{Vectors}) and make-list (@pxref{Building Lists}).

Function: string &rest characters

This returns a string containing the characters characters.

(string ?a ?b ?c)
     ⇒ "abc"
Function: substring string &optional start end

This function returns a new string which consists of those characters from string in the range from (and including) the character at the index start up to (but excluding) the character at the index end. The first character is at index zero. With one argument, this function just copies string.

(substring "abcdefg" 0 3)
     ⇒ "abc"

In the above example, the index for ‘a’ is 0, the index for ‘b’ is 1, and the index for ‘c’ is 2. The index 3—which is the fourth character in the string—marks the character position up to which the substring is copied. Thus, ‘abc’ is copied from the string "abcdefg".

A negative number counts from the end of the string, so that -1 signifies the index of the last character of the string. For example:

(substring "abcdefg" -3 -1)
     ⇒ "ef"

In this example, the index for ‘e’ is -3, the index for ‘f’ is -2, and the index for ‘g’ is -1. Therefore, ‘e’ and ‘f’ are included, and ‘g’ is excluded.

When nil is used for end, it stands for the length of the string. Thus,

(substring "abcdefg" -3 nil)
     ⇒ "efg"

Omitting the argument end is equivalent to specifying nil. It follows that (substring string 0) returns a copy of all of string.

(substring "abcdefg" 0)
     ⇒ "abcdefg"

But we recommend copy-sequence for this purpose (@pxref{Sequence Functions}).

If the characters copied from string have text properties, the properties are copied into the new string also. @xref{Text Properties}.

substring also accepts a vector for the first argument. For example:

(substring [a b (c) "d"] 1 3)
     ⇒ [b (c)]

A wrong-type-argument error is signaled if start is not an integer or if end is neither an integer nor nil. An args-out-of-range error is signaled if start indicates a character following end, or if either integer is out of range for string.

Contrast this function with buffer-substring (@pxref{Buffer Contents}), which returns a string containing a portion of the text in the current buffer. The beginning of a string is at index 0, but the beginning of a buffer is at index 1.

Function: substring-no-properties string &optional start end

This works like substring but discards all text properties from the value. Also, start may be omitted or nil, which is equivalent to 0. Thus, (substring-no-properties string) returns a copy of string, with all text properties removed.

Function: concat &rest sequences

This function returns a new string consisting of the characters in the arguments passed to it (along with their text properties, if any). The arguments may be strings, lists of numbers, or vectors of numbers; they are not themselves changed. If concat receives no arguments, it returns an empty string.

(concat "abc" "-def")
     ⇒ "abc-def"
(concat "abc" (list 120 121) [122])
     ⇒ "abcxyz"
;; nil is an empty sequence.
(concat "abc" nil "-def")
     ⇒ "abc-def"
(concat "The " "quick brown " "fox.")
     ⇒ "The quick brown fox."
(concat)
     ⇒ ""

This function always constructs a new string that is not eq to any existing string, except when the result is the empty string (to save space, Emacs makes only one empty multibyte string).

For information about other concatenation functions, see the description of mapconcat in @ref{Mapping Functions}, vconcat in @ref{Vector Functions}, and append in @ref{Building Lists}. For concatenating individual command-line arguments into a string to be used as a shell command, see @ref{Shell Arguments, combine-and-quote-strings}.

Function: split-string string &optional separators omit-nulls trim

This function splits string into substrings based on the regular expression separators (@pxref{Regular Expressions}). Each match for separators defines a splitting point; the substrings between splitting points are made into a list, which is returned.

If omit-nulls is nil (or omitted), the result contains null strings whenever there are two consecutive matches for separators, or a match is adjacent to the beginning or end of string. If omit-nulls is t, these null strings are omitted from the result.

If separators is nil (or omitted), the default is the value of split-string-default-separators.

As a special case, when separators is nil (or omitted), null strings are always omitted from the result. Thus:

(split-string "  two words ")
     ⇒ ("two" "words")

The result is not ("" "two" "words" ""), which would rarely be useful. If you need such a result, use an explicit value for separators:

(split-string "  two words "
              split-string-default-separators)
     ⇒ ("" "two" "words" "")

More examples:

(split-string "Soup is good food" "o")
     ⇒ ("S" "up is g" "" "d f" "" "d")
(split-string "Soup is good food" "o" t)
     ⇒ ("S" "up is g" "d f" "d")
(split-string "Soup is good food" "o+")
     ⇒ ("S" "up is g" "d f" "d")

Empty matches do count, except that split-string will not look for a final empty match when it already reached the end of the string using a non-empty match or when string is empty:

(split-string "aooob" "o*")
     ⇒ ("" "a" "" "b" "")
(split-string "ooaboo" "o*")
     ⇒ ("" "" "a" "b" "")
(split-string "" "")
     ⇒ ("")

However, when separators can match the empty string, omit-nulls is usually t, so that the subtleties in the three previous examples are rarely relevant:

(split-string "Soup is good food" "o*" t)
     ⇒ ("S" "u" "p" " " "i" "s" " " "g" "d" " " "f" "d")
(split-string "Nice doggy!" "" t)
     ⇒ ("N" "i" "c" "e" " " "d" "o" "g" "g" "y" "!")
(split-string "" "" t)
     ⇒ nil

Somewhat odd, but predictable, behavior can occur for certain “non-greedy” values of separators that can prefer empty matches over non-empty matches. Again, such values rarely occur in practice:

(split-string "ooo" "o*" t)
     ⇒ nil
(split-string "ooo" "\\|o+" t)
     ⇒ ("o" "o" "o")

If the optional argument trim is non-nil, it should be a regular expression to match text to trim from the beginning and end of each substring. If trimming makes the substring empty, it is treated as null.

If you need to split a string into a list of individual command-line arguments suitable for call-process or start-process, see @ref{Shell Arguments, split-string-and-unquote}.

Variable: split-string-default-separators

The default value of separators for split-string. Its usual value is "[ \f\t\n\r\v]+".


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.4 Modifying Strings

The most basic way to alter the contents of an existing string is with aset (@pxref{Array Functions}). (aset string idx char) stores char into string at index idx. Each character occupies one or more bytes, and if char needs a different number of bytes from the character already present at that index, aset signals an error.

A more powerful function is store-substring:

Function: store-substring string idx obj

This function alters part of the contents of the string string, by storing obj starting at index idx. The argument obj may be either a character or a (smaller) string.

Since it is impossible to change the length of an existing string, it is an error if obj doesn’t fit within string’s actual length, or if any new character requires a different number of bytes from the character currently present at that point in string.

To clear out a string that contained a password, use clear-string:

Function: clear-string string

This makes string a unibyte string and clears its contents to zeros. It may also change string’s length.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.5 Comparison of Characters and Strings

Function: char-equal character1 character2

This function returns t if the arguments represent the same character, nil otherwise. This function ignores differences in case if case-fold-search is non-nil.

(char-equal ?x ?x)
     ⇒ t
(let ((case-fold-search nil))
  (char-equal ?x ?X))
     ⇒ nil
Function: string= string1 string2

This function returns t if the characters of the two strings match exactly. Symbols are also allowed as arguments, in which case the symbol names are used. Case is always significant, regardless of case-fold-search.

This function is equivalent to equal for comparing two strings (@pxref{Equality Predicates}). In particular, the text properties of the two strings are ignored; use equal-including-properties if you need to distinguish between strings that differ only in their text properties. However, unlike equal, if either argument is not a string or symbol, string= signals an error.

(string= "abc" "abc")
     ⇒ t
(string= "abc" "ABC")
     ⇒ nil
(string= "ab" "ABC")
     ⇒ nil

For technical reasons, a unibyte and a multibyte string are equal if and only if they contain the same sequence of character codes and all these codes are either in the range 0 through 127 (ASCII) or 160 through 255 (eight-bit-graphic). However, when a unibyte string is converted to a multibyte string, all characters with codes in the range 160 through 255 are converted to characters with higher codes, whereas ASCII characters remain unchanged. Thus, a unibyte string and its conversion to multibyte are only equal if the string is all ASCII. Character codes 160 through 255 are not entirely proper in multibyte text, even though they can occur. As a consequence, the situation where a unibyte and a multibyte string are equal without both being all ASCII is a technical oddity that very few Emacs Lisp programmers ever get confronted with. @xref{Text Representations}.

Function: string-equal string1 string2

string-equal is another name for string=.

Function: string-collate-equalp string1 string2 &optional locale ignore-case

This function returns t if string1 and string2 are equal with respect to collation rules. A collation rule is not only determined by the lexicographic order of the characters contained in string1 and string2, but also further rules about relations between these characters. Usually, it is defined by the locale environment Emacs is running with.

For example, characters with different coding points but the same meaning might be considered as equal, like different grave accent Unicode characters:

(string-collate-equalp (string ?\uFF40) (string ?\u1FEF))
     ⇒ t

The optional argument locale, a string, overrides the setting of your current locale identifier for collation. The value is system dependent; a locale "en_US.UTF-8" is applicable on POSIX systems, while it would be, e.g., "enu_USA.1252" on MS-Windows systems.

If ignore-case is non-nil, characters are converted to lower-case before comparing them.

To emulate Unicode-compliant collation on MS-Windows systems, bind w32-collate-ignore-punctuation to a non-nil value, since the codeset part of the locale cannot be "UTF-8" on MS-Windows.

If your system does not support a locale environment, this function behaves like string-equal.

Do not use this function to compare file names for equality, as filesystems generally don’t honor linguistic equivalence of strings that collation implements.

Function: string< string1 string2

This function compares two strings a character at a time. It scans both the strings at the same time to find the first pair of corresponding characters that do not match. If the lesser character of these two is the character from string1, then string1 is less, and this function returns t. If the lesser character is the one from string2, then string1 is greater, and this function returns nil. If the two strings match entirely, the value is nil.

Pairs of characters are compared according to their character codes. Keep in mind that lower case letters have higher numeric values in the ASCII character set than their upper case counterparts; digits and many punctuation characters have a lower numeric value than upper case letters. An ASCII character is less than any non-ASCII character; a unibyte non-ASCII character is always less than any multibyte non-ASCII character (@pxref{Text Representations}).

(string< "abc" "abd")
     ⇒ t
(string< "abd" "abc")
     ⇒ nil
(string< "123" "abc")
     ⇒ t

When the strings have different lengths, and they match up to the length of string1, then the result is t. If they match up to the length of string2, the result is nil. A string of no characters is less than any other string.

(string< "" "abc")
     ⇒ t
(string< "ab" "abc")
     ⇒ t
(string< "abc" "")
     ⇒ nil
(string< "abc" "ab")
     ⇒ nil
(string< "" "")
     ⇒ nil

Symbols are also allowed as arguments, in which case their print names are compared.

Function: string-lessp string1 string2

string-lessp is another name for string<.

Function: string-greaterp string1 string2

This function returns the result of comparing string1 and string2 in the opposite order, i.e., it is equivalent to calling (string-lessp string2 string1).

Function: string-collate-lessp string1 string2 &optional locale ignore-case

This function returns t if string1 is less than string2 in collation order. A collation order is not only determined by the lexicographic order of the characters contained in string1 and string2, but also further rules about relations between these characters. Usually, it is defined by the locale environment Emacs is running with.

For example, punctuation and whitespace characters might be ignored for sorting (@pxref{Sequence Functions}):

(sort '("11" "12" "1 1" "1 2" "1.1" "1.2") 'string-collate-lessp)
     ⇒ ("11" "1 1" "1.1" "12" "1 2" "1.2")

This behavior is system-dependent; e.g., punctuation and whitespace are never ignored on Cygwin, regardless of locale.

The optional argument locale, a string, overrides the setting of your current locale identifier for collation. The value is system dependent; a locale "en_US.UTF-8" is applicable on POSIX systems, while it would be, e.g., "enu_USA.1252" on MS-Windows systems. The locale value of "POSIX" or "C" lets string-collate-lessp behave like string-lessp:

(sort '("11" "12" "1 1" "1 2" "1.1" "1.2")
      (lambda (s1 s2) (string-collate-lessp s1 s2 "POSIX")))
     ⇒ ("1 1" "1 2" "1.1" "1.2" "11" "12")

If ignore-case is non-nil, characters are converted to lower-case before comparing them.

To emulate Unicode-compliant collation on MS-Windows systems, bind w32-collate-ignore-punctuation to a non-nil value, since the codeset part of the locale cannot be "UTF-8" on MS-Windows.

If your system does not support a locale environment, this function behaves like string-lessp.

Function: string-prefix-p string1 string2 &optional ignore-case

This function returns non-nil if string1 is a prefix of string2; i.e., if string2 starts with string1. If the optional argument ignore-case is non-nil, the comparison ignores case differences.

Function: string-suffix-p suffix string &optional ignore-case

This function returns non-nil if suffix is a suffix of string; i.e., if string ends with suffix. If the optional argument ignore-case is non-nil, the comparison ignores case differences.

Function: compare-strings string1 start1 end1 string2 start2 end2 &optional ignore-case

This function compares a specified part of string1 with a specified part of string2. The specified part of string1 runs from index start1 (inclusive) up to index end1 (exclusive); nil for start1 means the start of the string, while nil for end1 means the length of the string. Likewise, the specified part of string2 runs from index start2 up to index end2.

The strings are compared by the numeric values of their characters. For instance, str1 is considered less than str2 if its first differing character has a smaller numeric value. If ignore-case is non-nil, characters are converted to upper-case before comparing them. Unibyte strings are converted to multibyte for comparison (@pxref{Text Representations}), so that a unibyte string and its conversion to multibyte are always regarded as equal.

If the specified portions of the two strings match, the value is t. Otherwise, the value is an integer which indicates how many leading characters agree, and which string is less. Its absolute value is one plus the number of characters that agree at the beginning of the two strings. The sign is negative if string1 (or its specified portion) is less.

Function: assoc-string key alist &optional case-fold

This function works like assoc, except that key must be a string or symbol, and comparison is done using compare-strings. Symbols are converted to strings before testing. If case-fold is non-nil, key and the elements of alist are converted to upper-case before comparison. Unlike assoc, this function can also match elements of the alist that are strings or symbols rather than conses. In particular, alist can be a list of strings or symbols rather than an actual alist. @xref{Association Lists}.

See also the function compare-buffer-substrings in @ref{Comparing Text}, for a way to compare text in buffers. The function string-match, which matches a regular expression against a string, can be used for a kind of string comparison; see @ref{Regexp Search}.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.6 Conversion of Characters and Strings

This section describes functions for converting between characters, strings and integers. format (see section Formatting Strings) and prin1-to-string (@pxref{Output Functions}) can also convert Lisp objects into strings. read-from-string (@pxref{Input Functions}) can convert a string representation of a Lisp object into an object. The functions string-to-multibyte and string-to-unibyte convert the text representation of a string (@pxref{Converting Representations}).

@xref{Documentation}, for functions that produce textual descriptions of text characters and general input events (single-key-description and text-char-description). These are used primarily for making help messages.

Function: number-to-string number

This function returns a string consisting of the printed base-ten representation of number. The returned value starts with a minus sign if the argument is negative.

(number-to-string 256)
     ⇒ "256"
(number-to-string -23)
     ⇒ "-23"
(number-to-string -23.5)
     ⇒ "-23.5"

int-to-string is a semi-obsolete alias for this function.

See also the function format in Formatting Strings.

Function: string-to-number string &optional base

This function returns the numeric value of the characters in string. If base is non-nil, it must be an integer between 2 and 16 (inclusive), and integers are converted in that base. If base is nil, then base ten is used. Floating-point conversion only works in base ten; we have not implemented other radices for floating-point numbers, because that would be much more work and does not seem useful. If string looks like an integer but its value is too large to fit into a Lisp integer, string-to-number returns a floating-point result.

The parsing skips spaces and tabs at the beginning of string, then reads as much of string as it can interpret as a number in the given base. (On some systems it ignores other whitespace at the beginning, not just spaces and tabs.) If string cannot be interpreted as a number, this function returns 0.

(string-to-number "256")
     ⇒ 256
(string-to-number "25 is a perfect square.")
     ⇒ 25
(string-to-number "X256")
     ⇒ 0
(string-to-number "-4.5")
     ⇒ -4.5
(string-to-number "1e5")
     ⇒ 100000.0

string-to-int is an obsolete alias for this function.

Function: char-to-string character

This function returns a new string containing one character, character. This function is semi-obsolete because the function string is more general. See section Creating Strings.

Function: string-to-char string

This function returns the first character in string. This mostly identical to (aref string 0), except that it returns 0 if the string is empty. (The value is also 0 when the first character of string is the null character, ASCII code 0.) This function may be eliminated in the future if it does not seem useful enough to retain.

Here are some other functions that can convert to or from a string:

concat

This function converts a vector or a list into a string. See section Creating Strings.

vconcat

This function converts a string into a vector. @xref{Vector Functions}.

append

This function converts a string into a list. @xref{Building Lists}.

byte-to-string

This function converts a byte of character data into a unibyte string. @xref{Converting Representations}.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.7 Formatting Strings

Formatting means constructing a string by substituting computed values at various places in a constant string. This constant string controls how the other values are printed, as well as where they appear; it is called a format string.

Formatting is often useful for computing messages to be displayed. In fact, the functions message and error provide the same formatting feature described here; they differ from format-message only in how they use the result of formatting.

Function: format string &rest objects

This function returns a new string that is made by copying string and then replacing any format specification in the copy with encodings of the corresponding objects. The arguments objects are the computed values to be formatted.

The characters in string, other than the format specifications, are copied directly into the output, including their text properties, if any.

Function: format-message string &rest objects

This function acts like format, except it also converts any curved single quotes in string as per the value of text-quoting-style, and treats grave accent () and apostrophe () as if they were curved single quotes.

A format that quotes with grave accents and apostrophes ‘like this’ typically generates curved quotes ‘like this’. In contrast, a format that quotes with only apostrophes ’like this’ typically generates two closing curved quotes ’like this’, an unusual style in English. @xref{Keys in Documentation}, for how the text-quoting-style variable affects generated quotes.

A format specification is a sequence of characters beginning with a ‘%’. Thus, if there is a ‘%d’ in string, the format function replaces it with the printed representation of one of the values to be formatted (one of the arguments objects). For example:

(format "The value of fill-column is %d." fill-column)
     ⇒ "The value of fill-column is 72."

Since format interprets ‘%’ characters as format specifications, you should never pass an arbitrary string as the first argument. This is particularly true when the string is generated by some Lisp code. Unless the string is known to never include any ‘%’ characters, pass "%s", described below, as the first argument, and the string as the second, like this:

  (format "%s" arbitrary-string)

If string contains more than one format specification, the format specifications correspond to successive values from objects. Thus, the first format specification in string uses the first such value, the second format specification uses the second such value, and so on. Any extra format specifications (those for which there are no corresponding values) cause an error. Any extra values to be formatted are ignored.

Certain format specifications require values of particular types. If you supply a value that doesn’t fit the requirements, an error is signaled.

Here is a table of valid format specifications:

%s

Replace the specification with the printed representation of the object, made without quoting (that is, using princ, not prin1—@pxref{Output Functions}). Thus, strings are represented by their contents alone, with no ‘"’ characters, and symbols appear without ‘\’ characters.

If the object is a string, its text properties are copied into the output. The text properties of the ‘%s’ itself are also copied, but those of the object take priority.

%S

Replace the specification with the printed representation of the object, made with quoting (that is, using prin1—@pxref{Output Functions}). Thus, strings are enclosed in ‘"’ characters, and ‘\’ characters appear where necessary before special characters.

%o

Replace the specification with the base-eight representation of an unsigned integer.

%d

Replace the specification with the base-ten representation of a signed integer.

%x
%X

Replace the specification with the base-sixteen representation of an unsigned integer. ‘%x’ uses lower case and ‘%X’ uses upper case.

%c

Replace the specification with the character which is the value given.

%e

Replace the specification with the exponential notation for a floating-point number.

%f

Replace the specification with the decimal-point notation for a floating-point number.

%g

Replace the specification with notation for a floating-point number, using either exponential notation or decimal-point notation. The exponential notation is used if the exponent would be less than -4 or greater than or equal to the precision (default: 6). By default, trailing zeros are removed from the fractional portion of the result and a decimal-point character appears only if it is followed by a digit.

%%

Replace the specification with a single ‘%’. This format specification is unusual in that it does not use a value. For example, (format "%% %d" 30) returns "% 30".

Any other format character results in an ‘Invalid format operation’ error.

Here are several examples, which assume the typical text-quoting-style settings:

(format "The octal value of %d is %o,
         and the hex value is %x." 18 18 18)
     ⇒ "The octal value of 18 is 22,
         and the hex value is 12."

(format-message
 "The name of this buffer is ‘%s’." (buffer-name))
     ⇒ "The name of this buffer is ‘strings.texi’."

(format-message
 "The buffer object prints as `%s'." (current-buffer))
     ⇒ "The buffer object prints as ‘strings.texi’."

A specification can have a width, which is a decimal number between the ‘%’ and the specification character. If the printed representation of the object contains fewer characters than this width, format extends it with padding. The width specifier is ignored for the ‘%%’ specification. Any padding introduced by the width specifier normally consists of spaces inserted on the left:

(format "%5d is padded on the left with spaces" 123)
     ⇒ "  123 is padded on the left with spaces"

If the width is too small, format does not truncate the object’s printed representation. Thus, you can use a width to specify a minimum spacing between columns with no risk of losing information. In the following two examples, ‘%7s’ specifies a minimum width of 7. In the first case, the string inserted in place of ‘%7s’ has only 3 letters, and needs 4 blank spaces as padding. In the second case, the string "specification" is 13 letters wide but is not truncated.

(format "The word '%7s' has %d letters in it."
        "foo" (length "foo"))
     ⇒ "The word '    foo' has 3 letters in it."
(format "The word '%7s' has %d letters in it."
        "specification" (length "specification"))
     ⇒ "The word 'specification' has 13 letters in it."

Immediately after the ‘%’ and before the optional width specifier, you can also put certain flag characters.

The flag ‘+’ inserts a plus sign before a positive number, so that it always has a sign. A space character as flag inserts a space before a positive number. (Otherwise, positive numbers start with the first digit.) These flags are useful for ensuring that positive numbers and negative numbers use the same number of columns. They are ignored except for ‘%d’, ‘%e’, ‘%f’, ‘%g’, and if both flags are used, ‘+’ takes precedence.

The flag ‘#’ specifies an alternate form which depends on the format in use. For ‘%o’, it ensures that the result begins with a ‘0’. For ‘%x’ and ‘%X’, it prefixes the result with ‘0x’ or ‘0X’. For ‘%e’ and ‘%f’, the ‘#’ flag means include a decimal point even if the precision is zero. For ‘%g’, it always includes a decimal point, and also forces any trailing zeros after the decimal point to be left in place where they would otherwise be removed.

The flag ‘0’ ensures that the padding consists of ‘0’ characters instead of spaces. This flag is ignored for non-numerical specification characters like ‘%s’, ‘%S’ and ‘%c’. These specification characters accept the ‘0’ flag, but still pad with spaces.

The flag ‘-’ causes the padding inserted by the width specifier, if any, to be inserted on the right rather than the left. If both ‘-’ and ‘0’ are present, the ‘0’ flag is ignored.

(format "%06d is padded on the left with zeros" 123)
     ⇒ "000123 is padded on the left with zeros"

(format "'%-6d' is padded on the right" 123)
     ⇒ "'123   ' is padded on the right"

(format "The word '%-7s' actually has %d letters in it."
        "foo" (length "foo"))
     ⇒ "The word 'foo    ' actually has 3 letters in it."

All the specification characters allow an optional precision before the character (after the width, if present). The precision is a decimal-point ‘.’ followed by a digit-string. For the floating-point specifications (‘%e’ and ‘%f’), the precision specifies how many digits following the decimal point to show; if zero, the decimal-point itself is also omitted. For ‘%g’, the precision specifies how many significant digits to show (significant digits are the first digit before the decimal point and all the digits after it). If the precision of %g is zero or unspecified, it is treated as 1. For ‘%s’ and ‘%S’, the precision truncates the string to the given width, so ‘%.3s’ shows only the first three characters of the representation for object. For other specification characters, the effect of precision is what the local library functions of the printf family produce.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.8 Case Conversion in Lisp

The character case functions change the case of single characters or of the contents of strings. The functions normally convert only alphabetic characters (the letters ‘A’ through ‘Z’ and ‘a’ through ‘z’, as well as non-ASCII letters); other characters are not altered. You can specify a different case conversion mapping by specifying a case table (see section The Case Table).

These functions do not modify the strings that are passed to them as arguments.

The examples below use the characters ‘X’ and ‘x’ which have ASCII codes 88 and 120 respectively.

Function: downcase string-or-char

This function converts string-or-char, which should be either a character or a string, to lower case.

When string-or-char is a string, this function returns a new string in which each letter in the argument that is upper case is converted to lower case. When string-or-char is a character, this function returns the corresponding lower case character (an integer); if the original character is lower case, or is not a letter, the return value is equal to the original character.

(downcase "The cat in the hat")
     ⇒ "the cat in the hat"

(downcase ?X)
     ⇒ 120
Function: upcase string-or-char

This function converts string-or-char, which should be either a character or a string, to upper case.

When string-or-char is a string, this function returns a new string in which each letter in the argument that is lower case is converted to upper case. When string-or-char is a character, this function returns the corresponding upper case character (an integer); if the original character is upper case, or is not a letter, the return value is equal to the original character.

(upcase "The cat in the hat")
     ⇒ "THE CAT IN THE HAT"

(upcase ?x)
     ⇒ 88
Function: capitalize string-or-char

This function capitalizes strings or characters. If string-or-char is a string, the function returns a new string whose contents are a copy of string-or-char in which each word has been capitalized. This means that the first character of each word is converted to upper case, and the rest are converted to lower case.

The definition of a word is any sequence of consecutive characters that are assigned to the word constituent syntax class in the current syntax table (@pxref{Syntax Class Table}).

When string-or-char is a character, this function does the same thing as upcase.

(capitalize "The cat in the hat")
     ⇒ "The Cat In The Hat"
(capitalize "THE 77TH-HATTED CAT")
     ⇒ "The 77th-Hatted Cat"
(capitalize ?x)
     ⇒ 88
Function: upcase-initials string-or-char

If string-or-char is a string, this function capitalizes the initials of the words in string-or-char, without altering any letters other than the initials. It returns a new string whose contents are a copy of string-or-char, in which each word has had its initial letter converted to upper case.

The definition of a word is any sequence of consecutive characters that are assigned to the word constituent syntax class in the current syntax table (@pxref{Syntax Class Table}).

When the argument to upcase-initials is a character, upcase-initials has the same result as upcase.

(upcase-initials "The CAT in the hAt")
     ⇒ "The CAT In The HAt"

See section Comparison of Characters and Strings, for functions that compare strings; some of them ignore case differences, or can optionally ignore case differences.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.9 The Case Table

You can customize case conversion by installing a special case table. A case table specifies the mapping between upper case and lower case letters. It affects both the case conversion functions for Lisp objects (see the previous section) and those that apply to text in the buffer (@pxref{Case Changes}). Each buffer has a case table; there is also a standard case table which is used to initialize the case table of new buffers.

A case table is a char-table (@pxref{Char-Tables}) whose subtype is case-table. This char-table maps each character into the corresponding lower case character. It has three extra slots, which hold related tables:

upcase

The upcase table maps each character into the corresponding upper case character.

canonicalize

The canonicalize table maps all of a set of case-related characters into a particular member of that set.

equivalences

The equivalences table maps each one of a set of case-related characters into the next character in that set.

In simple cases, all you need to specify is the mapping to lower-case; the three related tables will be calculated automatically from that one.

For some languages, upper and lower case letters are not in one-to-one correspondence. There may be two different lower case letters with the same upper case equivalent. In these cases, you need to specify the maps for both lower case and upper case.

The extra table canonicalize maps each character to a canonical equivalent; any two characters that are related by case-conversion have the same canonical equivalent character. For example, since ‘a’ and ‘A’ are related by case-conversion, they should have the same canonical equivalent character (which should be either ‘a’ for both of them, or ‘A’ for both of them).

The extra table equivalences is a map that cyclically permutes each equivalence class (of characters with the same canonical equivalent). (For ordinary ASCII, this would map ‘a’ into ‘A’ and ‘A’ into ‘a’, and likewise for each set of equivalent characters.)

When constructing a case table, you can provide nil for canonicalize; then Emacs fills in this slot from the lower case and upper case mappings. You can also provide nil for equivalences; then Emacs fills in this slot from canonicalize. In a case table that is actually in use, those components are non-nil. Do not try to specify equivalences without also specifying canonicalize.

Here are the functions for working with case tables:

Function: case-table-p object

This predicate returns non-nil if object is a valid case table.

Function: set-standard-case-table table

This function makes table the standard case table, so that it will be used in any buffers created subsequently.

Function: standard-case-table

This returns the standard case table.

Function: current-case-table

This function returns the current buffer’s case table.

Function: set-case-table table

This sets the current buffer’s case table to table.

Macro: with-case-table table body…

The with-case-table macro saves the current case table, makes table the current case table, evaluates the body forms, and finally restores the case table. The return value is the value of the last form in body. The case table is restored even in case of an abnormal exit via throw or error (@pxref{Nonlocal Exits}).

Some language environments modify the case conversions of ASCII characters; for example, in the Turkish language environment, the ASCII capital I is downcased into a Turkish dotless i (‘ı’). This can interfere with code that requires ordinary ASCII case conversion, such as implementations of ASCII-based network protocols. In that case, use the with-case-table macro with the variable ascii-case-table, which stores the unmodified case table for the ASCII character set.

Variable: ascii-case-table

The case table for the ASCII character set. This should not be modified by any language environment settings.

The following three functions are convenient subroutines for packages that define non-ASCII character sets. They modify the specified case table case-table; they also modify the standard syntax table. @xref{Syntax Tables}. Normally you would use these functions to change the standard case table.

Function: set-case-syntax-pair uc lc case-table

This function specifies a pair of corresponding letters, one upper case and one lower case.

Function: set-case-syntax-delims l r case-table

This function makes characters l and r a matching pair of case-invariant delimiters.

Function: set-case-syntax char syntax case-table

This function makes char case-invariant, with syntax syntax.

Command: describe-buffer-case-table

This command displays a description of the contents of the current buffer’s case table.


[Top] [Contents] [Index] [ ? ]

About This Document

This document was generated on September 23, 2017 using texi2html.

The buttons in the navigation panels have the following meaning:

Button Name Go to From 1.2.3 go to
[ << ] FastBack Beginning of this chapter or previous chapter 1
[ < ] Back Previous section in reading order 1.2.2
[ Up ] Up Up section 1.2
[ > ] Forward Next section in reading order 1.2.4
[ >> ] FastForward Next chapter 2
[Top] Top Cover (top) of document  
[Contents] Contents Table of contents  
[Index] Index Index  
[ ? ] About About (help)  

where the Example assumes that the current position is at Subsubsection One-Two-Three of a document of the following structure:


This document was generated on September 23, 2017 using texi2html.