Escape sequence

From Wikipedia the free encyclopedia

In computing, an escape sequence is a sequence of characters that has a special semantic meaning based on an established convention that specifies an escape character prefix in addition to the syntax of the rest of the text of a sequence.[1][2] A convention can define any particular character code as a sequence prefix. Some conventions use a normal, printable character such as backslash (\) or ampersand (&). Others use a non-printable (a.k.a. control) character such as ASCII escape.

Escape sequences date back at least to the 1874 Baudot code.[3][4][5]

Examples

[edit]

Data transmission

[edit]

A common use of an escape sequence is to remove control characters from a data stream so that it does not cause its control function by mistake. The control character is replaced with an escape character and one or more other subsequent characters. After escaping the normal context in which the control character would have caused an action, the sequence is replaced by the removed character.[6] To transmit the escape character itself, two copies are sent.[7]

Text literal

[edit]

An escape sequence is often used in character and string literals, to encode characters which are not printable or clash with the syntax of characters or strings. For example, control characters might not be allowed in a source file or may have undesirable side-effects if typed into a command.

In C and many derivative programming languages, a backslash (\) in a string literal marks the beginning of an escape sequence.[8][9] Common escape sequences include: carriage return \r, newline \n, tab \t. To account for the fact that using a printable character for escape causes that character to lose its normal meaning, a sequence of two backslash characters (\\) encodes a single backslash. An escape sequence can also specify a character by its code value. For example, the backslash can be encoded as either \x5c or \134 which specify the character code value as hexadecimal and octal, respectively.

A backslash immediately followed by a newline (which is necessarily outside of a string literal) does not mark an escape sequence. The C preprocessor joins the line with the subsequent line.[10]

Quoting escape

[edit]

When an escape character is needed within a string literal, there are two common strategies:

  • Doubled delimiter – For example, 'He didn''t do it.')[7]
  • Secondary escape sequence – For example, the command prompt command echo Cut^&Paste outputs "Cut&Paste" in by escaping the ampersand operator with a caret (^)[6]

In C and many related languages, the escape character is the backslash (\). The single quotation mark character can be coded as '\'' since ''' is not valid. As a string literal is delimited by double-quotes (") the content cannot contain a double-quote unless it is escaped ("\"") or via a sequence that specifies the code of the double-quote character (\x22).

In Perl or Python 2, the following is invalid syntax:

print "Nancy said "Hello World!" to the crowd." 

This can be fixed by inserted backslash to escape:

print "Nancy said \"Hello World!\" to the crowd." 

Alternatively, the following uses "\x" to indicate the subsequent two characters are hexadecimal digits; "22" being the hexadecimal ASCII value for double-quote.

print "Nancy said \x22Hello World!\x22 to the crowd." 

C, C++, Java, and Ruby allow the same two backslash escape styles. PostScript and rich text format (RTF) also use backslash escapes. The quoted-printable encoding uses the equals sign as an escape character. URL and URI use percent-encoding to quote characters with a special meaning, as for non-ASCII characters.

ANSI escape sequences

[edit]

The VT52 terminal used simple digraph commands like escape-A. Without the escape character prefix, A simply meant the letter A, but as part of the escape sequence escape-A, it had a different meaning. The VT52 also supported parameters. It was not a straightforward control language encoded as substitution.

The later VT100 terminal implemented the more sophisticated ANSI escape sequences standard (now ECMA-48) for functions such as controlling cursor movement, character set, and display enhancements. The HP 2640 series had perhaps the most elaborate escape sequences for block and character modes, programming keys and their soft labels, graphics vectors, and even saving data to tape or disk files.

In Windows (and older DOS), a utility, ANSI.SYS,[11] can be used to enable ANSI escape sequence support. In DOS via $e in the PROMPT command), and in 16-bit Windows via a command window. In Unix and Unix-like systems, the ANSI escape sequences are generally supported by the shell. The rise of GUI applications has reduced the use of escape sequences, yet the ability to provide full-screen, text-based applications is still available.

[edit]

Control sequence

[edit]

A control sequence is a sequence of characters that changes the state of a computer peripheral instead of conveying the normal information that the characters represent. In an ANSI escape sequence, the escape sequence prefix, called control sequence introducer, can be either ASCII ESC (decimal 27) followed by [ or CSI (decimal 155). Notable systems that did not use an escape character for control sequences include:

  • The Hayes command set defines a control sequence, +++ that is modal; switching from command to online mode. To ensure that the sequence is interpreted as a control sequence instead of embedded in content, the sender stops communication for one second before and after sending +++. When the modem detects condition, it switches from normal mode (sending characters to the phone) to a command mode in which the data is interpreted a command. Sending the O command switches back to the normal mode.[12][13][14][15]
  • Data General terminal control sequences,[16][17][18] but they often were still called escape sequences, and the very common use of "escaping" special characters in programming languages and command-line parameters today often use the "backslash" character to begin the sequence.

Escape sequences in communications are commonly used when a computer and a peripheral have only a single channel through which to send information back and forth (so escape sequences are an example of in-band signaling).[19][20] They were common when most dumb terminals used ASCII with 7 data bits for communication, and sometimes would be used to switch to a different character set for "foreign" or graphics characters that would otherwise been restricted by the 128 codes available in 7 data bits. Even relatively "dumb" terminals responded to some escape sequences, including the original mechanical Teletype printers (on which "glass Teletypes" or VDUs were based) responded to characters 27 and 31 to alternate between letters and figures modes.

Esc key

[edit]

Many computer keyboards have an Esc key (where Esc is short for escape) even though it is generally not used for entering an escape sequence. The vi text editor uses the key to exit from input mode.[21] Some application use the key to cancel an operation or navigate up a level of a nested context.[22]

See also

[edit]

References

[edit]
  1. ^ "Escape Sequence (General Concept)".
  2. ^ "Characters". The Java Tutorials.
  3. ^ "What is ASCII? The Economist explains". The Economist. 2013-06-09.
  4. ^ "Baudot and CCITT code". The Baudot code, invented in 1870 and patented in 1874 by J. Baudot is […]
  5. ^ "Guide to the use of Character Sets in Europe". elements C0 and C1 of control characters […] a 5-bit code patented by Jean-Maurice-Emile Baudot (1845-1903) in 1874
  6. ^ a b "The Windows NT Command Shell". 20 February 2014.
  7. ^ a b "Apostrophe Editing ('aaa') (FORTRAN 77 Language Reference)". Within the field, two consecutive apostrophes […]
  8. ^ "Escape Sequences". 3 August 2021. Character combinations consisting of a backslash \ followed by a letter or by a combination of digits are called escape sequences.
  9. ^ "Escape sequences". IBM.
  10. ^ "ISO/IEC 9899:201x Committee Draft N1570" (PDF). 5.1.1.2 Translation phases, 2.: Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines. [...]
  11. ^ "17. Understanding ANSI.SYS - Special Edition Using MS-DOS® 6.22, Third Edition [Book]". www.oreilly.com.
  12. ^ "Basic Hayes AT Command Set". 2011-02-05. +++ - "Escape Sequence" - This command initiates an escape sequence to return the modem to the on-line command mode
  13. ^ "Modem Programming Basics". When a modem is in command mode, the modem can accept commands from you
  14. ^ "Chapter 5 – AT Commands" (PDF).
  15. ^ "AT Command Set and Register Summary for Analog Modem Modules". Cisco.
  16. ^ "Data General terminals: discussion of". FTP server (FTP).[dead ftp link] (To view documents see Help:FTP)
  17. ^ "What's a Terminal?". www.kermitproject.org.
  18. ^ "Data General DG210 DG211 Terminal Emulation Software".
  19. ^ "Escape sequence".
  20. ^ "Terminals & Printers Handbook Glossary". vt100.net.
  21. ^ "Twelve Useful "vi" Commands". vi commands […] Pressing the Esc (Escape) key is how you […]
  22. ^ "Five Unexpected Uses for the Esc Key". PCworld. 2009-10-29.