Terminal keymap — arrow keys, modifier combos, kitty CSI u protocol

What does \x1b[A mean when you press the Up arrow, why does Ctrl+Up emit \x1b[1;5A, and why is the lone ESC key the hardest keypress to detect reliably? This page covers the input-side ANSI vocabulary — the bytes a terminal sends to your application for each keypress — across DECCKM cursor-key mode, the xterm 1;mod modifier convention, the F1-F4 SS3 vs F5+ CSI tilde split, the three competing Home/End encodings, the modern kitty CSI u protocol that retires every legacy ambiguity, and the 20-50ms timeout pattern every TUI uses to disambiguate a lone ESC from the start of a sequence.

Output vs input

Output vs input — same vocabulary, opposite directions

ansicode's per-sequence pages document the OUTPUT side: bytes an application writes to render colour, move the cursor, or set a window title. This page covers the INPUT side: bytes a terminal emits back to the application for each keypress. The two directions share the same CSI / SS3 / OSC vocabulary — `\x1b[` for CSI, `\x1bO` for SS3 — but the roles of parser and producer are swapped. The terminal owns the keyboard translation table (configured by `infocmp $TERM` capabilities `kcuu1` up-arrow, `kf1` F1, `khome` Home, etc.), and the application is on the receiving end. The combinations that make this interesting: arrow keys can come in two forms depending on DECCKM mode, modifier combos use a CSI parameter convention that varies across emulators, the lone ESC key looks identical to the start of any sequence, and the kitty keyboard protocol is the modern attempt to retire all the ambiguity.

`cat -v` shows ESC as `^[`, so the literal bytes are visible character-for-character.

# Inspect what your terminal actually emits for a key.
# Run this, then press the key once, then press Enter to exit.
cat -v
# Example output for Up arrow under tmux:
# ^[[A
# Example output for Ctrl+Up under xterm:
# ^[[1;5A

Arrow keys (DECCKM)

Arrow keys — CSI \e[A vs SS3 \eOA toggled by DECCKM

Cursor keys emit one of two byte sequences depending on DECCKM (DEC Cursor Keys Mode, CSI ? 1 h to set, CSI ? 1 l to reset). In NORMAL mode (the default), Up/Down/Right/Left = `\x1b[A` / `\x1b[B` / `\x1b[C` / `\x1b[D` (CSI). In APPLICATION mode, the same keys = `\x1bOA` / `\x1bOB` / `\x1bOC` / `\x1bOD` (SS3 — escape + capital O). vim and emacs typically set DECCKM application-mode on entry and reset on exit; bash readline reads both forms via the `\eOA` (or `\e[A`) bindings in inputrc. **Note**: DECCKM is independent of DECKPAM/DECKPNM (keypad application/numeric mode — see /sequence/deckpam-deckpnm); arrow keys are controlled by DECCKM, NOT DECKPAM. Apps that conflate the two read arrows incorrectly when the keypad mode flips.

Always bind both forms when authoring a readline-style input layer.

# Toggle DECCKM and observe Up arrow.
printf '\e[?1h'   # application cursor keys → Up = \eOA
printf '\e[?1l'   # normal cursor keys      → Up = \e[A

# Bash inputrc example — bind both forms:
# "\e[A": previous-history
# "\eOA": previous-history

Modifier keys

Modifier convention — \e[1;<mod><letter>, mod = 1 + bitmask

When a modifier (Shift, Alt, Ctrl, Meta) is held with a cursor or function key, xterm-family terminals emit the same final byte but with a modifier parameter wedged into the CSI: `\x1b[1;<mod>A` for modified Up, `\x1b[1;<mod>P` for modified F1, etc. The `mod` value is `1 + (Shift?1:0) + (Alt?2:0) + (Ctrl?4:0) + (Meta?8:0)`, so 2=Shift, 3=Alt, 4=Shift+Alt, 5=Ctrl, 6=Shift+Ctrl, 7=Alt+Ctrl, 8=Shift+Alt+Ctrl. Examples: `\x1b[1;2A` (Shift+Up), `\x1b[1;5A` (Ctrl+Up), `\x1b[1;6A` (Shift+Ctrl+Up), `\x1b[15;5~` (Ctrl+F5). Older Linux console and some VT100 emulators ignore the modifier param entirely. The XTMODKEYS / XTQMODKEYS pair (see /sequence/xtmodkeys + /sequence/xtqmodkeys) lets an app set or read the modifyCursorKeys / modifyFunctionKeys runtime mode that controls whether the modifier param is emitted at all.

Memorise the 1-2-4-8 bitmask once; it applies to every modified arrow + F-key.

# Modifier byte = 1 + Shift + 2·Alt + 4·Ctrl + 8·Meta
# Shift+Up        \e[1;2A
# Alt+Up          \e[1;3A
# Shift+Alt+Up    \e[1;4A
# Ctrl+Up         \e[1;5A
# Shift+Ctrl+Up   \e[1;6A
# Alt+Ctrl+Up     \e[1;7A
# Shift+Alt+Ctrl  \e[1;8A
# Same shape for F-keys: Ctrl+F5 = \e[15;5~

Function keys F1–F12

Function keys — F1–F4 are SS3, F5+ are CSI tilde

Function-key encoding is the most fragmented part of input. The xterm-family contract: F1 = `\x1bOP`, F2 = `\x1bOQ`, F3 = `\x1bOR`, F4 = `\x1bOS` (all SS3 — escape + capital O + letter, like application-mode arrows). F5 and up switch to CSI tilde: F5 = `\x1b[15~`, F6 = `\x1b[17~`, F7 = `\x1b[18~`, F8 = `\x1b[19~`, F9 = `\x1b[20~`, F10 = `\x1b[21~`, F11 = `\x1b[23~`, F12 = `\x1b[24~`. The gap at 16 and 22 is historical — DEC VT220 used those numbers for keys (Help, Do) that xterm doesn't expose. Linux console disagrees: F1-F5 = `\x1b[[A` through `\x1b[[E` (a non-standard double-bracket form), F6-F12 then join the CSI-tilde sequence. Always consult `infocmp` or the kitty keyboard protocol instead of hard-coding.

F1–F4 break the F-key pattern; F5+ obeys the modifier convention.

# Modified F-keys reuse the modifier convention.
# Shift+F5         \e[15;2~
# Ctrl+F5          \e[15;5~
# Alt+F12          \e[24;3~
# Shift+Ctrl+F11   \e[23;6~

# Linux console F1-F5 are the outlier: \e[[A through \e[[E

Home / End / PgUp / PgDn / Backspace

Special keys — three competing Home/End encodings, Backspace = 0x7F or 0x08

Home/End are the most-fragmented special keys. **xterm** emits `\x1b[H` (Home) and `\x1b[F` (End) — the same final bytes as cursor-position-without-params. **vt220** emits `\x1b[1~` (Home) and `\x1b[4~` (End). **rxvt** emits `\x1b[7~` (Home) and `\x1b[8~` (End). PgUp = `\x1b[5~` and PgDn = `\x1b[6~` are universal. Insert = `\x1b[2~`, Delete = `\x1b[3~`. **Backspace** is even worse — xterm and Linux console emit DEL (`0x7F`, terminfo `kbs`), but Windows console and many telnet hosts emit BS (`\x08`). Tools that read `^?` (0x7F) literally and ignore 0x08 break on Windows; the canonical fix is to bind both in your input layer. Modified forms follow the same convention as F-keys: `\x1b[H` Home, `\x1b[1;5H` Ctrl+Home, `\x1b[1;2F` Shift+End.

Three Home + three End + two Backspace bindings keep readline portable across terminals.

# Bind every Home/End encoding readline might see.
# In ~/.inputrc:
"\e[H":  beginning-of-line
"\e[1~": beginning-of-line
"\e[7~": beginning-of-line
"\e[F":  end-of-line
"\e[4~": end-of-line
"\e[8~": end-of-line
# Backspace defensively:
"\C-?":  backward-delete-char  # DEL  (0x7F)
"\C-h":  backward-delete-char  # BS   (0x08)

Kitty keyboard (CSI u)

Kitty keyboard protocol — CSI u, disambiguating every keypress

The kitty keyboard protocol (sometimes called 'CSI u', after its final byte) replaces every legacy encoding with a uniform shape: `\x1b[<codepoint>;<mod>u` for any key, where `codepoint` is the Unicode codepoint of the unmodified key (e.g. 27 for ESC, 13 for Enter, 32 for Space, 97 for 'a') and `mod` is the same 1+bitmask used by xterm-family. Apps opt in with `\x1b[>{flags}u` (push) and restore with `\x1b[<u` (pop). Flag bits: 1 = disambiguate (default — distinguish ESC from sequence start, Ctrl+I from Tab, Ctrl+M from Enter), 2 = report event types (press / repeat / release), 4 = report alternate keys (shifted-form + base-form of layout-dependent keys), 8 = report all keys as escape codes (no plain ASCII), 16 = include associated text. Adoption: kitty (origin), foot, WezTerm, ghostty, Konsole 24.02+, neovide; not yet xterm, gnome-terminal, alacritty, Windows Terminal. Detect support via the XTGETTCAP query (terminfo cap `kkbds`) or a DECRQM probe and fall back to the legacy encodings.

Always pair push/pop in a SIGINT/SIGTERM handler — a crash mid-app leaves the user's shell stuck in CSI-u mode.

# Opt into kitty keyboard, run app, restore on exit.
printf '\e[>1u'       # push: disambiguate flag
your-app
printf '\e[<u'        # pop: restore previous flag set

# Sample shapes the app then sees:
# 'a'           \e[97u
# Ctrl+a        \e[97;5u
# Shift+Enter   \e[13;2u
# Lone ESC      \e[27u   (NOT mistakable for a sequence start)
# F1            \e[57364u  (function keys also unified)

Reading keys reliably

Reading keys reliably — lone ESC vs sequence start, the 20–50ms timeout

The single hardest input bug: a lone ESC press (the user wanting to cancel a vim insert mode) starts with the exact same byte as every CSI / SS3 sequence. The terminal does NOT send a separator. The classical fix is a timer — if the byte after ESC arrives within 20–50ms, treat it as the start of a sequence; otherwise treat the ESC as standalone. Most TUI libraries (ncurses, blessed, crossterm, termion, tcell, ratatui, bubbletea) handle this for you with the `ESCDELAY` env var (ncurses, default 1000ms — too long for modern feel; users routinely set ESCDELAY=25). Raw-mode primitives by language: bash `read -rsn1` (one byte, no echo) loops with `read -t 0.05`; Python `termios.tcsetattr` + `select.select(timeout=0.05)`; Rust `crossterm::event::poll(Duration::from_millis(50))`; Go `tcell.Screen.PollEvent`. The kitty keyboard protocol's flag 1 (disambiguate) eliminates this entire problem by sending `\e[27u` for lone ESC — no ambiguity, no timeout — which is the practical reason every modern TUI is racing to adopt it.

50ms is long enough to catch the rest of any real sequence and short enough to feel snappy to the user.

# Bash readline-style escape disambiguation in pure bash.
IFS= read -rsn1 c
if [ "$c" = $'\e' ]; then
  # Could be start of a sequence — wait up to 50ms for more bytes.
  IFS= read -rsn1 -t 0.05 c2 && rest="$c2" || rest=""
  if [ -z "$rest" ]; then
    echo "Lone ESC pressed"
  else
    echo "Sequence start: \\e$rest..."
  fi
fi

See also