从字符串中剥离 ANSI 转义码

移除文本中所有 ANSI / VT 转义序列 —— 适用于日志清洗、纯文本邮件、不支持颜色的终端，或希望产出便于管道处理的 CLI 输出。下面的正则匹配最常见形式的 CSI、OSC 与独立 ESC 序列。

正则（Perl 兼容）

PCRE

\x1b(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~]|\][^\x07\x1b]*(?:\x07|\x1b\\))

匹配 `ESC`（0x1b）后接以下其一：单个非可打印两字节序列（`ESC X`，X 小于 0x40，例如 RIS `\033c`）；CSI 序列（`ESC [` + 参数字节 + 末字节 0x40–0x7e）；或 OSC / DCS / SOS / PM / APC 序列（`ESC [ ] P X ^ _ ]` + 任意字节 + 终止符 `BEL` 或 `ESC \`）。

按语言

Python (stdlib re)

import re

ANSI = re.compile(r'\x1b(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~]|\][^\x07\x1b]*(?:\x07|\x1b\\))')

def strip_ansi(s: str) -> str:
    return ANSI.sub('', s)

JavaScript / Node (no dependencies)

const ANSI = /\x1b(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~]|\][^\x07\x1b]*(?:\x07|\x1b\\))/g;

export function stripAnsi(s) {
  return s.replace(ANSI, '');
}

// npm alternative:
//   npm i strip-ansi
//   import stripAnsi from 'strip-ansi';
//   stripAnsi('\u001b[31mhi\u001b[0m');  // 'hi'

Go (regexp)

package ansi

import "regexp"

var ansiRe = regexp.MustCompile("\x1b(?:[@-Z\\\\-_]|\\[[0-?]*[ -/]*[@-~]|\\][^\x07\x1b]*(?:\x07|\x1b\\\\))")

func Strip(s string) string {
    return ansiRe.ReplaceAllString(s, "")
}

Rust (regex crate)

// Cargo.toml: regex = "1"
use regex::Regex;
use std::sync::OnceLock;

fn ansi() -> &'static Regex {
    static R: OnceLock<Regex> = OnceLock::new();
    R.get_or_init(|| Regex::new(
        r"\x1b(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~]|\][^\x07\x1b]*(?:\x07|\x1b\\))"
    ).unwrap())
}

pub fn strip_ansi(s: &str) -> String {
    ansi().replace_all(s, "").into_owned()
}

Bash / shell (sed)

# GNU sed with -E / -r:
sed -E 's/\x1b(\[[0-?]*[ -\/]*[@-~]|\][^\x07\x1b]*(\x07|\x1b\\)|[@-Z\\\\-_])//g' input.log > clean.log

# perl one-liner (more portable across BSD/macOS sed):
perl -pe 's/\e(?:[@-Z\\-_]|\[[0-?]*[ -\/]*[@-~]|\][^\a\e]*(?:\a|\e\\))//g' input.log

C (single-pass byte-level state machine, no regex)

#include <stdio.h>

// Strip every ESC sequence from stdin to stdout.
int main(void) {
    int c;
    while ((c = getchar()) != EOF) {
        if (c != 0x1b) { putchar(c); continue; }
        int next = getchar();
        if (next == EOF) break;
        if (next == '[') {                   // CSI
            int b;
            while ((b = getchar()) != EOF && (b < 0x40 || b > 0x7e)) {}
            continue;
        }
        if (next == ']' || next == 'P' || next == 'X' ||
            next == '^' || next == '_') {    // OSC, DCS, SOS, PM, APC
            int b;
            while ((b = getchar()) != EOF) {
                if (b == 0x07) break;        // BEL terminator
                if (b == 0x1b) { getchar(); break; } // ST = ESC \\
            }
            continue;
        }
        // ESC X (two-byte) — drop both bytes
    }
    return 0;
}

本配方不会剥离什么

孤立的非转义控制字符（BEL `\x07`、BS `\x08`、原始 `\r` / `\n` 等）会被下面的配方保留 —— 它们通常属于正常文本内容。如需同时剥离所有 C0 控制字符，可追加一次 `s/[\x00-\x08\x0b\x0c\x0e-\x1f]//g`。

上方正则所匹配并移除的转义族对应的权威页面。

正则（Perl 兼容）

按语言

本配方不会剥离什么

相关序列