pytermgui.parser
A custom markup language to make styling Widget
-s easier, and, well, more stylish.
Markup Syntax
Basics
- Everything inside [square_brackets] is considered a tag
- Everything outside is considered a PLAIN text
Tag types
Style tags use the english name of the style (e.g.
[bold]
)Color tags can be of three types:
- 8BIT:
[141]
- 24BIT/RGB:
[22;3;243]
- 24BIT/HEX:
[#2203FA]
- 24BIT colors are parsed as the same token type.
- 8BIT:
Color tags set background when using the @ prefix:
[@141]
Macros are denoted by
!
prefix, and optional(argument:list)
suffix
Macros
- Macros in markup convert from
[!name(arg1:arg2)]
toname(arg1, arg2, text)
- The next PLAIN token is always passed to the macro as its last argument
- The argument list is optional if a macro doesn't take any additional arguments
- A macro can be defined by using
MarkupLanguage.define(name, callable)
Aliases
- Tag aliases can be defined using
MarkupLanguage.alias(src, dst)
- These are expanded in parse-time, and are recognized as regular style tokens
- Whenever an alias is defined, any cached markup containing it is removed
Caching
- This module provides (opt-out) caching for parsed markup
- After parsing a previously unknown string, it is stored in
MarkupLanguage._cache
- Next time the parser sees this markup string, it will restore the cached value
- Alias definitions delete affected cache entries
Instancing
pytermgui
provides themarkup
name, which acts as the module-level language instance- You can create your own instance using the
MarkupLanguage
name - Each instance has its own tags, user tags & macros
- You might want a system-level and user-level instance when users can freely input markup
Usage
MarkupLanguage.parse()
: Parse markup text into ANSI stringMarkupLanguage.get_markup()
: Get markup string from ANSI textMarkupLanguage.tokenize_ansi()
,MarkupLanguage.tokenize_markup()
: Tokenize textMarkupLanguage.define()
: Define an instance-local macroMarkupLanguage.alias()
: Define an instance-local alias
View Source
""" A custom markup language to make styling `Widget`-s easier, and, well, more stylish. Markup Syntax ============= Basics ------ - Everything inside [square_brackets] is considered a tag - Everything outside is considered a PLAIN text Tag types --------- - Style tags use the english name of the style (e.g. `[bold]`) - Color tags can be of three types: + 8BIT: `[141]` + 24BIT/RGB: `[22;3;243]` + 24BIT/HEX: `[#2203FA]` + 24BIT colors are parsed as the same token type. - Color tags set background when using the @ prefix: `[@141]` - Macros are denoted by `!` prefix, and optional `(argument:list)` suffix Macros ------ - Macros in markup convert from `[!name(arg1:arg2)]` to `name(arg1, arg2, text)` - The next PLAIN token is always passed to the macro as its last argument - The argument list is optional if a macro doesn't take any additional arguments - A macro can be defined by using `MarkupLanguage.define(name, callable)` Aliases ------- - Tag aliases can be defined using `MarkupLanguage.alias(src, dst)` - These are expanded in parse-time, and are recognized as regular style tokens - Whenever an alias is defined, any cached markup containing it is removed Caching ------- - This module provides (opt-out) caching for parsed markup - After parsing a previously unknown string, it is stored in `MarkupLanguage._cache` - Next time the parser sees this markup string, it will restore the cached value - Alias definitions delete affected cache entries Instancing ---------- - `pytermgui` provides the `markup` name, which acts as the module-level language instance - You can create your own instance using the `MarkupLanguage` name - Each instance has its own tags, user tags & macros - You might want a system-level and user-level instance when users can freely input markup Usage ----- - `MarkupLanguage.parse()`: Parse markup text into ANSI string - `MarkupLanguage.get_markup()`: Get markup string from ANSI text - `MarkupLanguage.tokenize_ansi()`, `MarkupLanguage.tokenize_markup()`: Tokenize text - `MarkupLanguage.define()`: Define an instance-local macro - `MarkupLanguage.alias()`: Define an instance-local alias """ # pylint: disable=too-many-lines from __future__ import annotations import re import sys from random import shuffle from dataclasses import dataclass from argparse import ArgumentParser from enum import Enum, auto as _auto from typing import Iterator, Callable, Tuple, List, Any from .ansi_interface import foreground from .exceptions import MarkupSyntaxError, AnsiSyntaxError __all__ = ["MacroCallable", "MacroCall", "MarkupLanguage", "StyledText", "markup"] MacroCallable = Callable[..., str] MacroCall = Tuple[MacroCallable, List[str]] RE_ANSI = re.compile(r"(?:\x1b\[(.*?)m)|(?:\x1b\](.*?)\x1b\\)|(?:\x1b_G(.*?)\x1b\\)") RE_MACRO = re.compile(r"(![a-z0-9_]+)(?:\(([\w\/\.?=:]+)\))?") RE_MARKUP = re.compile(r"((\\*)\[([a-z0-9!#@_\/\(,\)].*?)\])") RE_256 = re.compile(r"^([\d]{1,3})$") RE_HEX = re.compile(r"#([0-9a-fA-F]{6})") RE_RGB = re.compile(r"(\d{1,3};\d{1,3};\d{1,3})") STYLE_MAP = { "bold": "1", "dim": "2", "italic": "3", "underline": "4", "blink": "5", "blink2": "6", "inverse": "7", "invisible": "8", "strikethrough": "9", "overline": "53", } UNSETTER_MAP: dict[str, str | None] = { "/": "0", "/bold": "22", "/dim": "22", "/italic": "23", "/underline": "24", "/blink": "25", "/blink2": "26", "/inverse": "27", "/invisible": "28", "/strikethrough": "29", "/fg": "39", "/bg": "49", "/overline": "54", } def _macro_align(width: str, alignment: str, content: str) -> str: """Align text using fstring magic""" aligner = "<" if alignment == "left" else (">" if alignment == "right" else "^") return f"{content:{aligner}{width}}" def _macro_expand(lang: MarkupLanguage, tag: str) -> str: """Expand tag alias""" if not tag in lang.user_tags: return tag return lang.get_markup("\x1b[" + lang.user_tags[tag] + "m ")[:-1] def _macro_strip_fg(item: str) -> str: """Strip foreground color from item""" return markup.parse("[/fg]" + item) def _macro_strip_bg(item: str) -> str: """Strip foreground color from item""" return markup.parse("[/bg]" + item) def _macro_shuffle(item: str) -> str: """Shuffle a string using shuffle.shuffle on its list cast""" shuffled = list(item) shuffle(shuffled) return "".join(shuffled) def _macro_link(*args) -> str: """Creates a clickable hyperlink. Note: Since this is a pretty new feature, its support is limited. """ *uri_parts, label = args uri = ":".join(uri_parts) return f"\x1b]8;;{uri}\x1b\\{label}\x1b]8;;\x1b\\" class TokenType(Enum): """An Enum to store various token types""" PLAIN = _auto() STYLE = _auto() MACRO = _auto() ESCAPED = _auto() FG_8BIT = _auto() BG_8BIT = _auto() FG_24BIT = _auto() BG_24BIT = _auto() UNSETTER = _auto() @dataclass class Token: """A class holding information on a singular Markup/ANSI unit""" ttype: TokenType data: str | MacroCall | None name: str = "<unnamed-token>" def __post_init__(self) -> None: """Set name to data if not provided""" if self.name == "<unnamed-token>": assert isinstance(self.data, str) self.name = self.data def __eq__(self, other: object) -> bool: """Check equality with `other`""" if other is None: return False if not isinstance(other, Token): raise NotImplementedError( "Cannot check for equality between Token and non-Token of type" + f" {type(other)}." ) return other.data == self.data and other.ttype is self.ttype @property def sequence(self) -> str | None: """Get ANSI sequence representing token""" if self.data is None: return None if self.ttype in [TokenType.PLAIN, TokenType.MACRO, TokenType.ESCAPED]: return None assert isinstance(self.data, str) if self.ttype in [TokenType.STYLE, TokenType.UNSETTER]: return "\033[" + self.data + "m" # Handle colors if self.ttype.name.startswith("BG"): template = "\x1b[48;{c_id};" + self.data + "m" else: template = "\x1b[38;{c_id};" + self.data + "m" if self.ttype in [TokenType.FG_8BIT, TokenType.BG_8BIT]: return template.format(c_id="5") return template.format(c_id="2") class StyledText(str): """A styled text object. The purpose of this class is to implement some things regular `str` breaks at when encountering ANSI sequences. Instances of this class are usually spat out by `MarkupLanguage.parse`, but may be manually constructed if the need arises. Everything works even if there is no ANSI tomfoolery going on. """ value: str """The underlying, ANSI-inclusive string value.""" plain: str """The string value with no ANSI sequences.""" tokens: list[Token] """The list of tokens that make up this string.""" def __new__(cls, value: str = ""): """Creates a StyledText, gets markup tags. Args: markup_language: The markup language instance this object uses. """ obj = super().__new__(cls, value) obj.value = value obj.tokens = list(markup.tokenize_ansi(value)) obj.plain = "" for token in obj.tokens: if token.ttype is not TokenType.PLAIN: continue assert isinstance(token.data, str) obj.plain += token.data return obj def plain_index(self, index: int | None) -> int | None: """Finds given index inside plain text.""" if index is None: return None styled_chars = 0 plain_chars = 0 negative_index = False tokens = self.tokens.copy() if index < 0: tokens.reverse() index = abs(index) negative_index = True for token in tokens: if token.data is None: continue if token.ttype is not TokenType.PLAIN: assert token.sequence is not None styled_chars += len(token.sequence) continue for _ in range(len(token.data)): if plain_chars == index: if negative_index: return -1 * (plain_chars + styled_chars) return plain_chars + styled_chars plain_chars += 1 return None def __len__(self) -> int: """Gets "real" length of object.""" return len(self.plain) def __getitem__(self, subscript: int | slice) -> str: """Gets an item, adjusted for non-plain text. Args: subscript: The integer or slice to find. Returns: The elements described by the subscript. Raises: IndexError: The given index is out of range. """ if isinstance(subscript, int): plain_index = self.plain_index(subscript) if plain_index is None: raise IndexError("StyledText index out of range") return self.value[plain_index] return self.value[ slice( self.plain_index(subscript.start), self.plain_index(subscript.stop), subscript.step, ) ] class MarkupLanguage: """A class representing an instance of a Markup Language. It holds data on default & custom tags and macros. It offers tokenizer methods for both `markup` and `ANSI` text, which can then be used to convert between the two formats. You can define macros using `MarkupLanguage.define`, and alias a set of tags using `MarkupLanguage.alias`. Parsing `markup` into `ANSI` text is done using the `parse()` method, where `optimizer_level` sets the amount of optimization that should be done on the result string. Getting `markup` from `ANSI` is done using the `get_markup()` method. Note that this method is "lossy": it does not preserve information about macros, and turns aliases into their underlying values. You can also use a `MarkupLanguage` instance as a context manager, which returns a callable with the signature of print that will parse every argument given to it, and pass through all **kwargs. ```python3 import pytermgui as ptg ptg.markup.alias("my-tag", "@152 72 bold") with ptg.markup as mprint: mprint("This is [my-tag]my-tag[/]!") ``` <p style="text-align: center"> <img src=https://raw.githubusercontent.com/bczsalba/pytermgui/master/assets/docs/parser.png style="width: 80%"> </p> """ def __init__(self, default_macros: bool = True) -> None: """Initialize object""" self.tags: dict[str, str] = STYLE_MAP.copy() self._cache: dict[str, StyledText] = {} self.macros: dict[str, MacroCallable] = {} self.user_tags: dict[str, str] = {} self.unsetters: dict[str, str | None] = UNSETTER_MAP.copy() self.should_cache: bool = True if default_macros: self.define("!link", _macro_link) self.define("!align", _macro_align) self.define("!markup", self.get_markup) self.define("!shuffle", _macro_shuffle) self.define("!strip_bg", _macro_strip_bg) self.define("!strip_fg", _macro_strip_fg) self.define("!upper", lambda item: str(item.upper())) self.define("!lower", lambda item: str(item.lower())) self.define("!title", lambda item: str(item.title())) self.define("!capitalize", lambda item: str(item.capitalize())) self.define("!expand", lambda tag: _macro_expand(self, tag)) self.alias("pprint-int", "176") self.alias("pprint-str", "149 italic") self.alias("pprint-type", "222") self.alias("pprint-none", "210") @staticmethod def _get_color_token(tag: str) -> Token | None: """Try to get color token from a tag""" def _hex_to_rgb(color: str) -> str: """Get rgb color from hex""" return ";".join(str(int(color[i : i + 2], 16)) for i in (0, 2, 4)) background = tag.startswith("@") lookup_tag = tag if background: lookup_tag = tag[1:] if lookup_tag in foreground.names: return Token( name=tag, ttype=(TokenType.BG_8BIT if background else TokenType.FG_8BIT), data=str(foreground.names[lookup_tag]), ) data_256 = RE_256.match(lookup_tag) if data_256 is not None: return Token( name=tag, ttype=(TokenType.BG_8BIT if background else TokenType.FG_8BIT), data=lookup_tag, ) data_hex = RE_HEX.match(lookup_tag) if data_hex is not None: return Token( name=tag, ttype=(TokenType.BG_24BIT if background else TokenType.FG_24BIT), data=_hex_to_rgb(lookup_tag[1:]), ) data_rgb = RE_RGB.match(lookup_tag) if data_rgb is not None: return Token( name=tag, ttype=(TokenType.BG_24BIT if background else TokenType.FG_24BIT), data=lookup_tag, ) return None def __enter__(self) -> Callable[..., None]: """Return a print method that parses markup""" def printer(*args, **kwargs) -> None: """Parse all arguments and pass them through to print, along with kwargs""" parsed = [] for arg in args: parsed.append(self.parse(str(arg))) print(*parsed, **kwargs) return printer def __exit__(self, _, exception: Exception, __) -> None: """Raise any exception that happened in context""" if exception is not None: raise exception def tokenize_markup(self, markup_text: str) -> Iterator[Token]: """Tokenize markup text, return an Iterator to save memory""" end = 0 start = 0 cursor = 0 for match in RE_MARKUP.finditer(markup_text): full, escapes, tag_text = match.groups() start, end = match.span() # Add plain text between last and current match if start > cursor: yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:start]) if not escapes == "" and len(escapes) % 2 == 1: cursor = end yield Token(ttype=TokenType.ESCAPED, data=full[len(escapes) :]) continue for tag in tag_text.split(): if tag in self.unsetters: yield Token( name=tag, ttype=TokenType.UNSETTER, data=self.unsetters[tag] ) elif tag in self.user_tags: yield Token( name=tag, ttype=TokenType.STYLE, data=self.user_tags[tag] ) elif tag in self.tags: yield Token(name=tag, ttype=TokenType.STYLE, data=self.tags[tag]) # Try to find a color token else: color_token = self._get_color_token(tag) if color_token is not None: yield color_token continue macro_match = RE_MACRO.match(tag) if macro_match is not None: name, args = macro_match.groups() macro_args = () if args is None else args.split(":") if not name in self.macros: raise MarkupSyntaxError( tag=tag, cause="is not a defined macro", context=markup_text, ) yield Token( name=tag, ttype=TokenType.MACRO, data=(self.macros[name], macro_args), ) continue raise MarkupSyntaxError( tag=tag, cause="not defined", context=markup_text ) cursor = end # Add remaining text as plain if len(markup_text) > cursor: yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:]) def tokenize_ansi( # pylint: disable=too-many-branches self, ansi: str ) -> Iterator[Token]: """Tokenize ansi text, return an Iterator to save memory.""" end = 0 start = 0 cursor = 0 # StyledText messes with indexing, so we need to cast it # back to str. if isinstance(ansi, StyledText): ansi = str(ansi) for match in RE_ANSI.finditer(ansi): code = match.groups()[0] start, end = match.span() if code is None: continue parts = code.split(";") if start > cursor: plain = ansi[cursor:start] yield Token(name=plain, ttype=TokenType.PLAIN, data=plain) # Styles & unsetters if len(parts) == 1: token_code: str | None = "" for name, token_code in self.unsetters.items(): if token_code == parts[0]: ttype = TokenType.UNSETTER break else: for name, token_code in self.tags.items(): if token_code == parts[0]: ttype = TokenType.STYLE break else: raise AnsiSyntaxError( tag=parts[0], cause="not recognized", context=ansi ) yield Token(name=name, ttype=ttype, data=token_code) # Colors elif len(parts) >= 3: name = ";".join(parts[2:]) types = [TokenType.FG_8BIT, TokenType.FG_24BIT] if parts[0] == "48": name = "@" + name types = [TokenType.BG_8BIT, TokenType.BG_24BIT] ttype = types[0] if parts[1] == "5" else types[1] yield Token(ttype=ttype, data=name) cursor = end if cursor < len(ansi): plain = ansi[cursor:] yield Token(ttype=TokenType.PLAIN, data=plain) def define(self, name: str, method: MacroCallable) -> None: """Define a Macro tag that executes `method` The `!` prefix is added to the name if not there already.""" if not name.startswith("!"): name = "!" + name self.macros[name] = method self.unsetters["/" + name] = None def alias(self, name: str, value: str) -> None: """Alias a markup tag to stand for some value, generate unsetter for it""" def _get_unsetter(token: Token) -> str | None: """Get unsetter for a token""" if token.ttype is TokenType.PLAIN: return None if token.ttype is TokenType.UNSETTER: return self.unsetters[token.name] if token.ttype.name.startswith("FG"): return self.unsetters["/fg"] if token.ttype.name.startswith("BG"): return self.unsetters["/bg"] name = "/" + token.name if not name in self.unsetters: raise KeyError(f"Could not find unsetter for token {token}.") return self.unsetters[name] if name.startswith("!"): raise ValueError('Only macro tags can always start with "!".') setter = "" unsetter = "" # Try to link to existing tag if value in self.user_tags: self.unsetters["/" + name] = value self.user_tags[name] = value return for token in self.tokenize_markup("[" + value + "]"): if token.ttype is TokenType.PLAIN: continue assert token.sequence is not None setter += token.sequence t_unsetter = _get_unsetter(token) assert t_unsetter is not None unsetter += "\x1b[" + t_unsetter + "m" self.unsetters["/" + name] = unsetter.lstrip("\x1b[").rstrip("m") self.user_tags[name] = setter.lstrip("\x1b[").rstrip("m") marked: list[str] = [] for item in self._cache: if name in item: marked.append(item) for item in marked: del self._cache[item] def parse(self, markup_text: str) -> StyledText: """Parse markup""" # TODO: Add more optimizations: # - keep track of currently-active tokens # - clean up widget dump applied_macros: list[tuple[str, MacroCall]] = [] previous_token: Token | None = None previous_sequence = "" sequence = "" out = "" def _apply_macros(text: str) -> str: """Apply current macros to text""" for _, (method, args) in applied_macros: text = method(*args, text) return text # TODO: Macros are only ran once with caching enabled if self.should_cache and markup_text in self._cache: return self._cache[markup_text] for token in self.tokenize_markup(markup_text): if sequence != "" and previous_token == token: continue if token.ttype == TokenType.UNSETTER and token.data == "0": out += "\033[0m" sequence = "" continue previous_token = token if token.ttype is TokenType.MACRO: assert isinstance(token.data, tuple) applied_macros.append((token.name, token.data)) continue if token.data is None and token.ttype is TokenType.UNSETTER: for call_str, data in applied_macros: macro_match = RE_MACRO.match(call_str) assert macro_match is not None macro_name = macro_match.groups()[0] if "/" + macro_name == token.name: applied_macros.remove((call_str, data)) continue if token.sequence is None: if previous_sequence == sequence: out += _apply_macros(token.name) continue previous_sequence = sequence out += sequence + _apply_macros(token.name) sequence = "" else: sequence += token.sequence if sequence + previous_sequence != "": out += "\x1b[0m" out = StyledText(out) self._cache[markup_text] = out return out def get_markup(self, ansi: str) -> str: """Get markup from ANSI text""" current_tags: list[str] = [] out = "" for token in self.tokenize_ansi(ansi): if token.ttype is TokenType.PLAIN: if len(current_tags) != 0: out += "[" + " ".join(current_tags) + "]" assert isinstance(token.data, str) out += token.data current_tags = [] continue current_tags.append(token.name) return out def prettify_ansi(self, text: str) -> str: """Returns a prettified (syntax-highlighted) ANSI str. This is useful to quickly "inspect" a given ANSI string. However, for most real uses `MarkupLanguage.prettify_markup` would be preferable, given an argument of `MarkupLanguage.get_markup(text)`, as it is much more verbose. Args: text: The ANSI-text to prettify. Returns: The prettified ANSI text. This text's styles remain valid, so copy-pasting the argument into a command (like printf) that can show styled text will work the same way. """ out = "" sequences = "" for token in self.tokenize_ansi(text): if token.ttype is TokenType.PLAIN: assert isinstance(token.data, str) out += token.data continue assert token.sequence is not None out += "\x1b[0m" + token.sequence + token.sequence.replace("\x1b", "\\x1b") sequences += token.sequence out += sequences return out def prettify_markup(self, text: str) -> str: """Returns a prettified (syntax-highlighted) markup str. Args: text: The markup-text to prettify. Returns: Prettified markup. This markup, excluding its styles, remains valid markup. """ styles: dict[TokenType, str] = { TokenType.MACRO: "210", TokenType.ESCAPED: "210 bold", TokenType.UNSETTER: "strikethrough", } out = "" in_sequence = False current_styles: list[Token] = [] for token in self.tokenize_markup(text): if token.ttype is TokenType.PLAIN: in_sequence = False if len(out) > 0: out += "]" sequence = "" for style in current_styles: if style.sequence is None: continue sequence += style.sequence out += sequence + token.name + "\033[m" continue out += " " if in_sequence else "[" in_sequence = True if token.ttype is TokenType.UNSETTER: name = token.name[1:] current_styles.append(token) unsetter_style = styles[TokenType.UNSETTER] special_style = ( name + " " if name in self.tags or name in self.user_tags else "" ) out += self.parse(f"[{special_style}{unsetter_style}]{name}") continue if token.sequence is not None: current_styles.append(token) style_markup = styles.get(token.ttype) or token.name out += self.parse(f"[{style_markup}]{token.name}") if in_sequence: out += "]" return out def prettify(self, text: str, force_markup: bool = False) -> str: """Prettifies any string. If the string contains ANSI sequences and `force_markup` is False, the `prettify_ansi` method is used. Otherwise, `prettify_markup` does the job. Since the `prettify_markup` method fails cleanly (e.g. doesn't modify a string with no markup) this is a safe call to any string. Args: text: The string to prettify. force_markup: If set, when given an ANSI string, the `MarkupLanguage.get_markup` method is used to translate it into markup, which is then prettified using `prettify_markup`. """ if len(RE_ANSI.findall(text)) > 0: if not force_markup: return self.prettify_ansi(text) text = self.get_markup(text) return self.prettify_markup(text) def pprint( # pylint: disable=too-many-arguments self, item: Any, indent: int = 2, condensed: bool = False, force_markup: bool = False, return_only: bool = False, ) -> str | None: """Pretty-prints any object. Args: item: The object to pretty-print. indent: The number of spaces that should be used for indenting. Only applies when `condensed` is True. condensed: If not set each item of a container will occupy different lines. force_markup: When given an item of `str` type, containing ANSI sequences, its markup representation will be generated and displayed using `MarkupLanguage.get_markup`. See `MarkupLanguage.prettify` for more info. return_only: If set, nothing will be printed and the prettified string is returned instead. Returns: The prettified string if `return_only` is set, otherwise `None`, as the value has already been printed. """ type_styles = { int: "[pprint-int]{item}[/]", str: "[pprint-str]'{item}'[/]", None: "[pprint-none]{item}[/]", type: "[pprint-type]{item}[/]", } indent_str = indent * " " def _apply_style(value: Any) -> str: """Applies type-based style to the given value. This value can technically be of any type, and builtins have special styles defined for them. Returns: A styled-str representation of value. """ if isinstance(value, (dict, list, tuple, set)): return ( self.pprint( value, indent=indent, condensed=condensed, return_only=True ) or "" ) if isinstance(value, type): return type_styles[type].format(item=value.__name__) if isinstance(value, str): value = value.replace("[", r"\[") if type(value) in type_styles: return type_styles[type(value)].format(item=str(value)) if value is None: return type_styles[None].format(item=str(value)) return str(value) def _format_container_item(value: str) -> str: """Formats a container item.""" out = f"{value}," if condensed: out += " " if not condensed: out += "\n" return out def _format_container( container: dict | list | tuple | set, chars: tuple[str, str] ) -> str: """Formats a container-type instance. Args: container: The container to format. chars: The characters that signify the start & end of the container. Returns: A pretty representation of the given container. """ out = chars[0] if not condensed: out += "\n" if isinstance(container, dict): for key, value in item.items(): for line in _format_container_item( f"{_apply_style(key)}: {_apply_style(value)}" ).splitlines(): if condensed: out += line continue out += indent_str + line + "\n" else: for value in item: for line in _format_container_item( f"{_apply_style(value)}" ).splitlines(): if condensed: out += line continue out += indent_str + line + "\n" out = out.rstrip(", ") out += chars[1] return out buff = "" if isinstance(item, (dict, set, tuple, list)): chars = str(item)[0], str(item)[-1] buff = _format_container(item, chars) if return_only: return buff with self as mprint: mprint(buff) return None if isinstance(item, str): item = self.prettify(item, force_markup=force_markup) if item is not None: if return_only: return item print(item) return None return None def setup_displayhook( self, indent: int = 2, condensed: bool = False, force_markup: bool = False, ) -> None: """Sets up `sys.displayhook` to use `MarkupLanguage.pprint`. This can be used to pretty-print all REPL output. IPython is also supported. Usage is pretty simple: ```python3 >>> from pytermgui import markup >>> markup.setup_displayhook() >>> # Any function output will now be prettified ``` Args: indent: The amount of indentation used in printing container-types. Only applied when `condensed` is False. condensed: If set, all items in a container-type will be displayed in one line, similar to the default `repl`. force_markup: When given an ANSI-sequence containing str, its markup representation will be generated using `MarkupLanguage.get_markup`, and syntax highlighted using `MarkupLanguage.prettify_markup`. """ try: # Try to get IPython instance. This function is provided by the # IPython runtime, so if running outside of that context a NameError # is raised. ipython = get_ipython() # type: ignore except NameError: sys.displayhook = lambda value: self.pprint( value, force_markup=force_markup, condensed=condensed, indent=indent ) return ipython.display_formatter.formatters["text/plain"] = self.pprint def main() -> None: """Main method""" parser = ArgumentParser() markup_group = parser.add_argument_group("Markup->ANSI") markup_group.add_argument( "-p", "--parse", metavar=("TXT"), help="parse a markup text" ) markup_group.add_argument( "-e", "--escape", help="escape parsed markup", action="store_true" ) # markup_group.add_argument( # "-o", # "--optimize", # help="set optimization level for markup parsing", # action="count", # default=0, # ) markup_group.add_argument("--alias", action="append", help="alias src=dst") ansi_group = parser.add_argument_group("ANSI->Markup") ansi_group.add_argument( "-m", "--markup", metavar=("TXT"), help="get markup from ANSI text" ) ansi_group.add_argument( "-s", "--show-inverse", action="store_true", help="show result of parsing result markup", ) args = parser.parse_args() lang = MarkupLanguage() if args.markup: markup_text = lang.get_markup(args.markup) print(markup_text, end="") if args.show_inverse: print("->", lang.parse(markup_text)) else: print() if args.parse: if args.alias: for alias in args.alias: src, dest = alias.split("=") lang.alias(src, dest) parsed = lang.parse(args.parse) if args.escape: print(ascii(parsed)) else: print(parsed) return markup = MarkupLanguage() if __name__ == "__main__": main()
View Source
class MarkupLanguage: """A class representing an instance of a Markup Language. It holds data on default & custom tags and macros. It offers tokenizer methods for both `markup` and `ANSI` text, which can then be used to convert between the two formats. You can define macros using `MarkupLanguage.define`, and alias a set of tags using `MarkupLanguage.alias`. Parsing `markup` into `ANSI` text is done using the `parse()` method, where `optimizer_level` sets the amount of optimization that should be done on the result string. Getting `markup` from `ANSI` is done using the `get_markup()` method. Note that this method is "lossy": it does not preserve information about macros, and turns aliases into their underlying values. You can also use a `MarkupLanguage` instance as a context manager, which returns a callable with the signature of print that will parse every argument given to it, and pass through all **kwargs. ```python3 import pytermgui as ptg ptg.markup.alias("my-tag", "@152 72 bold") with ptg.markup as mprint: mprint("This is [my-tag]my-tag[/]!") ``` <p style="text-align: center"> <img src=https://raw.githubusercontent.com/bczsalba/pytermgui/master/assets/docs/parser.png style="width: 80%"> </p> """ def __init__(self, default_macros: bool = True) -> None: """Initialize object""" self.tags: dict[str, str] = STYLE_MAP.copy() self._cache: dict[str, StyledText] = {} self.macros: dict[str, MacroCallable] = {} self.user_tags: dict[str, str] = {} self.unsetters: dict[str, str | None] = UNSETTER_MAP.copy() self.should_cache: bool = True if default_macros: self.define("!link", _macro_link) self.define("!align", _macro_align) self.define("!markup", self.get_markup) self.define("!shuffle", _macro_shuffle) self.define("!strip_bg", _macro_strip_bg) self.define("!strip_fg", _macro_strip_fg) self.define("!upper", lambda item: str(item.upper())) self.define("!lower", lambda item: str(item.lower())) self.define("!title", lambda item: str(item.title())) self.define("!capitalize", lambda item: str(item.capitalize())) self.define("!expand", lambda tag: _macro_expand(self, tag)) self.alias("pprint-int", "176") self.alias("pprint-str", "149 italic") self.alias("pprint-type", "222") self.alias("pprint-none", "210") @staticmethod def _get_color_token(tag: str) -> Token | None: """Try to get color token from a tag""" def _hex_to_rgb(color: str) -> str: """Get rgb color from hex""" return ";".join(str(int(color[i : i + 2], 16)) for i in (0, 2, 4)) background = tag.startswith("@") lookup_tag = tag if background: lookup_tag = tag[1:] if lookup_tag in foreground.names: return Token( name=tag, ttype=(TokenType.BG_8BIT if background else TokenType.FG_8BIT), data=str(foreground.names[lookup_tag]), ) data_256 = RE_256.match(lookup_tag) if data_256 is not None: return Token( name=tag, ttype=(TokenType.BG_8BIT if background else TokenType.FG_8BIT), data=lookup_tag, ) data_hex = RE_HEX.match(lookup_tag) if data_hex is not None: return Token( name=tag, ttype=(TokenType.BG_24BIT if background else TokenType.FG_24BIT), data=_hex_to_rgb(lookup_tag[1:]), ) data_rgb = RE_RGB.match(lookup_tag) if data_rgb is not None: return Token( name=tag, ttype=(TokenType.BG_24BIT if background else TokenType.FG_24BIT), data=lookup_tag, ) return None def __enter__(self) -> Callable[..., None]: """Return a print method that parses markup""" def printer(*args, **kwargs) -> None: """Parse all arguments and pass them through to print, along with kwargs""" parsed = [] for arg in args: parsed.append(self.parse(str(arg))) print(*parsed, **kwargs) return printer def __exit__(self, _, exception: Exception, __) -> None: """Raise any exception that happened in context""" if exception is not None: raise exception def tokenize_markup(self, markup_text: str) -> Iterator[Token]: """Tokenize markup text, return an Iterator to save memory""" end = 0 start = 0 cursor = 0 for match in RE_MARKUP.finditer(markup_text): full, escapes, tag_text = match.groups() start, end = match.span() # Add plain text between last and current match if start > cursor: yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:start]) if not escapes == "" and len(escapes) % 2 == 1: cursor = end yield Token(ttype=TokenType.ESCAPED, data=full[len(escapes) :]) continue for tag in tag_text.split(): if tag in self.unsetters: yield Token( name=tag, ttype=TokenType.UNSETTER, data=self.unsetters[tag] ) elif tag in self.user_tags: yield Token( name=tag, ttype=TokenType.STYLE, data=self.user_tags[tag] ) elif tag in self.tags: yield Token(name=tag, ttype=TokenType.STYLE, data=self.tags[tag]) # Try to find a color token else: color_token = self._get_color_token(tag) if color_token is not None: yield color_token continue macro_match = RE_MACRO.match(tag) if macro_match is not None: name, args = macro_match.groups() macro_args = () if args is None else args.split(":") if not name in self.macros: raise MarkupSyntaxError( tag=tag, cause="is not a defined macro", context=markup_text, ) yield Token( name=tag, ttype=TokenType.MACRO, data=(self.macros[name], macro_args), ) continue raise MarkupSyntaxError( tag=tag, cause="not defined", context=markup_text ) cursor = end # Add remaining text as plain if len(markup_text) > cursor: yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:]) def tokenize_ansi( # pylint: disable=too-many-branches self, ansi: str ) -> Iterator[Token]: """Tokenize ansi text, return an Iterator to save memory.""" end = 0 start = 0 cursor = 0 # StyledText messes with indexing, so we need to cast it # back to str. if isinstance(ansi, StyledText): ansi = str(ansi) for match in RE_ANSI.finditer(ansi): code = match.groups()[0] start, end = match.span() if code is None: continue parts = code.split(";") if start > cursor: plain = ansi[cursor:start] yield Token(name=plain, ttype=TokenType.PLAIN, data=plain) # Styles & unsetters if len(parts) == 1: token_code: str | None = "" for name, token_code in self.unsetters.items(): if token_code == parts[0]: ttype = TokenType.UNSETTER break else: for name, token_code in self.tags.items(): if token_code == parts[0]: ttype = TokenType.STYLE break else: raise AnsiSyntaxError( tag=parts[0], cause="not recognized", context=ansi ) yield Token(name=name, ttype=ttype, data=token_code) # Colors elif len(parts) >= 3: name = ";".join(parts[2:]) types = [TokenType.FG_8BIT, TokenType.FG_24BIT] if parts[0] == "48": name = "@" + name types = [TokenType.BG_8BIT, TokenType.BG_24BIT] ttype = types[0] if parts[1] == "5" else types[1] yield Token(ttype=ttype, data=name) cursor = end if cursor < len(ansi): plain = ansi[cursor:] yield Token(ttype=TokenType.PLAIN, data=plain) def define(self, name: str, method: MacroCallable) -> None: """Define a Macro tag that executes `method` The `!` prefix is added to the name if not there already.""" if not name.startswith("!"): name = "!" + name self.macros[name] = method self.unsetters["/" + name] = None def alias(self, name: str, value: str) -> None: """Alias a markup tag to stand for some value, generate unsetter for it""" def _get_unsetter(token: Token) -> str | None: """Get unsetter for a token""" if token.ttype is TokenType.PLAIN: return None if token.ttype is TokenType.UNSETTER: return self.unsetters[token.name] if token.ttype.name.startswith("FG"): return self.unsetters["/fg"] if token.ttype.name.startswith("BG"): return self.unsetters["/bg"] name = "/" + token.name if not name in self.unsetters: raise KeyError(f"Could not find unsetter for token {token}.") return self.unsetters[name] if name.startswith("!"): raise ValueError('Only macro tags can always start with "!".') setter = "" unsetter = "" # Try to link to existing tag if value in self.user_tags: self.unsetters["/" + name] = value self.user_tags[name] = value return for token in self.tokenize_markup("[" + value + "]"): if token.ttype is TokenType.PLAIN: continue assert token.sequence is not None setter += token.sequence t_unsetter = _get_unsetter(token) assert t_unsetter is not None unsetter += "\x1b[" + t_unsetter + "m" self.unsetters["/" + name] = unsetter.lstrip("\x1b[").rstrip("m") self.user_tags[name] = setter.lstrip("\x1b[").rstrip("m") marked: list[str] = [] for item in self._cache: if name in item: marked.append(item) for item in marked: del self._cache[item] def parse(self, markup_text: str) -> StyledText: """Parse markup""" # TODO: Add more optimizations: # - keep track of currently-active tokens # - clean up widget dump applied_macros: list[tuple[str, MacroCall]] = [] previous_token: Token | None = None previous_sequence = "" sequence = "" out = "" def _apply_macros(text: str) -> str: """Apply current macros to text""" for _, (method, args) in applied_macros: text = method(*args, text) return text # TODO: Macros are only ran once with caching enabled if self.should_cache and markup_text in self._cache: return self._cache[markup_text] for token in self.tokenize_markup(markup_text): if sequence != "" and previous_token == token: continue if token.ttype == TokenType.UNSETTER and token.data == "0": out += "\033[0m" sequence = "" continue previous_token = token if token.ttype is TokenType.MACRO: assert isinstance(token.data, tuple) applied_macros.append((token.name, token.data)) continue if token.data is None and token.ttype is TokenType.UNSETTER: for call_str, data in applied_macros: macro_match = RE_MACRO.match(call_str) assert macro_match is not None macro_name = macro_match.groups()[0] if "/" + macro_name == token.name: applied_macros.remove((call_str, data)) continue if token.sequence is None: if previous_sequence == sequence: out += _apply_macros(token.name) continue previous_sequence = sequence out += sequence + _apply_macros(token.name) sequence = "" else: sequence += token.sequence if sequence + previous_sequence != "": out += "\x1b[0m" out = StyledText(out) self._cache[markup_text] = out return out def get_markup(self, ansi: str) -> str: """Get markup from ANSI text""" current_tags: list[str] = [] out = "" for token in self.tokenize_ansi(ansi): if token.ttype is TokenType.PLAIN: if len(current_tags) != 0: out += "[" + " ".join(current_tags) + "]" assert isinstance(token.data, str) out += token.data current_tags = [] continue current_tags.append(token.name) return out def prettify_ansi(self, text: str) -> str: """Returns a prettified (syntax-highlighted) ANSI str. This is useful to quickly "inspect" a given ANSI string. However, for most real uses `MarkupLanguage.prettify_markup` would be preferable, given an argument of `MarkupLanguage.get_markup(text)`, as it is much more verbose. Args: text: The ANSI-text to prettify. Returns: The prettified ANSI text. This text's styles remain valid, so copy-pasting the argument into a command (like printf) that can show styled text will work the same way. """ out = "" sequences = "" for token in self.tokenize_ansi(text): if token.ttype is TokenType.PLAIN: assert isinstance(token.data, str) out += token.data continue assert token.sequence is not None out += "\x1b[0m" + token.sequence + token.sequence.replace("\x1b", "\\x1b") sequences += token.sequence out += sequences return out def prettify_markup(self, text: str) -> str: """Returns a prettified (syntax-highlighted) markup str. Args: text: The markup-text to prettify. Returns: Prettified markup. This markup, excluding its styles, remains valid markup. """ styles: dict[TokenType, str] = { TokenType.MACRO: "210", TokenType.ESCAPED: "210 bold", TokenType.UNSETTER: "strikethrough", } out = "" in_sequence = False current_styles: list[Token] = [] for token in self.tokenize_markup(text): if token.ttype is TokenType.PLAIN: in_sequence = False if len(out) > 0: out += "]" sequence = "" for style in current_styles: if style.sequence is None: continue sequence += style.sequence out += sequence + token.name + "\033[m" continue out += " " if in_sequence else "[" in_sequence = True if token.ttype is TokenType.UNSETTER: name = token.name[1:] current_styles.append(token) unsetter_style = styles[TokenType.UNSETTER] special_style = ( name + " " if name in self.tags or name in self.user_tags else "" ) out += self.parse(f"[{special_style}{unsetter_style}]{name}") continue if token.sequence is not None: current_styles.append(token) style_markup = styles.get(token.ttype) or token.name out += self.parse(f"[{style_markup}]{token.name}") if in_sequence: out += "]" return out def prettify(self, text: str, force_markup: bool = False) -> str: """Prettifies any string. If the string contains ANSI sequences and `force_markup` is False, the `prettify_ansi` method is used. Otherwise, `prettify_markup` does the job. Since the `prettify_markup` method fails cleanly (e.g. doesn't modify a string with no markup) this is a safe call to any string. Args: text: The string to prettify. force_markup: If set, when given an ANSI string, the `MarkupLanguage.get_markup` method is used to translate it into markup, which is then prettified using `prettify_markup`. """ if len(RE_ANSI.findall(text)) > 0: if not force_markup: return self.prettify_ansi(text) text = self.get_markup(text) return self.prettify_markup(text) def pprint( # pylint: disable=too-many-arguments self, item: Any, indent: int = 2, condensed: bool = False, force_markup: bool = False, return_only: bool = False, ) -> str | None: """Pretty-prints any object. Args: item: The object to pretty-print. indent: The number of spaces that should be used for indenting. Only applies when `condensed` is True. condensed: If not set each item of a container will occupy different lines. force_markup: When given an item of `str` type, containing ANSI sequences, its markup representation will be generated and displayed using `MarkupLanguage.get_markup`. See `MarkupLanguage.prettify` for more info. return_only: If set, nothing will be printed and the prettified string is returned instead. Returns: The prettified string if `return_only` is set, otherwise `None`, as the value has already been printed. """ type_styles = { int: "[pprint-int]{item}[/]", str: "[pprint-str]'{item}'[/]", None: "[pprint-none]{item}[/]", type: "[pprint-type]{item}[/]", } indent_str = indent * " " def _apply_style(value: Any) -> str: """Applies type-based style to the given value. This value can technically be of any type, and builtins have special styles defined for them. Returns: A styled-str representation of value. """ if isinstance(value, (dict, list, tuple, set)): return ( self.pprint( value, indent=indent, condensed=condensed, return_only=True ) or "" ) if isinstance(value, type): return type_styles[type].format(item=value.__name__) if isinstance(value, str): value = value.replace("[", r"\[") if type(value) in type_styles: return type_styles[type(value)].format(item=str(value)) if value is None: return type_styles[None].format(item=str(value)) return str(value) def _format_container_item(value: str) -> str: """Formats a container item.""" out = f"{value}," if condensed: out += " " if not condensed: out += "\n" return out def _format_container( container: dict | list | tuple | set, chars: tuple[str, str] ) -> str: """Formats a container-type instance. Args: container: The container to format. chars: The characters that signify the start & end of the container. Returns: A pretty representation of the given container. """ out = chars[0] if not condensed: out += "\n" if isinstance(container, dict): for key, value in item.items(): for line in _format_container_item( f"{_apply_style(key)}: {_apply_style(value)}" ).splitlines(): if condensed: out += line continue out += indent_str + line + "\n" else: for value in item: for line in _format_container_item( f"{_apply_style(value)}" ).splitlines(): if condensed: out += line continue out += indent_str + line + "\n" out = out.rstrip(", ") out += chars[1] return out buff = "" if isinstance(item, (dict, set, tuple, list)): chars = str(item)[0], str(item)[-1] buff = _format_container(item, chars) if return_only: return buff with self as mprint: mprint(buff) return None if isinstance(item, str): item = self.prettify(item, force_markup=force_markup) if item is not None: if return_only: return item print(item) return None return None def setup_displayhook( self, indent: int = 2, condensed: bool = False, force_markup: bool = False, ) -> None: """Sets up `sys.displayhook` to use `MarkupLanguage.pprint`. This can be used to pretty-print all REPL output. IPython is also supported. Usage is pretty simple: ```python3 >>> from pytermgui import markup >>> markup.setup_displayhook() >>> # Any function output will now be prettified ``` Args: indent: The amount of indentation used in printing container-types. Only applied when `condensed` is False. condensed: If set, all items in a container-type will be displayed in one line, similar to the default `repl`. force_markup: When given an ANSI-sequence containing str, its markup representation will be generated using `MarkupLanguage.get_markup`, and syntax highlighted using `MarkupLanguage.prettify_markup`. """ try: # Try to get IPython instance. This function is provided by the # IPython runtime, so if running outside of that context a NameError # is raised. ipython = get_ipython() # type: ignore except NameError: sys.displayhook = lambda value: self.pprint( value, force_markup=force_markup, condensed=condensed, indent=indent ) return ipython.display_formatter.formatters["text/plain"] = self.pprint
A class representing an instance of a Markup Language.
It holds data on default & custom tags and macros.
It offers tokenizer methods for both markup
and ANSI
text,
which can then be used to convert between the two formats.
You can define macros using MarkupLanguage.define
, and alias
a set of tags using MarkupLanguage.alias
.
Parsing markup
into ANSI
text is done using the parse()
method,
where optimizer_level
sets the amount of optimization that should be
done on the result string.
Getting markup
from ANSI
is done using the get_markup()
method. Note
that this method is "lossy": it does not preserve information about macros,
and turns aliases into their underlying values.
You can also use a MarkupLanguage
instance as a context manager, which
returns a callable with the signature of print that will parse every argument
given to it, and pass through all **kwargs.
import pytermgui as ptg
ptg.markup.alias("my-tag", "@152 72 bold")
with ptg.markup as mprint:
mprint("This is [my-tag]my-tag[/]!")
View Source
def __init__(self, default_macros: bool = True) -> None: """Initialize object""" self.tags: dict[str, str] = STYLE_MAP.copy() self._cache: dict[str, StyledText] = {} self.macros: dict[str, MacroCallable] = {} self.user_tags: dict[str, str] = {} self.unsetters: dict[str, str | None] = UNSETTER_MAP.copy() self.should_cache: bool = True if default_macros: self.define("!link", _macro_link) self.define("!align", _macro_align) self.define("!markup", self.get_markup) self.define("!shuffle", _macro_shuffle) self.define("!strip_bg", _macro_strip_bg) self.define("!strip_fg", _macro_strip_fg) self.define("!upper", lambda item: str(item.upper())) self.define("!lower", lambda item: str(item.lower())) self.define("!title", lambda item: str(item.title())) self.define("!capitalize", lambda item: str(item.capitalize())) self.define("!expand", lambda tag: _macro_expand(self, tag)) self.alias("pprint-int", "176") self.alias("pprint-str", "149 italic") self.alias("pprint-type", "222") self.alias("pprint-none", "210")
Initialize object
View Source
def tokenize_markup(self, markup_text: str) -> Iterator[Token]: """Tokenize markup text, return an Iterator to save memory""" end = 0 start = 0 cursor = 0 for match in RE_MARKUP.finditer(markup_text): full, escapes, tag_text = match.groups() start, end = match.span() # Add plain text between last and current match if start > cursor: yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:start]) if not escapes == "" and len(escapes) % 2 == 1: cursor = end yield Token(ttype=TokenType.ESCAPED, data=full[len(escapes) :]) continue for tag in tag_text.split(): if tag in self.unsetters: yield Token( name=tag, ttype=TokenType.UNSETTER, data=self.unsetters[tag] ) elif tag in self.user_tags: yield Token( name=tag, ttype=TokenType.STYLE, data=self.user_tags[tag] ) elif tag in self.tags: yield Token(name=tag, ttype=TokenType.STYLE, data=self.tags[tag]) # Try to find a color token else: color_token = self._get_color_token(tag) if color_token is not None: yield color_token continue macro_match = RE_MACRO.match(tag) if macro_match is not None: name, args = macro_match.groups() macro_args = () if args is None else args.split(":") if not name in self.macros: raise MarkupSyntaxError( tag=tag, cause="is not a defined macro", context=markup_text, ) yield Token( name=tag, ttype=TokenType.MACRO, data=(self.macros[name], macro_args), ) continue raise MarkupSyntaxError( tag=tag, cause="not defined", context=markup_text ) cursor = end # Add remaining text as plain if len(markup_text) > cursor: yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:])
Tokenize markup text, return an Iterator to save memory
View Source
def tokenize_ansi( # pylint: disable=too-many-branches self, ansi: str ) -> Iterator[Token]: """Tokenize ansi text, return an Iterator to save memory.""" end = 0 start = 0 cursor = 0 # StyledText messes with indexing, so we need to cast it # back to str. if isinstance(ansi, StyledText): ansi = str(ansi) for match in RE_ANSI.finditer(ansi): code = match.groups()[0] start, end = match.span() if code is None: continue parts = code.split(";") if start > cursor: plain = ansi[cursor:start] yield Token(name=plain, ttype=TokenType.PLAIN, data=plain) # Styles & unsetters if len(parts) == 1: token_code: str | None = "" for name, token_code in self.unsetters.items(): if token_code == parts[0]: ttype = TokenType.UNSETTER break else: for name, token_code in self.tags.items(): if token_code == parts[0]: ttype = TokenType.STYLE break else: raise AnsiSyntaxError( tag=parts[0], cause="not recognized", context=ansi ) yield Token(name=name, ttype=ttype, data=token_code) # Colors elif len(parts) >= 3: name = ";".join(parts[2:]) types = [TokenType.FG_8BIT, TokenType.FG_24BIT] if parts[0] == "48": name = "@" + name types = [TokenType.BG_8BIT, TokenType.BG_24BIT] ttype = types[0] if parts[1] == "5" else types[1] yield Token(ttype=ttype, data=name) cursor = end if cursor < len(ansi): plain = ansi[cursor:] yield Token(ttype=TokenType.PLAIN, data=plain)
Tokenize ansi text, return an Iterator to save memory.
View Source
def define(self, name: str, method: MacroCallable) -> None: """Define a Macro tag that executes `method` The `!` prefix is added to the name if not there already.""" if not name.startswith("!"): name = "!" + name self.macros[name] = method self.unsetters["/" + name] = None
Define a Macro tag that executes method
The !
prefix is added to the name if not there already.
View Source
def alias(self, name: str, value: str) -> None: """Alias a markup tag to stand for some value, generate unsetter for it""" def _get_unsetter(token: Token) -> str | None: """Get unsetter for a token""" if token.ttype is TokenType.PLAIN: return None if token.ttype is TokenType.UNSETTER: return self.unsetters[token.name] if token.ttype.name.startswith("FG"): return self.unsetters["/fg"] if token.ttype.name.startswith("BG"): return self.unsetters["/bg"] name = "/" + token.name if not name in self.unsetters: raise KeyError(f"Could not find unsetter for token {token}.") return self.unsetters[name] if name.startswith("!"): raise ValueError('Only macro tags can always start with "!".') setter = "" unsetter = "" # Try to link to existing tag if value in self.user_tags: self.unsetters["/" + name] = value self.user_tags[name] = value return for token in self.tokenize_markup("[" + value + "]"): if token.ttype is TokenType.PLAIN: continue assert token.sequence is not None setter += token.sequence t_unsetter = _get_unsetter(token) assert t_unsetter is not None unsetter += "\x1b[" + t_unsetter + "m" self.unsetters["/" + name] = unsetter.lstrip("\x1b[").rstrip("m") self.user_tags[name] = setter.lstrip("\x1b[").rstrip("m") marked: list[str] = [] for item in self._cache: if name in item: marked.append(item) for item in marked: del self._cache[item]
Alias a markup tag to stand for some value, generate unsetter for it
View Source
def parse(self, markup_text: str) -> StyledText: """Parse markup""" # TODO: Add more optimizations: # - keep track of currently-active tokens # - clean up widget dump applied_macros: list[tuple[str, MacroCall]] = [] previous_token: Token | None = None previous_sequence = "" sequence = "" out = "" def _apply_macros(text: str) -> str: """Apply current macros to text""" for _, (method, args) in applied_macros: text = method(*args, text) return text # TODO: Macros are only ran once with caching enabled if self.should_cache and markup_text in self._cache: return self._cache[markup_text] for token in self.tokenize_markup(markup_text): if sequence != "" and previous_token == token: continue if token.ttype == TokenType.UNSETTER and token.data == "0": out += "\033[0m" sequence = "" continue previous_token = token if token.ttype is TokenType.MACRO: assert isinstance(token.data, tuple) applied_macros.append((token.name, token.data)) continue if token.data is None and token.ttype is TokenType.UNSETTER: for call_str, data in applied_macros: macro_match = RE_MACRO.match(call_str) assert macro_match is not None macro_name = macro_match.groups()[0] if "/" + macro_name == token.name: applied_macros.remove((call_str, data)) continue if token.sequence is None: if previous_sequence == sequence: out += _apply_macros(token.name) continue previous_sequence = sequence out += sequence + _apply_macros(token.name) sequence = "" else: sequence += token.sequence if sequence + previous_sequence != "": out += "\x1b[0m" out = StyledText(out) self._cache[markup_text] = out return out
Parse markup
View Source
def get_markup(self, ansi: str) -> str: """Get markup from ANSI text""" current_tags: list[str] = [] out = "" for token in self.tokenize_ansi(ansi): if token.ttype is TokenType.PLAIN: if len(current_tags) != 0: out += "[" + " ".join(current_tags) + "]" assert isinstance(token.data, str) out += token.data current_tags = [] continue current_tags.append(token.name) return out
Get markup from ANSI text
View Source
def prettify_ansi(self, text: str) -> str: """Returns a prettified (syntax-highlighted) ANSI str. This is useful to quickly "inspect" a given ANSI string. However, for most real uses `MarkupLanguage.prettify_markup` would be preferable, given an argument of `MarkupLanguage.get_markup(text)`, as it is much more verbose. Args: text: The ANSI-text to prettify. Returns: The prettified ANSI text. This text's styles remain valid, so copy-pasting the argument into a command (like printf) that can show styled text will work the same way. """ out = "" sequences = "" for token in self.tokenize_ansi(text): if token.ttype is TokenType.PLAIN: assert isinstance(token.data, str) out += token.data continue assert token.sequence is not None out += "\x1b[0m" + token.sequence + token.sequence.replace("\x1b", "\\x1b") sequences += token.sequence out += sequences return out
Returns a prettified (syntax-highlighted) ANSI str.
This is useful to quickly "inspect" a given ANSI string. However,
for most real uses MarkupLanguage.prettify_markup
would be
preferable, given an argument of MarkupLanguage.get_markup(text)
,
as it is much more verbose.
Args
- text: The ANSI-text to prettify.
Returns
The prettified ANSI text. This text's styles remain valid, so copy-pasting the argument into a command (like printf) that can show styled text will work the same way.
View Source
def prettify_markup(self, text: str) -> str: """Returns a prettified (syntax-highlighted) markup str. Args: text: The markup-text to prettify. Returns: Prettified markup. This markup, excluding its styles, remains valid markup. """ styles: dict[TokenType, str] = { TokenType.MACRO: "210", TokenType.ESCAPED: "210 bold", TokenType.UNSETTER: "strikethrough", } out = "" in_sequence = False current_styles: list[Token] = [] for token in self.tokenize_markup(text): if token.ttype is TokenType.PLAIN: in_sequence = False if len(out) > 0: out += "]" sequence = "" for style in current_styles: if style.sequence is None: continue sequence += style.sequence out += sequence + token.name + "\033[m" continue out += " " if in_sequence else "[" in_sequence = True if token.ttype is TokenType.UNSETTER: name = token.name[1:] current_styles.append(token) unsetter_style = styles[TokenType.UNSETTER] special_style = ( name + " " if name in self.tags or name in self.user_tags else "" ) out += self.parse(f"[{special_style}{unsetter_style}]{name}") continue if token.sequence is not None: current_styles.append(token) style_markup = styles.get(token.ttype) or token.name out += self.parse(f"[{style_markup}]{token.name}") if in_sequence: out += "]" return out
Returns a prettified (syntax-highlighted) markup str.
Args
- text: The markup-text to prettify.
Returns
Prettified markup. This markup, excluding its styles, remains valid markup.
View Source
def prettify(self, text: str, force_markup: bool = False) -> str: """Prettifies any string. If the string contains ANSI sequences and `force_markup` is False, the `prettify_ansi` method is used. Otherwise, `prettify_markup` does the job. Since the `prettify_markup` method fails cleanly (e.g. doesn't modify a string with no markup) this is a safe call to any string. Args: text: The string to prettify. force_markup: If set, when given an ANSI string, the `MarkupLanguage.get_markup` method is used to translate it into markup, which is then prettified using `prettify_markup`. """ if len(RE_ANSI.findall(text)) > 0: if not force_markup: return self.prettify_ansi(text) text = self.get_markup(text) return self.prettify_markup(text)
Prettifies any string.
If the string contains ANSI sequences and force_markup
is False,
the prettify_ansi
method is used. Otherwise, prettify_markup
does
the job.
Since the prettify_markup
method fails cleanly (e.g. doesn't modify
a string with no markup) this is a safe call to any string.
Args
- text: The string to prettify.
- force_markup: If set, when given an ANSI string, the
MarkupLanguage.get_markup
method is used to translate it into markup, which is then prettified usingprettify_markup
.
View Source
def pprint( # pylint: disable=too-many-arguments self, item: Any, indent: int = 2, condensed: bool = False, force_markup: bool = False, return_only: bool = False, ) -> str | None: """Pretty-prints any object. Args: item: The object to pretty-print. indent: The number of spaces that should be used for indenting. Only applies when `condensed` is True. condensed: If not set each item of a container will occupy different lines. force_markup: When given an item of `str` type, containing ANSI sequences, its markup representation will be generated and displayed using `MarkupLanguage.get_markup`. See `MarkupLanguage.prettify` for more info. return_only: If set, nothing will be printed and the prettified string is returned instead. Returns: The prettified string if `return_only` is set, otherwise `None`, as the value has already been printed. """ type_styles = { int: "[pprint-int]{item}[/]", str: "[pprint-str]'{item}'[/]", None: "[pprint-none]{item}[/]", type: "[pprint-type]{item}[/]", } indent_str = indent * " " def _apply_style(value: Any) -> str: """Applies type-based style to the given value. This value can technically be of any type, and builtins have special styles defined for them. Returns: A styled-str representation of value. """ if isinstance(value, (dict, list, tuple, set)): return ( self.pprint( value, indent=indent, condensed=condensed, return_only=True ) or "" ) if isinstance(value, type): return type_styles[type].format(item=value.__name__) if isinstance(value, str): value = value.replace("[", r"\[") if type(value) in type_styles: return type_styles[type(value)].format(item=str(value)) if value is None: return type_styles[None].format(item=str(value)) return str(value) def _format_container_item(value: str) -> str: """Formats a container item.""" out = f"{value}," if condensed: out += " " if not condensed: out += "\n" return out def _format_container( container: dict | list | tuple | set, chars: tuple[str, str] ) -> str: """Formats a container-type instance. Args: container: The container to format. chars: The characters that signify the start & end of the container. Returns: A pretty representation of the given container. """ out = chars[0] if not condensed: out += "\n" if isinstance(container, dict): for key, value in item.items(): for line in _format_container_item( f"{_apply_style(key)}: {_apply_style(value)}" ).splitlines(): if condensed: out += line continue out += indent_str + line + "\n" else: for value in item: for line in _format_container_item( f"{_apply_style(value)}" ).splitlines(): if condensed: out += line continue out += indent_str + line + "\n" out = out.rstrip(", ") out += chars[1] return out buff = "" if isinstance(item, (dict, set, tuple, list)): chars = str(item)[0], str(item)[-1] buff = _format_container(item, chars) if return_only: return buff with self as mprint: mprint(buff) return None if isinstance(item, str): item = self.prettify(item, force_markup=force_markup) if item is not None: if return_only: return item print(item) return None return None
Pretty-prints any object.
Args
- item: The object to pretty-print.
- indent: The number of spaces that should be used for indenting.
Only applies when
condensed
is True. - condensed: If not set each item of a container will occupy different lines.
- force_markup: When given an item of
str
type, containing ANSI sequences, its markup representation will be generated and displayed usingMarkupLanguage.get_markup
. SeeMarkupLanguage.prettify
for more info. - return_only: If set, nothing will be printed and the prettified string is returned instead.
Returns
The prettified string if
return_only
is set, otherwiseNone
, as the value has already been printed.
View Source
def setup_displayhook( self, indent: int = 2, condensed: bool = False, force_markup: bool = False, ) -> None: """Sets up `sys.displayhook` to use `MarkupLanguage.pprint`. This can be used to pretty-print all REPL output. IPython is also supported. Usage is pretty simple: ```python3 >>> from pytermgui import markup >>> markup.setup_displayhook() >>> # Any function output will now be prettified ``` Args: indent: The amount of indentation used in printing container-types. Only applied when `condensed` is False. condensed: If set, all items in a container-type will be displayed in one line, similar to the default `repl`. force_markup: When given an ANSI-sequence containing str, its markup representation will be generated using `MarkupLanguage.get_markup`, and syntax highlighted using `MarkupLanguage.prettify_markup`. """ try: # Try to get IPython instance. This function is provided by the # IPython runtime, so if running outside of that context a NameError # is raised. ipython = get_ipython() # type: ignore except NameError: sys.displayhook = lambda value: self.pprint( value, force_markup=force_markup, condensed=condensed, indent=indent ) return ipython.display_formatter.formatters["text/plain"] = self.pprint
Sets up sys.displayhook
to use MarkupLanguage.pprint
.
This can be used to pretty-print all REPL output. IPython is also supported.
Usage is pretty simple:
>>> from pytermgui import markup
>>> markup.setup_displayhook()
>>> # Any function output will now be prettified
Args
- indent: The amount of indentation used in printing container-types.
Only applied when
condensed
is False. - condensed: If set, all items in a container-type will be displayed in
one line, similar to the default
repl
. - force_markup: When given an ANSI-sequence containing str, its markup
representation will be generated using
MarkupLanguage.get_markup
, and syntax highlighted usingMarkupLanguage.prettify_markup
.
View Source
class StyledText(str): """A styled text object. The purpose of this class is to implement some things regular `str` breaks at when encountering ANSI sequences. Instances of this class are usually spat out by `MarkupLanguage.parse`, but may be manually constructed if the need arises. Everything works even if there is no ANSI tomfoolery going on. """ value: str """The underlying, ANSI-inclusive string value.""" plain: str """The string value with no ANSI sequences.""" tokens: list[Token] """The list of tokens that make up this string.""" def __new__(cls, value: str = ""): """Creates a StyledText, gets markup tags. Args: markup_language: The markup language instance this object uses. """ obj = super().__new__(cls, value) obj.value = value obj.tokens = list(markup.tokenize_ansi(value)) obj.plain = "" for token in obj.tokens: if token.ttype is not TokenType.PLAIN: continue assert isinstance(token.data, str) obj.plain += token.data return obj def plain_index(self, index: int | None) -> int | None: """Finds given index inside plain text.""" if index is None: return None styled_chars = 0 plain_chars = 0 negative_index = False tokens = self.tokens.copy() if index < 0: tokens.reverse() index = abs(index) negative_index = True for token in tokens: if token.data is None: continue if token.ttype is not TokenType.PLAIN: assert token.sequence is not None styled_chars += len(token.sequence) continue for _ in range(len(token.data)): if plain_chars == index: if negative_index: return -1 * (plain_chars + styled_chars) return plain_chars + styled_chars plain_chars += 1 return None def __len__(self) -> int: """Gets "real" length of object.""" return len(self.plain) def __getitem__(self, subscript: int | slice) -> str: """Gets an item, adjusted for non-plain text. Args: subscript: The integer or slice to find. Returns: The elements described by the subscript. Raises: IndexError: The given index is out of range. """ if isinstance(subscript, int): plain_index = self.plain_index(subscript) if plain_index is None: raise IndexError("StyledText index out of range") return self.value[plain_index] return self.value[ slice( self.plain_index(subscript.start), self.plain_index(subscript.stop), subscript.step, ) ]
A styled text object.
The purpose of this class is to implement some things regular str
breaks at when encountering ANSI sequences.
Instances of this class are usually spat out by MarkupLanguage.parse
,
but may be manually constructed if the need arises. Everything works even
if there is no ANSI tomfoolery going on.
View Source
def __new__(cls, value: str = ""): """Creates a StyledText, gets markup tags. Args: markup_language: The markup language instance this object uses. """ obj = super().__new__(cls, value) obj.value = value obj.tokens = list(markup.tokenize_ansi(value)) obj.plain = "" for token in obj.tokens: if token.ttype is not TokenType.PLAIN: continue assert isinstance(token.data, str) obj.plain += token.data return obj
Creates a StyledText, gets markup tags.
Args
- markup_language: The markup language instance this object uses.
The underlying, ANSI-inclusive string value.
The string value with no ANSI sequences.
The list of tokens that make up this string.
View Source
def plain_index(self, index: int | None) -> int | None: """Finds given index inside plain text.""" if index is None: return None styled_chars = 0 plain_chars = 0 negative_index = False tokens = self.tokens.copy() if index < 0: tokens.reverse() index = abs(index) negative_index = True for token in tokens: if token.data is None: continue if token.ttype is not TokenType.PLAIN: assert token.sequence is not None styled_chars += len(token.sequence) continue for _ in range(len(token.data)): if plain_chars == index: if negative_index: return -1 * (plain_chars + styled_chars) return plain_chars + styled_chars plain_chars += 1 return None
Finds given index inside plain text.
Inherited Members
- builtins.str
- encode
- replace
- split
- rsplit
- join
- capitalize
- casefold
- title
- center
- count
- expandtabs
- find
- partition
- index
- ljust
- lower
- lstrip
- rfind
- rindex
- rjust
- rstrip
- rpartition
- splitlines
- strip
- swapcase
- translate
- upper
- startswith
- endswith
- removeprefix
- removesuffix
- isascii
- islower
- isupper
- istitle
- isspace
- isdecimal
- isdigit
- isnumeric
- isalpha
- isalnum
- isidentifier
- isprintable
- zfill
- format
- format_map
- maketrans