pytermgui.parser

This module provides TIM, PyTermGUI's Terminal Inline Markup language. It is a simple, performant and easy to read way to style, colorize & modify text.

Basic rundown

TIM is included with the purpose of making styling easier to read and manage.

Its syntax is based on square brackets, within which tags are strictly separated by one space character. Tags can stand for colors (xterm-256, RGB or HEX, both background & foreground), styles, unsetters and macros.

The 16 simple colors of the terminal exist as named tags that refer to their numerical value.

Here is a simple example of the syntax, using the pytermgui.pretty submodule to syntax-highlight it inside the REPL:

>>> from pytermgui import pretty
>>> '[141 @61 bold] Hello [!upper inverse] There '

General syntax

Background colors are always denoted by a leading @ character in front of the color tag. Styles are just the name of the style and macros have an exclamation mark in front of them. Additionally, unsetters use a leading slash (/) for their syntax. Color tokens have special unsetters: they use /fg to cancel foreground colors, and /bg to do so with backgrounds.

Macros:

Macros are any type of callable that take at least *args; this is the value of the plain text enclosed by the tag group within which the given macro resides. Additionally, macros can be given any number of positional arguments from within markup, using the syntax:

[!macro(arg1:arg2:arg3)]Text that the macro applies to.[/!macro]plain text, no macro

This syntax gets parsed as follows:

macro("Text that the macro applies to.", "arg1", "arg2", "arg3")

macro here is whatever the name macro was defined as prior.

Colors:

Colors can be of three general types: xterm-256, RGB and HEX.

xterm-256 stands for one of the 256 xterm colors. You can use ptg -c to see the all of the available colors. Its syntax is just the 0-base index of the color, like [141]

RGB colors are pretty self explanatory. Their syntax is follows the format RED;GREEN;BLUE, such as [111;222;333].

HEX colors are basically just RGB with extra steps. Their syntax is #RRGGBB, such as [#FA72BF]. This code then gets converted to a tuple of RGB colors under the hood, so from then on RGB and HEX colors are treated the same, and emit the same tokens.

As mentioned above, all colors can be made to act on the background instead by prepending the color tag with @, such as @141, @111;222;333 or @#FA72BF. To clear these effects, use /fg for foreground and /bg for background colors.

MarkupLanguage and instancing

All markup behaviour is done by an instance of the MarkupLanguage class. This is done partially for organization reasons, but also to allow a sort of sandboxing of custom definitions and settings.

PyTermGUI provides the tim name as the global markup language instance. For historical reasons, the same instance is available as markup. This should be used pretty much all of the time, and custom instances should only ever come about when some security-sensitive macro definitions are needed, as markup is used by every widget, including user-input ones such as InputField.

For the rest of this page, MarkupLanguage will refer to whichever instance you are using.

TL;DR : Use tim always, unless a security concern blocks you from doing so.

Caching

By default, all markup parse results are cached and returned when the same input is given. To disable this behaviour, set your markup instance (usually markup)'s should_cache field to False.

Customization

There are a couple of ways to customize how markup is parsed. Custom tags can be created by calling MarkupLanguage.alias. For defining custom macros, you can use MarkupLanguage.define. For more information, see each method's documentation.

   1"""
   2This module provides `TIM`, PyTermGUI's Terminal Inline Markup language. It is a simple,
   3performant and easy to read way to style, colorize & modify text.
   4
   5Basic rundown
   6-------------
   7
   8TIM is included with the purpose of making styling easier to read and manage.
   9
  10Its syntax is based on square brackets, within which tags are strictly separated by one
  11space character. Tags can stand for colors (xterm-256, RGB or HEX, both background &
  12foreground), styles, unsetters and macros.
  13
  14The 16 simple colors of the terminal exist as named tags that refer to their numerical
  15value.
  16
  17Here is a simple example of the syntax, using the `pytermgui.pretty` submodule to
  18syntax-highlight it inside the REPL:
  19
  20```python3
  21>>> from pytermgui import pretty
  22>>> '[141 @61 bold] Hello [!upper inverse] There '
  23```
  24
  25<p align=center>
  26<img src="https://github.com/bczsalba/pytermgui/blob/master/assets/docs/parser/\
  27simple_example.png?raw=true" width=70%>
  28</p>
  29
  30
  31General syntax
  32--------------
  33
  34Background colors are always denoted by a leading `@` character in front of the color
  35tag. Styles are just the name of the style and macros have an exclamation mark in front
  36of them. Additionally, unsetters use a leading slash (`/`) for their syntax. Color
  37tokens have special unsetters: they use `/fg` to cancel foreground colors, and `/bg` to
  38do so with backgrounds.
  39
  40### Macros:
  41
  42Macros are any type of callable that take at least *args; this is the value of the plain
  43text enclosed by the tag group within which the given macro resides. Additionally,
  44macros can be given any number of positional arguments from within markup, using the
  45syntax:
  46
  47```
  48[!macro(arg1:arg2:arg3)]Text that the macro applies to.[/!macro]plain text, no macro
  49```
  50
  51This syntax gets parsed as follows:
  52
  53```python3
  54macro("Text that the macro applies to.", "arg1", "arg2", "arg3")
  55```
  56
  57`macro` here is whatever the name `macro` was defined as prior.
  58
  59### Colors:
  60
  61Colors can be of three general types: xterm-256, RGB and HEX.
  62
  63`xterm-256` stands for one of the 256 xterm colors. You can use `ptg -c` to see the all
  64of the available colors. Its syntax is just the 0-base index of the color, like `[141]`
  65
  66`RGB` colors are pretty self explanatory. Their syntax is follows the format
  67`RED;GREEN;BLUE`, such as `[111;222;333]`.
  68
  69`HEX` colors are basically just RGB with extra steps. Their syntax is `#RRGGBB`, such as
  70`[#FA72BF]`. This code then gets converted to a tuple of RGB colors under the hood, so
  71from then on RGB and HEX colors are treated the same, and emit the same tokens.
  72
  73As mentioned above, all colors can be made to act on the background instead by
  74prepending the color tag with `@`, such as `@141`, `@111;222;333` or `@#FA72BF`. To
  75clear these effects, use `/fg` for foreground and `/bg` for background colors.
  76
  77`MarkupLanguage` and instancing
  78-------------------------------
  79
  80All markup behaviour is done by an instance of the `MarkupLanguage` class. This is done
  81partially for organization reasons, but also to allow a sort of sandboxing of custom
  82definitions and settings.
  83
  84PyTermGUI provides the `tim` name as the global markup language instance. For historical
  85reasons, the same instance is available as `markup`. This should be used pretty much all
  86of the time, and custom instances should only ever come about when some
  87security-sensitive macro definitions are needed, as `markup` is used by every widget,
  88including user-input ones such as `InputField`.
  89
  90For the rest of this page, `MarkupLanguage` will refer to whichever instance you are
  91using.
  92
  93TL;DR : Use `tim` always, unless a security concern blocks you from doing so.
  94
  95Caching
  96-------
  97
  98By default, all markup parse results are cached and returned when the same input is
  99given. To disable this behaviour, set your markup instance (usually `markup`)'s
 100`should_cache` field to False.
 101
 102Customization
 103-------------
 104
 105There are a couple of ways to customize how markup is parsed. Custom tags can be created
 106by calling `MarkupLanguage.alias`. For defining custom macros, you can use
 107`MarkupLanguage.define`. For more information, see each method's documentation.
 108"""
 109# pylint: disable=too-many-lines
 110
 111from __future__ import annotations
 112
 113from random import shuffle
 114from contextlib import suppress
 115from dataclasses import dataclass
 116from argparse import ArgumentParser
 117from enum import Enum, auto as _auto
 118from typing import Iterator, Callable, Tuple, List
 119
 120from .terminal import get_terminal
 121from .colors import str_to_color, Color, StandardColor
 122from .regex import RE_ANSI, RE_MARKUP, RE_MACRO, RE_LINK
 123from .exceptions import MarkupSyntaxError, ColorSyntaxError, AnsiSyntaxError
 124
 125
 126__all__ = [
 127    "StyledText",
 128    "MacroCallable",
 129    "MacroCall",
 130    "MarkupLanguage",
 131    "markup",
 132    "tim",
 133]
 134
 135MacroCallable = Callable[..., str]
 136MacroCall = Tuple[MacroCallable, List[str]]
 137
 138STYLE_MAP = {
 139    "bold": "1",
 140    "dim": "2",
 141    "italic": "3",
 142    "underline": "4",
 143    "blink": "5",
 144    "blink2": "6",
 145    "inverse": "7",
 146    "invisible": "8",
 147    "strikethrough": "9",
 148    "overline": "53",
 149}
 150
 151UNSETTER_MAP: dict[str, str | None] = {
 152    "/": "0",
 153    "/bold": "22",
 154    "/dim": "22",
 155    "/italic": "23",
 156    "/underline": "24",
 157    "/blink": "25",
 158    "/blink2": "26",
 159    "/inverse": "27",
 160    "/invisible": "28",
 161    "/strikethrough": "29",
 162    "/fg": "39",
 163    "/bg": "49",
 164    "/overline": "54",
 165}
 166
 167
 168def macro_align(width: str, alignment: str, content: str) -> str:
 169    """Aligns given text using fstrings.
 170
 171    Args:
 172        width: The width to align to.
 173        alignment: One of "left", "center", "right".
 174        content: The content to align; implicit argument.
 175    """
 176
 177    aligner = "<" if alignment == "left" else (">" if alignment == "right" else "^")
 178    return f"{content:{aligner}{width}}"
 179
 180
 181def macro_expand(lang: MarkupLanguage, tag: str) -> str:
 182    """Expands a tag alias."""
 183
 184    if not tag in lang.user_tags:
 185        return tag
 186
 187    return lang.get_markup(f"\x1b[{lang.user_tags[tag]}m ")[:-1]
 188
 189
 190def macro_strip_fg(item: str) -> str:
 191    """Strips foreground color from item"""
 192
 193    return markup.parse(f"[/fg]{item}")
 194
 195
 196def macro_strip_bg(item: str) -> str:
 197    """Strips foreground color from item"""
 198
 199    return markup.parse(f"[/bg]{item}")
 200
 201
 202def macro_shuffle(item: str) -> str:
 203    """Shuffles a string using shuffle.shuffle on its list cast."""
 204
 205    shuffled = list(item)
 206    shuffle(shuffled)
 207
 208    return "".join(shuffled)
 209
 210
 211def macro_link(*args) -> str:
 212    """Creates a clickable hyperlink.
 213
 214    Note:
 215        Since this is a pretty new feature for terminals, its support is limited.
 216    """
 217
 218    *uri_parts, label = args
 219    uri = ":".join(uri_parts)
 220
 221    return f"\x1b]8;;{uri}\x1b\\{label}\x1b]8;;\x1b\\"
 222
 223
 224def _apply_colors(colors: list[str] | list[int], item: str) -> str:
 225    """Applies the given list of colors to the item, spread out evenly."""
 226
 227    blocksize = max(round(len(item) / len(colors)), 1)
 228
 229    out = ""
 230    current_block = 0
 231    for i, char in enumerate(item):
 232        if i % blocksize == 0 and current_block < len(colors):
 233            out += f"[{colors[current_block]}]"
 234            current_block += 1
 235
 236        out += char
 237
 238    return markup.parse(out)
 239
 240
 241def macro_rainbow(item: str) -> str:
 242    """Creates rainbow-colored text."""
 243
 244    colors = ["red", "208", "yellow", "green", "brightblue", "blue", "93"]
 245
 246    return _apply_colors(colors, item)
 247
 248
 249def macro_gradient(base_str: str, item: str) -> str:
 250    """Creates an xterm-256 gradient from a base color.
 251
 252    This exploits the way the colors are arranged in the xterm color table; every
 253    36th color is the next item of a single gradient.
 254
 255    The start of this given gradient is calculated by decreasing the given base by 36 on
 256    every iteration as long as the point is a valid gradient start.
 257
 258    After that, the 6 colors of this gradient are calculated and applied.
 259    """
 260
 261    if not base_str.isdigit():
 262        raise ValueError(f"Gradient base has to be a digit, got {base_str}.")
 263
 264    base = int(base_str)
 265    if base < 16 or base > 231:
 266        raise ValueError("Gradient base must be between 16 and 232")
 267
 268    while base > 52:
 269        base -= 36
 270
 271    colors = []
 272    for i in range(6):
 273        colors.append(base + 36 * i)
 274
 275    return _apply_colors(colors, item)
 276
 277
 278class TokenType(Enum):
 279    """An Enum to store various token types."""
 280
 281    LINK = _auto()
 282    """A terminal hyperlink."""
 283
 284    PLAIN = _auto()
 285    """Plain text, nothing interesting."""
 286
 287    COLOR = _auto()
 288    """A color token. Has a `pytermgui.colors.Color` instance as its data."""
 289
 290    STYLE = _auto()
 291    """A builtin terminal style, such as `bold` or `italic`."""
 292
 293    MACRO = _auto()
 294    """A PTG markup macro. The macro itself is stored inside `self.data`."""
 295
 296    ESCAPED = _auto()
 297    """An escaped token."""
 298
 299    UNSETTER = _auto()
 300    """A token that unsets some other attribute."""
 301
 302    POSITION = _auto()
 303    """A token representing a positioning string. `self.data` follows the format `x,y`."""
 304
 305
 306@dataclass
 307class Token:
 308    """A class holding information on a singular markup or ANSI style unit.
 309
 310    Attributes:
 311    """
 312
 313    ttype: TokenType
 314    """The type of this token."""
 315
 316    data: str | MacroCall | Color | None
 317    """The data contained within this token. This changes based on the `ttype` attr."""
 318
 319    name: str = "<unnamed-token>"
 320    """An optional display name of the token. Defaults to `data` when not given."""
 321
 322    def __post_init__(self) -> None:
 323        """Sets `name` to `data` if not provided."""
 324
 325        if self.name == "<unnamed-token>":
 326            if isinstance(self.data, str):
 327                self.name = self.data
 328
 329            elif isinstance(self.data, Color):
 330                self.name = self.data.name
 331
 332            else:
 333                raise TypeError
 334
 335        # Create LINK from a plain token
 336        if self.ttype is TokenType.PLAIN:
 337            assert isinstance(self.data, str)
 338
 339            link_match = RE_LINK.match(self.data)
 340
 341            if link_match is not None:
 342                self.data, self.name = link_match.groups()
 343                self.ttype = TokenType.LINK
 344
 345        if self.ttype is TokenType.ESCAPED:
 346            assert isinstance(self.data, str)
 347
 348            self.name = self.data[1:]
 349
 350    def __eq__(self, other: object) -> bool:
 351        """Checks equality with `other`."""
 352
 353        if other is None:
 354            return False
 355
 356        if not isinstance(other, type(self)):
 357            return False
 358
 359        return other.data == self.data and other.ttype is self.ttype
 360
 361    @property
 362    def sequence(self) -> str | None:
 363        """Returns the ANSI sequence this token represents."""
 364
 365        if self.data is None:
 366            return None
 367
 368        if self.ttype in [TokenType.PLAIN, TokenType.MACRO, TokenType.ESCAPED]:
 369            return None
 370
 371        if self.ttype is TokenType.LINK:
 372            return macro_link(self.data, self.name)
 373
 374        if self.ttype is TokenType.POSITION:
 375            assert isinstance(self.data, str)
 376            position = self.data.split(",")
 377            return f"\x1b[{position[1]};{position[0]}H"
 378
 379        # Colors and styles
 380        data = self.data
 381
 382        if self.ttype in [TokenType.STYLE, TokenType.UNSETTER]:
 383            return f"\033[{data}m"
 384
 385        assert isinstance(data, Color)
 386        return data.sequence
 387
 388
 389class StyledText(str):
 390    """A styled text object.
 391
 392    The purpose of this class is to implement some things regular `str`
 393    breaks at when encountering ANSI sequences.
 394
 395    Instances of this class are usually spat out by `MarkupLanguage.parse`,
 396    but may be manually constructed if the need arises. Everything works even
 397    if there is no ANSI tomfoolery going on.
 398    """
 399
 400    value: str
 401    """The underlying, ANSI-inclusive string value."""
 402
 403    _plain: str | None = None
 404    _tokens: list[Token] | None = None
 405
 406    def __new__(cls, value: str = ""):
 407        """Creates a StyledText, gets markup tags."""
 408
 409        obj = super().__new__(cls, value)
 410        obj.value = value
 411
 412        return obj
 413
 414    def _generate_tokens(self) -> None:
 415        """Generates self._tokens & self._plain."""
 416
 417        self._tokens = list(tim.tokenize_ansi(self.value))
 418
 419        self._plain = ""
 420        for token in self._tokens:
 421            if token.ttype is not TokenType.PLAIN:
 422                continue
 423
 424            assert isinstance(token.data, str)
 425            self._plain += token.data
 426
 427    @property
 428    def tokens(self) -> list[Token]:
 429        """Returns all markup tokens of this object.
 430
 431        Generated on-demand, at the first call to this or the self.plain
 432        property.
 433        """
 434
 435        if self._tokens is not None:
 436            return self._tokens
 437
 438        self._generate_tokens()
 439        assert self._tokens is not None
 440        return self._tokens
 441
 442    @property
 443    def plain(self) -> str:
 444        """Returns the value of this object, with no ANSI sequences.
 445
 446        Generated on-demand, at the first call to this or the self.tokens
 447        property.
 448        """
 449
 450        if self._plain is not None:
 451            return self._plain
 452
 453        self._generate_tokens()
 454        assert self._plain is not None
 455        return self._plain
 456
 457    def plain_index(self, index: int | None) -> int | None:
 458        """Finds given index inside plain text."""
 459
 460        if index is None:
 461            return None
 462
 463        styled_chars = 0
 464        plain_chars = 0
 465        negative_index = False
 466
 467        tokens = self.tokens.copy()
 468        if index < 0:
 469            tokens.reverse()
 470            index = abs(index)
 471            negative_index = True
 472
 473        for token in tokens:
 474            if token.data is None:
 475                continue
 476
 477            if token.ttype is not TokenType.PLAIN:
 478                assert token.sequence is not None
 479                styled_chars += len(token.sequence)
 480                continue
 481
 482            assert isinstance(token.data, str)
 483            for _ in range(len(token.data)):
 484                if plain_chars == index:
 485                    if negative_index:
 486                        return -1 * (plain_chars + styled_chars)
 487
 488                    return styled_chars + plain_chars
 489
 490                plain_chars += 1
 491
 492        return None
 493
 494    def __len__(self) -> int:
 495        """Gets "real" length of object."""
 496
 497        return len(self.plain)
 498
 499    def __getitem__(self, subscript: int | slice) -> str:
 500        """Gets an item, adjusted for non-plain text.
 501
 502        Args:
 503            subscript: The integer or slice to find.
 504
 505        Returns:
 506            The elements described by the subscript.
 507
 508        Raises:
 509            IndexError: The given index is out of range.
 510        """
 511
 512        if isinstance(subscript, int):
 513            plain_index = self.plain_index(subscript)
 514            if plain_index is None:
 515                raise IndexError("StyledText index out of range")
 516
 517            return self.value[plain_index]
 518
 519        return self.value[
 520            slice(
 521                self.plain_index(subscript.start),
 522                self.plain_index(subscript.stop),
 523                subscript.step,
 524            )
 525        ]
 526
 527
 528class MarkupLanguage:
 529    """A class representing an instance of a Markup Language.
 530
 531    This class is used for all markup/ANSI parsing, tokenizing and usage.
 532
 533    ```python3
 534    from pytermgui import tim
 535
 536    tim.alias("my-tag", "@152 72 bold")
 537    tim.print("This is [my-tag]my-tag[/]!")
 538    ```
 539
 540    <p style="text-align: center">
 541        <img src="https://raw.githubusercontent.com/bczsalba/pytermgui/master/assets/\
 542docs/parser/markup_language.png"
 543        style="width: 80%">
 544    </p>
 545    """
 546
 547    raise_unknown_markup: bool = False
 548    """Raise `pytermgui.exceptions.MarkupSyntaxError` when encountering unknown markup tags."""
 549
 550    def __init__(self, default_macros: bool = True) -> None:
 551        """Initializes a MarkupLanguage.
 552
 553        Args:
 554            default_macros: If not set, the builtin macros are not defined.
 555        """
 556
 557        self.tags: dict[str, str] = STYLE_MAP.copy()
 558        self._cache: dict[str, StyledText] = {}
 559        self.macros: dict[str, MacroCallable] = {}
 560        self.user_tags: dict[str, str] = {}
 561        self.unsetters: dict[str, str | None] = UNSETTER_MAP.copy()
 562
 563        self.should_cache: bool = True
 564
 565        if default_macros:
 566            self.define("!link", macro_link)
 567            self.define("!align", macro_align)
 568            self.define("!markup", self.get_markup)
 569            self.define("!shuffle", macro_shuffle)
 570            self.define("!strip_bg", macro_strip_bg)
 571            self.define("!strip_fg", macro_strip_fg)
 572            self.define("!rainbow", macro_rainbow)
 573            self.define("!gradient", macro_gradient)
 574            self.define("!upper", lambda item: str(item.upper()))
 575            self.define("!lower", lambda item: str(item.lower()))
 576            self.define("!title", lambda item: str(item.title()))
 577            self.define("!capitalize", lambda item: str(item.capitalize()))
 578            self.define("!expand", lambda tag: macro_expand(self, tag))
 579            self.define("!debug", lambda *args: ",".join(ascii(arg) for arg in args))
 580
 581        self.alias("code", "dim @black")
 582        self.alias("code.str", "142")
 583        self.alias("code.multiline_str", "code.str")
 584        self.alias("code.none", "167")
 585        self.alias("code.global", "214")
 586        self.alias("code.number", "175")
 587        self.alias("code.keyword", "203")
 588        self.alias("code.identifier", "109")
 589        self.alias("code.name", "code.global")
 590        self.alias("code.comment", "240 italic")
 591        self.alias("code.builtin", "code.global")
 592        self.alias("code.file", "code.identifier")
 593        self.alias("code.symbol", "code.identifier")
 594
 595    def _get_color_token(self, tag: str) -> Token | None:
 596        """Tries to get a color token from the given tag.
 597
 598        Args:
 599            tag: The tag to parse.
 600
 601        Returns:
 602            A color token if the given tag could be parsed into one, else None.
 603        """
 604
 605        try:
 606            color = str_to_color(tag, use_cache=self.should_cache)
 607
 608        except ColorSyntaxError:
 609            return None
 610
 611        return Token(name=color.value, ttype=TokenType.COLOR, data=color)
 612
 613    def _get_style_token(self, tag: str) -> Token | None:
 614        """Tries to get a style (including unsetter) token from tags, user tags and unsetters.
 615
 616        Args:
 617            tag: The tag to parse.
 618
 619        Returns:
 620            A `Token` if one could be created, None otherwise.
 621        """
 622
 623        if tag in self.unsetters:
 624            return Token(name=tag, ttype=TokenType.UNSETTER, data=self.unsetters[tag])
 625
 626        if tag in self.user_tags:
 627            return Token(name=tag, ttype=TokenType.STYLE, data=self.user_tags[tag])
 628
 629        if tag in self.tags:
 630            return Token(name=tag, ttype=TokenType.STYLE, data=self.tags[tag])
 631
 632        return None
 633
 634    def print(self, *args, **kwargs) -> None:
 635        """Parse all arguments and pass them through to print, along with kwargs."""
 636
 637        parsed = []
 638        for arg in args:
 639            parsed.append(self.parse(str(arg)))
 640
 641        get_terminal().print(*parsed, **kwargs)
 642
 643    def tokenize_markup(self, markup_text: str) -> Iterator[Token]:
 644        """Converts the given markup string into an iterator of `Token`.
 645
 646        Args:
 647            markup_text: The text to look at.
 648
 649        Returns:
 650            An iterator of tokens. The reason this is an iterator is to possibly save
 651            on memory.
 652        """
 653
 654        end = 0
 655        start = 0
 656        cursor = 0
 657        for match in RE_MARKUP.finditer(markup_text):
 658            full, escapes, tag_text = match.groups()
 659            start, end = match.span()
 660
 661            # Add plain text between last and current match
 662            if start > cursor:
 663                yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:start])
 664
 665            if not escapes == "" and len(escapes) % 2 == 1:
 666                cursor = end
 667                yield Token(ttype=TokenType.ESCAPED, data=full[len(escapes) - 1 :])
 668                continue
 669
 670            for tag in tag_text.split():
 671                token = self._get_style_token(tag)
 672                if token is not None:
 673                    yield token
 674                    continue
 675
 676                # Try to find a color token
 677                token = self._get_color_token(tag)
 678                if token is not None:
 679                    yield token
 680                    continue
 681
 682                macro_match = RE_MACRO.match(tag)
 683                if macro_match is not None:
 684                    name, args = macro_match.groups()
 685                    macro_args = () if args is None else args.split(":")
 686
 687                    if not name in self.macros:
 688                        raise MarkupSyntaxError(
 689                            tag=tag,
 690                            cause="is not a defined macro",
 691                            context=markup_text,
 692                        )
 693
 694                    yield Token(
 695                        name=tag,
 696                        ttype=TokenType.MACRO,
 697                        data=(self.macros[name], macro_args),
 698                    )
 699                    continue
 700
 701                if self.raise_unknown_markup:
 702                    raise MarkupSyntaxError(
 703                        tag=tag, cause="not defined", context=markup_text
 704                    )
 705
 706            cursor = end
 707
 708        # Add remaining text as plain
 709        if len(markup_text) > cursor:
 710            yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:])
 711
 712    def tokenize_ansi(self, ansi: str) -> Iterator[Token]:
 713        """Converts the given ANSI string into an iterator of `Token`.
 714
 715        Args:
 716            ansi: The text to look at.
 717
 718        Returns:
 719            An iterator of tokens. The reason this is an iterator is to possibly save
 720            on memory.
 721        """
 722
 723        def _is_in_tags(code: str, tags: dict[str, str]) -> str | None:
 724            """Determines whether a code is in the given dict of tags."""
 725
 726            for name, current in tags.items():
 727                if current == code:
 728                    return name
 729
 730            return None
 731
 732        def _generate_color(
 733            parts: list[str], code: str
 734        ) -> tuple[str, TokenType, Color]:
 735            """Generates a color token."""
 736
 737            data: Color
 738            if len(parts) == 1:
 739                data = StandardColor.from_ansi(code)
 740                name = data.name
 741                ttype = TokenType.COLOR
 742
 743            else:
 744                data = str_to_color(code)
 745                name = data.name
 746                ttype = TokenType.COLOR
 747
 748            return name, ttype, data
 749
 750        end = 0
 751        start = 0
 752        cursor = 0
 753
 754        # StyledText messes with indexing, so we need to cast it
 755        # back to str.
 756        if isinstance(ansi, StyledText):
 757            ansi = str(ansi)
 758
 759        for match in RE_ANSI.finditer(ansi):
 760            code = match.groups()[0]
 761            start, end = match.span()
 762
 763            if code is None:
 764                continue
 765
 766            parts = code.split(";")
 767
 768            if start > cursor:
 769                plain = ansi[cursor:start]
 770
 771                yield Token(name=plain, ttype=TokenType.PLAIN, data=plain)
 772
 773            name: str | None = code
 774            ttype = None
 775            data: str | Color = parts[0]
 776
 777            # Styles & Unsetters
 778            if len(parts) == 1:
 779                # Covariancy is not an issue here, even though mypy seems to think so.
 780                name = _is_in_tags(parts[0], self.unsetters)  # type: ignore
 781                if name is not None:
 782                    ttype = TokenType.UNSETTER
 783
 784                else:
 785                    name = _is_in_tags(parts[0], self.tags)
 786                    if name is not None:
 787                        ttype = TokenType.STYLE
 788
 789            # Colors
 790            if ttype is None:
 791                with suppress(ColorSyntaxError):
 792                    name, ttype, data = _generate_color(parts, code)
 793
 794            if name is None or ttype is None or data is None:
 795                if len(parts) != 2:
 796                    raise AnsiSyntaxError(
 797                        tag=parts[0], cause="not recognized", context=ansi
 798                    )
 799
 800                name = "position"
 801                ttype = TokenType.POSITION
 802                data = ",".join(reversed(parts))
 803
 804            yield Token(name=name, ttype=ttype, data=data)
 805            cursor = end
 806
 807        if cursor < len(ansi):
 808            plain = ansi[cursor:]
 809
 810            yield Token(ttype=TokenType.PLAIN, data=plain)
 811
 812    def define(self, name: str, method: MacroCallable) -> None:
 813        """Defines a Macro tag that executes the given method.
 814
 815        Args:
 816            name: The name the given method will be reachable by within markup.
 817                The given value gets "!" prepended if it isn't present already.
 818            method: The method this macro will execute.
 819        """
 820
 821        if not name.startswith("!"):
 822            name = f"!{name}"
 823
 824        self.macros[name] = method
 825        self.unsetters[f"/{name}"] = None
 826
 827    def alias(self, name: str, value: str) -> None:
 828        """Aliases the given name to a value, and generates an unsetter for it.
 829
 830        Note that it is not possible to alias macros.
 831
 832        Args:
 833            name: The name of the new tag.
 834            value: The value the new tag will stand for.
 835        """
 836
 837        def _get_unsetter(token: Token) -> str | None:
 838            """Get unsetter for a token"""
 839
 840            if token.ttype is TokenType.PLAIN:
 841                return None
 842
 843            if token.ttype is TokenType.UNSETTER:
 844                return self.unsetters[token.name]
 845
 846            if token.ttype is TokenType.COLOR:
 847                assert isinstance(token.data, Color)
 848
 849                if token.data.background:
 850                    return self.unsetters["/bg"]
 851
 852                return self.unsetters["/fg"]
 853
 854            name = f"/{token.name}"
 855            if not name in self.unsetters:
 856                raise KeyError(f"Could not find unsetter for token {token}.")
 857
 858            return self.unsetters[name]
 859
 860        if name.startswith("!"):
 861            raise ValueError('Only macro tags can always start with "!".')
 862
 863        setter = ""
 864        unsetter = ""
 865
 866        # Try to link to existing tag
 867        if value in self.user_tags:
 868            self.unsetters[f"/{name}"] = self.unsetters[f"/{value}"]
 869            self.user_tags[name] = self.user_tags[value]
 870            return
 871
 872        for token in self.tokenize_markup(f"[{value}]"):
 873            if token.ttype is TokenType.PLAIN:
 874                continue
 875
 876            assert token.sequence is not None
 877            setter += token.sequence
 878
 879            t_unsetter = _get_unsetter(token)
 880            unsetter += f"\x1b[{t_unsetter}m"
 881
 882        self.unsetters[f"/{name}"] = unsetter.lstrip("\x1b[").rstrip("m")
 883        self.user_tags[name] = setter.lstrip("\x1b[").rstrip("m")
 884
 885        marked: list[str] = []
 886        for item in self._cache:
 887            if name in item:
 888                marked.append(item)
 889
 890        for item in marked:
 891            del self._cache[item]
 892
 893    # TODO: I cannot cut down the one-too-many branch that this has at the moment.
 894    #       We could look into it in the future, however.
 895    def parse(  # pylint: disable=too-many-branches
 896        self, markup_text: str
 897    ) -> StyledText:
 898        """Parses the given markup.
 899
 900        Args:
 901            markup_text: The markup to parse.
 902
 903        Returns:
 904            A `StyledText` instance of the result of parsing the input. This
 905            custom `str` class is used to allow accessing the plain value of
 906            the output, as well as to cleanly index within it. It is analogous
 907            to builtin `str`, only adds extra things on top.
 908        """
 909
 910        applied_macros: list[tuple[str, MacroCall]] = []
 911        previous_token: Token | None = None
 912        previous_sequence = ""
 913        sequence = ""
 914        out = ""
 915
 916        def _apply_macros(text: str) -> str:
 917            """Apply current macros to text"""
 918
 919            for _, (method, args) in applied_macros:
 920                text = method(*args, text)
 921
 922            return text
 923
 924        def _is_same_colorgroup(previous: Token, new: Token) -> bool:
 925            if not isinstance(new.data, Color) or not isinstance(previous.data, Color):
 926                return False
 927
 928            return (
 929                type(previous) is type(new)
 930                and previous.data.background == new.data.background
 931            )
 932
 933        if (
 934            self.should_cache
 935            and markup_text in self._cache
 936            and len(RE_MACRO.findall(markup_text)) == 0
 937        ):
 938            return self._cache[markup_text]
 939
 940        token: Token
 941        for token in self.tokenize_markup(markup_text):
 942            if sequence != "" and previous_token == token:
 943                continue
 944
 945            # Optimize out previously added color tokens, as only the most
 946            # recent would be visible anyways.
 947            if (
 948                token.sequence is not None
 949                and previous_token is not None
 950                and _is_same_colorgroup(previous_token, token)
 951            ):
 952                sequence = token.sequence
 953                continue
 954
 955            if token.ttype == TokenType.UNSETTER and token.data == "0":
 956                out += "\033[0m"
 957                sequence = ""
 958                applied_macros = []
 959                continue
 960
 961            previous_token = token
 962
 963            # Macro unsetters are stored with None as their data
 964            if token.data is None and token.ttype is TokenType.UNSETTER:
 965                for item, data in applied_macros.copy():
 966                    macro_match = RE_MACRO.match(item)
 967                    assert macro_match is not None
 968
 969                    macro_name = macro_match.groups()[0]
 970
 971                    if f"/{macro_name}" == token.name:
 972                        applied_macros.remove((item, data))
 973
 974                continue
 975
 976            if token.ttype is TokenType.MACRO:
 977                assert isinstance(token.data, tuple)
 978
 979                applied_macros.append((token.name, token.data))
 980                continue
 981
 982            if token.sequence is None:
 983                applied = sequence
 984
 985                if not out.endswith("\x1b[0m"):
 986                    for item in previous_sequence.split("\x1b"):
 987                        if item == "" or item[1:-1] in self.unsetters.values():
 988                            continue
 989
 990                        item = f"\x1b{item}"
 991                        applied = applied.replace(item, "")
 992
 993                out += applied + _apply_macros(token.name)
 994                previous_sequence = sequence
 995                sequence = ""
 996                continue
 997
 998            sequence += token.sequence
 999
1000        if sequence + previous_sequence != "":
1001            out += "\x1b[0m"
1002
1003        out = StyledText(out)
1004        self._cache[markup_text] = out
1005        return out
1006
1007    def get_markup(self, ansi: str) -> str:
1008        """Generates markup from ANSI text.
1009
1010        Args:
1011            ansi: The text to get markup from.
1012
1013        Returns:
1014            A markup string that can be parsed to get (visually) the same
1015            result. Note that this conversion is lossy in a way: there are some
1016            details (like macros) that cannot be preserved in an ANSI->Markup->ANSI
1017            conversion.
1018        """
1019
1020        current_tags: list[str] = []
1021        out = ""
1022        for token in self.tokenize_ansi(ansi):
1023            if token.ttype is TokenType.PLAIN:
1024                if len(current_tags) != 0:
1025                    out += "[" + " ".join(current_tags) + "]"
1026
1027                assert isinstance(token.data, str)
1028                out += token.data
1029                current_tags = []
1030                continue
1031
1032            if token.ttype is TokenType.ESCAPED:
1033                assert isinstance(token.data, str)
1034
1035                current_tags.append(token.data)
1036                continue
1037
1038            current_tags.append(token.name)
1039
1040        return out
1041
1042    def prettify_ansi(self, text: str) -> str:
1043        """Returns a prettified (syntax-highlighted) ANSI str.
1044
1045        This is useful to quickly "inspect" a given ANSI string. However,
1046        for most real uses `MarkupLanguage.prettify_markup` would be
1047        preferable, given an argument of `MarkupLanguage.get_markup(text)`,
1048        as it is much more verbose.
1049
1050        Args:
1051            text: The ANSI-text to prettify.
1052
1053        Returns:
1054            The prettified ANSI text. This text's styles remain valid,
1055            so copy-pasting the argument into a command (like printf)
1056            that can show styled text will work the same way.
1057        """
1058
1059        out = ""
1060        sequences = ""
1061        for token in self.tokenize_ansi(text):
1062            if token.ttype is TokenType.PLAIN:
1063                assert isinstance(token.data, str)
1064                out += token.data
1065                continue
1066
1067            assert token.sequence is not None
1068            out += "\x1b[0m" + token.sequence + token.sequence.replace("\x1b", "\\x1b")
1069            sequences += token.sequence
1070            out += sequences
1071
1072        return out
1073
1074    def prettify_markup(self, text: str) -> str:
1075        """Returns a prettified (syntax-highlighted) markup str.
1076
1077        Args:
1078            text: The markup-text to prettify.
1079
1080        Returns:
1081            Prettified markup. This markup, excluding its styles,
1082            remains valid markup.
1083        """
1084
1085        def _apply_macros(text: str) -> str:
1086            """Apply current macros to text"""
1087
1088            for _, (method, args) in applied_macros:
1089                text = method(*args, text)
1090
1091            return text
1092
1093        def _pop_macro(name: str) -> None:
1094            """Pops a macro from applied_macros."""
1095
1096            for i, (macro_name, _) in enumerate(applied_macros):
1097                if macro_name == name:
1098                    applied_macros.pop(i)
1099                    break
1100
1101        def _finish(out: str, in_sequence: bool) -> str:
1102            """Adds ending cap to the given string."""
1103
1104            if in_sequence:
1105                if not out.endswith("\x1b[0m"):
1106                    out += "\x1b[0m"
1107
1108                return out + "]"
1109
1110            return out + "[/]"
1111
1112        styles: dict[TokenType, str] = {
1113            TokenType.MACRO: "210",
1114            TokenType.ESCAPED: "210 bold",
1115            TokenType.UNSETTER: "strikethrough",
1116        }
1117
1118        applied_macros: list[tuple[str, MacroCall]] = []
1119
1120        out = ""
1121        in_sequence = False
1122        current_styles: list[Token] = []
1123
1124        for token in self.tokenize_markup(text):
1125            if token.ttype in [TokenType.PLAIN, TokenType.ESCAPED]:
1126                if in_sequence:
1127                    out += "]"
1128
1129                in_sequence = False
1130
1131                sequence = ""
1132                for style in current_styles:
1133                    if style.sequence is None:
1134                        continue
1135
1136                    sequence += style.sequence
1137
1138                out += f"{sequence}{_apply_macros(token.name)}\033[0m"
1139                continue
1140
1141            out += " " if in_sequence else "["
1142            in_sequence = True
1143
1144            if token.ttype is TokenType.UNSETTER:
1145                if token.name == "/":
1146                    applied_macros = []
1147
1148                name = token.name[1:]
1149
1150                if name in self.macros:
1151                    _pop_macro(name)
1152
1153                current_styles.append(token)
1154
1155                out += self.parse(
1156                    ("" if (name in self.tags) or (name in self.user_tags) else "")
1157                    + f"[{styles[TokenType.UNSETTER]}]/{name}"
1158                )
1159                continue
1160
1161            if token.ttype is TokenType.MACRO:
1162                assert isinstance(token.data, tuple)
1163
1164                name = token.name
1165                if "(" in name:
1166                    name = name[: token.name.index("(")]
1167
1168                applied_macros.append((name, token.data))
1169
1170                try:
1171                    out += token.data[0](*token.data[1], token.name)
1172                    continue
1173
1174                except TypeError:  # Not enough arguments
1175                    pass
1176
1177            if token.sequence is not None:
1178                current_styles.append(token)
1179
1180            style_markup = styles.get(token.ttype) or token.name
1181            out += self.parse(f"[{style_markup}]{token.name}")
1182
1183        return _finish(out, in_sequence)
1184
1185    def get_styled_plains(self, text: str) -> Iterator[StyledText]:
1186        """Gets all plain tokens within text, with their respective styles applied.
1187
1188        Args:
1189            text: The ANSI-sequence containing string to find plains from.
1190
1191        Returns:
1192            An iterator of `StyledText` objects, each yielded when a new plain token is found,
1193            containing the styles that are relevant and active on the given plain.
1194        """
1195
1196        def _apply_styles(styles: list[Token], text: str) -> str:
1197            """Applies given styles to text."""
1198
1199            for token in styles:
1200                if token.ttype is TokenType.MACRO:
1201                    assert isinstance(token.data, tuple)
1202                    text = token.data[0](*token.data[1], text)
1203                    continue
1204
1205                if token.sequence is None:
1206                    continue
1207
1208                text = token.sequence + text
1209
1210            return text
1211
1212        def _pop_unsetter(token: Token, styles: list[Token]) -> list[Token]:
1213            """Removes an unsetter from the list, returns the new list."""
1214
1215            if token.name == "/":
1216                return list(filter(lambda tkn: tkn.ttype is TokenType.POSITION, styles))
1217
1218            target_name = token.name[1:]
1219            for style in styles:
1220                # bold & dim unsetters represent the same character, so we have
1221                # to treat them the same way.
1222                style_name = style.name
1223
1224                if style.name == "dim":
1225                    style_name = "bold"
1226
1227                if style_name == target_name:
1228                    styles.remove(style)
1229
1230                elif (
1231                    style_name.startswith(target_name)
1232                    and style.ttype is TokenType.MACRO
1233                ):
1234                    styles.remove(style)
1235
1236                elif style.ttype is TokenType.COLOR:
1237                    assert isinstance(style.data, Color)
1238                    if target_name == "fg" and not style.data.background:
1239                        styles.remove(style)
1240
1241                    elif target_name == "bg" and style.data.background:
1242                        styles.remove(style)
1243
1244            return styles
1245
1246        def _pop_position(styles: list[Token]) -> list[Token]:
1247            for token in styles.copy():
1248                if token.ttype is TokenType.POSITION:
1249                    styles.remove(token)
1250
1251            return styles
1252
1253        styles: list[Token] = []
1254        for token in self.tokenize_ansi(text):
1255            if token.ttype is TokenType.COLOR:
1256                for i, style in enumerate(reversed(styles)):
1257                    if style.ttype is TokenType.COLOR:
1258                        assert isinstance(style.data, Color)
1259                        assert isinstance(token.data, Color)
1260
1261                        if style.data.background != token.data.background:
1262                            continue
1263
1264                        styles[len(styles) - i - 1] = token
1265                        break
1266                else:
1267                    styles.append(token)
1268
1269                continue
1270
1271            if token.ttype is TokenType.LINK:
1272                styles.append(token)
1273                yield StyledText(_apply_styles(styles, token.name))
1274
1275            if token.ttype is TokenType.PLAIN:
1276                assert isinstance(token.data, str)
1277                yield StyledText(_apply_styles(styles, token.data))
1278                styles = _pop_position(styles)
1279                continue
1280
1281            if token.ttype is TokenType.UNSETTER:
1282                styles = _pop_unsetter(token, styles)
1283                continue
1284
1285            styles.append(token)
1286
1287
1288def main() -> None:
1289    """Main method"""
1290
1291    parser = ArgumentParser()
1292
1293    markup_group = parser.add_argument_group("Markup->ANSI")
1294    markup_group.add_argument(
1295        "-p", "--parse", metavar=("TXT"), help="parse a markup text"
1296    )
1297    markup_group.add_argument(
1298        "-e", "--escape", help="escape parsed markup", action="store_true"
1299    )
1300    # markup_group.add_argument(
1301    # "-o",
1302    # "--optimize",
1303    # help="set optimization level for markup parsing",
1304    # action="count",
1305    # default=0,
1306    # )
1307
1308    markup_group.add_argument("--alias", action="append", help="alias src=dst")
1309
1310    ansi_group = parser.add_argument_group("ANSI->Markup")
1311    ansi_group.add_argument(
1312        "-m", "--markup", metavar=("TXT"), help="get markup from ANSI text"
1313    )
1314    ansi_group.add_argument(
1315        "-s",
1316        "--show-inverse",
1317        action="store_true",
1318        help="show result of parsing result markup",
1319    )
1320
1321    args = parser.parse_args()
1322
1323    lang = MarkupLanguage()
1324
1325    if args.markup:
1326        markup_text = lang.get_markup(args.markup)
1327        print(markup_text, end="")
1328
1329        if args.show_inverse:
1330            print("->", lang.parse(markup_text))
1331        else:
1332            print()
1333
1334    if args.parse:
1335        if args.alias:
1336            for alias in args.alias:
1337                src, dest = alias.split("=")
1338                lang.alias(src, dest)
1339
1340        parsed = lang.parse(args.parse)
1341
1342        if args.escape:
1343            print(ascii(parsed))
1344        else:
1345            print(parsed)
1346
1347        return
1348
1349
1350tim = markup = MarkupLanguage()
1351"""The default TIM instances."""
1352
1353if __name__ == "__main__":
1354    main()
class StyledText(builtins.str):
390class StyledText(str):
391    """A styled text object.
392
393    The purpose of this class is to implement some things regular `str`
394    breaks at when encountering ANSI sequences.
395
396    Instances of this class are usually spat out by `MarkupLanguage.parse`,
397    but may be manually constructed if the need arises. Everything works even
398    if there is no ANSI tomfoolery going on.
399    """
400
401    value: str
402    """The underlying, ANSI-inclusive string value."""
403
404    _plain: str | None = None
405    _tokens: list[Token] | None = None
406
407    def __new__(cls, value: str = ""):
408        """Creates a StyledText, gets markup tags."""
409
410        obj = super().__new__(cls, value)
411        obj.value = value
412
413        return obj
414
415    def _generate_tokens(self) -> None:
416        """Generates self._tokens & self._plain."""
417
418        self._tokens = list(tim.tokenize_ansi(self.value))
419
420        self._plain = ""
421        for token in self._tokens:
422            if token.ttype is not TokenType.PLAIN:
423                continue
424
425            assert isinstance(token.data, str)
426            self._plain += token.data
427
428    @property
429    def tokens(self) -> list[Token]:
430        """Returns all markup tokens of this object.
431
432        Generated on-demand, at the first call to this or the self.plain
433        property.
434        """
435
436        if self._tokens is not None:
437            return self._tokens
438
439        self._generate_tokens()
440        assert self._tokens is not None
441        return self._tokens
442
443    @property
444    def plain(self) -> str:
445        """Returns the value of this object, with no ANSI sequences.
446
447        Generated on-demand, at the first call to this or the self.tokens
448        property.
449        """
450
451        if self._plain is not None:
452            return self._plain
453
454        self._generate_tokens()
455        assert self._plain is not None
456        return self._plain
457
458    def plain_index(self, index: int | None) -> int | None:
459        """Finds given index inside plain text."""
460
461        if index is None:
462            return None
463
464        styled_chars = 0
465        plain_chars = 0
466        negative_index = False
467
468        tokens = self.tokens.copy()
469        if index < 0:
470            tokens.reverse()
471            index = abs(index)
472            negative_index = True
473
474        for token in tokens:
475            if token.data is None:
476                continue
477
478            if token.ttype is not TokenType.PLAIN:
479                assert token.sequence is not None
480                styled_chars += len(token.sequence)
481                continue
482
483            assert isinstance(token.data, str)
484            for _ in range(len(token.data)):
485                if plain_chars == index:
486                    if negative_index:
487                        return -1 * (plain_chars + styled_chars)
488
489                    return styled_chars + plain_chars
490
491                plain_chars += 1
492
493        return None
494
495    def __len__(self) -> int:
496        """Gets "real" length of object."""
497
498        return len(self.plain)
499
500    def __getitem__(self, subscript: int | slice) -> str:
501        """Gets an item, adjusted for non-plain text.
502
503        Args:
504            subscript: The integer or slice to find.
505
506        Returns:
507            The elements described by the subscript.
508
509        Raises:
510            IndexError: The given index is out of range.
511        """
512
513        if isinstance(subscript, int):
514            plain_index = self.plain_index(subscript)
515            if plain_index is None:
516                raise IndexError("StyledText index out of range")
517
518            return self.value[plain_index]
519
520        return self.value[
521            slice(
522                self.plain_index(subscript.start),
523                self.plain_index(subscript.stop),
524                subscript.step,
525            )
526        ]

A styled text object.

The purpose of this class is to implement some things regular str breaks at when encountering ANSI sequences.

Instances of this class are usually spat out by MarkupLanguage.parse, but may be manually constructed if the need arises. Everything works even if there is no ANSI tomfoolery going on.

StyledText(value: str = '')
407    def __new__(cls, value: str = ""):
408        """Creates a StyledText, gets markup tags."""
409
410        obj = super().__new__(cls, value)
411        obj.value = value
412
413        return obj

Creates a StyledText, gets markup tags.

value: str

The underlying, ANSI-inclusive string value.

tokens: list[pytermgui.parser.Token]

Returns all markup tokens of this object.

Generated on-demand, at the first call to this or the self.plain property.

plain: str

Returns the value of this object, with no ANSI sequences.

Generated on-demand, at the first call to this or the self.tokens property.

def plain_index(self, index: int | None) -> int | None:
458    def plain_index(self, index: int | None) -> int | None:
459        """Finds given index inside plain text."""
460
461        if index is None:
462            return None
463
464        styled_chars = 0
465        plain_chars = 0
466        negative_index = False
467
468        tokens = self.tokens.copy()
469        if index < 0:
470            tokens.reverse()
471            index = abs(index)
472            negative_index = True
473
474        for token in tokens:
475            if token.data is None:
476                continue
477
478            if token.ttype is not TokenType.PLAIN:
479                assert token.sequence is not None
480                styled_chars += len(token.sequence)
481                continue
482
483            assert isinstance(token.data, str)
484            for _ in range(len(token.data)):
485                if plain_chars == index:
486                    if negative_index:
487                        return -1 * (plain_chars + styled_chars)
488
489                    return styled_chars + plain_chars
490
491                plain_chars += 1
492
493        return None

Finds given index inside plain text.

Inherited Members
builtins.str
encode
replace
split
rsplit
join
capitalize
casefold
title
center
count
expandtabs
find
partition
index
ljust
lower
lstrip
rfind
rindex
rjust
rstrip
rpartition
splitlines
strip
swapcase
translate
upper
startswith
endswith
removeprefix
removesuffix
isascii
islower
isupper
istitle
isspace
isdecimal
isdigit
isnumeric
isalpha
isalnum
isidentifier
isprintable
zfill
format
format_map
maketrans
MacroCallable = typing.Callable[..., str]
MacroCall = typing.Tuple[typing.Callable[..., str], typing.List[str]]
class MarkupLanguage:
 529class MarkupLanguage:
 530    """A class representing an instance of a Markup Language.
 531
 532    This class is used for all markup/ANSI parsing, tokenizing and usage.
 533
 534    ```python3
 535    from pytermgui import tim
 536
 537    tim.alias("my-tag", "@152 72 bold")
 538    tim.print("This is [my-tag]my-tag[/]!")
 539    ```
 540
 541    <p style="text-align: center">
 542        <img src="https://raw.githubusercontent.com/bczsalba/pytermgui/master/assets/\
 543docs/parser/markup_language.png"
 544        style="width: 80%">
 545    </p>
 546    """
 547
 548    raise_unknown_markup: bool = False
 549    """Raise `pytermgui.exceptions.MarkupSyntaxError` when encountering unknown markup tags."""
 550
 551    def __init__(self, default_macros: bool = True) -> None:
 552        """Initializes a MarkupLanguage.
 553
 554        Args:
 555            default_macros: If not set, the builtin macros are not defined.
 556        """
 557
 558        self.tags: dict[str, str] = STYLE_MAP.copy()
 559        self._cache: dict[str, StyledText] = {}
 560        self.macros: dict[str, MacroCallable] = {}
 561        self.user_tags: dict[str, str] = {}
 562        self.unsetters: dict[str, str | None] = UNSETTER_MAP.copy()
 563
 564        self.should_cache: bool = True
 565
 566        if default_macros:
 567            self.define("!link", macro_link)
 568            self.define("!align", macro_align)
 569            self.define("!markup", self.get_markup)
 570            self.define("!shuffle", macro_shuffle)
 571            self.define("!strip_bg", macro_strip_bg)
 572            self.define("!strip_fg", macro_strip_fg)
 573            self.define("!rainbow", macro_rainbow)
 574            self.define("!gradient", macro_gradient)
 575            self.define("!upper", lambda item: str(item.upper()))
 576            self.define("!lower", lambda item: str(item.lower()))
 577            self.define("!title", lambda item: str(item.title()))
 578            self.define("!capitalize", lambda item: str(item.capitalize()))
 579            self.define("!expand", lambda tag: macro_expand(self, tag))
 580            self.define("!debug", lambda *args: ",".join(ascii(arg) for arg in args))
 581
 582        self.alias("code", "dim @black")
 583        self.alias("code.str", "142")
 584        self.alias("code.multiline_str", "code.str")
 585        self.alias("code.none", "167")
 586        self.alias("code.global", "214")
 587        self.alias("code.number", "175")
 588        self.alias("code.keyword", "203")
 589        self.alias("code.identifier", "109")
 590        self.alias("code.name", "code.global")
 591        self.alias("code.comment", "240 italic")
 592        self.alias("code.builtin", "code.global")
 593        self.alias("code.file", "code.identifier")
 594        self.alias("code.symbol", "code.identifier")
 595
 596    def _get_color_token(self, tag: str) -> Token | None:
 597        """Tries to get a color token from the given tag.
 598
 599        Args:
 600            tag: The tag to parse.
 601
 602        Returns:
 603            A color token if the given tag could be parsed into one, else None.
 604        """
 605
 606        try:
 607            color = str_to_color(tag, use_cache=self.should_cache)
 608
 609        except ColorSyntaxError:
 610            return None
 611
 612        return Token(name=color.value, ttype=TokenType.COLOR, data=color)
 613
 614    def _get_style_token(self, tag: str) -> Token | None:
 615        """Tries to get a style (including unsetter) token from tags, user tags and unsetters.
 616
 617        Args:
 618            tag: The tag to parse.
 619
 620        Returns:
 621            A `Token` if one could be created, None otherwise.
 622        """
 623
 624        if tag in self.unsetters:
 625            return Token(name=tag, ttype=TokenType.UNSETTER, data=self.unsetters[tag])
 626
 627        if tag in self.user_tags:
 628            return Token(name=tag, ttype=TokenType.STYLE, data=self.user_tags[tag])
 629
 630        if tag in self.tags:
 631            return Token(name=tag, ttype=TokenType.STYLE, data=self.tags[tag])
 632
 633        return None
 634
 635    def print(self, *args, **kwargs) -> None:
 636        """Parse all arguments and pass them through to print, along with kwargs."""
 637
 638        parsed = []
 639        for arg in args:
 640            parsed.append(self.parse(str(arg)))
 641
 642        get_terminal().print(*parsed, **kwargs)
 643
 644    def tokenize_markup(self, markup_text: str) -> Iterator[Token]:
 645        """Converts the given markup string into an iterator of `Token`.
 646
 647        Args:
 648            markup_text: The text to look at.
 649
 650        Returns:
 651            An iterator of tokens. The reason this is an iterator is to possibly save
 652            on memory.
 653        """
 654
 655        end = 0
 656        start = 0
 657        cursor = 0
 658        for match in RE_MARKUP.finditer(markup_text):
 659            full, escapes, tag_text = match.groups()
 660            start, end = match.span()
 661
 662            # Add plain text between last and current match
 663            if start > cursor:
 664                yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:start])
 665
 666            if not escapes == "" and len(escapes) % 2 == 1:
 667                cursor = end
 668                yield Token(ttype=TokenType.ESCAPED, data=full[len(escapes) - 1 :])
 669                continue
 670
 671            for tag in tag_text.split():
 672                token = self._get_style_token(tag)
 673                if token is not None:
 674                    yield token
 675                    continue
 676
 677                # Try to find a color token
 678                token = self._get_color_token(tag)
 679                if token is not None:
 680                    yield token
 681                    continue
 682
 683                macro_match = RE_MACRO.match(tag)
 684                if macro_match is not None:
 685                    name, args = macro_match.groups()
 686                    macro_args = () if args is None else args.split(":")
 687
 688                    if not name in self.macros:
 689                        raise MarkupSyntaxError(
 690                            tag=tag,
 691                            cause="is not a defined macro",
 692                            context=markup_text,
 693                        )
 694
 695                    yield Token(
 696                        name=tag,
 697                        ttype=TokenType.MACRO,
 698                        data=(self.macros[name], macro_args),
 699                    )
 700                    continue
 701
 702                if self.raise_unknown_markup:
 703                    raise MarkupSyntaxError(
 704                        tag=tag, cause="not defined", context=markup_text
 705                    )
 706
 707            cursor = end
 708
 709        # Add remaining text as plain
 710        if len(markup_text) > cursor:
 711            yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:])
 712
 713    def tokenize_ansi(self, ansi: str) -> Iterator[Token]:
 714        """Converts the given ANSI string into an iterator of `Token`.
 715
 716        Args:
 717            ansi: The text to look at.
 718
 719        Returns:
 720            An iterator of tokens. The reason this is an iterator is to possibly save
 721            on memory.
 722        """
 723
 724        def _is_in_tags(code: str, tags: dict[str, str]) -> str | None:
 725            """Determines whether a code is in the given dict of tags."""
 726
 727            for name, current in tags.items():
 728                if current == code:
 729                    return name
 730
 731            return None
 732
 733        def _generate_color(
 734            parts: list[str], code: str
 735        ) -> tuple[str, TokenType, Color]:
 736            """Generates a color token."""
 737
 738            data: Color
 739            if len(parts) == 1:
 740                data = StandardColor.from_ansi(code)
 741                name = data.name
 742                ttype = TokenType.COLOR
 743
 744            else:
 745                data = str_to_color(code)
 746                name = data.name
 747                ttype = TokenType.COLOR
 748
 749            return name, ttype, data
 750
 751        end = 0
 752        start = 0
 753        cursor = 0
 754
 755        # StyledText messes with indexing, so we need to cast it
 756        # back to str.
 757        if isinstance(ansi, StyledText):
 758            ansi = str(ansi)
 759
 760        for match in RE_ANSI.finditer(ansi):
 761            code = match.groups()[0]
 762            start, end = match.span()
 763
 764            if code is None:
 765                continue
 766
 767            parts = code.split(";")
 768
 769            if start > cursor:
 770                plain = ansi[cursor:start]
 771
 772                yield Token(name=plain, ttype=TokenType.PLAIN, data=plain)
 773
 774            name: str | None = code
 775            ttype = None
 776            data: str | Color = parts[0]
 777
 778            # Styles & Unsetters
 779            if len(parts) == 1:
 780                # Covariancy is not an issue here, even though mypy seems to think so.
 781                name = _is_in_tags(parts[0], self.unsetters)  # type: ignore
 782                if name is not None:
 783                    ttype = TokenType.UNSETTER
 784
 785                else:
 786                    name = _is_in_tags(parts[0], self.tags)
 787                    if name is not None:
 788                        ttype = TokenType.STYLE
 789
 790            # Colors
 791            if ttype is None:
 792                with suppress(ColorSyntaxError):
 793                    name, ttype, data = _generate_color(parts, code)
 794
 795            if name is None or ttype is None or data is None:
 796                if len(parts) != 2:
 797                    raise AnsiSyntaxError(
 798                        tag=parts[0], cause="not recognized", context=ansi
 799                    )
 800
 801                name = "position"
 802                ttype = TokenType.POSITION
 803                data = ",".join(reversed(parts))
 804
 805            yield Token(name=name, ttype=ttype, data=data)
 806            cursor = end
 807
 808        if cursor < len(ansi):
 809            plain = ansi[cursor:]
 810
 811            yield Token(ttype=TokenType.PLAIN, data=plain)
 812
 813    def define(self, name: str, method: MacroCallable) -> None:
 814        """Defines a Macro tag that executes the given method.
 815
 816        Args:
 817            name: The name the given method will be reachable by within markup.
 818                The given value gets "!" prepended if it isn't present already.
 819            method: The method this macro will execute.
 820        """
 821
 822        if not name.startswith("!"):
 823            name = f"!{name}"
 824
 825        self.macros[name] = method
 826        self.unsetters[f"/{name}"] = None
 827
 828    def alias(self, name: str, value: str) -> None:
 829        """Aliases the given name to a value, and generates an unsetter for it.
 830
 831        Note that it is not possible to alias macros.
 832
 833        Args:
 834            name: The name of the new tag.
 835            value: The value the new tag will stand for.
 836        """
 837
 838        def _get_unsetter(token: Token) -> str | None:
 839            """Get unsetter for a token"""
 840
 841            if token.ttype is TokenType.PLAIN:
 842                return None
 843
 844            if token.ttype is TokenType.UNSETTER:
 845                return self.unsetters[token.name]
 846
 847            if token.ttype is TokenType.COLOR:
 848                assert isinstance(token.data, Color)
 849
 850                if token.data.background:
 851                    return self.unsetters["/bg"]
 852
 853                return self.unsetters["/fg"]
 854
 855            name = f"/{token.name}"
 856            if not name in self.unsetters:
 857                raise KeyError(f"Could not find unsetter for token {token}.")
 858
 859            return self.unsetters[name]
 860
 861        if name.startswith("!"):
 862            raise ValueError('Only macro tags can always start with "!".')
 863
 864        setter = ""
 865        unsetter = ""
 866
 867        # Try to link to existing tag
 868        if value in self.user_tags:
 869            self.unsetters[f"/{name}"] = self.unsetters[f"/{value}"]
 870            self.user_tags[name] = self.user_tags[value]
 871            return
 872
 873        for token in self.tokenize_markup(f"[{value}]"):
 874            if token.ttype is TokenType.PLAIN:
 875                continue
 876
 877            assert token.sequence is not None
 878            setter += token.sequence
 879
 880            t_unsetter = _get_unsetter(token)
 881            unsetter += f"\x1b[{t_unsetter}m"
 882
 883        self.unsetters[f"/{name}"] = unsetter.lstrip("\x1b[").rstrip("m")
 884        self.user_tags[name] = setter.lstrip("\x1b[").rstrip("m")
 885
 886        marked: list[str] = []
 887        for item in self._cache:
 888            if name in item:
 889                marked.append(item)
 890
 891        for item in marked:
 892            del self._cache[item]
 893
 894    # TODO: I cannot cut down the one-too-many branch that this has at the moment.
 895    #       We could look into it in the future, however.
 896    def parse(  # pylint: disable=too-many-branches
 897        self, markup_text: str
 898    ) -> StyledText:
 899        """Parses the given markup.
 900
 901        Args:
 902            markup_text: The markup to parse.
 903
 904        Returns:
 905            A `StyledText` instance of the result of parsing the input. This
 906            custom `str` class is used to allow accessing the plain value of
 907            the output, as well as to cleanly index within it. It is analogous
 908            to builtin `str`, only adds extra things on top.
 909        """
 910
 911        applied_macros: list[tuple[str, MacroCall]] = []
 912        previous_token: Token | None = None
 913        previous_sequence = ""
 914        sequence = ""
 915        out = ""
 916
 917        def _apply_macros(text: str) -> str:
 918            """Apply current macros to text"""
 919
 920            for _, (method, args) in applied_macros:
 921                text = method(*args, text)
 922
 923            return text
 924
 925        def _is_same_colorgroup(previous: Token, new: Token) -> bool:
 926            if not isinstance(new.data, Color) or not isinstance(previous.data, Color):
 927                return False
 928
 929            return (
 930                type(previous) is type(new)
 931                and previous.data.background == new.data.background
 932            )
 933
 934        if (
 935            self.should_cache
 936            and markup_text in self._cache
 937            and len(RE_MACRO.findall(markup_text)) == 0
 938        ):
 939            return self._cache[markup_text]
 940
 941        token: Token
 942        for token in self.tokenize_markup(markup_text):
 943            if sequence != "" and previous_token == token:
 944                continue
 945
 946            # Optimize out previously added color tokens, as only the most
 947            # recent would be visible anyways.
 948            if (
 949                token.sequence is not None
 950                and previous_token is not None
 951                and _is_same_colorgroup(previous_token, token)
 952            ):
 953                sequence = token.sequence
 954                continue
 955
 956            if token.ttype == TokenType.UNSETTER and token.data == "0":
 957                out += "\033[0m"
 958                sequence = ""
 959                applied_macros = []
 960                continue
 961
 962            previous_token = token
 963
 964            # Macro unsetters are stored with None as their data
 965            if token.data is None and token.ttype is TokenType.UNSETTER:
 966                for item, data in applied_macros.copy():
 967                    macro_match = RE_MACRO.match(item)
 968                    assert macro_match is not None
 969
 970                    macro_name = macro_match.groups()[0]
 971
 972                    if f"/{macro_name}" == token.name:
 973                        applied_macros.remove((item, data))
 974
 975                continue
 976
 977            if token.ttype is TokenType.MACRO:
 978                assert isinstance(token.data, tuple)
 979
 980                applied_macros.append((token.name, token.data))
 981                continue
 982
 983            if token.sequence is None:
 984                applied = sequence
 985
 986                if not out.endswith("\x1b[0m"):
 987                    for item in previous_sequence.split("\x1b"):
 988                        if item == "" or item[1:-1] in self.unsetters.values():
 989                            continue
 990
 991                        item = f"\x1b{item}"
 992                        applied = applied.replace(item, "")
 993
 994                out += applied + _apply_macros(token.name)
 995                previous_sequence = sequence
 996                sequence = ""
 997                continue
 998
 999            sequence += token.sequence
1000
1001        if sequence + previous_sequence != "":
1002            out += "\x1b[0m"
1003
1004        out = StyledText(out)
1005        self._cache[markup_text] = out
1006        return out
1007
1008    def get_markup(self, ansi: str) -> str:
1009        """Generates markup from ANSI text.
1010
1011        Args:
1012            ansi: The text to get markup from.
1013
1014        Returns:
1015            A markup string that can be parsed to get (visually) the same
1016            result. Note that this conversion is lossy in a way: there are some
1017            details (like macros) that cannot be preserved in an ANSI->Markup->ANSI
1018            conversion.
1019        """
1020
1021        current_tags: list[str] = []
1022        out = ""
1023        for token in self.tokenize_ansi(ansi):
1024            if token.ttype is TokenType.PLAIN:
1025                if len(current_tags) != 0:
1026                    out += "[" + " ".join(current_tags) + "]"
1027
1028                assert isinstance(token.data, str)
1029                out += token.data
1030                current_tags = []
1031                continue
1032
1033            if token.ttype is TokenType.ESCAPED:
1034                assert isinstance(token.data, str)
1035
1036                current_tags.append(token.data)
1037                continue
1038
1039            current_tags.append(token.name)
1040
1041        return out
1042
1043    def prettify_ansi(self, text: str) -> str:
1044        """Returns a prettified (syntax-highlighted) ANSI str.
1045
1046        This is useful to quickly "inspect" a given ANSI string. However,
1047        for most real uses `MarkupLanguage.prettify_markup` would be
1048        preferable, given an argument of `MarkupLanguage.get_markup(text)`,
1049        as it is much more verbose.
1050
1051        Args:
1052            text: The ANSI-text to prettify.
1053
1054        Returns:
1055            The prettified ANSI text. This text's styles remain valid,
1056            so copy-pasting the argument into a command (like printf)
1057            that can show styled text will work the same way.
1058        """
1059
1060        out = ""
1061        sequences = ""
1062        for token in self.tokenize_ansi(text):
1063            if token.ttype is TokenType.PLAIN:
1064                assert isinstance(token.data, str)
1065                out += token.data
1066                continue
1067
1068            assert token.sequence is not None
1069            out += "\x1b[0m" + token.sequence + token.sequence.replace("\x1b", "\\x1b")
1070            sequences += token.sequence
1071            out += sequences
1072
1073        return out
1074
1075    def prettify_markup(self, text: str) -> str:
1076        """Returns a prettified (syntax-highlighted) markup str.
1077
1078        Args:
1079            text: The markup-text to prettify.
1080
1081        Returns:
1082            Prettified markup. This markup, excluding its styles,
1083            remains valid markup.
1084        """
1085
1086        def _apply_macros(text: str) -> str:
1087            """Apply current macros to text"""
1088
1089            for _, (method, args) in applied_macros:
1090                text = method(*args, text)
1091
1092            return text
1093
1094        def _pop_macro(name: str) -> None:
1095            """Pops a macro from applied_macros."""
1096
1097            for i, (macro_name, _) in enumerate(applied_macros):
1098                if macro_name == name:
1099                    applied_macros.pop(i)
1100                    break
1101
1102        def _finish(out: str, in_sequence: bool) -> str:
1103            """Adds ending cap to the given string."""
1104
1105            if in_sequence:
1106                if not out.endswith("\x1b[0m"):
1107                    out += "\x1b[0m"
1108
1109                return out + "]"
1110
1111            return out + "[/]"
1112
1113        styles: dict[TokenType, str] = {
1114            TokenType.MACRO: "210",
1115            TokenType.ESCAPED: "210 bold",
1116            TokenType.UNSETTER: "strikethrough",
1117        }
1118
1119        applied_macros: list[tuple[str, MacroCall]] = []
1120
1121        out = ""
1122        in_sequence = False
1123        current_styles: list[Token] = []
1124
1125        for token in self.tokenize_markup(text):
1126            if token.ttype in [TokenType.PLAIN, TokenType.ESCAPED]:
1127                if in_sequence:
1128                    out += "]"
1129
1130                in_sequence = False
1131
1132                sequence = ""
1133                for style in current_styles:
1134                    if style.sequence is None:
1135                        continue
1136
1137                    sequence += style.sequence
1138
1139                out += f"{sequence}{_apply_macros(token.name)}\033[0m"
1140                continue
1141
1142            out += " " if in_sequence else "["
1143            in_sequence = True
1144
1145            if token.ttype is TokenType.UNSETTER:
1146                if token.name == "/":
1147                    applied_macros = []
1148
1149                name = token.name[1:]
1150
1151                if name in self.macros:
1152                    _pop_macro(name)
1153
1154                current_styles.append(token)
1155
1156                out += self.parse(
1157                    ("" if (name in self.tags) or (name in self.user_tags) else "")
1158                    + f"[{styles[TokenType.UNSETTER]}]/{name}"
1159                )
1160                continue
1161
1162            if token.ttype is TokenType.MACRO:
1163                assert isinstance(token.data, tuple)
1164
1165                name = token.name
1166                if "(" in name:
1167                    name = name[: token.name.index("(")]
1168
1169                applied_macros.append((name, token.data))
1170
1171                try:
1172                    out += token.data[0](*token.data[1], token.name)
1173                    continue
1174
1175                except TypeError:  # Not enough arguments
1176                    pass
1177
1178            if token.sequence is not None:
1179                current_styles.append(token)
1180
1181            style_markup = styles.get(token.ttype) or token.name
1182            out += self.parse(f"[{style_markup}]{token.name}")
1183
1184        return _finish(out, in_sequence)
1185
1186    def get_styled_plains(self, text: str) -> Iterator[StyledText]:
1187        """Gets all plain tokens within text, with their respective styles applied.
1188
1189        Args:
1190            text: The ANSI-sequence containing string to find plains from.
1191
1192        Returns:
1193            An iterator of `StyledText` objects, each yielded when a new plain token is found,
1194            containing the styles that are relevant and active on the given plain.
1195        """
1196
1197        def _apply_styles(styles: list[Token], text: str) -> str:
1198            """Applies given styles to text."""
1199
1200            for token in styles:
1201                if token.ttype is TokenType.MACRO:
1202                    assert isinstance(token.data, tuple)
1203                    text = token.data[0](*token.data[1], text)
1204                    continue
1205
1206                if token.sequence is None:
1207                    continue
1208
1209                text = token.sequence + text
1210
1211            return text
1212
1213        def _pop_unsetter(token: Token, styles: list[Token]) -> list[Token]:
1214            """Removes an unsetter from the list, returns the new list."""
1215
1216            if token.name == "/":
1217                return list(filter(lambda tkn: tkn.ttype is TokenType.POSITION, styles))
1218
1219            target_name = token.name[1:]
1220            for style in styles:
1221                # bold & dim unsetters represent the same character, so we have
1222                # to treat them the same way.
1223                style_name = style.name
1224
1225                if style.name == "dim":
1226                    style_name = "bold"
1227
1228                if style_name == target_name:
1229                    styles.remove(style)
1230
1231                elif (
1232                    style_name.startswith(target_name)
1233                    and style.ttype is TokenType.MACRO
1234                ):
1235                    styles.remove(style)
1236
1237                elif style.ttype is TokenType.COLOR:
1238                    assert isinstance(style.data, Color)
1239                    if target_name == "fg" and not style.data.background:
1240                        styles.remove(style)
1241
1242                    elif target_name == "bg" and style.data.background:
1243                        styles.remove(style)
1244
1245            return styles
1246
1247        def _pop_position(styles: list[Token]) -> list[Token]:
1248            for token in styles.copy():
1249                if token.ttype is TokenType.POSITION:
1250                    styles.remove(token)
1251
1252            return styles
1253
1254        styles: list[Token] = []
1255        for token in self.tokenize_ansi(text):
1256            if token.ttype is TokenType.COLOR:
1257                for i, style in enumerate(reversed(styles)):
1258                    if style.ttype is TokenType.COLOR:
1259                        assert isinstance(style.data, Color)
1260                        assert isinstance(token.data, Color)
1261
1262                        if style.data.background != token.data.background:
1263                            continue
1264
1265                        styles[len(styles) - i - 1] = token
1266                        break
1267                else:
1268                    styles.append(token)
1269
1270                continue
1271
1272            if token.ttype is TokenType.LINK:
1273                styles.append(token)
1274                yield StyledText(_apply_styles(styles, token.name))
1275
1276            if token.ttype is TokenType.PLAIN:
1277                assert isinstance(token.data, str)
1278                yield StyledText(_apply_styles(styles, token.data))
1279                styles = _pop_position(styles)
1280                continue
1281
1282            if token.ttype is TokenType.UNSETTER:
1283                styles = _pop_unsetter(token, styles)
1284                continue
1285
1286            styles.append(token)

A class representing an instance of a Markup Language.

This class is used for all markup/ANSI parsing, tokenizing and usage.

from pytermgui import tim

tim.alias("my-tag", "@152 72 bold")
tim.print("This is [my-tag]my-tag[/]!")

MarkupLanguage(default_macros: bool = True)
551    def __init__(self, default_macros: bool = True) -> None:
552        """Initializes a MarkupLanguage.
553
554        Args:
555            default_macros: If not set, the builtin macros are not defined.
556        """
557
558        self.tags: dict[str, str] = STYLE_MAP.copy()
559        self._cache: dict[str, StyledText] = {}
560        self.macros: dict[str, MacroCallable] = {}
561        self.user_tags: dict[str, str] = {}
562        self.unsetters: dict[str, str | None] = UNSETTER_MAP.copy()
563
564        self.should_cache: bool = True
565
566        if default_macros:
567            self.define("!link", macro_link)
568            self.define("!align", macro_align)
569            self.define("!markup", self.get_markup)
570            self.define("!shuffle", macro_shuffle)
571            self.define("!strip_bg", macro_strip_bg)
572            self.define("!strip_fg", macro_strip_fg)
573            self.define("!rainbow", macro_rainbow)
574            self.define("!gradient", macro_gradient)
575            self.define("!upper", lambda item: str(item.upper()))
576            self.define("!lower", lambda item: str(item.lower()))
577            self.define("!title", lambda item: str(item.title()))
578            self.define("!capitalize", lambda item: str(item.capitalize()))
579            self.define("!expand", lambda tag: macro_expand(self, tag))
580            self.define("!debug", lambda *args: ",".join(ascii(arg) for arg in args))
581
582        self.alias("code", "dim @black")
583        self.alias("code.str", "142")
584        self.alias("code.multiline_str", "code.str")
585        self.alias("code.none", "167")
586        self.alias("code.global", "214")
587        self.alias("code.number", "175")
588        self.alias("code.keyword", "203")
589        self.alias("code.identifier", "109")
590        self.alias("code.name", "code.global")
591        self.alias("code.comment", "240 italic")
592        self.alias("code.builtin", "code.global")
593        self.alias("code.file", "code.identifier")
594        self.alias("code.symbol", "code.identifier")

Initializes a MarkupLanguage.

Args
  • default_macros: If not set, the builtin macros are not defined.
raise_unknown_markup: bool = False

Raise pytermgui.exceptions.MarkupSyntaxError when encountering unknown markup tags.

def print(self, *args, **kwargs) -> None:
635    def print(self, *args, **kwargs) -> None:
636        """Parse all arguments and pass them through to print, along with kwargs."""
637
638        parsed = []
639        for arg in args:
640            parsed.append(self.parse(str(arg)))
641
642        get_terminal().print(*parsed, **kwargs)

Parse all arguments and pass them through to print, along with kwargs.

def tokenize_markup(self, markup_text: str) -> Iterator[pytermgui.parser.Token]:
644    def tokenize_markup(self, markup_text: str) -> Iterator[Token]:
645        """Converts the given markup string into an iterator of `Token`.
646
647        Args:
648            markup_text: The text to look at.
649
650        Returns:
651            An iterator of tokens. The reason this is an iterator is to possibly save
652            on memory.
653        """
654
655        end = 0
656        start = 0
657        cursor = 0
658        for match in RE_MARKUP.finditer(markup_text):
659            full, escapes, tag_text = match.groups()
660            start, end = match.span()
661
662            # Add plain text between last and current match
663            if start > cursor:
664                yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:start])
665
666            if not escapes == "" and len(escapes) % 2 == 1:
667                cursor = end
668                yield Token(ttype=TokenType.ESCAPED, data=full[len(escapes) - 1 :])
669                continue
670
671            for tag in tag_text.split():
672                token = self._get_style_token(tag)
673                if token is not None:
674                    yield token
675                    continue
676
677                # Try to find a color token
678                token = self._get_color_token(tag)
679                if token is not None:
680                    yield token
681                    continue
682
683                macro_match = RE_MACRO.match(tag)
684                if macro_match is not None:
685                    name, args = macro_match.groups()
686                    macro_args = () if args is None else args.split(":")
687
688                    if not name in self.macros:
689                        raise MarkupSyntaxError(
690                            tag=tag,
691                            cause="is not a defined macro",
692                            context=markup_text,
693                        )
694
695                    yield Token(
696                        name=tag,
697                        ttype=TokenType.MACRO,
698                        data=(self.macros[name], macro_args),
699                    )
700                    continue
701
702                if self.raise_unknown_markup:
703                    raise MarkupSyntaxError(
704                        tag=tag, cause="not defined", context=markup_text
705                    )
706
707            cursor = end
708
709        # Add remaining text as plain
710        if len(markup_text) > cursor:
711            yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:])

Converts the given markup string into an iterator of Token.

Args
  • markup_text: The text to look at.
Returns

An iterator of tokens. The reason this is an iterator is to possibly save on memory.

def tokenize_ansi(self, ansi: str) -> Iterator[pytermgui.parser.Token]:
713    def tokenize_ansi(self, ansi: str) -> Iterator[Token]:
714        """Converts the given ANSI string into an iterator of `Token`.
715
716        Args:
717            ansi: The text to look at.
718
719        Returns:
720            An iterator of tokens. The reason this is an iterator is to possibly save
721            on memory.
722        """
723
724        def _is_in_tags(code: str, tags: dict[str, str]) -> str | None:
725            """Determines whether a code is in the given dict of tags."""
726
727            for name, current in tags.items():
728                if current == code:
729                    return name
730
731            return None
732
733        def _generate_color(
734            parts: list[str], code: str
735        ) -> tuple[str, TokenType, Color]:
736            """Generates a color token."""
737
738            data: Color
739            if len(parts) == 1:
740                data = StandardColor.from_ansi(code)
741                name = data.name
742                ttype = TokenType.COLOR
743
744            else:
745                data = str_to_color(code)
746                name = data.name
747                ttype = TokenType.COLOR
748
749            return name, ttype, data
750
751        end = 0
752        start = 0
753        cursor = 0
754
755        # StyledText messes with indexing, so we need to cast it
756        # back to str.
757        if isinstance(ansi, StyledText):
758            ansi = str(ansi)
759
760        for match in RE_ANSI.finditer(ansi):
761            code = match.groups()[0]
762            start, end = match.span()
763
764            if code is None:
765                continue
766
767            parts = code.split(";")
768
769            if start > cursor:
770                plain = ansi[cursor:start]
771
772                yield Token(name=plain, ttype=TokenType.PLAIN, data=plain)
773
774            name: str | None = code
775            ttype = None
776            data: str | Color = parts[0]
777
778            # Styles & Unsetters
779            if len(parts) == 1:
780                # Covariancy is not an issue here, even though mypy seems to think so.
781                name = _is_in_tags(parts[0], self.unsetters)  # type: ignore
782                if name is not None:
783                    ttype = TokenType.UNSETTER
784
785                else:
786                    name = _is_in_tags(parts[0], self.tags)
787                    if name is not None:
788                        ttype = TokenType.STYLE
789
790            # Colors
791            if ttype is None:
792                with suppress(ColorSyntaxError):
793                    name, ttype, data = _generate_color(parts, code)
794
795            if name is None or ttype is None or data is None:
796                if len(parts) != 2:
797                    raise AnsiSyntaxError(
798                        tag=parts[0], cause="not recognized", context=ansi
799                    )
800
801                name = "position"
802                ttype = TokenType.POSITION
803                data = ",".join(reversed(parts))
804
805            yield Token(name=name, ttype=ttype, data=data)
806            cursor = end
807
808        if cursor < len(ansi):
809            plain = ansi[cursor:]
810
811            yield Token(ttype=TokenType.PLAIN, data=plain)

Converts the given ANSI string into an iterator of Token.

Args
  • ansi: The text to look at.
Returns

An iterator of tokens. The reason this is an iterator is to possibly save on memory.

def define(self, name: str, method: Callable[..., str]) -> None:
813    def define(self, name: str, method: MacroCallable) -> None:
814        """Defines a Macro tag that executes the given method.
815
816        Args:
817            name: The name the given method will be reachable by within markup.
818                The given value gets "!" prepended if it isn't present already.
819            method: The method this macro will execute.
820        """
821
822        if not name.startswith("!"):
823            name = f"!{name}"
824
825        self.macros[name] = method
826        self.unsetters[f"/{name}"] = None

Defines a Macro tag that executes the given method.

Args
  • name: The name the given method will be reachable by within markup. The given value gets "!" prepended if it isn't present already.
  • method: The method this macro will execute.
def alias(self, name: str, value: str) -> None:
828    def alias(self, name: str, value: str) -> None:
829        """Aliases the given name to a value, and generates an unsetter for it.
830
831        Note that it is not possible to alias macros.
832
833        Args:
834            name: The name of the new tag.
835            value: The value the new tag will stand for.
836        """
837
838        def _get_unsetter(token: Token) -> str | None:
839            """Get unsetter for a token"""
840
841            if token.ttype is TokenType.PLAIN:
842                return None
843
844            if token.ttype is TokenType.UNSETTER:
845                return self.unsetters[token.name]
846
847            if token.ttype is TokenType.COLOR:
848                assert isinstance(token.data, Color)
849
850                if token.data.background:
851                    return self.unsetters["/bg"]
852
853                return self.unsetters["/fg"]
854
855            name = f"/{token.name}"
856            if not name in self.unsetters:
857                raise KeyError(f"Could not find unsetter for token {token}.")
858
859            return self.unsetters[name]
860
861        if name.startswith("!"):
862            raise ValueError('Only macro tags can always start with "!".')
863
864        setter = ""
865        unsetter = ""
866
867        # Try to link to existing tag
868        if value in self.user_tags:
869            self.unsetters[f"/{name}"] = self.unsetters[f"/{value}"]
870            self.user_tags[name] = self.user_tags[value]
871            return
872
873        for token in self.tokenize_markup(f"[{value}]"):
874            if token.ttype is TokenType.PLAIN:
875                continue
876
877            assert token.sequence is not None
878            setter += token.sequence
879
880            t_unsetter = _get_unsetter(token)
881            unsetter += f"\x1b[{t_unsetter}m"
882
883        self.unsetters[f"/{name}"] = unsetter.lstrip("\x1b[").rstrip("m")
884        self.user_tags[name] = setter.lstrip("\x1b[").rstrip("m")
885
886        marked: list[str] = []
887        for item in self._cache:
888            if name in item:
889                marked.append(item)
890
891        for item in marked:
892            del self._cache[item]

Aliases the given name to a value, and generates an unsetter for it.

Note that it is not possible to alias macros.

Args
  • name: The name of the new tag.
  • value: The value the new tag will stand for.
def parse(self, markup_text: str) -> pytermgui.parser.StyledText:
 896    def parse(  # pylint: disable=too-many-branches
 897        self, markup_text: str
 898    ) -> StyledText:
 899        """Parses the given markup.
 900
 901        Args:
 902            markup_text: The markup to parse.
 903
 904        Returns:
 905            A `StyledText` instance of the result of parsing the input. This
 906            custom `str` class is used to allow accessing the plain value of
 907            the output, as well as to cleanly index within it. It is analogous
 908            to builtin `str`, only adds extra things on top.
 909        """
 910
 911        applied_macros: list[tuple[str, MacroCall]] = []
 912        previous_token: Token | None = None
 913        previous_sequence = ""
 914        sequence = ""
 915        out = ""
 916
 917        def _apply_macros(text: str) -> str:
 918            """Apply current macros to text"""
 919
 920            for _, (method, args) in applied_macros:
 921                text = method(*args, text)
 922
 923            return text
 924
 925        def _is_same_colorgroup(previous: Token, new: Token) -> bool:
 926            if not isinstance(new.data, Color) or not isinstance(previous.data, Color):
 927                return False
 928
 929            return (
 930                type(previous) is type(new)
 931                and previous.data.background == new.data.background
 932            )
 933
 934        if (
 935            self.should_cache
 936            and markup_text in self._cache
 937            and len(RE_MACRO.findall(markup_text)) == 0
 938        ):
 939            return self._cache[markup_text]
 940
 941        token: Token
 942        for token in self.tokenize_markup(markup_text):
 943            if sequence != "" and previous_token == token:
 944                continue
 945
 946            # Optimize out previously added color tokens, as only the most
 947            # recent would be visible anyways.
 948            if (
 949                token.sequence is not None
 950                and previous_token is not None
 951                and _is_same_colorgroup(previous_token, token)
 952            ):
 953                sequence = token.sequence
 954                continue
 955
 956            if token.ttype == TokenType.UNSETTER and token.data == "0":
 957                out += "\033[0m"
 958                sequence = ""
 959                applied_macros = []
 960                continue
 961
 962            previous_token = token
 963
 964            # Macro unsetters are stored with None as their data
 965            if token.data is None and token.ttype is TokenType.UNSETTER:
 966                for item, data in applied_macros.copy():
 967                    macro_match = RE_MACRO.match(item)
 968                    assert macro_match is not None
 969
 970                    macro_name = macro_match.groups()[0]
 971
 972                    if f"/{macro_name}" == token.name:
 973                        applied_macros.remove((item, data))
 974
 975                continue
 976
 977            if token.ttype is TokenType.MACRO:
 978                assert isinstance(token.data, tuple)
 979
 980                applied_macros.append((token.name, token.data))
 981                continue
 982
 983            if token.sequence is None:
 984                applied = sequence
 985
 986                if not out.endswith("\x1b[0m"):
 987                    for item in previous_sequence.split("\x1b"):
 988                        if item == "" or item[1:-1] in self.unsetters.values():
 989                            continue
 990
 991                        item = f"\x1b{item}"
 992                        applied = applied.replace(item, "")
 993
 994                out += applied + _apply_macros(token.name)
 995                previous_sequence = sequence
 996                sequence = ""
 997                continue
 998
 999            sequence += token.sequence
1000
1001        if sequence + previous_sequence != "":
1002            out += "\x1b[0m"
1003
1004        out = StyledText(out)
1005        self._cache[markup_text] = out
1006        return out

Parses the given markup.

Args
  • markup_text: The markup to parse.
Returns

A StyledText instance of the result of parsing the input. This custom str class is used to allow accessing the plain value of the output, as well as to cleanly index within it. It is analogous to builtin str, only adds extra things on top.

def get_markup(self, ansi: str) -> str:
1008    def get_markup(self, ansi: str) -> str:
1009        """Generates markup from ANSI text.
1010
1011        Args:
1012            ansi: The text to get markup from.
1013
1014        Returns:
1015            A markup string that can be parsed to get (visually) the same
1016            result. Note that this conversion is lossy in a way: there are some
1017            details (like macros) that cannot be preserved in an ANSI->Markup->ANSI
1018            conversion.
1019        """
1020
1021        current_tags: list[str] = []
1022        out = ""
1023        for token in self.tokenize_ansi(ansi):
1024            if token.ttype is TokenType.PLAIN:
1025                if len(current_tags) != 0:
1026                    out += "[" + " ".join(current_tags) + "]"
1027
1028                assert isinstance(token.data, str)
1029                out += token.data
1030                current_tags = []
1031                continue
1032
1033            if token.ttype is TokenType.ESCAPED:
1034                assert isinstance(token.data, str)
1035
1036                current_tags.append(token.data)
1037                continue
1038
1039            current_tags.append(token.name)
1040
1041        return out

Generates markup from ANSI text.

Args
  • ansi: The text to get markup from.
Returns

A markup string that can be parsed to get (visually) the same result. Note that this conversion is lossy in a way: there are some details (like macros) that cannot be preserved in an ANSI->Markup->ANSI conversion.

def prettify_ansi(self, text: str) -> str:
1043    def prettify_ansi(self, text: str) -> str:
1044        """Returns a prettified (syntax-highlighted) ANSI str.
1045
1046        This is useful to quickly "inspect" a given ANSI string. However,
1047        for most real uses `MarkupLanguage.prettify_markup` would be
1048        preferable, given an argument of `MarkupLanguage.get_markup(text)`,
1049        as it is much more verbose.
1050
1051        Args:
1052            text: The ANSI-text to prettify.
1053
1054        Returns:
1055            The prettified ANSI text. This text's styles remain valid,
1056            so copy-pasting the argument into a command (like printf)
1057            that can show styled text will work the same way.
1058        """
1059
1060        out = ""
1061        sequences = ""
1062        for token in self.tokenize_ansi(text):
1063            if token.ttype is TokenType.PLAIN:
1064                assert isinstance(token.data, str)
1065                out += token.data
1066                continue
1067
1068            assert token.sequence is not None
1069            out += "\x1b[0m" + token.sequence + token.sequence.replace("\x1b", "\\x1b")
1070            sequences += token.sequence
1071            out += sequences
1072
1073        return out

Returns a prettified (syntax-highlighted) ANSI str.

This is useful to quickly "inspect" a given ANSI string. However, for most real uses MarkupLanguage.prettify_markup would be preferable, given an argument of MarkupLanguage.get_markup(text), as it is much more verbose.

Args
  • text: The ANSI-text to prettify.
Returns

The prettified ANSI text. This text's styles remain valid, so copy-pasting the argument into a command (like printf) that can show styled text will work the same way.

def prettify_markup(self, text: str) -> str:
1075    def prettify_markup(self, text: str) -> str:
1076        """Returns a prettified (syntax-highlighted) markup str.
1077
1078        Args:
1079            text: The markup-text to prettify.
1080
1081        Returns:
1082            Prettified markup. This markup, excluding its styles,
1083            remains valid markup.
1084        """
1085
1086        def _apply_macros(text: str) -> str:
1087            """Apply current macros to text"""
1088
1089            for _, (method, args) in applied_macros:
1090                text = method(*args, text)
1091
1092            return text
1093
1094        def _pop_macro(name: str) -> None:
1095            """Pops a macro from applied_macros."""
1096
1097            for i, (macro_name, _) in enumerate(applied_macros):
1098                if macro_name == name:
1099                    applied_macros.pop(i)
1100                    break
1101
1102        def _finish(out: str, in_sequence: bool) -> str:
1103            """Adds ending cap to the given string."""
1104
1105            if in_sequence:
1106                if not out.endswith("\x1b[0m"):
1107                    out += "\x1b[0m"
1108
1109                return out + "]"
1110
1111            return out + "[/]"
1112
1113        styles: dict[TokenType, str] = {
1114            TokenType.MACRO: "210",
1115            TokenType.ESCAPED: "210 bold",
1116            TokenType.UNSETTER: "strikethrough",
1117        }
1118
1119        applied_macros: list[tuple[str, MacroCall]] = []
1120
1121        out = ""
1122        in_sequence = False
1123        current_styles: list[Token] = []
1124
1125        for token in self.tokenize_markup(text):
1126            if token.ttype in [TokenType.PLAIN, TokenType.ESCAPED]:
1127                if in_sequence:
1128                    out += "]"
1129
1130                in_sequence = False
1131
1132                sequence = ""
1133                for style in current_styles:
1134                    if style.sequence is None:
1135                        continue
1136
1137                    sequence += style.sequence
1138
1139                out += f"{sequence}{_apply_macros(token.name)}\033[0m"
1140                continue
1141
1142            out += " " if in_sequence else "["
1143            in_sequence = True
1144
1145            if token.ttype is TokenType.UNSETTER:
1146                if token.name == "/":
1147                    applied_macros = []
1148
1149                name = token.name[1:]
1150
1151                if name in self.macros:
1152                    _pop_macro(name)
1153
1154                current_styles.append(token)
1155
1156                out += self.parse(
1157                    ("" if (name in self.tags) or (name in self.user_tags) else "")
1158                    + f"[{styles[TokenType.UNSETTER]}]/{name}"
1159                )
1160                continue
1161
1162            if token.ttype is TokenType.MACRO:
1163                assert isinstance(token.data, tuple)
1164
1165                name = token.name
1166                if "(" in name:
1167                    name = name[: token.name.index("(")]
1168
1169                applied_macros.append((name, token.data))
1170
1171                try:
1172                    out += token.data[0](*token.data[1], token.name)
1173                    continue
1174
1175                except TypeError:  # Not enough arguments
1176                    pass
1177
1178            if token.sequence is not None:
1179                current_styles.append(token)
1180
1181            style_markup = styles.get(token.ttype) or token.name
1182            out += self.parse(f"[{style_markup}]{token.name}")
1183
1184        return _finish(out, in_sequence)

Returns a prettified (syntax-highlighted) markup str.

Args
  • text: The markup-text to prettify.
Returns

Prettified markup. This markup, excluding its styles, remains valid markup.

def get_styled_plains(self, text: str) -> Iterator[pytermgui.parser.StyledText]:
1186    def get_styled_plains(self, text: str) -> Iterator[StyledText]:
1187        """Gets all plain tokens within text, with their respective styles applied.
1188
1189        Args:
1190            text: The ANSI-sequence containing string to find plains from.
1191
1192        Returns:
1193            An iterator of `StyledText` objects, each yielded when a new plain token is found,
1194            containing the styles that are relevant and active on the given plain.
1195        """
1196
1197        def _apply_styles(styles: list[Token], text: str) -> str:
1198            """Applies given styles to text."""
1199
1200            for token in styles:
1201                if token.ttype is TokenType.MACRO:
1202                    assert isinstance(token.data, tuple)
1203                    text = token.data[0](*token.data[1], text)
1204                    continue
1205
1206                if token.sequence is None:
1207                    continue
1208
1209                text = token.sequence + text
1210
1211            return text
1212
1213        def _pop_unsetter(token: Token, styles: list[Token]) -> list[Token]:
1214            """Removes an unsetter from the list, returns the new list."""
1215
1216            if token.name == "/":
1217                return list(filter(lambda tkn: tkn.ttype is TokenType.POSITION, styles))
1218
1219            target_name = token.name[1:]
1220            for style in styles:
1221                # bold & dim unsetters represent the same character, so we have
1222                # to treat them the same way.
1223                style_name = style.name
1224
1225                if style.name == "dim":
1226                    style_name = "bold"
1227
1228                if style_name == target_name:
1229                    styles.remove(style)
1230
1231                elif (
1232                    style_name.startswith(target_name)
1233                    and style.ttype is TokenType.MACRO
1234                ):
1235                    styles.remove(style)
1236
1237                elif style.ttype is TokenType.COLOR:
1238                    assert isinstance(style.data, Color)
1239                    if target_name == "fg" and not style.data.background:
1240                        styles.remove(style)
1241
1242                    elif target_name == "bg" and style.data.background:
1243                        styles.remove(style)
1244
1245            return styles
1246
1247        def _pop_position(styles: list[Token]) -> list[Token]:
1248            for token in styles.copy():
1249                if token.ttype is TokenType.POSITION:
1250                    styles.remove(token)
1251
1252            return styles
1253
1254        styles: list[Token] = []
1255        for token in self.tokenize_ansi(text):
1256            if token.ttype is TokenType.COLOR:
1257                for i, style in enumerate(reversed(styles)):
1258                    if style.ttype is TokenType.COLOR:
1259                        assert isinstance(style.data, Color)
1260                        assert isinstance(token.data, Color)
1261
1262                        if style.data.background != token.data.background:
1263                            continue
1264
1265                        styles[len(styles) - i - 1] = token
1266                        break
1267                else:
1268                    styles.append(token)
1269
1270                continue
1271
1272            if token.ttype is TokenType.LINK:
1273                styles.append(token)
1274                yield StyledText(_apply_styles(styles, token.name))
1275
1276            if token.ttype is TokenType.PLAIN:
1277                assert isinstance(token.data, str)
1278                yield StyledText(_apply_styles(styles, token.data))
1279                styles = _pop_position(styles)
1280                continue
1281
1282            if token.ttype is TokenType.UNSETTER:
1283                styles = _pop_unsetter(token, styles)
1284                continue
1285
1286            styles.append(token)

Gets all plain tokens within text, with their respective styles applied.

Args
  • text: The ANSI-sequence containing string to find plains from.
Returns

An iterator of StyledText objects, each yielded when a new plain token is found, containing the styles that are relevant and active on the given plain.