pytermgui.parser

This module provides TIM, PyTermGUI's Terminal Inline Markup language. It is a simple, performant and easy to read way to style, colorize & modify text.

Basic rundown

TIM is included with the purpose of making styling easier to read and manage.

Its syntax is based on square brackets, within which tags are strictly separated by one space character. Tags can stand for colors (xterm-256, RGB or HEX, both background & foreground), styles, unsetters and macros.

The 16 simple colors of the terminal exist as named tags that refer to their numerical value.

Here is a simple example of the syntax, using the pytermgui.pretty submodule to syntax-highlight it inside the REPL:

>>> from pytermgui import pretty
>>> '[141 @61 bold] Hello [!upper inverse] There '

General syntax

Background colors are always denoted by a leading @ character in front of the color tag. Styles are just the name of the style and macros have an exclamation mark in front of them. Additionally, unsetters use a leading slash (/) for their syntax. Color tokens have special unsetters: they use /fg to cancel foreground colors, and /bg to do so with backgrounds.

Macros:

Macros are any type of callable that take at least *args; this is the value of the plain text enclosed by the tag group within which the given macro resides. Additionally, macros can be given any number of positional arguments from within markup, using the syntax:

[!macro(arg1:arg2:arg3)]Text that the macro applies to.[/!macro]plain text, no macro

This syntax gets parsed as follows:

macro("Text that the macro applies to.", "arg1", "arg2", "arg3")

macro here is whatever the name macro was defined as prior.

Colors:

Colors can be of three general types: xterm-256, RGB and HEX.

xterm-256 stands for one of the 256 xterm colors. You can use ptg -c to see the all of the available colors. Its syntax is just the 0-base index of the color, like [141]

RGB colors are pretty self explanatory. Their syntax is follows the format RED;GREEN;BLUE, such as [111;222;333].

HEX colors are basically just RGB with extra steps. Their syntax is #RRGGBB, such as [#FA72BF]. This code then gets converted to a tuple of RGB colors under the hood, so from then on RGB and HEX colors are treated the same, and emit the same tokens.

As mentioned above, all colors can be made to act on the background instead by prepending the color tag with @, such as @141, @111;222;333 or @#FA72BF. To clear these effects, use /fg for foreground and /bg for background colors.

MarkupLanguage and instancing

All markup behaviour is done by an instance of the MarkupLanguage class. This is done partially for organization reasons, but also to allow a sort of sandboxing of custom definitions and settings.

PyTermGUI provides the tim name as the global markup language instance. For historical reasons, the same instance is available as markup. This should be used pretty much all of the time, and custom instances should only ever come about when some security-sensitive macro definitions are needed, as markup is used by every widget, including user-input ones such as InputField.

For the rest of this page, MarkupLanguage will refer to whichever instance you are using.

TL;DR : Use tim always, unless a security concern blocks you from doing so.

Caching

By default, all markup parse results are cached and returned when the same input is given. To disable this behaviour, set your markup instance (usually markup)'s should_cache field to False.

Customization

There are a couple of ways to customize how markup is parsed. Custom tags can be created by calling MarkupLanguage.alias. For defining custom macros, you can use MarkupLanguage.define. For more information, see each method's documentation.

View Source
   0"""
   1This module provides `TIM`, PyTermGUI's Terminal Inline Markup language. It is a simple,
   2performant and easy to read way to style, colorize & modify text.
   3
   4Basic rundown
   5-------------
   6
   7TIM is included with the purpose of making styling easier to read and manage.
   8
   9Its syntax is based on square brackets, within which tags are strictly separated by one
  10space character. Tags can stand for colors (xterm-256, RGB or HEX, both background &
  11foreground), styles, unsetters and macros.
  12
  13The 16 simple colors of the terminal exist as named tags that refer to their numerical
  14value.
  15
  16Here is a simple example of the syntax, using the `pytermgui.pretty` submodule to
  17syntax-highlight it inside the REPL:
  18
  19```python3
  20>>> from pytermgui import pretty
  21>>> '[141 @61 bold] Hello [!upper inverse] There '
  22```
  23
  24<p align=center>
  25<img src="https://github.com/bczsalba/pytermgui/blob/master/assets/docs/parser/\
  26simple_example.png?raw=true" width=70%>
  27</p>
  28
  29
  30General syntax
  31--------------
  32
  33Background colors are always denoted by a leading `@` character in front of the color
  34tag. Styles are just the name of the style and macros have an exclamation mark in front
  35of them. Additionally, unsetters use a leading slash (`/`) for their syntax. Color
  36tokens have special unsetters: they use `/fg` to cancel foreground colors, and `/bg` to
  37do so with backgrounds.
  38
  39### Macros:
  40
  41Macros are any type of callable that take at least *args; this is the value of the plain
  42text enclosed by the tag group within which the given macro resides. Additionally,
  43macros can be given any number of positional arguments from within markup, using the
  44syntax:
  45
  46```
  47[!macro(arg1:arg2:arg3)]Text that the macro applies to.[/!macro]plain text, no macro
  48```
  49
  50This syntax gets parsed as follows:
  51
  52```python3
  53macro("Text that the macro applies to.", "arg1", "arg2", "arg3")
  54```
  55
  56`macro` here is whatever the name `macro` was defined as prior.
  57
  58### Colors:
  59
  60Colors can be of three general types: xterm-256, RGB and HEX.
  61
  62`xterm-256` stands for one of the 256 xterm colors. You can use `ptg -c` to see the all
  63of the available colors. Its syntax is just the 0-base index of the color, like `[141]`
  64
  65`RGB` colors are pretty self explanatory. Their syntax is follows the format
  66`RED;GREEN;BLUE`, such as `[111;222;333]`.
  67
  68`HEX` colors are basically just RGB with extra steps. Their syntax is `#RRGGBB`, such as
  69`[#FA72BF]`. This code then gets converted to a tuple of RGB colors under the hood, so
  70from then on RGB and HEX colors are treated the same, and emit the same tokens.
  71
  72As mentioned above, all colors can be made to act on the background instead by
  73prepending the color tag with `@`, such as `@141`, `@111;222;333` or `@#FA72BF`. To
  74clear these effects, use `/fg` for foreground and `/bg` for background colors.
  75
  76`MarkupLanguage` and instancing
  77-------------------------------
  78
  79All markup behaviour is done by an instance of the `MarkupLanguage` class. This is done
  80partially for organization reasons, but also to allow a sort of sandboxing of custom
  81definitions and settings.
  82
  83PyTermGUI provides the `tim` name as the global markup language instance. For historical
  84reasons, the same instance is available as `markup`. This should be used pretty much all
  85of the time, and custom instances should only ever come about when some
  86security-sensitive macro definitions are needed, as `markup` is used by every widget,
  87including user-input ones such as `InputField`.
  88
  89For the rest of this page, `MarkupLanguage` will refer to whichever instance you are
  90using.
  91
  92TL;DR : Use `tim` always, unless a security concern blocks you from doing so.
  93
  94Caching
  95-------
  96
  97By default, all markup parse results are cached and returned when the same input is
  98given. To disable this behaviour, set your markup instance (usually `markup`)'s
  99`should_cache` field to False.
 100
 101Customization
 102-------------
 103
 104There are a couple of ways to customize how markup is parsed. Custom tags can be created
 105by calling `MarkupLanguage.alias`. For defining custom macros, you can use
 106`MarkupLanguage.define`. For more information, see each method's documentation.
 107"""
 108# pylint: disable=too-many-lines
 109
 110from __future__ import annotations
 111
 112from random import shuffle
 113from contextlib import suppress
 114from dataclasses import dataclass
 115from argparse import ArgumentParser
 116from enum import Enum, auto as _auto
 117from typing import Iterator, Callable, Tuple, List
 118
 119from .terminal import get_terminal
 120from .colors import str_to_color, Color
 121from .regex import RE_ANSI, RE_MARKUP, RE_MACRO, RE_LINK
 122from .exceptions import MarkupSyntaxError, ColorSyntaxError, AnsiSyntaxError
 123
 124
 125__all__ = [
 126    "StyledText",
 127    "MacroCallable",
 128    "MacroCall",
 129    "MarkupLanguage",
 130    "markup",
 131    "tim",
 132]
 133
 134MacroCallable = Callable[..., str]
 135MacroCall = Tuple[MacroCallable, List[str]]
 136
 137STYLE_MAP = {
 138    "bold": "1",
 139    "dim": "2",
 140    "italic": "3",
 141    "underline": "4",
 142    "blink": "5",
 143    "blink2": "6",
 144    "inverse": "7",
 145    "invisible": "8",
 146    "strikethrough": "9",
 147    "overline": "53",
 148}
 149
 150UNSETTER_MAP: dict[str, str | None] = {
 151    "/": "0",
 152    "/bold": "22",
 153    "/dim": "22",
 154    "/italic": "23",
 155    "/underline": "24",
 156    "/blink": "25",
 157    "/blink2": "26",
 158    "/inverse": "27",
 159    "/invisible": "28",
 160    "/strikethrough": "29",
 161    "/fg": "39",
 162    "/bg": "49",
 163    "/overline": "54",
 164}
 165
 166
 167def macro_align(width: str, alignment: str, content: str) -> str:
 168    """Aligns given text using fstrings.
 169
 170    Args:
 171        width: The width to align to.
 172        alignment: One of "left", "center", "right".
 173        content: The content to align; implicit argument.
 174    """
 175
 176    aligner = "<" if alignment == "left" else (">" if alignment == "right" else "^")
 177    return f"{content:{aligner}{width}}"
 178
 179
 180def macro_expand(lang: MarkupLanguage, tag: str) -> str:
 181    """Expands a tag alias."""
 182
 183    if not tag in lang.user_tags:
 184        return tag
 185
 186    return lang.get_markup(f"\x1b[{lang.user_tags[tag]}m ")[:-1]
 187
 188
 189def macro_strip_fg(item: str) -> str:
 190    """Strips foreground color from item"""
 191
 192    return markup.parse(f"[/fg]{item}")
 193
 194
 195def macro_strip_bg(item: str) -> str:
 196    """Strips foreground color from item"""
 197
 198    return markup.parse(f"[/bg]{item}")
 199
 200
 201def macro_shuffle(item: str) -> str:
 202    """Shuffles a string using shuffle.shuffle on its list cast."""
 203
 204    shuffled = list(item)
 205    shuffle(shuffled)
 206
 207    return "".join(shuffled)
 208
 209
 210def macro_link(*args) -> str:
 211    """Creates a clickable hyperlink.
 212
 213    Note:
 214        Since this is a pretty new feature for terminals, its support is limited.
 215    """
 216
 217    *uri_parts, label = args
 218    uri = ":".join(uri_parts)
 219
 220    return f"\x1b]8;;{uri}\x1b\\{label}\x1b]8;;\x1b\\"
 221
 222
 223def _apply_colors(colors: list[str] | list[int], item: str) -> str:
 224    """Applies the given list of colors to the item, spread out evenly."""
 225
 226    blocksize = max(round(len(item) / len(colors)), 1)
 227
 228    out = ""
 229    current_block = 0
 230    for i, char in enumerate(item):
 231        if i % blocksize == 0 and current_block < len(colors):
 232            out += f"[{colors[current_block]}]"
 233            current_block += 1
 234
 235        out += char
 236
 237    return markup.parse(out)
 238
 239
 240def macro_rainbow(item: str) -> str:
 241    """Creates rainbow-colored text."""
 242
 243    colors = ["red", "208", "yellow", "green", "brightblue", "blue", "93"]
 244
 245    return _apply_colors(colors, item)
 246
 247
 248def macro_gradient(base_str: str, item: str) -> str:
 249    """Creates an xterm-256 gradient from a base color.
 250
 251    This exploits the way the colors are arranged in the xterm color table; every
 252    36th color is the next item of a single gradient.
 253
 254    The start of this given gradient is calculated by decreasing the given base by 36 on
 255    every iteration as long as the point is a valid gradient start.
 256
 257    After that, the 6 colors of this gradient are calculated and applied.
 258    """
 259
 260    if not base_str.isdigit():
 261        raise ValueError(f"Gradient base has to be a digit, got {base_str}.")
 262
 263    base = int(base_str)
 264    if base < 16 or base > 231:
 265        raise ValueError("Gradient base must be between 16 and 232")
 266
 267    while base > 52:
 268        base -= 36
 269
 270    colors = []
 271    for i in range(6):
 272        colors.append(base + 36 * i)
 273
 274    return _apply_colors(colors, item)
 275
 276
 277class TokenType(Enum):
 278    """An Enum to store various token types."""
 279
 280    LINK = _auto()
 281    """A terminal hyperlink."""
 282
 283    PLAIN = _auto()
 284    """Plain text, nothing interesting."""
 285
 286    COLOR = _auto()
 287    """A color token. Has a `pytermgui.colors.Color` instance as its data."""
 288
 289    STYLE = _auto()
 290    """A builtin terminal style, such as `bold` or `italic`."""
 291
 292    MACRO = _auto()
 293    """A PTG markup macro. The macro itself is stored inside `self.data`."""
 294
 295    ESCAPED = _auto()
 296    """An escaped token."""
 297
 298    UNSETTER = _auto()
 299    """A token that unsets some other attribute."""
 300
 301    POSITION = _auto()
 302    """A token representing a positioning string. `self.data` follows the format `x,y`."""
 303
 304
 305@dataclass
 306class Token:
 307    """A class holding information on a singular markup or ANSI style unit.
 308
 309    Attributes:
 310    """
 311
 312    ttype: TokenType
 313    """The type of this token."""
 314
 315    data: str | MacroCall | Color | None
 316    """The data contained within this token. This changes based on the `ttype` attr."""
 317
 318    name: str = "<unnamed-token>"
 319    """An optional display name of the token. Defaults to `data` when not given."""
 320
 321    def __post_init__(self) -> None:
 322        """Sets `name` to `data` if not provided."""
 323
 324        if self.name == "<unnamed-token>":
 325            if isinstance(self.data, str):
 326                self.name = self.data
 327
 328            elif isinstance(self.data, Color):
 329                self.name = self.data.name
 330
 331            else:
 332                raise TypeError
 333
 334        # Create LINK from a plain token
 335        if self.ttype is TokenType.PLAIN:
 336            assert isinstance(self.data, str)
 337
 338            link_match = RE_LINK.match(self.data)
 339
 340            if link_match is not None:
 341                self.data, self.name = link_match.groups()
 342                self.ttype = TokenType.LINK
 343
 344        if self.ttype is TokenType.ESCAPED:
 345            assert isinstance(self.data, str)
 346
 347            self.name = self.data[1:]
 348
 349    def __eq__(self, other: object) -> bool:
 350        """Checks equality with `other`."""
 351
 352        if other is None:
 353            return False
 354
 355        if not isinstance(other, type(self)):
 356            return False
 357
 358        return other.data == self.data and other.ttype is self.ttype
 359
 360    @property
 361    def sequence(self) -> str | None:
 362        """Returns the ANSI sequence this token represents."""
 363
 364        if self.data is None:
 365            return None
 366
 367        if self.ttype in [TokenType.PLAIN, TokenType.MACRO, TokenType.ESCAPED]:
 368            return None
 369
 370        if self.ttype is TokenType.LINK:
 371            return macro_link(self.data, self.name)
 372
 373        if self.ttype is TokenType.POSITION:
 374            assert isinstance(self.data, str)
 375            position = self.data.split(",")
 376            return f"\x1b[{position[1]};{position[0]}H"
 377
 378        # Colors and styles
 379        data = self.data
 380
 381        if self.ttype in [TokenType.STYLE, TokenType.UNSETTER]:
 382            return f"\033[{data}m"
 383
 384        assert isinstance(data, Color)
 385        return data.sequence
 386
 387
 388class StyledText(str):
 389    """A styled text object.
 390
 391    The purpose of this class is to implement some things regular `str`
 392    breaks at when encountering ANSI sequences.
 393
 394    Instances of this class are usually spat out by `MarkupLanguage.parse`,
 395    but may be manually constructed if the need arises. Everything works even
 396    if there is no ANSI tomfoolery going on.
 397    """
 398
 399    value: str
 400    """The underlying, ANSI-inclusive string value."""
 401
 402    _plain: str | None = None
 403    _tokens: list[Token] | None = None
 404
 405    def __new__(cls, value: str = ""):
 406        """Creates a StyledText, gets markup tags."""
 407
 408        obj = super().__new__(cls, value)
 409        obj.value = value
 410
 411        return obj
 412
 413    def _generate_tokens(self) -> None:
 414        """Generates self._tokens & self._plain."""
 415
 416        self._tokens = list(tim.tokenize_ansi(self.value))
 417
 418        self._plain = ""
 419        for token in self._tokens:
 420            if token.ttype is not TokenType.PLAIN:
 421                continue
 422
 423            assert isinstance(token.data, str)
 424            self._plain += token.data
 425
 426    @property
 427    def tokens(self) -> list[Token]:
 428        """Returns all markup tokens of this object.
 429
 430        Generated on-demand, at the first call to this or the self.plain
 431        property.
 432        """
 433
 434        if self._tokens is not None:
 435            return self._tokens
 436
 437        self._generate_tokens()
 438        assert self._tokens is not None
 439        return self._tokens
 440
 441    @property
 442    def plain(self) -> str:
 443        """Returns the value of this object, with no ANSI sequences.
 444
 445        Generated on-demand, at the first call to this or the self.tokens
 446        property.
 447        """
 448
 449        if self._plain is not None:
 450            return self._plain
 451
 452        self._generate_tokens()
 453        assert self._plain is not None
 454        return self._plain
 455
 456    def plain_index(self, index: int | None) -> int | None:
 457        """Finds given index inside plain text."""
 458
 459        if index is None:
 460            return None
 461
 462        styled_chars = 0
 463        plain_chars = 0
 464        negative_index = False
 465
 466        tokens = self.tokens.copy()
 467        if index < 0:
 468            tokens.reverse()
 469            index = abs(index)
 470            negative_index = True
 471
 472        for token in tokens:
 473            if token.data is None:
 474                continue
 475
 476            if token.ttype is not TokenType.PLAIN:
 477                assert token.sequence is not None
 478                styled_chars += len(token.sequence)
 479                continue
 480
 481            assert isinstance(token.data, str)
 482            for _ in range(len(token.data)):
 483                if plain_chars == index:
 484                    if negative_index:
 485                        return -1 * (plain_chars + styled_chars)
 486
 487                    return styled_chars + plain_chars
 488
 489                plain_chars += 1
 490
 491        return None
 492
 493    def __len__(self) -> int:
 494        """Gets "real" length of object."""
 495
 496        return len(self.plain)
 497
 498    def __getitem__(self, subscript: int | slice) -> str:
 499        """Gets an item, adjusted for non-plain text.
 500
 501        Args:
 502            subscript: The integer or slice to find.
 503
 504        Returns:
 505            The elements described by the subscript.
 506
 507        Raises:
 508            IndexError: The given index is out of range.
 509        """
 510
 511        if isinstance(subscript, int):
 512            plain_index = self.plain_index(subscript)
 513            if plain_index is None:
 514                raise IndexError("StyledText index out of range")
 515
 516            return self.value[plain_index]
 517
 518        return self.value[
 519            slice(
 520                self.plain_index(subscript.start),
 521                self.plain_index(subscript.stop),
 522                subscript.step,
 523            )
 524        ]
 525
 526
 527class MarkupLanguage:
 528    """A class representing an instance of a Markup Language.
 529
 530    This class is used for all markup/ANSI parsing, tokenizing and usage.
 531
 532    ```python3
 533    from pytermgui import tim
 534
 535    tim.alias("my-tag", "@152 72 bold")
 536    tim.print("This is [my-tag]my-tag[/]!")
 537    ```
 538
 539    <p style="text-align: center">
 540        <img src="https://raw.githubusercontent.com/bczsalba/pytermgui/master/assets/\
 541docs/parser/markup_language.png"
 542        style="width: 80%">
 543    </p>
 544    """
 545
 546    raise_unknown_markup: bool = False
 547    """Raise `pytermgui.exceptions.MarkupSyntaxError` when encountering unknown markup tags."""
 548
 549    def __init__(self, default_macros: bool = True) -> None:
 550        """Initializes a MarkupLanguage.
 551
 552        Args:
 553            default_macros: If not set, the builtin macros are not defined.
 554        """
 555
 556        self.tags: dict[str, str] = STYLE_MAP.copy()
 557        self._cache: dict[str, StyledText] = {}
 558        self.macros: dict[str, MacroCallable] = {}
 559        self.user_tags: dict[str, str] = {}
 560        self.unsetters: dict[str, str | None] = UNSETTER_MAP.copy()
 561
 562        self.should_cache: bool = True
 563
 564        if default_macros:
 565            self.define("!link", macro_link)
 566            self.define("!align", macro_align)
 567            self.define("!markup", self.get_markup)
 568            self.define("!shuffle", macro_shuffle)
 569            self.define("!strip_bg", macro_strip_bg)
 570            self.define("!strip_fg", macro_strip_fg)
 571            self.define("!rainbow", macro_rainbow)
 572            self.define("!gradient", macro_gradient)
 573            self.define("!upper", lambda item: str(item.upper()))
 574            self.define("!lower", lambda item: str(item.lower()))
 575            self.define("!title", lambda item: str(item.title()))
 576            self.define("!capitalize", lambda item: str(item.capitalize()))
 577            self.define("!expand", lambda tag: macro_expand(self, tag))
 578            self.define("!debug", lambda *args: ",".join(ascii(arg) for arg in args))
 579
 580        self.alias("code", "dim @black")
 581        self.alias("code.str", "142")
 582        self.alias("code.none", "167")
 583        self.alias("code.global", "214")
 584        self.alias("code.number", "175")
 585        self.alias("code.keyword", "203")
 586        self.alias("code.identifier", "109")
 587        self.alias("code.name", "code.global")
 588        self.alias("code.comment", "240 italic")
 589        self.alias("code.builtin", "code.global")
 590        self.alias("code.file", "code.identifier")
 591        self.alias("code.symbol", "code.identifier")
 592
 593    def _get_color_token(self, tag: str) -> Token | None:
 594        """Tries to get a color token from the given tag.
 595
 596        Args:
 597            tag: The tag to parse.
 598
 599        Returns:
 600            A color token if the given tag could be parsed into one, else None.
 601        """
 602
 603        try:
 604            color = str_to_color(tag, use_cache=self.should_cache)
 605
 606        except ColorSyntaxError:
 607            return None
 608
 609        return Token(name=color.value, ttype=TokenType.COLOR, data=color)
 610
 611    def _get_style_token(self, tag: str) -> Token | None:
 612        """Tries to get a style (including unsetter) token from tags, user tags and unsetters.
 613
 614        Args:
 615            tag: The tag to parse.
 616
 617        Returns:
 618            A `Token` if one could be created, None otherwise.
 619        """
 620
 621        if tag in self.unsetters:
 622            return Token(name=tag, ttype=TokenType.UNSETTER, data=self.unsetters[tag])
 623
 624        if tag in self.user_tags:
 625            return Token(name=tag, ttype=TokenType.STYLE, data=self.user_tags[tag])
 626
 627        if tag in self.tags:
 628            return Token(name=tag, ttype=TokenType.STYLE, data=self.tags[tag])
 629
 630        return None
 631
 632    def print(self, *args, **kwargs) -> None:
 633        """Parse all arguments and pass them through to print, along with kwargs."""
 634
 635        parsed = []
 636        for arg in args:
 637            parsed.append(self.parse(str(arg)))
 638
 639        get_terminal().print(*parsed, **kwargs)
 640
 641    def tokenize_markup(self, markup_text: str) -> Iterator[Token]:
 642        """Converts the given markup string into an iterator of `Token`.
 643
 644        Args:
 645            markup_text: The text to look at.
 646
 647        Returns:
 648            An iterator of tokens. The reason this is an iterator is to possibly save
 649            on memory.
 650        """
 651
 652        end = 0
 653        start = 0
 654        cursor = 0
 655        for match in RE_MARKUP.finditer(markup_text):
 656            full, escapes, tag_text = match.groups()
 657            start, end = match.span()
 658
 659            # Add plain text between last and current match
 660            if start > cursor:
 661                yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:start])
 662
 663            if not escapes == "" and len(escapes) % 2 == 1:
 664                cursor = end
 665                yield Token(ttype=TokenType.ESCAPED, data=full[len(escapes) - 1 :])
 666                continue
 667
 668            for tag in tag_text.split():
 669                token = self._get_style_token(tag)
 670                if token is not None:
 671                    yield token
 672                    continue
 673
 674                # Try to find a color token
 675                token = self._get_color_token(tag)
 676                if token is not None:
 677                    yield token
 678                    continue
 679
 680                macro_match = RE_MACRO.match(tag)
 681                if macro_match is not None:
 682                    name, args = macro_match.groups()
 683                    macro_args = () if args is None else args.split(":")
 684
 685                    if not name in self.macros:
 686                        raise MarkupSyntaxError(
 687                            tag=tag,
 688                            cause="is not a defined macro",
 689                            context=markup_text,
 690                        )
 691
 692                    yield Token(
 693                        name=tag,
 694                        ttype=TokenType.MACRO,
 695                        data=(self.macros[name], macro_args),
 696                    )
 697                    continue
 698
 699                if self.raise_unknown_markup:
 700                    raise MarkupSyntaxError(
 701                        tag=tag, cause="not defined", context=markup_text
 702                    )
 703
 704            cursor = end
 705
 706        # Add remaining text as plain
 707        if len(markup_text) > cursor:
 708            yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:])
 709
 710    def tokenize_ansi(self, ansi: str) -> Iterator[Token]:
 711        """Converts the given ANSI string into an iterator of `Token`.
 712
 713        Args:
 714            ansi: The text to look at.
 715
 716        Returns:
 717            An iterator of tokens. The reason this is an iterator is to possibly save
 718            on memory.
 719        """
 720
 721        def _is_in_tags(code: str, tags: dict[str, str]) -> str | None:
 722            """Determines whether a code is in the given dict of tags."""
 723
 724            for name, current in tags.items():
 725                if current == code:
 726                    return name
 727
 728            return None
 729
 730        end = 0
 731        start = 0
 732        cursor = 0
 733
 734        # StyledText messes with indexing, so we need to cast it
 735        # back to str.
 736        if isinstance(ansi, StyledText):
 737            ansi = str(ansi)
 738
 739        for match in RE_ANSI.finditer(ansi):
 740            code = match.groups()[0]
 741            start, end = match.span()
 742
 743            if code is None:
 744                continue
 745
 746            parts = code.split(";")
 747
 748            if start > cursor:
 749                plain = ansi[cursor:start]
 750
 751                yield Token(name=plain, ttype=TokenType.PLAIN, data=plain)
 752
 753            name: str | None = code
 754            ttype = None
 755            data: str | Color = parts[0]
 756
 757            # Styles & Unsetters
 758            if len(parts) == 1:
 759                # Covariancy is not an issue here, even though mypy seems to think so.
 760                name = _is_in_tags(parts[0], self.unsetters)  # type: ignore
 761                if name is not None:
 762                    ttype = TokenType.UNSETTER
 763
 764                else:
 765                    name = _is_in_tags(parts[0], self.tags)
 766                    if name is not None:
 767                        ttype = TokenType.STYLE
 768
 769            # Colors
 770            if ttype is None:
 771                with suppress(ColorSyntaxError):
 772                    data = str_to_color(code)
 773                    name = data.name
 774                    ttype = TokenType.COLOR
 775
 776            if name is None or ttype is None or data is None:
 777                if len(parts) != 2:
 778                    raise AnsiSyntaxError(
 779                        tag=parts[0], cause="not recognized", context=ansi
 780                    )
 781
 782                name = "position"
 783                ttype = TokenType.POSITION
 784                data = ",".join(reversed(parts))
 785
 786            yield Token(name=name, ttype=ttype, data=data)
 787            cursor = end
 788
 789        if cursor < len(ansi):
 790            plain = ansi[cursor:]
 791
 792            yield Token(ttype=TokenType.PLAIN, data=plain)
 793
 794    def define(self, name: str, method: MacroCallable) -> None:
 795        """Defines a Macro tag that executes the given method.
 796
 797        Args:
 798            name: The name the given method will be reachable by within markup.
 799                The given value gets "!" prepended if it isn't present already.
 800            method: The method this macro will execute.
 801        """
 802
 803        if not name.startswith("!"):
 804            name = f"!{name}"
 805
 806        self.macros[name] = method
 807        self.unsetters[f"/{name}"] = None
 808
 809    def alias(self, name: str, value: str) -> None:
 810        """Aliases the given name to a value, and generates an unsetter for it.
 811
 812        Note that it is not possible to alias macros.
 813
 814        Args:
 815            name: The name of the new tag.
 816            value: The value the new tag will stand for.
 817        """
 818
 819        def _get_unsetter(token: Token) -> str | None:
 820            """Get unsetter for a token"""
 821
 822            if token.ttype is TokenType.PLAIN:
 823                return None
 824
 825            if token.ttype is TokenType.UNSETTER:
 826                return self.unsetters[token.name]
 827
 828            if token.ttype is TokenType.COLOR:
 829                assert isinstance(token.data, Color)
 830
 831                if token.data.background:
 832                    return self.unsetters["/bg"]
 833
 834                return self.unsetters["/fg"]
 835
 836            name = f"/{token.name}"
 837            if not name in self.unsetters:
 838                raise KeyError(f"Could not find unsetter for token {token}.")
 839
 840            return self.unsetters[name]
 841
 842        if name.startswith("!"):
 843            raise ValueError('Only macro tags can always start with "!".')
 844
 845        setter = ""
 846        unsetter = ""
 847
 848        # Try to link to existing tag
 849        if value in self.user_tags:
 850            self.unsetters[f"/{name}"] = self.unsetters[f"/{value}"]
 851            self.user_tags[name] = self.user_tags[value]
 852            return
 853
 854        for token in self.tokenize_markup(f"[{value}]"):
 855            if token.ttype is TokenType.PLAIN:
 856                continue
 857
 858            assert token.sequence is not None
 859            setter += token.sequence
 860
 861            t_unsetter = _get_unsetter(token)
 862            unsetter += f"\x1b[{t_unsetter}m"
 863
 864        self.unsetters[f"/{name}"] = unsetter.lstrip("\x1b[").rstrip("m")
 865        self.user_tags[name] = setter.lstrip("\x1b[").rstrip("m")
 866
 867        marked: list[str] = []
 868        for item in self._cache:
 869            if name in item:
 870                marked.append(item)
 871
 872        for item in marked:
 873            del self._cache[item]
 874
 875    # TODO: I cannot cut down the one-too-many branch that this has at the moment.
 876    #       We could look into it in the future, however.
 877    def parse(  # pylint: disable=too-many-branches
 878        self, markup_text: str
 879    ) -> StyledText:
 880        """Parses the given markup.
 881
 882        Args:
 883            markup_text: The markup to parse.
 884
 885        Returns:
 886            A `StyledText` instance of the result of parsing the input. This
 887            custom `str` class is used to allow accessing the plain value of
 888            the output, as well as to cleanly index within it. It is analogous
 889            to builtin `str`, only adds extra things on top.
 890        """
 891
 892        applied_macros: list[tuple[str, MacroCall]] = []
 893        previous_token: Token | None = None
 894        previous_sequence = ""
 895        sequence = ""
 896        out = ""
 897
 898        def _apply_macros(text: str) -> str:
 899            """Apply current macros to text"""
 900
 901            for _, (method, args) in applied_macros:
 902                text = method(*args, text)
 903
 904            return text
 905
 906        def _is_same_colorgroup(previous: Token, new: Token) -> bool:
 907            if not isinstance(new.data, Color) or not isinstance(previous.data, Color):
 908                return False
 909
 910            return previous.data.background == new.data.background and type(
 911                previous
 912            ) is type(new)
 913
 914        if (
 915            self.should_cache
 916            and markup_text in self._cache
 917            and len(RE_MACRO.findall(markup_text)) == 0
 918        ):
 919            return self._cache[markup_text]
 920
 921        token: Token
 922        for token in self.tokenize_markup(markup_text):
 923            if sequence != "" and previous_token == token:
 924                continue
 925
 926            # Optimize out previously added color tokens, as only the most
 927            # recent would be visible anyways.
 928            if (
 929                token.sequence is not None
 930                and previous_token is not None
 931                and _is_same_colorgroup(previous_token, token)
 932            ):
 933                sequence = token.sequence
 934                continue
 935
 936            if token.ttype == TokenType.UNSETTER and token.data == "0":
 937                out += "\033[0m"
 938                sequence = ""
 939                applied_macros = []
 940                continue
 941
 942            previous_token = token
 943
 944            # Macro unsetters are stored with None as their data
 945            if token.data is None and token.ttype is TokenType.UNSETTER:
 946                for item, data in applied_macros.copy():
 947                    macro_match = RE_MACRO.match(item)
 948                    assert macro_match is not None
 949
 950                    macro_name = macro_match.groups()[0]
 951
 952                    if f"/{macro_name}" == token.name:
 953                        applied_macros.remove((item, data))
 954
 955                continue
 956
 957            if token.ttype is TokenType.MACRO:
 958                assert isinstance(token.data, tuple)
 959
 960                applied_macros.append((token.name, token.data))
 961                continue
 962
 963            if token.sequence is None:
 964                applied = sequence
 965                for item in previous_sequence.split("\x1b"):
 966                    if item == "" or item[1:-1] in self.unsetters.values():
 967                        continue
 968
 969                    item = f"\x1b{item}"
 970                    applied = applied.replace(item, "")
 971
 972                out += applied + _apply_macros(token.name)
 973                previous_sequence = sequence
 974                sequence = ""
 975                continue
 976
 977            sequence += token.sequence
 978
 979        if sequence + previous_sequence != "":
 980            out += "\x1b[0m"
 981
 982        out = StyledText(out)
 983        self._cache[markup_text] = out
 984        return out
 985
 986    def get_markup(self, ansi: str) -> str:
 987        """Generates markup from ANSI text.
 988
 989        Args:
 990            ansi: The text to get markup from.
 991
 992        Returns:
 993            A markup string that can be parsed to get (visually) the same
 994            result. Note that this conversion is lossy in a way: there are some
 995            details (like macros) that cannot be preserved in an ANSI->Markup->ANSI
 996            conversion.
 997        """
 998
 999        current_tags: list[str] = []
1000        out = ""
1001        for token in self.tokenize_ansi(ansi):
1002            if token.ttype is TokenType.PLAIN:
1003                if len(current_tags) != 0:
1004                    out += "[" + " ".join(current_tags) + "]"
1005
1006                assert isinstance(token.data, str)
1007                out += token.data
1008                current_tags = []
1009                continue
1010
1011            if token.ttype is TokenType.ESCAPED:
1012                assert isinstance(token.data, str)
1013
1014                current_tags.append(token.data)
1015                continue
1016
1017            current_tags.append(token.name)
1018
1019        return out
1020
1021    def prettify_ansi(self, text: str) -> str:
1022        """Returns a prettified (syntax-highlighted) ANSI str.
1023
1024        This is useful to quickly "inspect" a given ANSI string. However,
1025        for most real uses `MarkupLanguage.prettify_markup` would be
1026        preferable, given an argument of `MarkupLanguage.get_markup(text)`,
1027        as it is much more verbose.
1028
1029        Args:
1030            text: The ANSI-text to prettify.
1031
1032        Returns:
1033            The prettified ANSI text. This text's styles remain valid,
1034            so copy-pasting the argument into a command (like printf)
1035            that can show styled text will work the same way.
1036        """
1037
1038        out = ""
1039        sequences = ""
1040        for token in self.tokenize_ansi(text):
1041            if token.ttype is TokenType.PLAIN:
1042                assert isinstance(token.data, str)
1043                out += token.data
1044                continue
1045
1046            assert token.sequence is not None
1047            out += "\x1b[0m" + token.sequence + token.sequence.replace("\x1b", "\\x1b")
1048            sequences += token.sequence
1049            out += sequences
1050
1051        return out
1052
1053    def prettify_markup(self, text: str) -> str:
1054        """Returns a prettified (syntax-highlighted) markup str.
1055
1056        Args:
1057            text: The markup-text to prettify.
1058
1059        Returns:
1060            Prettified markup. This markup, excluding its styles,
1061            remains valid markup.
1062        """
1063
1064        def _apply_macros(text: str) -> str:
1065            """Apply current macros to text"""
1066
1067            for _, (method, args) in applied_macros:
1068                text = method(*args, text)
1069
1070            return text
1071
1072        def _pop_macro(name: str) -> None:
1073            """Pops a macro from applied_macros."""
1074
1075            for i, (macro_name, _) in enumerate(applied_macros):
1076                if macro_name == name:
1077                    applied_macros.pop(i)
1078                    break
1079
1080        def _finish(out: str, in_sequence: bool) -> str:
1081            """Adds ending cap to the given string."""
1082
1083            if in_sequence:
1084                if not out.endswith("\x1b[0m"):
1085                    out += "\x1b[0m"
1086
1087                return out + "]"
1088
1089            return out + "[/]"
1090
1091        styles: dict[TokenType, str] = {
1092            TokenType.MACRO: "210",
1093            TokenType.ESCAPED: "210 bold",
1094            TokenType.UNSETTER: "strikethrough",
1095        }
1096
1097        applied_macros: list[tuple[str, MacroCall]] = []
1098
1099        out = ""
1100        in_sequence = False
1101        current_styles: list[Token] = []
1102
1103        for token in self.tokenize_markup(text):
1104            if token.ttype in [TokenType.PLAIN, TokenType.ESCAPED]:
1105                if in_sequence:
1106                    out += "]"
1107
1108                in_sequence = False
1109
1110                sequence = ""
1111                for style in current_styles:
1112                    if style.sequence is None:
1113                        continue
1114
1115                    sequence += style.sequence
1116
1117                out += f"{sequence}{_apply_macros(token.name)}\033[0m"
1118                continue
1119
1120            out += " " if in_sequence else "["
1121            in_sequence = True
1122
1123            if token.ttype is TokenType.UNSETTER:
1124                if token.name == "/":
1125                    applied_macros = []
1126
1127                name = token.name[1:]
1128
1129                if name in self.macros:
1130                    _pop_macro(name)
1131
1132                current_styles.append(token)
1133
1134                out += self.parse(
1135                    ("" if (name in self.tags) or (name in self.user_tags) else "")
1136                    + f"[{styles[TokenType.UNSETTER]}]/{name}"
1137                )
1138                continue
1139
1140            if token.ttype is TokenType.MACRO:
1141                assert isinstance(token.data, tuple)
1142
1143                name = token.name
1144                if "(" in name:
1145                    name = name[: token.name.index("(")]
1146
1147                applied_macros.append((name, token.data))
1148
1149                try:
1150                    out += token.data[0](*token.data[1], token.name)
1151                    continue
1152
1153                except TypeError:  # Not enough arguments
1154                    pass
1155
1156            if token.sequence is not None:
1157                current_styles.append(token)
1158
1159            style_markup = styles.get(token.ttype) or token.name
1160            out += self.parse(f"[{style_markup}]{token.name}")
1161
1162        return _finish(out, in_sequence)
1163
1164    def get_styled_plains(self, text: str) -> Iterator[StyledText]:
1165        """Gets all plain tokens within text, with their respective styles applied.
1166
1167        Args:
1168            text: The ANSI-sequence containing string to find plains from.
1169
1170        Returns:
1171            An iterator of `StyledText` objects, each yielded when a new plain token is found,
1172            containing the styles that are relevant and active on the given plain.
1173        """
1174
1175        def _apply_styles(styles: list[Token], text: str) -> str:
1176            """Applies given styles to text."""
1177
1178            for token in styles:
1179                if token.ttype is TokenType.MACRO:
1180                    assert isinstance(token.data, tuple)
1181                    text = token.data[0](*token.data[1], text)
1182                    continue
1183
1184                if token.sequence is None:
1185                    continue
1186
1187                text = token.sequence + text
1188
1189            return text
1190
1191        def _pop_unsetter(token: Token, styles: list[Token]) -> list[Token]:
1192            """Removes an unsetter from the list, returns the new list."""
1193
1194            if token.name == "/":
1195                return []
1196
1197            target_name = token.name[1:]
1198            for style in styles:
1199                # bold & dim unsetters represent the same character, so we have
1200                # to treat them the same way.
1201                style_name = style.name
1202
1203                if style.name == "dim":
1204                    style_name = "bold"
1205
1206                if style_name == target_name:
1207                    styles.remove(style)
1208
1209                elif (
1210                    style_name.startswith(target_name)
1211                    and style.ttype is TokenType.MACRO
1212                ):
1213                    styles.remove(style)
1214
1215                elif style.ttype is TokenType.COLOR:
1216                    assert isinstance(style.data, Color)
1217                    if target_name == "fg" and not style.data.background:
1218                        styles.remove(style)
1219
1220                    elif target_name == "bg" and style.data.background:
1221                        styles.remove(style)
1222
1223            return styles
1224
1225        styles: list[Token] = []
1226        for token in self.tokenize_ansi(text):
1227            if token.ttype is TokenType.COLOR:
1228                for i, style in enumerate(reversed(styles)):
1229                    if style.ttype is TokenType.COLOR:
1230                        assert isinstance(style.data, Color)
1231                        assert isinstance(token.data, Color)
1232
1233                        if style.data.background != token.data.background:
1234                            continue
1235
1236                        styles[len(styles) - i - 1] = token
1237                        break
1238                else:
1239                    styles.append(token)
1240
1241                continue
1242
1243            if token.ttype is TokenType.LINK:
1244                styles.append(token)
1245                yield StyledText(_apply_styles(styles, token.name))
1246
1247            if token.ttype is TokenType.PLAIN:
1248                assert isinstance(token.data, str)
1249                yield StyledText(_apply_styles(styles, token.data))
1250                continue
1251
1252            if token.ttype is TokenType.UNSETTER:
1253                styles = _pop_unsetter(token, styles)
1254                continue
1255
1256            styles.append(token)
1257
1258
1259def main() -> None:
1260    """Main method"""
1261
1262    parser = ArgumentParser()
1263
1264    markup_group = parser.add_argument_group("Markup->ANSI")
1265    markup_group.add_argument(
1266        "-p", "--parse", metavar=("TXT"), help="parse a markup text"
1267    )
1268    markup_group.add_argument(
1269        "-e", "--escape", help="escape parsed markup", action="store_true"
1270    )
1271    # markup_group.add_argument(
1272    # "-o",
1273    # "--optimize",
1274    # help="set optimization level for markup parsing",
1275    # action="count",
1276    # default=0,
1277    # )
1278
1279    markup_group.add_argument("--alias", action="append", help="alias src=dst")
1280
1281    ansi_group = parser.add_argument_group("ANSI->Markup")
1282    ansi_group.add_argument(
1283        "-m", "--markup", metavar=("TXT"), help="get markup from ANSI text"
1284    )
1285    ansi_group.add_argument(
1286        "-s",
1287        "--show-inverse",
1288        action="store_true",
1289        help="show result of parsing result markup",
1290    )
1291
1292    args = parser.parse_args()
1293
1294    lang = MarkupLanguage()
1295
1296    if args.markup:
1297        markup_text = lang.get_markup(args.markup)
1298        print(markup_text, end="")
1299
1300        if args.show_inverse:
1301            print("->", lang.parse(markup_text))
1302        else:
1303            print()
1304
1305    if args.parse:
1306        if args.alias:
1307            for alias in args.alias:
1308                src, dest = alias.split("=")
1309                lang.alias(src, dest)
1310
1311        parsed = lang.parse(args.parse)
1312
1313        if args.escape:
1314            print(ascii(parsed))
1315        else:
1316            print(parsed)
1317
1318        return
1319
1320
1321tim = markup = MarkupLanguage()
1322"""The default TIM instances."""
1323
1324if __name__ == "__main__":
1325    main()
#   class StyledText(builtins.str):
View Source
389class StyledText(str):
390    """A styled text object.
391
392    The purpose of this class is to implement some things regular `str`
393    breaks at when encountering ANSI sequences.
394
395    Instances of this class are usually spat out by `MarkupLanguage.parse`,
396    but may be manually constructed if the need arises. Everything works even
397    if there is no ANSI tomfoolery going on.
398    """
399
400    value: str
401    """The underlying, ANSI-inclusive string value."""
402
403    _plain: str | None = None
404    _tokens: list[Token] | None = None
405
406    def __new__(cls, value: str = ""):
407        """Creates a StyledText, gets markup tags."""
408
409        obj = super().__new__(cls, value)
410        obj.value = value
411
412        return obj
413
414    def _generate_tokens(self) -> None:
415        """Generates self._tokens & self._plain."""
416
417        self._tokens = list(tim.tokenize_ansi(self.value))
418
419        self._plain = ""
420        for token in self._tokens:
421            if token.ttype is not TokenType.PLAIN:
422                continue
423
424            assert isinstance(token.data, str)
425            self._plain += token.data
426
427    @property
428    def tokens(self) -> list[Token]:
429        """Returns all markup tokens of this object.
430
431        Generated on-demand, at the first call to this or the self.plain
432        property.
433        """
434
435        if self._tokens is not None:
436            return self._tokens
437
438        self._generate_tokens()
439        assert self._tokens is not None
440        return self._tokens
441
442    @property
443    def plain(self) -> str:
444        """Returns the value of this object, with no ANSI sequences.
445
446        Generated on-demand, at the first call to this or the self.tokens
447        property.
448        """
449
450        if self._plain is not None:
451            return self._plain
452
453        self._generate_tokens()
454        assert self._plain is not None
455        return self._plain
456
457    def plain_index(self, index: int | None) -> int | None:
458        """Finds given index inside plain text."""
459
460        if index is None:
461            return None
462
463        styled_chars = 0
464        plain_chars = 0
465        negative_index = False
466
467        tokens = self.tokens.copy()
468        if index < 0:
469            tokens.reverse()
470            index = abs(index)
471            negative_index = True
472
473        for token in tokens:
474            if token.data is None:
475                continue
476
477            if token.ttype is not TokenType.PLAIN:
478                assert token.sequence is not None
479                styled_chars += len(token.sequence)
480                continue
481
482            assert isinstance(token.data, str)
483            for _ in range(len(token.data)):
484                if plain_chars == index:
485                    if negative_index:
486                        return -1 * (plain_chars + styled_chars)
487
488                    return styled_chars + plain_chars
489
490                plain_chars += 1
491
492        return None
493
494    def __len__(self) -> int:
495        """Gets "real" length of object."""
496
497        return len(self.plain)
498
499    def __getitem__(self, subscript: int | slice) -> str:
500        """Gets an item, adjusted for non-plain text.
501
502        Args:
503            subscript: The integer or slice to find.
504
505        Returns:
506            The elements described by the subscript.
507
508        Raises:
509            IndexError: The given index is out of range.
510        """
511
512        if isinstance(subscript, int):
513            plain_index = self.plain_index(subscript)
514            if plain_index is None:
515                raise IndexError("StyledText index out of range")
516
517            return self.value[plain_index]
518
519        return self.value[
520            slice(
521                self.plain_index(subscript.start),
522                self.plain_index(subscript.stop),
523                subscript.step,
524            )
525        ]

A styled text object.

The purpose of this class is to implement some things regular str breaks at when encountering ANSI sequences.

Instances of this class are usually spat out by MarkupLanguage.parse, but may be manually constructed if the need arises. Everything works even if there is no ANSI tomfoolery going on.

#   StyledText(value: str = '')
View Source
406    def __new__(cls, value: str = ""):
407        """Creates a StyledText, gets markup tags."""
408
409        obj = super().__new__(cls, value)
410        obj.value = value
411
412        return obj

Creates a StyledText, gets markup tags.

#   value: str

The underlying, ANSI-inclusive string value.

#   tokens: list[pytermgui.parser.Token]

Returns all markup tokens of this object.

Generated on-demand, at the first call to this or the self.plain property.

#   plain: str

Returns the value of this object, with no ANSI sequences.

Generated on-demand, at the first call to this or the self.tokens property.

#   def plain_index(self, index: int | None) -> int | None:
View Source
457    def plain_index(self, index: int | None) -> int | None:
458        """Finds given index inside plain text."""
459
460        if index is None:
461            return None
462
463        styled_chars = 0
464        plain_chars = 0
465        negative_index = False
466
467        tokens = self.tokens.copy()
468        if index < 0:
469            tokens.reverse()
470            index = abs(index)
471            negative_index = True
472
473        for token in tokens:
474            if token.data is None:
475                continue
476
477            if token.ttype is not TokenType.PLAIN:
478                assert token.sequence is not None
479                styled_chars += len(token.sequence)
480                continue
481
482            assert isinstance(token.data, str)
483            for _ in range(len(token.data)):
484                if plain_chars == index:
485                    if negative_index:
486                        return -1 * (plain_chars + styled_chars)
487
488                    return styled_chars + plain_chars
489
490                plain_chars += 1
491
492        return None

Finds given index inside plain text.

Inherited Members
builtins.str
encode
replace
split
rsplit
join
capitalize
casefold
title
center
count
expandtabs
find
partition
index
ljust
lower
lstrip
rfind
rindex
rjust
rstrip
rpartition
splitlines
strip
swapcase
translate
upper
startswith
endswith
removeprefix
removesuffix
isascii
islower
isupper
istitle
isspace
isdecimal
isdigit
isnumeric
isalpha
isalnum
isidentifier
isprintable
zfill
format
format_map
maketrans
#   MacroCallable = typing.Callable[..., str]
#   MacroCall = typing.Tuple[typing.Callable[..., str], typing.List[str]]
#   class MarkupLanguage:
View Source
 528class MarkupLanguage:
 529    """A class representing an instance of a Markup Language.
 530
 531    This class is used for all markup/ANSI parsing, tokenizing and usage.
 532
 533    ```python3
 534    from pytermgui import tim
 535
 536    tim.alias("my-tag", "@152 72 bold")
 537    tim.print("This is [my-tag]my-tag[/]!")
 538    ```
 539
 540    <p style="text-align: center">
 541        <img src="https://raw.githubusercontent.com/bczsalba/pytermgui/master/assets/\
 542docs/parser/markup_language.png"
 543        style="width: 80%">
 544    </p>
 545    """
 546
 547    raise_unknown_markup: bool = False
 548    """Raise `pytermgui.exceptions.MarkupSyntaxError` when encountering unknown markup tags."""
 549
 550    def __init__(self, default_macros: bool = True) -> None:
 551        """Initializes a MarkupLanguage.
 552
 553        Args:
 554            default_macros: If not set, the builtin macros are not defined.
 555        """
 556
 557        self.tags: dict[str, str] = STYLE_MAP.copy()
 558        self._cache: dict[str, StyledText] = {}
 559        self.macros: dict[str, MacroCallable] = {}
 560        self.user_tags: dict[str, str] = {}
 561        self.unsetters: dict[str, str | None] = UNSETTER_MAP.copy()
 562
 563        self.should_cache: bool = True
 564
 565        if default_macros:
 566            self.define("!link", macro_link)
 567            self.define("!align", macro_align)
 568            self.define("!markup", self.get_markup)
 569            self.define("!shuffle", macro_shuffle)
 570            self.define("!strip_bg", macro_strip_bg)
 571            self.define("!strip_fg", macro_strip_fg)
 572            self.define("!rainbow", macro_rainbow)
 573            self.define("!gradient", macro_gradient)
 574            self.define("!upper", lambda item: str(item.upper()))
 575            self.define("!lower", lambda item: str(item.lower()))
 576            self.define("!title", lambda item: str(item.title()))
 577            self.define("!capitalize", lambda item: str(item.capitalize()))
 578            self.define("!expand", lambda tag: macro_expand(self, tag))
 579            self.define("!debug", lambda *args: ",".join(ascii(arg) for arg in args))
 580
 581        self.alias("code", "dim @black")
 582        self.alias("code.str", "142")
 583        self.alias("code.none", "167")
 584        self.alias("code.global", "214")
 585        self.alias("code.number", "175")
 586        self.alias("code.keyword", "203")
 587        self.alias("code.identifier", "109")
 588        self.alias("code.name", "code.global")
 589        self.alias("code.comment", "240 italic")
 590        self.alias("code.builtin", "code.global")
 591        self.alias("code.file", "code.identifier")
 592        self.alias("code.symbol", "code.identifier")
 593
 594    def _get_color_token(self, tag: str) -> Token | None:
 595        """Tries to get a color token from the given tag.
 596
 597        Args:
 598            tag: The tag to parse.
 599
 600        Returns:
 601            A color token if the given tag could be parsed into one, else None.
 602        """
 603
 604        try:
 605            color = str_to_color(tag, use_cache=self.should_cache)
 606
 607        except ColorSyntaxError:
 608            return None
 609
 610        return Token(name=color.value, ttype=TokenType.COLOR, data=color)
 611
 612    def _get_style_token(self, tag: str) -> Token | None:
 613        """Tries to get a style (including unsetter) token from tags, user tags and unsetters.
 614
 615        Args:
 616            tag: The tag to parse.
 617
 618        Returns:
 619            A `Token` if one could be created, None otherwise.
 620        """
 621
 622        if tag in self.unsetters:
 623            return Token(name=tag, ttype=TokenType.UNSETTER, data=self.unsetters[tag])
 624
 625        if tag in self.user_tags:
 626            return Token(name=tag, ttype=TokenType.STYLE, data=self.user_tags[tag])
 627
 628        if tag in self.tags:
 629            return Token(name=tag, ttype=TokenType.STYLE, data=self.tags[tag])
 630
 631        return None
 632
 633    def print(self, *args, **kwargs) -> None:
 634        """Parse all arguments and pass them through to print, along with kwargs."""
 635
 636        parsed = []
 637        for arg in args:
 638            parsed.append(self.parse(str(arg)))
 639
 640        get_terminal().print(*parsed, **kwargs)
 641
 642    def tokenize_markup(self, markup_text: str) -> Iterator[Token]:
 643        """Converts the given markup string into an iterator of `Token`.
 644
 645        Args:
 646            markup_text: The text to look at.
 647
 648        Returns:
 649            An iterator of tokens. The reason this is an iterator is to possibly save
 650            on memory.
 651        """
 652
 653        end = 0
 654        start = 0
 655        cursor = 0
 656        for match in RE_MARKUP.finditer(markup_text):
 657            full, escapes, tag_text = match.groups()
 658            start, end = match.span()
 659
 660            # Add plain text between last and current match
 661            if start > cursor:
 662                yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:start])
 663
 664            if not escapes == "" and len(escapes) % 2 == 1:
 665                cursor = end
 666                yield Token(ttype=TokenType.ESCAPED, data=full[len(escapes) - 1 :])
 667                continue
 668
 669            for tag in tag_text.split():
 670                token = self._get_style_token(tag)
 671                if token is not None:
 672                    yield token
 673                    continue
 674
 675                # Try to find a color token
 676                token = self._get_color_token(tag)
 677                if token is not None:
 678                    yield token
 679                    continue
 680
 681                macro_match = RE_MACRO.match(tag)
 682                if macro_match is not None:
 683                    name, args = macro_match.groups()
 684                    macro_args = () if args is None else args.split(":")
 685
 686                    if not name in self.macros:
 687                        raise MarkupSyntaxError(
 688                            tag=tag,
 689                            cause="is not a defined macro",
 690                            context=markup_text,
 691                        )
 692
 693                    yield Token(
 694                        name=tag,
 695                        ttype=TokenType.MACRO,
 696                        data=(self.macros[name], macro_args),
 697                    )
 698                    continue
 699
 700                if self.raise_unknown_markup:
 701                    raise MarkupSyntaxError(
 702                        tag=tag, cause="not defined", context=markup_text
 703                    )
 704
 705            cursor = end
 706
 707        # Add remaining text as plain
 708        if len(markup_text) > cursor:
 709            yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:])
 710
 711    def tokenize_ansi(self, ansi: str) -> Iterator[Token]:
 712        """Converts the given ANSI string into an iterator of `Token`.
 713
 714        Args:
 715            ansi: The text to look at.
 716
 717        Returns:
 718            An iterator of tokens. The reason this is an iterator is to possibly save
 719            on memory.
 720        """
 721
 722        def _is_in_tags(code: str, tags: dict[str, str]) -> str | None:
 723            """Determines whether a code is in the given dict of tags."""
 724
 725            for name, current in tags.items():
 726                if current == code:
 727                    return name
 728
 729            return None
 730
 731        end = 0
 732        start = 0
 733        cursor = 0
 734
 735        # StyledText messes with indexing, so we need to cast it
 736        # back to str.
 737        if isinstance(ansi, StyledText):
 738            ansi = str(ansi)
 739
 740        for match in RE_ANSI.finditer(ansi):
 741            code = match.groups()[0]
 742            start, end = match.span()
 743
 744            if code is None:
 745                continue
 746
 747            parts = code.split(";")
 748
 749            if start > cursor:
 750                plain = ansi[cursor:start]
 751
 752                yield Token(name=plain, ttype=TokenType.PLAIN, data=plain)
 753
 754            name: str | None = code
 755            ttype = None
 756            data: str | Color = parts[0]
 757
 758            # Styles & Unsetters
 759            if len(parts) == 1:
 760                # Covariancy is not an issue here, even though mypy seems to think so.
 761                name = _is_in_tags(parts[0], self.unsetters)  # type: ignore
 762                if name is not None:
 763                    ttype = TokenType.UNSETTER
 764
 765                else:
 766                    name = _is_in_tags(parts[0], self.tags)
 767                    if name is not None:
 768                        ttype = TokenType.STYLE
 769
 770            # Colors
 771            if ttype is None:
 772                with suppress(ColorSyntaxError):
 773                    data = str_to_color(code)
 774                    name = data.name
 775                    ttype = TokenType.COLOR
 776
 777            if name is None or ttype is None or data is None:
 778                if len(parts) != 2:
 779                    raise AnsiSyntaxError(
 780                        tag=parts[0], cause="not recognized", context=ansi
 781                    )
 782
 783                name = "position"
 784                ttype = TokenType.POSITION
 785                data = ",".join(reversed(parts))
 786
 787            yield Token(name=name, ttype=ttype, data=data)
 788            cursor = end
 789
 790        if cursor < len(ansi):
 791            plain = ansi[cursor:]
 792
 793            yield Token(ttype=TokenType.PLAIN, data=plain)
 794
 795    def define(self, name: str, method: MacroCallable) -> None:
 796        """Defines a Macro tag that executes the given method.
 797
 798        Args:
 799            name: The name the given method will be reachable by within markup.
 800                The given value gets "!" prepended if it isn't present already.
 801            method: The method this macro will execute.
 802        """
 803
 804        if not name.startswith("!"):
 805            name = f"!{name}"
 806
 807        self.macros[name] = method
 808        self.unsetters[f"/{name}"] = None
 809
 810    def alias(self, name: str, value: str) -> None:
 811        """Aliases the given name to a value, and generates an unsetter for it.
 812
 813        Note that it is not possible to alias macros.
 814
 815        Args:
 816            name: The name of the new tag.
 817            value: The value the new tag will stand for.
 818        """
 819
 820        def _get_unsetter(token: Token) -> str | None:
 821            """Get unsetter for a token"""
 822
 823            if token.ttype is TokenType.PLAIN:
 824                return None
 825
 826            if token.ttype is TokenType.UNSETTER:
 827                return self.unsetters[token.name]
 828
 829            if token.ttype is TokenType.COLOR:
 830                assert isinstance(token.data, Color)
 831
 832                if token.data.background:
 833                    return self.unsetters["/bg"]
 834
 835                return self.unsetters["/fg"]
 836
 837            name = f"/{token.name}"
 838            if not name in self.unsetters:
 839                raise KeyError(f"Could not find unsetter for token {token}.")
 840
 841            return self.unsetters[name]
 842
 843        if name.startswith("!"):
 844            raise ValueError('Only macro tags can always start with "!".')
 845
 846        setter = ""
 847        unsetter = ""
 848
 849        # Try to link to existing tag
 850        if value in self.user_tags:
 851            self.unsetters[f"/{name}"] = self.unsetters[f"/{value}"]
 852            self.user_tags[name] = self.user_tags[value]
 853            return
 854
 855        for token in self.tokenize_markup(f"[{value}]"):
 856            if token.ttype is TokenType.PLAIN:
 857                continue
 858
 859            assert token.sequence is not None
 860            setter += token.sequence
 861
 862            t_unsetter = _get_unsetter(token)
 863            unsetter += f"\x1b[{t_unsetter}m"
 864
 865        self.unsetters[f"/{name}"] = unsetter.lstrip("\x1b[").rstrip("m")
 866        self.user_tags[name] = setter.lstrip("\x1b[").rstrip("m")
 867
 868        marked: list[str] = []
 869        for item in self._cache:
 870            if name in item:
 871                marked.append(item)
 872
 873        for item in marked:
 874            del self._cache[item]
 875
 876    # TODO: I cannot cut down the one-too-many branch that this has at the moment.
 877    #       We could look into it in the future, however.
 878    def parse(  # pylint: disable=too-many-branches
 879        self, markup_text: str
 880    ) -> StyledText:
 881        """Parses the given markup.
 882
 883        Args:
 884            markup_text: The markup to parse.
 885
 886        Returns:
 887            A `StyledText` instance of the result of parsing the input. This
 888            custom `str` class is used to allow accessing the plain value of
 889            the output, as well as to cleanly index within it. It is analogous
 890            to builtin `str`, only adds extra things on top.
 891        """
 892
 893        applied_macros: list[tuple[str, MacroCall]] = []
 894        previous_token: Token | None = None
 895        previous_sequence = ""
 896        sequence = ""
 897        out = ""
 898
 899        def _apply_macros(text: str) -> str:
 900            """Apply current macros to text"""
 901
 902            for _, (method, args) in applied_macros:
 903                text = method(*args, text)
 904
 905            return text
 906
 907        def _is_same_colorgroup(previous: Token, new: Token) -> bool:
 908            if not isinstance(new.data, Color) or not isinstance(previous.data, Color):
 909                return False
 910
 911            return previous.data.background == new.data.background and type(
 912                previous
 913            ) is type(new)
 914
 915        if (
 916            self.should_cache
 917            and markup_text in self._cache
 918            and len(RE_MACRO.findall(markup_text)) == 0
 919        ):
 920            return self._cache[markup_text]
 921
 922        token: Token
 923        for token in self.tokenize_markup(markup_text):
 924            if sequence != "" and previous_token == token:
 925                continue
 926
 927            # Optimize out previously added color tokens, as only the most
 928            # recent would be visible anyways.
 929            if (
 930                token.sequence is not None
 931                and previous_token is not None
 932                and _is_same_colorgroup(previous_token, token)
 933            ):
 934                sequence = token.sequence
 935                continue
 936
 937            if token.ttype == TokenType.UNSETTER and token.data == "0":
 938                out += "\033[0m"
 939                sequence = ""
 940                applied_macros = []
 941                continue
 942
 943            previous_token = token
 944
 945            # Macro unsetters are stored with None as their data
 946            if token.data is None and token.ttype is TokenType.UNSETTER:
 947                for item, data in applied_macros.copy():
 948                    macro_match = RE_MACRO.match(item)
 949                    assert macro_match is not None
 950
 951                    macro_name = macro_match.groups()[0]
 952
 953                    if f"/{macro_name}" == token.name:
 954                        applied_macros.remove((item, data))
 955
 956                continue
 957
 958            if token.ttype is TokenType.MACRO:
 959                assert isinstance(token.data, tuple)
 960
 961                applied_macros.append((token.name, token.data))
 962                continue
 963
 964            if token.sequence is None:
 965                applied = sequence
 966                for item in previous_sequence.split("\x1b"):
 967                    if item == "" or item[1:-1] in self.unsetters.values():
 968                        continue
 969
 970                    item = f"\x1b{item}"
 971                    applied = applied.replace(item, "")
 972
 973                out += applied + _apply_macros(token.name)
 974                previous_sequence = sequence
 975                sequence = ""
 976                continue
 977
 978            sequence += token.sequence
 979
 980        if sequence + previous_sequence != "":
 981            out += "\x1b[0m"
 982
 983        out = StyledText(out)
 984        self._cache[markup_text] = out
 985        return out
 986
 987    def get_markup(self, ansi: str) -> str:
 988        """Generates markup from ANSI text.
 989
 990        Args:
 991            ansi: The text to get markup from.
 992
 993        Returns:
 994            A markup string that can be parsed to get (visually) the same
 995            result. Note that this conversion is lossy in a way: there are some
 996            details (like macros) that cannot be preserved in an ANSI->Markup->ANSI
 997            conversion.
 998        """
 999
1000        current_tags: list[str] = []
1001        out = ""
1002        for token in self.tokenize_ansi(ansi):
1003            if token.ttype is TokenType.PLAIN:
1004                if len(current_tags) != 0:
1005                    out += "[" + " ".join(current_tags) + "]"
1006
1007                assert isinstance(token.data, str)
1008                out += token.data
1009                current_tags = []
1010                continue
1011
1012            if token.ttype is TokenType.ESCAPED:
1013                assert isinstance(token.data, str)
1014
1015                current_tags.append(token.data)
1016                continue
1017
1018            current_tags.append(token.name)
1019
1020        return out
1021
1022    def prettify_ansi(self, text: str) -> str:
1023        """Returns a prettified (syntax-highlighted) ANSI str.
1024
1025        This is useful to quickly "inspect" a given ANSI string. However,
1026        for most real uses `MarkupLanguage.prettify_markup` would be
1027        preferable, given an argument of `MarkupLanguage.get_markup(text)`,
1028        as it is much more verbose.
1029
1030        Args:
1031            text: The ANSI-text to prettify.
1032
1033        Returns:
1034            The prettified ANSI text. This text's styles remain valid,
1035            so copy-pasting the argument into a command (like printf)
1036            that can show styled text will work the same way.
1037        """
1038
1039        out = ""
1040        sequences = ""
1041        for token in self.tokenize_ansi(text):
1042            if token.ttype is TokenType.PLAIN:
1043                assert isinstance(token.data, str)
1044                out += token.data
1045                continue
1046
1047            assert token.sequence is not None
1048            out += "\x1b[0m" + token.sequence + token.sequence.replace("\x1b", "\\x1b")
1049            sequences += token.sequence
1050            out += sequences
1051
1052        return out
1053
1054    def prettify_markup(self, text: str) -> str:
1055        """Returns a prettified (syntax-highlighted) markup str.
1056
1057        Args:
1058            text: The markup-text to prettify.
1059
1060        Returns:
1061            Prettified markup. This markup, excluding its styles,
1062            remains valid markup.
1063        """
1064
1065        def _apply_macros(text: str) -> str:
1066            """Apply current macros to text"""
1067
1068            for _, (method, args) in applied_macros:
1069                text = method(*args, text)
1070
1071            return text
1072
1073        def _pop_macro(name: str) -> None:
1074            """Pops a macro from applied_macros."""
1075
1076            for i, (macro_name, _) in enumerate(applied_macros):
1077                if macro_name == name:
1078                    applied_macros.pop(i)
1079                    break
1080
1081        def _finish(out: str, in_sequence: bool) -> str:
1082            """Adds ending cap to the given string."""
1083
1084            if in_sequence:
1085                if not out.endswith("\x1b[0m"):
1086                    out += "\x1b[0m"
1087
1088                return out + "]"
1089
1090            return out + "[/]"
1091
1092        styles: dict[TokenType, str] = {
1093            TokenType.MACRO: "210",
1094            TokenType.ESCAPED: "210 bold",
1095            TokenType.UNSETTER: "strikethrough",
1096        }
1097
1098        applied_macros: list[tuple[str, MacroCall]] = []
1099
1100        out = ""
1101        in_sequence = False
1102        current_styles: list[Token] = []
1103
1104        for token in self.tokenize_markup(text):
1105            if token.ttype in [TokenType.PLAIN, TokenType.ESCAPED]:
1106                if in_sequence:
1107                    out += "]"
1108
1109                in_sequence = False
1110
1111                sequence = ""
1112                for style in current_styles:
1113                    if style.sequence is None:
1114                        continue
1115
1116                    sequence += style.sequence
1117
1118                out += f"{sequence}{_apply_macros(token.name)}\033[0m"
1119                continue
1120
1121            out += " " if in_sequence else "["
1122            in_sequence = True
1123
1124            if token.ttype is TokenType.UNSETTER:
1125                if token.name == "/":
1126                    applied_macros = []
1127
1128                name = token.name[1:]
1129
1130                if name in self.macros:
1131                    _pop_macro(name)
1132
1133                current_styles.append(token)
1134
1135                out += self.parse(
1136                    ("" if (name in self.tags) or (name in self.user_tags) else "")
1137                    + f"[{styles[TokenType.UNSETTER]}]/{name}"
1138                )
1139                continue
1140
1141            if token.ttype is TokenType.MACRO:
1142                assert isinstance(token.data, tuple)
1143
1144                name = token.name
1145                if "(" in name:
1146                    name = name[: token.name.index("(")]
1147
1148                applied_macros.append((name, token.data))
1149
1150                try:
1151                    out += token.data[0](*token.data[1], token.name)
1152                    continue
1153
1154                except TypeError:  # Not enough arguments
1155                    pass
1156
1157            if token.sequence is not None:
1158                current_styles.append(token)
1159
1160            style_markup = styles.get(token.ttype) or token.name
1161            out += self.parse(f"[{style_markup}]{token.name}")
1162
1163        return _finish(out, in_sequence)
1164
1165    def get_styled_plains(self, text: str) -> Iterator[StyledText]:
1166        """Gets all plain tokens within text, with their respective styles applied.
1167
1168        Args:
1169            text: The ANSI-sequence containing string to find plains from.
1170
1171        Returns:
1172            An iterator of `StyledText` objects, each yielded when a new plain token is found,
1173            containing the styles that are relevant and active on the given plain.
1174        """
1175
1176        def _apply_styles(styles: list[Token], text: str) -> str:
1177            """Applies given styles to text."""
1178
1179            for token in styles:
1180                if token.ttype is TokenType.MACRO:
1181                    assert isinstance(token.data, tuple)
1182                    text = token.data[0](*token.data[1], text)
1183                    continue
1184
1185                if token.sequence is None:
1186                    continue
1187
1188                text = token.sequence + text
1189
1190            return text
1191
1192        def _pop_unsetter(token: Token, styles: list[Token]) -> list[Token]:
1193            """Removes an unsetter from the list, returns the new list."""
1194
1195            if token.name == "/":
1196                return []
1197
1198            target_name = token.name[1:]
1199            for style in styles:
1200                # bold & dim unsetters represent the same character, so we have
1201                # to treat them the same way.
1202                style_name = style.name
1203
1204                if style.name == "dim":
1205                    style_name = "bold"
1206
1207                if style_name == target_name:
1208                    styles.remove(style)
1209
1210                elif (
1211                    style_name.startswith(target_name)
1212                    and style.ttype is TokenType.MACRO
1213                ):
1214                    styles.remove(style)
1215
1216                elif style.ttype is TokenType.COLOR:
1217                    assert isinstance(style.data, Color)
1218                    if target_name == "fg" and not style.data.background:
1219                        styles.remove(style)
1220
1221                    elif target_name == "bg" and style.data.background:
1222                        styles.remove(style)
1223
1224            return styles
1225
1226        styles: list[Token] = []
1227        for token in self.tokenize_ansi(text):
1228            if token.ttype is TokenType.COLOR:
1229                for i, style in enumerate(reversed(styles)):
1230                    if style.ttype is TokenType.COLOR:
1231                        assert isinstance(style.data, Color)
1232                        assert isinstance(token.data, Color)
1233
1234                        if style.data.background != token.data.background:
1235                            continue
1236
1237                        styles[len(styles) - i - 1] = token
1238                        break
1239                else:
1240                    styles.append(token)
1241
1242                continue
1243
1244            if token.ttype is TokenType.LINK:
1245                styles.append(token)
1246                yield StyledText(_apply_styles(styles, token.name))
1247
1248            if token.ttype is TokenType.PLAIN:
1249                assert isinstance(token.data, str)
1250                yield StyledText(_apply_styles(styles, token.data))
1251                continue
1252
1253            if token.ttype is TokenType.UNSETTER:
1254                styles = _pop_unsetter(token, styles)
1255                continue
1256
1257            styles.append(token)

A class representing an instance of a Markup Language.

This class is used for all markup/ANSI parsing, tokenizing and usage.

from pytermgui import tim

tim.alias("my-tag", "@152 72 bold")
tim.print("This is [my-tag]my-tag[/]!")

#   MarkupLanguage(default_macros: bool = True)
View Source
550    def __init__(self, default_macros: bool = True) -> None:
551        """Initializes a MarkupLanguage.
552
553        Args:
554            default_macros: If not set, the builtin macros are not defined.
555        """
556
557        self.tags: dict[str, str] = STYLE_MAP.copy()
558        self._cache: dict[str, StyledText] = {}
559        self.macros: dict[str, MacroCallable] = {}
560        self.user_tags: dict[str, str] = {}
561        self.unsetters: dict[str, str | None] = UNSETTER_MAP.copy()
562
563        self.should_cache: bool = True
564
565        if default_macros:
566            self.define("!link", macro_link)
567            self.define("!align", macro_align)
568            self.define("!markup", self.get_markup)
569            self.define("!shuffle", macro_shuffle)
570            self.define("!strip_bg", macro_strip_bg)
571            self.define("!strip_fg", macro_strip_fg)
572            self.define("!rainbow", macro_rainbow)
573            self.define("!gradient", macro_gradient)
574            self.define("!upper", lambda item: str(item.upper()))
575            self.define("!lower", lambda item: str(item.lower()))
576            self.define("!title", lambda item: str(item.title()))
577            self.define("!capitalize", lambda item: str(item.capitalize()))
578            self.define("!expand", lambda tag: macro_expand(self, tag))
579            self.define("!debug", lambda *args: ",".join(ascii(arg) for arg in args))
580
581        self.alias("code", "dim @black")
582        self.alias("code.str", "142")
583        self.alias("code.none", "167")
584        self.alias("code.global", "214")
585        self.alias("code.number", "175")
586        self.alias("code.keyword", "203")
587        self.alias("code.identifier", "109")
588        self.alias("code.name", "code.global")
589        self.alias("code.comment", "240 italic")
590        self.alias("code.builtin", "code.global")
591        self.alias("code.file", "code.identifier")
592        self.alias("code.symbol", "code.identifier")

Initializes a MarkupLanguage.

Args
  • default_macros: If not set, the builtin macros are not defined.
#   raise_unknown_markup: bool = False

Raise pytermgui.exceptions.MarkupSyntaxError when encountering unknown markup tags.

#   def print(self, *args, **kwargs) -> None:
View Source
633    def print(self, *args, **kwargs) -> None:
634        """Parse all arguments and pass them through to print, along with kwargs."""
635
636        parsed = []
637        for arg in args:
638            parsed.append(self.parse(str(arg)))
639
640        get_terminal().print(*parsed, **kwargs)

Parse all arguments and pass them through to print, along with kwargs.

#   def tokenize_markup(self, markup_text: str) -> Iterator[pytermgui.parser.Token]:
View Source
642    def tokenize_markup(self, markup_text: str) -> Iterator[Token]:
643        """Converts the given markup string into an iterator of `Token`.
644
645        Args:
646            markup_text: The text to look at.
647
648        Returns:
649            An iterator of tokens. The reason this is an iterator is to possibly save
650            on memory.
651        """
652
653        end = 0
654        start = 0
655        cursor = 0
656        for match in RE_MARKUP.finditer(markup_text):
657            full, escapes, tag_text = match.groups()
658            start, end = match.span()
659
660            # Add plain text between last and current match
661            if start > cursor:
662                yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:start])
663
664            if not escapes == "" and len(escapes) % 2 == 1:
665                cursor = end
666                yield Token(ttype=TokenType.ESCAPED, data=full[len(escapes) - 1 :])
667                continue
668
669            for tag in tag_text.split():
670                token = self._get_style_token(tag)
671                if token is not None:
672                    yield token
673                    continue
674
675                # Try to find a color token
676                token = self._get_color_token(tag)
677                if token is not None:
678                    yield token
679                    continue
680
681                macro_match = RE_MACRO.match(tag)
682                if macro_match is not None:
683                    name, args = macro_match.groups()
684                    macro_args = () if args is None else args.split(":")
685
686                    if not name in self.macros:
687                        raise MarkupSyntaxError(
688                            tag=tag,
689                            cause="is not a defined macro",
690                            context=markup_text,
691                        )
692
693                    yield Token(
694                        name=tag,
695                        ttype=TokenType.MACRO,
696                        data=(self.macros[name], macro_args),
697                    )
698                    continue
699
700                if self.raise_unknown_markup:
701                    raise MarkupSyntaxError(
702                        tag=tag, cause="not defined", context=markup_text
703                    )
704
705            cursor = end
706
707        # Add remaining text as plain
708        if len(markup_text) > cursor:
709            yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:])

Converts the given markup string into an iterator of Token.

Args
  • markup_text: The text to look at.
Returns

An iterator of tokens. The reason this is an iterator is to possibly save on memory.

#   def tokenize_ansi(self, ansi: str) -> Iterator[pytermgui.parser.Token]:
View Source
711    def tokenize_ansi(self, ansi: str) -> Iterator[Token]:
712        """Converts the given ANSI string into an iterator of `Token`.
713
714        Args:
715            ansi: The text to look at.
716
717        Returns:
718            An iterator of tokens. The reason this is an iterator is to possibly save
719            on memory.
720        """
721
722        def _is_in_tags(code: str, tags: dict[str, str]) -> str | None:
723            """Determines whether a code is in the given dict of tags."""
724
725            for name, current in tags.items():
726                if current == code:
727                    return name
728
729            return None
730
731        end = 0
732        start = 0
733        cursor = 0
734
735        # StyledText messes with indexing, so we need to cast it
736        # back to str.
737        if isinstance(ansi, StyledText):
738            ansi = str(ansi)
739
740        for match in RE_ANSI.finditer(ansi):
741            code = match.groups()[0]
742            start, end = match.span()
743
744            if code is None:
745                continue
746
747            parts = code.split(";")
748
749            if start > cursor:
750                plain = ansi[cursor:start]
751
752                yield Token(name=plain, ttype=TokenType.PLAIN, data=plain)
753
754            name: str | None = code
755            ttype = None
756            data: str | Color = parts[0]
757
758            # Styles & Unsetters
759            if len(parts) == 1:
760                # Covariancy is not an issue here, even though mypy seems to think so.
761                name = _is_in_tags(parts[0], self.unsetters)  # type: ignore
762                if name is not None:
763                    ttype = TokenType.UNSETTER
764
765                else:
766                    name = _is_in_tags(parts[0], self.tags)
767                    if name is not None:
768                        ttype = TokenType.STYLE
769
770            # Colors
771            if ttype is None:
772                with suppress(ColorSyntaxError):
773                    data = str_to_color(code)
774                    name = data.name
775                    ttype = TokenType.COLOR
776
777            if name is None or ttype is None or data is None:
778                if len(parts) != 2:
779                    raise AnsiSyntaxError(
780                        tag=parts[0], cause="not recognized", context=ansi
781                    )
782
783                name = "position"
784                ttype = TokenType.POSITION
785                data = ",".join(reversed(parts))
786
787            yield Token(name=name, ttype=ttype, data=data)
788            cursor = end
789
790        if cursor < len(ansi):
791            plain = ansi[cursor:]
792
793            yield Token(ttype=TokenType.PLAIN, data=plain)

Converts the given ANSI string into an iterator of Token.

Args
  • ansi: The text to look at.
Returns

An iterator of tokens. The reason this is an iterator is to possibly save on memory.

#   def define(self, name: str, method: Callable[..., str]) -> None:
View Source
795    def define(self, name: str, method: MacroCallable) -> None:
796        """Defines a Macro tag that executes the given method.
797
798        Args:
799            name: The name the given method will be reachable by within markup.
800                The given value gets "!" prepended if it isn't present already.
801            method: The method this macro will execute.
802        """
803
804        if not name.startswith("!"):
805            name = f"!{name}"
806
807        self.macros[name] = method
808        self.unsetters[f"/{name}"] = None

Defines a Macro tag that executes the given method.

Args
  • name: The name the given method will be reachable by within markup. The given value gets "!" prepended if it isn't present already.
  • method: The method this macro will execute.
#   def alias(self, name: str, value: str) -> None:
View Source
810    def alias(self, name: str, value: str) -> None:
811        """Aliases the given name to a value, and generates an unsetter for it.
812
813        Note that it is not possible to alias macros.
814
815        Args:
816            name: The name of the new tag.
817            value: The value the new tag will stand for.
818        """
819
820        def _get_unsetter(token: Token) -> str | None:
821            """Get unsetter for a token"""
822
823            if token.ttype is TokenType.PLAIN:
824                return None
825
826            if token.ttype is TokenType.UNSETTER:
827                return self.unsetters[token.name]
828
829            if token.ttype is TokenType.COLOR:
830                assert isinstance(token.data, Color)
831
832                if token.data.background:
833                    return self.unsetters["/bg"]
834
835                return self.unsetters["/fg"]
836
837            name = f"/{token.name}"
838            if not name in self.unsetters:
839                raise KeyError(f"Could not find unsetter for token {token}.")
840
841            return self.unsetters[name]
842
843        if name.startswith("!"):
844            raise ValueError('Only macro tags can always start with "!".')
845
846        setter = ""
847        unsetter = ""
848
849        # Try to link to existing tag
850        if value in self.user_tags:
851            self.unsetters[f"/{name}"] = self.unsetters[f"/{value}"]
852            self.user_tags[name] = self.user_tags[value]
853            return
854
855        for token in self.tokenize_markup(f"[{value}]"):
856            if token.ttype is TokenType.PLAIN:
857                continue
858
859            assert token.sequence is not None
860            setter += token.sequence
861
862            t_unsetter = _get_unsetter(token)
863            unsetter += f"\x1b[{t_unsetter}m"
864
865        self.unsetters[f"/{name}"] = unsetter.lstrip("\x1b[").rstrip("m")
866        self.user_tags[name] = setter.lstrip("\x1b[").rstrip("m")
867
868        marked: list[str] = []
869        for item in self._cache:
870            if name in item:
871                marked.append(item)
872
873        for item in marked:
874            del self._cache[item]

Aliases the given name to a value, and generates an unsetter for it.

Note that it is not possible to alias macros.

Args
  • name: The name of the new tag.
  • value: The value the new tag will stand for.
#   def parse(self, markup_text: str) -> pytermgui.parser.StyledText:
View Source
878    def parse(  # pylint: disable=too-many-branches
879        self, markup_text: str
880    ) -> StyledText:
881        """Parses the given markup.
882
883        Args:
884            markup_text: The markup to parse.
885
886        Returns:
887            A `StyledText` instance of the result of parsing the input. This
888            custom `str` class is used to allow accessing the plain value of
889            the output, as well as to cleanly index within it. It is analogous
890            to builtin `str`, only adds extra things on top.
891        """
892
893        applied_macros: list[tuple[str, MacroCall]] = []
894        previous_token: Token | None = None
895        previous_sequence = ""
896        sequence = ""
897        out = ""
898
899        def _apply_macros(text: str) -> str:
900            """Apply current macros to text"""
901
902            for _, (method, args) in applied_macros:
903                text = method(*args, text)
904
905            return text
906
907        def _is_same_colorgroup(previous: Token, new: Token) -> bool:
908            if not isinstance(new.data, Color) or not isinstance(previous.data, Color):
909                return False
910
911            return previous.data.background == new.data.background and type(
912                previous
913            ) is type(new)
914
915        if (
916            self.should_cache
917            and markup_text in self._cache
918            and len(RE_MACRO.findall(markup_text)) == 0
919        ):
920            return self._cache[markup_text]
921
922        token: Token
923        for token in self.tokenize_markup(markup_text):
924            if sequence != "" and previous_token == token:
925                continue
926
927            # Optimize out previously added color tokens, as only the most
928            # recent would be visible anyways.
929            if (
930                token.sequence is not None
931                and previous_token is not None
932                and _is_same_colorgroup(previous_token, token)
933            ):
934                sequence = token.sequence
935                continue
936
937            if token.ttype == TokenType.UNSETTER and token.data == "0":
938                out += "\033[0m"
939                sequence = ""
940                applied_macros = []
941                continue
942
943            previous_token = token
944
945            # Macro unsetters are stored with None as their data
946            if token.data is None and token.ttype is TokenType.UNSETTER:
947                for item, data in applied_macros.copy():
948                    macro_match = RE_MACRO.match(item)
949                    assert macro_match is not None
950
951                    macro_name = macro_match.groups()[0]
952
953                    if f"/{macro_name}" == token.name:
954                        applied_macros.remove((item, data))
955
956                continue
957
958            if token.ttype is TokenType.MACRO:
959                assert isinstance(token.data, tuple)
960
961                applied_macros.append((token.name, token.data))
962                continue
963
964            if token.sequence is None:
965                applied = sequence
966                for item in previous_sequence.split("\x1b"):
967                    if item == "" or item[1:-1] in self.unsetters.values():
968                        continue
969
970                    item = f"\x1b{item}"
971                    applied = applied.replace(item, "")
972
973                out += applied + _apply_macros(token.name)
974                previous_sequence = sequence
975                sequence = ""
976                continue
977
978            sequence += token.sequence
979
980        if sequence + previous_sequence != "":
981            out += "\x1b[0m"
982
983        out = StyledText(out)
984        self._cache[markup_text] = out
985        return out

Parses the given markup.

Args
  • markup_text: The markup to parse.
Returns

A StyledText instance of the result of parsing the input. This custom str class is used to allow accessing the plain value of the output, as well as to cleanly index within it. It is analogous to builtin str, only adds extra things on top.

#   def get_markup(self, ansi: str) -> str:
View Source
 987    def get_markup(self, ansi: str) -> str:
 988        """Generates markup from ANSI text.
 989
 990        Args:
 991            ansi: The text to get markup from.
 992
 993        Returns:
 994            A markup string that can be parsed to get (visually) the same
 995            result. Note that this conversion is lossy in a way: there are some
 996            details (like macros) that cannot be preserved in an ANSI->Markup->ANSI
 997            conversion.
 998        """
 999
1000        current_tags: list[str] = []
1001        out = ""
1002        for token in self.tokenize_ansi(ansi):
1003            if token.ttype is TokenType.PLAIN:
1004                if len(current_tags) != 0:
1005                    out += "[" + " ".join(current_tags) + "]"
1006
1007                assert isinstance(token.data, str)
1008                out += token.data
1009                current_tags = []
1010                continue
1011
1012            if token.ttype is TokenType.ESCAPED:
1013                assert isinstance(token.data, str)
1014
1015                current_tags.append(token.data)
1016                continue
1017
1018            current_tags.append(token.name)
1019
1020        return out

Generates markup from ANSI text.

Args
  • ansi: The text to get markup from.
Returns

A markup string that can be parsed to get (visually) the same result. Note that this conversion is lossy in a way: there are some details (like macros) that cannot be preserved in an ANSI->Markup->ANSI conversion.

#   def prettify_ansi(self, text: str) -> str:
View Source
1022    def prettify_ansi(self, text: str) -> str:
1023        """Returns a prettified (syntax-highlighted) ANSI str.
1024
1025        This is useful to quickly "inspect" a given ANSI string. However,
1026        for most real uses `MarkupLanguage.prettify_markup` would be
1027        preferable, given an argument of `MarkupLanguage.get_markup(text)`,
1028        as it is much more verbose.
1029
1030        Args:
1031            text: The ANSI-text to prettify.
1032
1033        Returns:
1034            The prettified ANSI text. This text's styles remain valid,
1035            so copy-pasting the argument into a command (like printf)
1036            that can show styled text will work the same way.
1037        """
1038
1039        out = ""
1040        sequences = ""
1041        for token in self.tokenize_ansi(text):
1042            if token.ttype is TokenType.PLAIN:
1043                assert isinstance(token.data, str)
1044                out += token.data
1045                continue
1046
1047            assert token.sequence is not None
1048            out += "\x1b[0m" + token.sequence + token.sequence.replace("\x1b", "\\x1b")
1049            sequences += token.sequence
1050            out += sequences
1051
1052        return out

Returns a prettified (syntax-highlighted) ANSI str.

This is useful to quickly "inspect" a given ANSI string. However, for most real uses MarkupLanguage.prettify_markup would be preferable, given an argument of MarkupLanguage.get_markup(text), as it is much more verbose.

Args
  • text: The ANSI-text to prettify.
Returns

The prettified ANSI text. This text's styles remain valid, so copy-pasting the argument into a command (like printf) that can show styled text will work the same way.

#   def prettify_markup(self, text: str) -> str:
View Source
1054    def prettify_markup(self, text: str) -> str:
1055        """Returns a prettified (syntax-highlighted) markup str.
1056
1057        Args:
1058            text: The markup-text to prettify.
1059
1060        Returns:
1061            Prettified markup. This markup, excluding its styles,
1062            remains valid markup.
1063        """
1064
1065        def _apply_macros(text: str) -> str:
1066            """Apply current macros to text"""
1067
1068            for _, (method, args) in applied_macros:
1069                text = method(*args, text)
1070
1071            return text
1072
1073        def _pop_macro(name: str) -> None:
1074            """Pops a macro from applied_macros."""
1075
1076            for i, (macro_name, _) in enumerate(applied_macros):
1077                if macro_name == name:
1078                    applied_macros.pop(i)
1079                    break
1080
1081        def _finish(out: str, in_sequence: bool) -> str:
1082            """Adds ending cap to the given string."""
1083
1084            if in_sequence:
1085                if not out.endswith("\x1b[0m"):
1086                    out += "\x1b[0m"
1087
1088                return out + "]"
1089
1090            return out + "[/]"
1091
1092        styles: dict[TokenType, str] = {
1093            TokenType.MACRO: "210",
1094            TokenType.ESCAPED: "210 bold",
1095            TokenType.UNSETTER: "strikethrough",
1096        }
1097
1098        applied_macros: list[tuple[str, MacroCall]] = []
1099
1100        out = ""
1101        in_sequence = False
1102        current_styles: list[Token] = []
1103
1104        for token in self.tokenize_markup(text):
1105            if token.ttype in [TokenType.PLAIN, TokenType.ESCAPED]:
1106                if in_sequence:
1107                    out += "]"
1108
1109                in_sequence = False
1110
1111                sequence = ""
1112                for style in current_styles:
1113                    if style.sequence is None:
1114                        continue
1115
1116                    sequence += style.sequence
1117
1118                out += f"{sequence}{_apply_macros(token.name)}\033[0m"
1119                continue
1120
1121            out += " " if in_sequence else "["
1122            in_sequence = True
1123
1124            if token.ttype is TokenType.UNSETTER:
1125                if token.name == "/":
1126                    applied_macros = []
1127
1128                name = token.name[1:]
1129
1130                if name in self.macros:
1131                    _pop_macro(name)
1132
1133                current_styles.append(token)
1134
1135                out += self.parse(
1136                    ("" if (name in self.tags) or (name in self.user_tags) else "")
1137                    + f"[{styles[TokenType.UNSETTER]}]/{name}"
1138                )
1139                continue
1140
1141            if token.ttype is TokenType.MACRO:
1142                assert isinstance(token.data, tuple)
1143
1144                name = token.name
1145                if "(" in name:
1146                    name = name[: token.name.index("(")]
1147
1148                applied_macros.append((name, token.data))
1149
1150                try:
1151                    out += token.data[0](*token.data[1], token.name)
1152                    continue
1153
1154                except TypeError:  # Not enough arguments
1155                    pass
1156
1157            if token.sequence is not None:
1158                current_styles.append(token)
1159
1160            style_markup = styles.get(token.ttype) or token.name
1161            out += self.parse(f"[{style_markup}]{token.name}")
1162
1163        return _finish(out, in_sequence)

Returns a prettified (syntax-highlighted) markup str.

Args
  • text: The markup-text to prettify.
Returns

Prettified markup. This markup, excluding its styles, remains valid markup.

#   def get_styled_plains(self, text: str) -> Iterator[pytermgui.parser.StyledText]:
View Source
1165    def get_styled_plains(self, text: str) -> Iterator[StyledText]:
1166        """Gets all plain tokens within text, with their respective styles applied.
1167
1168        Args:
1169            text: The ANSI-sequence containing string to find plains from.
1170
1171        Returns:
1172            An iterator of `StyledText` objects, each yielded when a new plain token is found,
1173            containing the styles that are relevant and active on the given plain.
1174        """
1175
1176        def _apply_styles(styles: list[Token], text: str) -> str:
1177            """Applies given styles to text."""
1178
1179            for token in styles:
1180                if token.ttype is TokenType.MACRO:
1181                    assert isinstance(token.data, tuple)
1182                    text = token.data[0](*token.data[1], text)
1183                    continue
1184
1185                if token.sequence is None:
1186                    continue
1187
1188                text = token.sequence + text
1189
1190            return text
1191
1192        def _pop_unsetter(token: Token, styles: list[Token]) -> list[Token]:
1193            """Removes an unsetter from the list, returns the new list."""
1194
1195            if token.name == "/":
1196                return []
1197
1198            target_name = token.name[1:]
1199            for style in styles:
1200                # bold & dim unsetters represent the same character, so we have
1201                # to treat them the same way.
1202                style_name = style.name
1203
1204                if style.name == "dim":
1205                    style_name = "bold"
1206
1207                if style_name == target_name:
1208                    styles.remove(style)
1209
1210                elif (
1211                    style_name.startswith(target_name)
1212                    and style.ttype is TokenType.MACRO
1213                ):
1214                    styles.remove(style)
1215
1216                elif style.ttype is TokenType.COLOR:
1217                    assert isinstance(style.data, Color)
1218                    if target_name == "fg" and not style.data.background:
1219                        styles.remove(style)
1220
1221                    elif target_name == "bg" and style.data.background:
1222                        styles.remove(style)
1223
1224            return styles
1225
1226        styles: list[Token] = []
1227        for token in self.tokenize_ansi(text):
1228            if token.ttype is TokenType.COLOR:
1229                for i, style in enumerate(reversed(styles)):
1230                    if style.ttype is TokenType.COLOR:
1231                        assert isinstance(style.data, Color)
1232                        assert isinstance(token.data, Color)
1233
1234                        if style.data.background != token.data.background:
1235                            continue
1236
1237                        styles[len(styles) - i - 1] = token
1238                        break
1239                else:
1240                    styles.append(token)
1241
1242                continue
1243
1244            if token.ttype is TokenType.LINK:
1245                styles.append(token)
1246                yield StyledText(_apply_styles(styles, token.name))
1247
1248            if token.ttype is TokenType.PLAIN:
1249                assert isinstance(token.data, str)
1250                yield StyledText(_apply_styles(styles, token.data))
1251                continue
1252
1253            if token.ttype is TokenType.UNSETTER:
1254                styles = _pop_unsetter(token, styles)
1255                continue
1256
1257            styles.append(token)

Gets all plain tokens within text, with their respective styles applied.

Args
  • text: The ANSI-sequence containing string to find plains from.
Returns

An iterator of StyledText objects, each yielded when a new plain token is found, containing the styles that are relevant and active on the given plain.