pytermgui.parser
This module provides TIM
, PyTermGUI's Terminal Inline Markup language. It is a simple,
performant and easy to read way to style, colorize & modify text.
Basic rundown
TIM is included with the purpose of making styling easier to read and manage.
Its syntax is based on square brackets, within which tags are strictly separated by one space character. Tags can stand for colors (xterm-256, RGB or HEX, both background & foreground), styles, unsetters and macros.
The 16 simple colors of the terminal exist as named tags that refer to their numerical value.
Here is a simple example of the syntax, using the pytermgui.pretty
submodule to
syntax-highlight it inside the REPL:
>>> from pytermgui import pretty
>>> '[141 @61 bold] Hello [!upper inverse] There '
General syntax
Background colors are always denoted by a leading @
character in front of the color
tag. Styles are just the name of the style and macros have an exclamation mark in front
of them. Additionally, unsetters use a leading slash (/
) for their syntax. Color
tokens have special unsetters: they use /fg
to cancel foreground colors, and /bg
to
do so with backgrounds.
Macros:
Macros are any type of callable that take at least *args; this is the value of the plain text enclosed by the tag group within which the given macro resides. Additionally, macros can be given any number of positional arguments from within markup, using the syntax:
[!macro(arg1:arg2:arg3)]Text that the macro applies to.[/!macro]plain text, no macro
This syntax gets parsed as follows:
macro("Text that the macro applies to.", "arg1", "arg2", "arg3")
macro
here is whatever the name macro
was defined as prior.
Colors:
Colors can be of three general types: xterm-256, RGB and HEX.
xterm-256
stands for one of the 256 xterm colors. You can use ptg -c
to see the all
of the available colors. Its syntax is just the 0-base index of the color, like [141]
RGB
colors are pretty self explanatory. Their syntax is follows the format
RED;GREEN;BLUE
, such as [111;222;333]
.
HEX
colors are basically just RGB with extra steps. Their syntax is #RRGGBB
, such as
[#FA72BF]
. This code then gets converted to a tuple of RGB colors under the hood, so
from then on RGB and HEX colors are treated the same, and emit the same tokens.
As mentioned above, all colors can be made to act on the background instead by
prepending the color tag with @
, such as @141
, @111;222;333
or @#FA72BF
. To
clear these effects, use /fg
for foreground and /bg
for background colors.
MarkupLanguage
and instancing
All markup behaviour is done by an instance of the MarkupLanguage
class. This is done
partially for organization reasons, but also to allow a sort of sandboxing of custom
definitions and settings.
PyTermGUI provides the tim
name as the global markup language instance. For historical
reasons, the same instance is available as markup
. This should be used pretty much all
of the time, and custom instances should only ever come about when some
security-sensitive macro definitions are needed, as markup
is used by every widget,
including user-input ones such as InputField
.
For the rest of this page, MarkupLanguage
will refer to whichever instance you are
using.
TL;DR : Use tim
always, unless a security concern blocks you from doing so.
Caching
By default, all markup parse results are cached and returned when the same input is
given. To disable this behaviour, set your markup instance (usually markup
)'s
should_cache
field to False.
Customization
There are a couple of ways to customize how markup is parsed. Custom tags can be created
by calling MarkupLanguage.alias
. For defining custom macros, you can use
MarkupLanguage.define
. For more information, see each method's documentation.
1""" 2This module provides `TIM`, PyTermGUI's Terminal Inline Markup language. It is a simple, 3performant and easy to read way to style, colorize & modify text. 4 5Basic rundown 6------------- 7 8TIM is included with the purpose of making styling easier to read and manage. 9 10Its syntax is based on square brackets, within which tags are strictly separated by one 11space character. Tags can stand for colors (xterm-256, RGB or HEX, both background & 12foreground), styles, unsetters and macros. 13 14The 16 simple colors of the terminal exist as named tags that refer to their numerical 15value. 16 17Here is a simple example of the syntax, using the `pytermgui.pretty` submodule to 18syntax-highlight it inside the REPL: 19 20```python3 21>>> from pytermgui import pretty 22>>> '[141 @61 bold] Hello [!upper inverse] There ' 23``` 24 25<p align=center> 26<img src="https://github.com/bczsalba/pytermgui/blob/master/assets/docs/parser/\ 27simple_example.png?raw=true" width=70%> 28</p> 29 30 31General syntax 32-------------- 33 34Background colors are always denoted by a leading `@` character in front of the color 35tag. Styles are just the name of the style and macros have an exclamation mark in front 36of them. Additionally, unsetters use a leading slash (`/`) for their syntax. Color 37tokens have special unsetters: they use `/fg` to cancel foreground colors, and `/bg` to 38do so with backgrounds. 39 40### Macros: 41 42Macros are any type of callable that take at least *args; this is the value of the plain 43text enclosed by the tag group within which the given macro resides. Additionally, 44macros can be given any number of positional arguments from within markup, using the 45syntax: 46 47``` 48[!macro(arg1:arg2:arg3)]Text that the macro applies to.[/!macro]plain text, no macro 49``` 50 51This syntax gets parsed as follows: 52 53```python3 54macro("Text that the macro applies to.", "arg1", "arg2", "arg3") 55``` 56 57`macro` here is whatever the name `macro` was defined as prior. 58 59### Colors: 60 61Colors can be of three general types: xterm-256, RGB and HEX. 62 63`xterm-256` stands for one of the 256 xterm colors. You can use `ptg -c` to see the all 64of the available colors. Its syntax is just the 0-base index of the color, like `[141]` 65 66`RGB` colors are pretty self explanatory. Their syntax is follows the format 67`RED;GREEN;BLUE`, such as `[111;222;333]`. 68 69`HEX` colors are basically just RGB with extra steps. Their syntax is `#RRGGBB`, such as 70`[#FA72BF]`. This code then gets converted to a tuple of RGB colors under the hood, so 71from then on RGB and HEX colors are treated the same, and emit the same tokens. 72 73As mentioned above, all colors can be made to act on the background instead by 74prepending the color tag with `@`, such as `@141`, `@111;222;333` or `@#FA72BF`. To 75clear these effects, use `/fg` for foreground and `/bg` for background colors. 76 77`MarkupLanguage` and instancing 78------------------------------- 79 80All markup behaviour is done by an instance of the `MarkupLanguage` class. This is done 81partially for organization reasons, but also to allow a sort of sandboxing of custom 82definitions and settings. 83 84PyTermGUI provides the `tim` name as the global markup language instance. For historical 85reasons, the same instance is available as `markup`. This should be used pretty much all 86of the time, and custom instances should only ever come about when some 87security-sensitive macro definitions are needed, as `markup` is used by every widget, 88including user-input ones such as `InputField`. 89 90For the rest of this page, `MarkupLanguage` will refer to whichever instance you are 91using. 92 93TL;DR : Use `tim` always, unless a security concern blocks you from doing so. 94 95Caching 96------- 97 98By default, all markup parse results are cached and returned when the same input is 99given. To disable this behaviour, set your markup instance (usually `markup`)'s 100`should_cache` field to False. 101 102Customization 103------------- 104 105There are a couple of ways to customize how markup is parsed. Custom tags can be created 106by calling `MarkupLanguage.alias`. For defining custom macros, you can use 107`MarkupLanguage.define`. For more information, see each method's documentation. 108""" 109# pylint: disable=too-many-lines 110 111from __future__ import annotations 112 113from random import shuffle 114from contextlib import suppress 115from dataclasses import dataclass 116from argparse import ArgumentParser 117from enum import Enum, auto as _auto 118from typing import Iterator, Callable, Tuple, List 119 120from .terminal import get_terminal 121from .colors import str_to_color, Color, StandardColor 122from .regex import RE_ANSI, RE_MARKUP, RE_MACRO, RE_LINK 123from .exceptions import MarkupSyntaxError, ColorSyntaxError, AnsiSyntaxError 124 125 126__all__ = [ 127 "StyledText", 128 "MacroCallable", 129 "MacroCall", 130 "MarkupLanguage", 131 "markup", 132 "tim", 133] 134 135MacroCallable = Callable[..., str] 136MacroCall = Tuple[MacroCallable, List[str]] 137 138STYLE_MAP = { 139 "bold": "1", 140 "dim": "2", 141 "italic": "3", 142 "underline": "4", 143 "blink": "5", 144 "blink2": "6", 145 "inverse": "7", 146 "invisible": "8", 147 "strikethrough": "9", 148 "overline": "53", 149} 150 151UNSETTER_MAP: dict[str, str | None] = { 152 "/": "0", 153 "/bold": "22", 154 "/dim": "22", 155 "/italic": "23", 156 "/underline": "24", 157 "/blink": "25", 158 "/blink2": "26", 159 "/inverse": "27", 160 "/invisible": "28", 161 "/strikethrough": "29", 162 "/fg": "39", 163 "/bg": "49", 164 "/overline": "54", 165} 166 167 168def macro_align(width: str, alignment: str, content: str) -> str: 169 """Aligns given text using fstrings. 170 171 Args: 172 width: The width to align to. 173 alignment: One of "left", "center", "right". 174 content: The content to align; implicit argument. 175 """ 176 177 aligner = "<" if alignment == "left" else (">" if alignment == "right" else "^") 178 return f"{content:{aligner}{width}}" 179 180 181def macro_expand(lang: MarkupLanguage, tag: str) -> str: 182 """Expands a tag alias.""" 183 184 if not tag in lang.user_tags: 185 return tag 186 187 return lang.get_markup(f"\x1b[{lang.user_tags[tag]}m ")[:-1] 188 189 190def macro_strip_fg(item: str) -> str: 191 """Strips foreground color from item""" 192 193 return markup.parse(f"[/fg]{item}") 194 195 196def macro_strip_bg(item: str) -> str: 197 """Strips foreground color from item""" 198 199 return markup.parse(f"[/bg]{item}") 200 201 202def macro_shuffle(item: str) -> str: 203 """Shuffles a string using shuffle.shuffle on its list cast.""" 204 205 shuffled = list(item) 206 shuffle(shuffled) 207 208 return "".join(shuffled) 209 210 211def macro_link(*args) -> str: 212 """Creates a clickable hyperlink. 213 214 Note: 215 Since this is a pretty new feature for terminals, its support is limited. 216 """ 217 218 *uri_parts, label = args 219 uri = ":".join(uri_parts) 220 221 return f"\x1b]8;;{uri}\x1b\\{label}\x1b]8;;\x1b\\" 222 223 224def _apply_colors(colors: list[str] | list[int], item: str) -> str: 225 """Applies the given list of colors to the item, spread out evenly.""" 226 227 blocksize = max(round(len(item) / len(colors)), 1) 228 229 out = "" 230 current_block = 0 231 for i, char in enumerate(item): 232 if i % blocksize == 0 and current_block < len(colors): 233 out += f"[{colors[current_block]}]" 234 current_block += 1 235 236 out += char 237 238 return markup.parse(out) 239 240 241def macro_rainbow(item: str) -> str: 242 """Creates rainbow-colored text.""" 243 244 colors = ["red", "208", "yellow", "green", "brightblue", "blue", "93"] 245 246 return _apply_colors(colors, item) 247 248 249def macro_gradient(base_str: str, item: str) -> str: 250 """Creates an xterm-256 gradient from a base color. 251 252 This exploits the way the colors are arranged in the xterm color table; every 253 36th color is the next item of a single gradient. 254 255 The start of this given gradient is calculated by decreasing the given base by 36 on 256 every iteration as long as the point is a valid gradient start. 257 258 After that, the 6 colors of this gradient are calculated and applied. 259 """ 260 261 if not base_str.isdigit(): 262 raise ValueError(f"Gradient base has to be a digit, got {base_str}.") 263 264 base = int(base_str) 265 if base < 16 or base > 231: 266 raise ValueError("Gradient base must be between 16 and 232") 267 268 while base > 52: 269 base -= 36 270 271 colors = [] 272 for i in range(6): 273 colors.append(base + 36 * i) 274 275 return _apply_colors(colors, item) 276 277 278class TokenType(Enum): 279 """An Enum to store various token types.""" 280 281 LINK = _auto() 282 """A terminal hyperlink.""" 283 284 PLAIN = _auto() 285 """Plain text, nothing interesting.""" 286 287 COLOR = _auto() 288 """A color token. Has a `pytermgui.colors.Color` instance as its data.""" 289 290 STYLE = _auto() 291 """A builtin terminal style, such as `bold` or `italic`.""" 292 293 MACRO = _auto() 294 """A PTG markup macro. The macro itself is stored inside `self.data`.""" 295 296 ESCAPED = _auto() 297 """An escaped token.""" 298 299 UNSETTER = _auto() 300 """A token that unsets some other attribute.""" 301 302 POSITION = _auto() 303 """A token representing a positioning string. `self.data` follows the format `x,y`.""" 304 305 306@dataclass 307class Token: 308 """A class holding information on a singular markup or ANSI style unit. 309 310 Attributes: 311 """ 312 313 ttype: TokenType 314 """The type of this token.""" 315 316 data: str | MacroCall | Color | None 317 """The data contained within this token. This changes based on the `ttype` attr.""" 318 319 name: str = "<unnamed-token>" 320 """An optional display name of the token. Defaults to `data` when not given.""" 321 322 def __post_init__(self) -> None: 323 """Sets `name` to `data` if not provided.""" 324 325 if self.name == "<unnamed-token>": 326 if isinstance(self.data, str): 327 self.name = self.data 328 329 elif isinstance(self.data, Color): 330 self.name = self.data.name 331 332 else: 333 raise TypeError 334 335 # Create LINK from a plain token 336 if self.ttype is TokenType.PLAIN: 337 assert isinstance(self.data, str) 338 339 link_match = RE_LINK.match(self.data) 340 341 if link_match is not None: 342 self.data, self.name = link_match.groups() 343 self.ttype = TokenType.LINK 344 345 if self.ttype is TokenType.ESCAPED: 346 assert isinstance(self.data, str) 347 348 self.name = self.data[1:] 349 350 def __eq__(self, other: object) -> bool: 351 """Checks equality with `other`.""" 352 353 if other is None: 354 return False 355 356 if not isinstance(other, type(self)): 357 return False 358 359 return other.data == self.data and other.ttype is self.ttype 360 361 @property 362 def sequence(self) -> str | None: 363 """Returns the ANSI sequence this token represents.""" 364 365 if self.data is None: 366 return None 367 368 if self.ttype in [TokenType.PLAIN, TokenType.MACRO, TokenType.ESCAPED]: 369 return None 370 371 if self.ttype is TokenType.LINK: 372 return macro_link(self.data, self.name) 373 374 if self.ttype is TokenType.POSITION: 375 assert isinstance(self.data, str) 376 position = self.data.split(",") 377 return f"\x1b[{position[1]};{position[0]}H" 378 379 # Colors and styles 380 data = self.data 381 382 if self.ttype in [TokenType.STYLE, TokenType.UNSETTER]: 383 return f"\033[{data}m" 384 385 assert isinstance(data, Color) 386 return data.sequence 387 388 389class StyledText(str): 390 """A styled text object. 391 392 The purpose of this class is to implement some things regular `str` 393 breaks at when encountering ANSI sequences. 394 395 Instances of this class are usually spat out by `MarkupLanguage.parse`, 396 but may be manually constructed if the need arises. Everything works even 397 if there is no ANSI tomfoolery going on. 398 """ 399 400 value: str 401 """The underlying, ANSI-inclusive string value.""" 402 403 _plain: str | None = None 404 _tokens: list[Token] | None = None 405 406 def __new__(cls, value: str = ""): 407 """Creates a StyledText, gets markup tags.""" 408 409 obj = super().__new__(cls, value) 410 obj.value = value 411 412 return obj 413 414 def _generate_tokens(self) -> None: 415 """Generates self._tokens & self._plain.""" 416 417 self._tokens = list(tim.tokenize_ansi(self.value)) 418 419 self._plain = "" 420 for token in self._tokens: 421 if token.ttype is not TokenType.PLAIN: 422 continue 423 424 assert isinstance(token.data, str) 425 self._plain += token.data 426 427 @property 428 def tokens(self) -> list[Token]: 429 """Returns all markup tokens of this object. 430 431 Generated on-demand, at the first call to this or the self.plain 432 property. 433 """ 434 435 if self._tokens is not None: 436 return self._tokens 437 438 self._generate_tokens() 439 assert self._tokens is not None 440 return self._tokens 441 442 @property 443 def plain(self) -> str: 444 """Returns the value of this object, with no ANSI sequences. 445 446 Generated on-demand, at the first call to this or the self.tokens 447 property. 448 """ 449 450 if self._plain is not None: 451 return self._plain 452 453 self._generate_tokens() 454 assert self._plain is not None 455 return self._plain 456 457 def plain_index(self, index: int | None) -> int | None: 458 """Finds given index inside plain text.""" 459 460 if index is None: 461 return None 462 463 styled_chars = 0 464 plain_chars = 0 465 negative_index = False 466 467 tokens = self.tokens.copy() 468 if index < 0: 469 tokens.reverse() 470 index = abs(index) 471 negative_index = True 472 473 for token in tokens: 474 if token.data is None: 475 continue 476 477 if token.ttype is not TokenType.PLAIN: 478 assert token.sequence is not None 479 styled_chars += len(token.sequence) 480 continue 481 482 assert isinstance(token.data, str) 483 for _ in range(len(token.data)): 484 if plain_chars == index: 485 if negative_index: 486 return -1 * (plain_chars + styled_chars) 487 488 return styled_chars + plain_chars 489 490 plain_chars += 1 491 492 return None 493 494 def __len__(self) -> int: 495 """Gets "real" length of object.""" 496 497 return len(self.plain) 498 499 def __getitem__(self, subscript: int | slice) -> str: 500 """Gets an item, adjusted for non-plain text. 501 502 Args: 503 subscript: The integer or slice to find. 504 505 Returns: 506 The elements described by the subscript. 507 508 Raises: 509 IndexError: The given index is out of range. 510 """ 511 512 if isinstance(subscript, int): 513 plain_index = self.plain_index(subscript) 514 if plain_index is None: 515 raise IndexError("StyledText index out of range") 516 517 return self.value[plain_index] 518 519 return self.value[ 520 slice( 521 self.plain_index(subscript.start), 522 self.plain_index(subscript.stop), 523 subscript.step, 524 ) 525 ] 526 527 528class MarkupLanguage: 529 """A class representing an instance of a Markup Language. 530 531 This class is used for all markup/ANSI parsing, tokenizing and usage. 532 533 ```python3 534 from pytermgui import tim 535 536 tim.alias("my-tag", "@152 72 bold") 537 tim.print("This is [my-tag]my-tag[/]!") 538 ``` 539 540 <p style="text-align: center"> 541 <img src="https://raw.githubusercontent.com/bczsalba/pytermgui/master/assets/\ 542docs/parser/markup_language.png" 543 style="width: 80%"> 544 </p> 545 """ 546 547 raise_unknown_markup: bool = False 548 """Raise `pytermgui.exceptions.MarkupSyntaxError` when encountering unknown markup tags.""" 549 550 def __init__(self, default_macros: bool = True) -> None: 551 """Initializes a MarkupLanguage. 552 553 Args: 554 default_macros: If not set, the builtin macros are not defined. 555 """ 556 557 self.tags: dict[str, str] = STYLE_MAP.copy() 558 self._cache: dict[str, StyledText] = {} 559 self.macros: dict[str, MacroCallable] = {} 560 self.user_tags: dict[str, str] = {} 561 self.unsetters: dict[str, str | None] = UNSETTER_MAP.copy() 562 563 self.should_cache: bool = True 564 565 if default_macros: 566 self.define("!link", macro_link) 567 self.define("!align", macro_align) 568 self.define("!markup", self.get_markup) 569 self.define("!shuffle", macro_shuffle) 570 self.define("!strip_bg", macro_strip_bg) 571 self.define("!strip_fg", macro_strip_fg) 572 self.define("!rainbow", macro_rainbow) 573 self.define("!gradient", macro_gradient) 574 self.define("!upper", lambda item: str(item.upper())) 575 self.define("!lower", lambda item: str(item.lower())) 576 self.define("!title", lambda item: str(item.title())) 577 self.define("!capitalize", lambda item: str(item.capitalize())) 578 self.define("!expand", lambda tag: macro_expand(self, tag)) 579 self.define("!debug", lambda *args: ",".join(ascii(arg) for arg in args)) 580 581 self.alias("code", "dim @black") 582 self.alias("code.str", "142") 583 self.alias("code.multiline_str", "code.str") 584 self.alias("code.none", "167") 585 self.alias("code.global", "214") 586 self.alias("code.number", "175") 587 self.alias("code.keyword", "203") 588 self.alias("code.identifier", "109") 589 self.alias("code.name", "code.global") 590 self.alias("code.comment", "240 italic") 591 self.alias("code.builtin", "code.global") 592 self.alias("code.file", "code.identifier") 593 self.alias("code.symbol", "code.identifier") 594 595 def _get_color_token(self, tag: str) -> Token | None: 596 """Tries to get a color token from the given tag. 597 598 Args: 599 tag: The tag to parse. 600 601 Returns: 602 A color token if the given tag could be parsed into one, else None. 603 """ 604 605 try: 606 color = str_to_color(tag, use_cache=self.should_cache) 607 608 except ColorSyntaxError: 609 return None 610 611 return Token(name=color.value, ttype=TokenType.COLOR, data=color) 612 613 def _get_style_token(self, tag: str) -> Token | None: 614 """Tries to get a style (including unsetter) token from tags, user tags and unsetters. 615 616 Args: 617 tag: The tag to parse. 618 619 Returns: 620 A `Token` if one could be created, None otherwise. 621 """ 622 623 if tag in self.unsetters: 624 return Token(name=tag, ttype=TokenType.UNSETTER, data=self.unsetters[tag]) 625 626 if tag in self.user_tags: 627 return Token(name=tag, ttype=TokenType.STYLE, data=self.user_tags[tag]) 628 629 if tag in self.tags: 630 return Token(name=tag, ttype=TokenType.STYLE, data=self.tags[tag]) 631 632 return None 633 634 def print(self, *args, **kwargs) -> None: 635 """Parse all arguments and pass them through to print, along with kwargs.""" 636 637 parsed = [] 638 for arg in args: 639 parsed.append(self.parse(str(arg))) 640 641 get_terminal().print(*parsed, **kwargs) 642 643 def tokenize_markup(self, markup_text: str) -> Iterator[Token]: 644 """Converts the given markup string into an iterator of `Token`. 645 646 Args: 647 markup_text: The text to look at. 648 649 Returns: 650 An iterator of tokens. The reason this is an iterator is to possibly save 651 on memory. 652 """ 653 654 end = 0 655 start = 0 656 cursor = 0 657 for match in RE_MARKUP.finditer(markup_text): 658 full, escapes, tag_text = match.groups() 659 start, end = match.span() 660 661 # Add plain text between last and current match 662 if start > cursor: 663 yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:start]) 664 665 if not escapes == "" and len(escapes) % 2 == 1: 666 cursor = end 667 yield Token(ttype=TokenType.ESCAPED, data=full[len(escapes) - 1 :]) 668 continue 669 670 for tag in tag_text.split(): 671 token = self._get_style_token(tag) 672 if token is not None: 673 yield token 674 continue 675 676 # Try to find a color token 677 token = self._get_color_token(tag) 678 if token is not None: 679 yield token 680 continue 681 682 macro_match = RE_MACRO.match(tag) 683 if macro_match is not None: 684 name, args = macro_match.groups() 685 macro_args = () if args is None else args.split(":") 686 687 if not name in self.macros: 688 raise MarkupSyntaxError( 689 tag=tag, 690 cause="is not a defined macro", 691 context=markup_text, 692 ) 693 694 yield Token( 695 name=tag, 696 ttype=TokenType.MACRO, 697 data=(self.macros[name], macro_args), 698 ) 699 continue 700 701 if self.raise_unknown_markup: 702 raise MarkupSyntaxError( 703 tag=tag, cause="not defined", context=markup_text 704 ) 705 706 cursor = end 707 708 # Add remaining text as plain 709 if len(markup_text) > cursor: 710 yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:]) 711 712 def tokenize_ansi(self, ansi: str) -> Iterator[Token]: 713 """Converts the given ANSI string into an iterator of `Token`. 714 715 Args: 716 ansi: The text to look at. 717 718 Returns: 719 An iterator of tokens. The reason this is an iterator is to possibly save 720 on memory. 721 """ 722 723 def _is_in_tags(code: str, tags: dict[str, str]) -> str | None: 724 """Determines whether a code is in the given dict of tags.""" 725 726 for name, current in tags.items(): 727 if current == code: 728 return name 729 730 return None 731 732 def _generate_color( 733 parts: list[str], code: str 734 ) -> tuple[str, TokenType, Color]: 735 """Generates a color token.""" 736 737 data: Color 738 if len(parts) == 1: 739 data = StandardColor.from_ansi(code) 740 name = data.name 741 ttype = TokenType.COLOR 742 743 else: 744 data = str_to_color(code) 745 name = data.name 746 ttype = TokenType.COLOR 747 748 return name, ttype, data 749 750 end = 0 751 start = 0 752 cursor = 0 753 754 # StyledText messes with indexing, so we need to cast it 755 # back to str. 756 if isinstance(ansi, StyledText): 757 ansi = str(ansi) 758 759 for match in RE_ANSI.finditer(ansi): 760 code = match.groups()[0] 761 start, end = match.span() 762 763 if code is None: 764 continue 765 766 parts = code.split(";") 767 768 if start > cursor: 769 plain = ansi[cursor:start] 770 771 yield Token(name=plain, ttype=TokenType.PLAIN, data=plain) 772 773 name: str | None = code 774 ttype = None 775 data: str | Color = parts[0] 776 777 # Styles & Unsetters 778 if len(parts) == 1: 779 # Covariancy is not an issue here, even though mypy seems to think so. 780 name = _is_in_tags(parts[0], self.unsetters) # type: ignore 781 if name is not None: 782 ttype = TokenType.UNSETTER 783 784 else: 785 name = _is_in_tags(parts[0], self.tags) 786 if name is not None: 787 ttype = TokenType.STYLE 788 789 # Colors 790 if ttype is None: 791 with suppress(ColorSyntaxError): 792 name, ttype, data = _generate_color(parts, code) 793 794 if name is None or ttype is None or data is None: 795 if len(parts) != 2: 796 raise AnsiSyntaxError( 797 tag=parts[0], cause="not recognized", context=ansi 798 ) 799 800 name = "position" 801 ttype = TokenType.POSITION 802 data = ",".join(reversed(parts)) 803 804 yield Token(name=name, ttype=ttype, data=data) 805 cursor = end 806 807 if cursor < len(ansi): 808 plain = ansi[cursor:] 809 810 yield Token(ttype=TokenType.PLAIN, data=plain) 811 812 def define(self, name: str, method: MacroCallable) -> None: 813 """Defines a Macro tag that executes the given method. 814 815 Args: 816 name: The name the given method will be reachable by within markup. 817 The given value gets "!" prepended if it isn't present already. 818 method: The method this macro will execute. 819 """ 820 821 if not name.startswith("!"): 822 name = f"!{name}" 823 824 self.macros[name] = method 825 self.unsetters[f"/{name}"] = None 826 827 def alias(self, name: str, value: str) -> None: 828 """Aliases the given name to a value, and generates an unsetter for it. 829 830 Note that it is not possible to alias macros. 831 832 Args: 833 name: The name of the new tag. 834 value: The value the new tag will stand for. 835 """ 836 837 def _get_unsetter(token: Token) -> str | None: 838 """Get unsetter for a token""" 839 840 if token.ttype is TokenType.PLAIN: 841 return None 842 843 if token.ttype is TokenType.UNSETTER: 844 return self.unsetters[token.name] 845 846 if token.ttype is TokenType.COLOR: 847 assert isinstance(token.data, Color) 848 849 if token.data.background: 850 return self.unsetters["/bg"] 851 852 return self.unsetters["/fg"] 853 854 name = f"/{token.name}" 855 if not name in self.unsetters: 856 raise KeyError(f"Could not find unsetter for token {token}.") 857 858 return self.unsetters[name] 859 860 if name.startswith("!"): 861 raise ValueError('Only macro tags can always start with "!".') 862 863 setter = "" 864 unsetter = "" 865 866 # Try to link to existing tag 867 if value in self.user_tags: 868 self.unsetters[f"/{name}"] = self.unsetters[f"/{value}"] 869 self.user_tags[name] = self.user_tags[value] 870 return 871 872 for token in self.tokenize_markup(f"[{value}]"): 873 if token.ttype is TokenType.PLAIN: 874 continue 875 876 assert token.sequence is not None 877 setter += token.sequence 878 879 t_unsetter = _get_unsetter(token) 880 unsetter += f"\x1b[{t_unsetter}m" 881 882 self.unsetters[f"/{name}"] = unsetter.lstrip("\x1b[").rstrip("m") 883 self.user_tags[name] = setter.lstrip("\x1b[").rstrip("m") 884 885 marked: list[str] = [] 886 for item in self._cache: 887 if name in item: 888 marked.append(item) 889 890 for item in marked: 891 del self._cache[item] 892 893 # TODO: I cannot cut down the one-too-many branch that this has at the moment. 894 # We could look into it in the future, however. 895 def parse( # pylint: disable=too-many-branches 896 self, markup_text: str 897 ) -> StyledText: 898 """Parses the given markup. 899 900 Args: 901 markup_text: The markup to parse. 902 903 Returns: 904 A `StyledText` instance of the result of parsing the input. This 905 custom `str` class is used to allow accessing the plain value of 906 the output, as well as to cleanly index within it. It is analogous 907 to builtin `str`, only adds extra things on top. 908 """ 909 910 applied_macros: list[tuple[str, MacroCall]] = [] 911 previous_token: Token | None = None 912 previous_sequence = "" 913 sequence = "" 914 out = "" 915 916 def _apply_macros(text: str) -> str: 917 """Apply current macros to text""" 918 919 for _, (method, args) in applied_macros: 920 text = method(*args, text) 921 922 return text 923 924 def _is_same_colorgroup(previous: Token, new: Token) -> bool: 925 if not isinstance(new.data, Color) or not isinstance(previous.data, Color): 926 return False 927 928 return ( 929 type(previous) is type(new) 930 and previous.data.background == new.data.background 931 ) 932 933 if ( 934 self.should_cache 935 and markup_text in self._cache 936 and len(RE_MACRO.findall(markup_text)) == 0 937 ): 938 return self._cache[markup_text] 939 940 token: Token 941 for token in self.tokenize_markup(markup_text): 942 if sequence != "" and previous_token == token: 943 continue 944 945 # Optimize out previously added color tokens, as only the most 946 # recent would be visible anyways. 947 if ( 948 token.sequence is not None 949 and previous_token is not None 950 and _is_same_colorgroup(previous_token, token) 951 ): 952 sequence = token.sequence 953 continue 954 955 if token.ttype == TokenType.UNSETTER and token.data == "0": 956 out += "\033[0m" 957 sequence = "" 958 applied_macros = [] 959 continue 960 961 previous_token = token 962 963 # Macro unsetters are stored with None as their data 964 if token.data is None and token.ttype is TokenType.UNSETTER: 965 for item, data in applied_macros.copy(): 966 macro_match = RE_MACRO.match(item) 967 assert macro_match is not None 968 969 macro_name = macro_match.groups()[0] 970 971 if f"/{macro_name}" == token.name: 972 applied_macros.remove((item, data)) 973 974 continue 975 976 if token.ttype is TokenType.MACRO: 977 assert isinstance(token.data, tuple) 978 979 applied_macros.append((token.name, token.data)) 980 continue 981 982 if token.sequence is None: 983 applied = sequence 984 985 if not out.endswith("\x1b[0m"): 986 for item in previous_sequence.split("\x1b"): 987 if item == "" or item[1:-1] in self.unsetters.values(): 988 continue 989 990 item = f"\x1b{item}" 991 applied = applied.replace(item, "") 992 993 out += applied + _apply_macros(token.name) 994 previous_sequence = sequence 995 sequence = "" 996 continue 997 998 sequence += token.sequence 999 1000 if sequence + previous_sequence != "": 1001 out += "\x1b[0m" 1002 1003 out = StyledText(out) 1004 self._cache[markup_text] = out 1005 return out 1006 1007 def get_markup(self, ansi: str) -> str: 1008 """Generates markup from ANSI text. 1009 1010 Args: 1011 ansi: The text to get markup from. 1012 1013 Returns: 1014 A markup string that can be parsed to get (visually) the same 1015 result. Note that this conversion is lossy in a way: there are some 1016 details (like macros) that cannot be preserved in an ANSI->Markup->ANSI 1017 conversion. 1018 """ 1019 1020 current_tags: list[str] = [] 1021 out = "" 1022 for token in self.tokenize_ansi(ansi): 1023 if token.ttype is TokenType.PLAIN: 1024 if len(current_tags) != 0: 1025 out += "[" + " ".join(current_tags) + "]" 1026 1027 assert isinstance(token.data, str) 1028 out += token.data 1029 current_tags = [] 1030 continue 1031 1032 if token.ttype is TokenType.ESCAPED: 1033 assert isinstance(token.data, str) 1034 1035 current_tags.append(token.data) 1036 continue 1037 1038 current_tags.append(token.name) 1039 1040 return out 1041 1042 def prettify_ansi(self, text: str) -> str: 1043 """Returns a prettified (syntax-highlighted) ANSI str. 1044 1045 This is useful to quickly "inspect" a given ANSI string. However, 1046 for most real uses `MarkupLanguage.prettify_markup` would be 1047 preferable, given an argument of `MarkupLanguage.get_markup(text)`, 1048 as it is much more verbose. 1049 1050 Args: 1051 text: The ANSI-text to prettify. 1052 1053 Returns: 1054 The prettified ANSI text. This text's styles remain valid, 1055 so copy-pasting the argument into a command (like printf) 1056 that can show styled text will work the same way. 1057 """ 1058 1059 out = "" 1060 sequences = "" 1061 for token in self.tokenize_ansi(text): 1062 if token.ttype is TokenType.PLAIN: 1063 assert isinstance(token.data, str) 1064 out += token.data 1065 continue 1066 1067 assert token.sequence is not None 1068 out += "\x1b[0m" + token.sequence + token.sequence.replace("\x1b", "\\x1b") 1069 sequences += token.sequence 1070 out += sequences 1071 1072 return out 1073 1074 def prettify_markup(self, text: str) -> str: 1075 """Returns a prettified (syntax-highlighted) markup str. 1076 1077 Args: 1078 text: The markup-text to prettify. 1079 1080 Returns: 1081 Prettified markup. This markup, excluding its styles, 1082 remains valid markup. 1083 """ 1084 1085 def _apply_macros(text: str) -> str: 1086 """Apply current macros to text""" 1087 1088 for _, (method, args) in applied_macros: 1089 text = method(*args, text) 1090 1091 return text 1092 1093 def _pop_macro(name: str) -> None: 1094 """Pops a macro from applied_macros.""" 1095 1096 for i, (macro_name, _) in enumerate(applied_macros): 1097 if macro_name == name: 1098 applied_macros.pop(i) 1099 break 1100 1101 def _finish(out: str, in_sequence: bool) -> str: 1102 """Adds ending cap to the given string.""" 1103 1104 if in_sequence: 1105 if not out.endswith("\x1b[0m"): 1106 out += "\x1b[0m" 1107 1108 return out + "]" 1109 1110 return out + "[/]" 1111 1112 styles: dict[TokenType, str] = { 1113 TokenType.MACRO: "210", 1114 TokenType.ESCAPED: "210 bold", 1115 TokenType.UNSETTER: "strikethrough", 1116 } 1117 1118 applied_macros: list[tuple[str, MacroCall]] = [] 1119 1120 out = "" 1121 in_sequence = False 1122 current_styles: list[Token] = [] 1123 1124 for token in self.tokenize_markup(text): 1125 if token.ttype in [TokenType.PLAIN, TokenType.ESCAPED]: 1126 if in_sequence: 1127 out += "]" 1128 1129 in_sequence = False 1130 1131 sequence = "" 1132 for style in current_styles: 1133 if style.sequence is None: 1134 continue 1135 1136 sequence += style.sequence 1137 1138 out += f"{sequence}{_apply_macros(token.name)}\033[0m" 1139 continue 1140 1141 out += " " if in_sequence else "[" 1142 in_sequence = True 1143 1144 if token.ttype is TokenType.UNSETTER: 1145 if token.name == "/": 1146 applied_macros = [] 1147 1148 name = token.name[1:] 1149 1150 if name in self.macros: 1151 _pop_macro(name) 1152 1153 current_styles.append(token) 1154 1155 out += self.parse( 1156 ("" if (name in self.tags) or (name in self.user_tags) else "") 1157 + f"[{styles[TokenType.UNSETTER]}]/{name}" 1158 ) 1159 continue 1160 1161 if token.ttype is TokenType.MACRO: 1162 assert isinstance(token.data, tuple) 1163 1164 name = token.name 1165 if "(" in name: 1166 name = name[: token.name.index("(")] 1167 1168 applied_macros.append((name, token.data)) 1169 1170 try: 1171 out += token.data[0](*token.data[1], token.name) 1172 continue 1173 1174 except TypeError: # Not enough arguments 1175 pass 1176 1177 if token.sequence is not None: 1178 current_styles.append(token) 1179 1180 style_markup = styles.get(token.ttype) or token.name 1181 out += self.parse(f"[{style_markup}]{token.name}") 1182 1183 return _finish(out, in_sequence) 1184 1185 def get_styled_plains(self, text: str) -> Iterator[StyledText]: 1186 """Gets all plain tokens within text, with their respective styles applied. 1187 1188 Args: 1189 text: The ANSI-sequence containing string to find plains from. 1190 1191 Returns: 1192 An iterator of `StyledText` objects, each yielded when a new plain token is found, 1193 containing the styles that are relevant and active on the given plain. 1194 """ 1195 1196 def _apply_styles(styles: list[Token], text: str) -> str: 1197 """Applies given styles to text.""" 1198 1199 for token in styles: 1200 if token.ttype is TokenType.MACRO: 1201 assert isinstance(token.data, tuple) 1202 text = token.data[0](*token.data[1], text) 1203 continue 1204 1205 if token.sequence is None: 1206 continue 1207 1208 text = token.sequence + text 1209 1210 return text 1211 1212 def _pop_unsetter(token: Token, styles: list[Token]) -> list[Token]: 1213 """Removes an unsetter from the list, returns the new list.""" 1214 1215 if token.name == "/": 1216 return list(filter(lambda tkn: tkn.ttype is TokenType.POSITION, styles)) 1217 1218 target_name = token.name[1:] 1219 for style in styles: 1220 # bold & dim unsetters represent the same character, so we have 1221 # to treat them the same way. 1222 style_name = style.name 1223 1224 if style.name == "dim": 1225 style_name = "bold" 1226 1227 if style_name == target_name: 1228 styles.remove(style) 1229 1230 elif ( 1231 style_name.startswith(target_name) 1232 and style.ttype is TokenType.MACRO 1233 ): 1234 styles.remove(style) 1235 1236 elif style.ttype is TokenType.COLOR: 1237 assert isinstance(style.data, Color) 1238 if target_name == "fg" and not style.data.background: 1239 styles.remove(style) 1240 1241 elif target_name == "bg" and style.data.background: 1242 styles.remove(style) 1243 1244 return styles 1245 1246 def _pop_position(styles: list[Token]) -> list[Token]: 1247 for token in styles.copy(): 1248 if token.ttype is TokenType.POSITION: 1249 styles.remove(token) 1250 1251 return styles 1252 1253 styles: list[Token] = [] 1254 for token in self.tokenize_ansi(text): 1255 if token.ttype is TokenType.COLOR: 1256 for i, style in enumerate(reversed(styles)): 1257 if style.ttype is TokenType.COLOR: 1258 assert isinstance(style.data, Color) 1259 assert isinstance(token.data, Color) 1260 1261 if style.data.background != token.data.background: 1262 continue 1263 1264 styles[len(styles) - i - 1] = token 1265 break 1266 else: 1267 styles.append(token) 1268 1269 continue 1270 1271 if token.ttype is TokenType.LINK: 1272 styles.append(token) 1273 yield StyledText(_apply_styles(styles, token.name)) 1274 1275 if token.ttype is TokenType.PLAIN: 1276 assert isinstance(token.data, str) 1277 yield StyledText(_apply_styles(styles, token.data)) 1278 styles = _pop_position(styles) 1279 continue 1280 1281 if token.ttype is TokenType.UNSETTER: 1282 styles = _pop_unsetter(token, styles) 1283 continue 1284 1285 styles.append(token) 1286 1287 1288def main() -> None: 1289 """Main method""" 1290 1291 parser = ArgumentParser() 1292 1293 markup_group = parser.add_argument_group("Markup->ANSI") 1294 markup_group.add_argument( 1295 "-p", "--parse", metavar=("TXT"), help="parse a markup text" 1296 ) 1297 markup_group.add_argument( 1298 "-e", "--escape", help="escape parsed markup", action="store_true" 1299 ) 1300 # markup_group.add_argument( 1301 # "-o", 1302 # "--optimize", 1303 # help="set optimization level for markup parsing", 1304 # action="count", 1305 # default=0, 1306 # ) 1307 1308 markup_group.add_argument("--alias", action="append", help="alias src=dst") 1309 1310 ansi_group = parser.add_argument_group("ANSI->Markup") 1311 ansi_group.add_argument( 1312 "-m", "--markup", metavar=("TXT"), help="get markup from ANSI text" 1313 ) 1314 ansi_group.add_argument( 1315 "-s", 1316 "--show-inverse", 1317 action="store_true", 1318 help="show result of parsing result markup", 1319 ) 1320 1321 args = parser.parse_args() 1322 1323 lang = MarkupLanguage() 1324 1325 if args.markup: 1326 markup_text = lang.get_markup(args.markup) 1327 print(markup_text, end="") 1328 1329 if args.show_inverse: 1330 print("->", lang.parse(markup_text)) 1331 else: 1332 print() 1333 1334 if args.parse: 1335 if args.alias: 1336 for alias in args.alias: 1337 src, dest = alias.split("=") 1338 lang.alias(src, dest) 1339 1340 parsed = lang.parse(args.parse) 1341 1342 if args.escape: 1343 print(ascii(parsed)) 1344 else: 1345 print(parsed) 1346 1347 return 1348 1349 1350tim = markup = MarkupLanguage() 1351"""The default TIM instances.""" 1352 1353if __name__ == "__main__": 1354 main()
390class StyledText(str): 391 """A styled text object. 392 393 The purpose of this class is to implement some things regular `str` 394 breaks at when encountering ANSI sequences. 395 396 Instances of this class are usually spat out by `MarkupLanguage.parse`, 397 but may be manually constructed if the need arises. Everything works even 398 if there is no ANSI tomfoolery going on. 399 """ 400 401 value: str 402 """The underlying, ANSI-inclusive string value.""" 403 404 _plain: str | None = None 405 _tokens: list[Token] | None = None 406 407 def __new__(cls, value: str = ""): 408 """Creates a StyledText, gets markup tags.""" 409 410 obj = super().__new__(cls, value) 411 obj.value = value 412 413 return obj 414 415 def _generate_tokens(self) -> None: 416 """Generates self._tokens & self._plain.""" 417 418 self._tokens = list(tim.tokenize_ansi(self.value)) 419 420 self._plain = "" 421 for token in self._tokens: 422 if token.ttype is not TokenType.PLAIN: 423 continue 424 425 assert isinstance(token.data, str) 426 self._plain += token.data 427 428 @property 429 def tokens(self) -> list[Token]: 430 """Returns all markup tokens of this object. 431 432 Generated on-demand, at the first call to this or the self.plain 433 property. 434 """ 435 436 if self._tokens is not None: 437 return self._tokens 438 439 self._generate_tokens() 440 assert self._tokens is not None 441 return self._tokens 442 443 @property 444 def plain(self) -> str: 445 """Returns the value of this object, with no ANSI sequences. 446 447 Generated on-demand, at the first call to this or the self.tokens 448 property. 449 """ 450 451 if self._plain is not None: 452 return self._plain 453 454 self._generate_tokens() 455 assert self._plain is not None 456 return self._plain 457 458 def plain_index(self, index: int | None) -> int | None: 459 """Finds given index inside plain text.""" 460 461 if index is None: 462 return None 463 464 styled_chars = 0 465 plain_chars = 0 466 negative_index = False 467 468 tokens = self.tokens.copy() 469 if index < 0: 470 tokens.reverse() 471 index = abs(index) 472 negative_index = True 473 474 for token in tokens: 475 if token.data is None: 476 continue 477 478 if token.ttype is not TokenType.PLAIN: 479 assert token.sequence is not None 480 styled_chars += len(token.sequence) 481 continue 482 483 assert isinstance(token.data, str) 484 for _ in range(len(token.data)): 485 if plain_chars == index: 486 if negative_index: 487 return -1 * (plain_chars + styled_chars) 488 489 return styled_chars + plain_chars 490 491 plain_chars += 1 492 493 return None 494 495 def __len__(self) -> int: 496 """Gets "real" length of object.""" 497 498 return len(self.plain) 499 500 def __getitem__(self, subscript: int | slice) -> str: 501 """Gets an item, adjusted for non-plain text. 502 503 Args: 504 subscript: The integer or slice to find. 505 506 Returns: 507 The elements described by the subscript. 508 509 Raises: 510 IndexError: The given index is out of range. 511 """ 512 513 if isinstance(subscript, int): 514 plain_index = self.plain_index(subscript) 515 if plain_index is None: 516 raise IndexError("StyledText index out of range") 517 518 return self.value[plain_index] 519 520 return self.value[ 521 slice( 522 self.plain_index(subscript.start), 523 self.plain_index(subscript.stop), 524 subscript.step, 525 ) 526 ]
A styled text object.
The purpose of this class is to implement some things regular str
breaks at when encountering ANSI sequences.
Instances of this class are usually spat out by MarkupLanguage.parse
,
but may be manually constructed if the need arises. Everything works even
if there is no ANSI tomfoolery going on.
407 def __new__(cls, value: str = ""): 408 """Creates a StyledText, gets markup tags.""" 409 410 obj = super().__new__(cls, value) 411 obj.value = value 412 413 return obj
Creates a StyledText, gets markup tags.
Returns all markup tokens of this object.
Generated on-demand, at the first call to this or the self.plain property.
Returns the value of this object, with no ANSI sequences.
Generated on-demand, at the first call to this or the self.tokens property.
458 def plain_index(self, index: int | None) -> int | None: 459 """Finds given index inside plain text.""" 460 461 if index is None: 462 return None 463 464 styled_chars = 0 465 plain_chars = 0 466 negative_index = False 467 468 tokens = self.tokens.copy() 469 if index < 0: 470 tokens.reverse() 471 index = abs(index) 472 negative_index = True 473 474 for token in tokens: 475 if token.data is None: 476 continue 477 478 if token.ttype is not TokenType.PLAIN: 479 assert token.sequence is not None 480 styled_chars += len(token.sequence) 481 continue 482 483 assert isinstance(token.data, str) 484 for _ in range(len(token.data)): 485 if plain_chars == index: 486 if negative_index: 487 return -1 * (plain_chars + styled_chars) 488 489 return styled_chars + plain_chars 490 491 plain_chars += 1 492 493 return None
Finds given index inside plain text.
Inherited Members
- builtins.str
- encode
- replace
- split
- rsplit
- join
- capitalize
- casefold
- title
- center
- count
- expandtabs
- find
- partition
- index
- ljust
- lower
- lstrip
- rfind
- rindex
- rjust
- rstrip
- rpartition
- splitlines
- strip
- swapcase
- translate
- upper
- startswith
- endswith
- removeprefix
- removesuffix
- isascii
- islower
- isupper
- istitle
- isspace
- isdecimal
- isdigit
- isnumeric
- isalpha
- isalnum
- isidentifier
- isprintable
- zfill
- format
- format_map
- maketrans
529class MarkupLanguage: 530 """A class representing an instance of a Markup Language. 531 532 This class is used for all markup/ANSI parsing, tokenizing and usage. 533 534 ```python3 535 from pytermgui import tim 536 537 tim.alias("my-tag", "@152 72 bold") 538 tim.print("This is [my-tag]my-tag[/]!") 539 ``` 540 541 <p style="text-align: center"> 542 <img src="https://raw.githubusercontent.com/bczsalba/pytermgui/master/assets/\ 543docs/parser/markup_language.png" 544 style="width: 80%"> 545 </p> 546 """ 547 548 raise_unknown_markup: bool = False 549 """Raise `pytermgui.exceptions.MarkupSyntaxError` when encountering unknown markup tags.""" 550 551 def __init__(self, default_macros: bool = True) -> None: 552 """Initializes a MarkupLanguage. 553 554 Args: 555 default_macros: If not set, the builtin macros are not defined. 556 """ 557 558 self.tags: dict[str, str] = STYLE_MAP.copy() 559 self._cache: dict[str, StyledText] = {} 560 self.macros: dict[str, MacroCallable] = {} 561 self.user_tags: dict[str, str] = {} 562 self.unsetters: dict[str, str | None] = UNSETTER_MAP.copy() 563 564 self.should_cache: bool = True 565 566 if default_macros: 567 self.define("!link", macro_link) 568 self.define("!align", macro_align) 569 self.define("!markup", self.get_markup) 570 self.define("!shuffle", macro_shuffle) 571 self.define("!strip_bg", macro_strip_bg) 572 self.define("!strip_fg", macro_strip_fg) 573 self.define("!rainbow", macro_rainbow) 574 self.define("!gradient", macro_gradient) 575 self.define("!upper", lambda item: str(item.upper())) 576 self.define("!lower", lambda item: str(item.lower())) 577 self.define("!title", lambda item: str(item.title())) 578 self.define("!capitalize", lambda item: str(item.capitalize())) 579 self.define("!expand", lambda tag: macro_expand(self, tag)) 580 self.define("!debug", lambda *args: ",".join(ascii(arg) for arg in args)) 581 582 self.alias("code", "dim @black") 583 self.alias("code.str", "142") 584 self.alias("code.multiline_str", "code.str") 585 self.alias("code.none", "167") 586 self.alias("code.global", "214") 587 self.alias("code.number", "175") 588 self.alias("code.keyword", "203") 589 self.alias("code.identifier", "109") 590 self.alias("code.name", "code.global") 591 self.alias("code.comment", "240 italic") 592 self.alias("code.builtin", "code.global") 593 self.alias("code.file", "code.identifier") 594 self.alias("code.symbol", "code.identifier") 595 596 def _get_color_token(self, tag: str) -> Token | None: 597 """Tries to get a color token from the given tag. 598 599 Args: 600 tag: The tag to parse. 601 602 Returns: 603 A color token if the given tag could be parsed into one, else None. 604 """ 605 606 try: 607 color = str_to_color(tag, use_cache=self.should_cache) 608 609 except ColorSyntaxError: 610 return None 611 612 return Token(name=color.value, ttype=TokenType.COLOR, data=color) 613 614 def _get_style_token(self, tag: str) -> Token | None: 615 """Tries to get a style (including unsetter) token from tags, user tags and unsetters. 616 617 Args: 618 tag: The tag to parse. 619 620 Returns: 621 A `Token` if one could be created, None otherwise. 622 """ 623 624 if tag in self.unsetters: 625 return Token(name=tag, ttype=TokenType.UNSETTER, data=self.unsetters[tag]) 626 627 if tag in self.user_tags: 628 return Token(name=tag, ttype=TokenType.STYLE, data=self.user_tags[tag]) 629 630 if tag in self.tags: 631 return Token(name=tag, ttype=TokenType.STYLE, data=self.tags[tag]) 632 633 return None 634 635 def print(self, *args, **kwargs) -> None: 636 """Parse all arguments and pass them through to print, along with kwargs.""" 637 638 parsed = [] 639 for arg in args: 640 parsed.append(self.parse(str(arg))) 641 642 get_terminal().print(*parsed, **kwargs) 643 644 def tokenize_markup(self, markup_text: str) -> Iterator[Token]: 645 """Converts the given markup string into an iterator of `Token`. 646 647 Args: 648 markup_text: The text to look at. 649 650 Returns: 651 An iterator of tokens. The reason this is an iterator is to possibly save 652 on memory. 653 """ 654 655 end = 0 656 start = 0 657 cursor = 0 658 for match in RE_MARKUP.finditer(markup_text): 659 full, escapes, tag_text = match.groups() 660 start, end = match.span() 661 662 # Add plain text between last and current match 663 if start > cursor: 664 yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:start]) 665 666 if not escapes == "" and len(escapes) % 2 == 1: 667 cursor = end 668 yield Token(ttype=TokenType.ESCAPED, data=full[len(escapes) - 1 :]) 669 continue 670 671 for tag in tag_text.split(): 672 token = self._get_style_token(tag) 673 if token is not None: 674 yield token 675 continue 676 677 # Try to find a color token 678 token = self._get_color_token(tag) 679 if token is not None: 680 yield token 681 continue 682 683 macro_match = RE_MACRO.match(tag) 684 if macro_match is not None: 685 name, args = macro_match.groups() 686 macro_args = () if args is None else args.split(":") 687 688 if not name in self.macros: 689 raise MarkupSyntaxError( 690 tag=tag, 691 cause="is not a defined macro", 692 context=markup_text, 693 ) 694 695 yield Token( 696 name=tag, 697 ttype=TokenType.MACRO, 698 data=(self.macros[name], macro_args), 699 ) 700 continue 701 702 if self.raise_unknown_markup: 703 raise MarkupSyntaxError( 704 tag=tag, cause="not defined", context=markup_text 705 ) 706 707 cursor = end 708 709 # Add remaining text as plain 710 if len(markup_text) > cursor: 711 yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:]) 712 713 def tokenize_ansi(self, ansi: str) -> Iterator[Token]: 714 """Converts the given ANSI string into an iterator of `Token`. 715 716 Args: 717 ansi: The text to look at. 718 719 Returns: 720 An iterator of tokens. The reason this is an iterator is to possibly save 721 on memory. 722 """ 723 724 def _is_in_tags(code: str, tags: dict[str, str]) -> str | None: 725 """Determines whether a code is in the given dict of tags.""" 726 727 for name, current in tags.items(): 728 if current == code: 729 return name 730 731 return None 732 733 def _generate_color( 734 parts: list[str], code: str 735 ) -> tuple[str, TokenType, Color]: 736 """Generates a color token.""" 737 738 data: Color 739 if len(parts) == 1: 740 data = StandardColor.from_ansi(code) 741 name = data.name 742 ttype = TokenType.COLOR 743 744 else: 745 data = str_to_color(code) 746 name = data.name 747 ttype = TokenType.COLOR 748 749 return name, ttype, data 750 751 end = 0 752 start = 0 753 cursor = 0 754 755 # StyledText messes with indexing, so we need to cast it 756 # back to str. 757 if isinstance(ansi, StyledText): 758 ansi = str(ansi) 759 760 for match in RE_ANSI.finditer(ansi): 761 code = match.groups()[0] 762 start, end = match.span() 763 764 if code is None: 765 continue 766 767 parts = code.split(";") 768 769 if start > cursor: 770 plain = ansi[cursor:start] 771 772 yield Token(name=plain, ttype=TokenType.PLAIN, data=plain) 773 774 name: str | None = code 775 ttype = None 776 data: str | Color = parts[0] 777 778 # Styles & Unsetters 779 if len(parts) == 1: 780 # Covariancy is not an issue here, even though mypy seems to think so. 781 name = _is_in_tags(parts[0], self.unsetters) # type: ignore 782 if name is not None: 783 ttype = TokenType.UNSETTER 784 785 else: 786 name = _is_in_tags(parts[0], self.tags) 787 if name is not None: 788 ttype = TokenType.STYLE 789 790 # Colors 791 if ttype is None: 792 with suppress(ColorSyntaxError): 793 name, ttype, data = _generate_color(parts, code) 794 795 if name is None or ttype is None or data is None: 796 if len(parts) != 2: 797 raise AnsiSyntaxError( 798 tag=parts[0], cause="not recognized", context=ansi 799 ) 800 801 name = "position" 802 ttype = TokenType.POSITION 803 data = ",".join(reversed(parts)) 804 805 yield Token(name=name, ttype=ttype, data=data) 806 cursor = end 807 808 if cursor < len(ansi): 809 plain = ansi[cursor:] 810 811 yield Token(ttype=TokenType.PLAIN, data=plain) 812 813 def define(self, name: str, method: MacroCallable) -> None: 814 """Defines a Macro tag that executes the given method. 815 816 Args: 817 name: The name the given method will be reachable by within markup. 818 The given value gets "!" prepended if it isn't present already. 819 method: The method this macro will execute. 820 """ 821 822 if not name.startswith("!"): 823 name = f"!{name}" 824 825 self.macros[name] = method 826 self.unsetters[f"/{name}"] = None 827 828 def alias(self, name: str, value: str) -> None: 829 """Aliases the given name to a value, and generates an unsetter for it. 830 831 Note that it is not possible to alias macros. 832 833 Args: 834 name: The name of the new tag. 835 value: The value the new tag will stand for. 836 """ 837 838 def _get_unsetter(token: Token) -> str | None: 839 """Get unsetter for a token""" 840 841 if token.ttype is TokenType.PLAIN: 842 return None 843 844 if token.ttype is TokenType.UNSETTER: 845 return self.unsetters[token.name] 846 847 if token.ttype is TokenType.COLOR: 848 assert isinstance(token.data, Color) 849 850 if token.data.background: 851 return self.unsetters["/bg"] 852 853 return self.unsetters["/fg"] 854 855 name = f"/{token.name}" 856 if not name in self.unsetters: 857 raise KeyError(f"Could not find unsetter for token {token}.") 858 859 return self.unsetters[name] 860 861 if name.startswith("!"): 862 raise ValueError('Only macro tags can always start with "!".') 863 864 setter = "" 865 unsetter = "" 866 867 # Try to link to existing tag 868 if value in self.user_tags: 869 self.unsetters[f"/{name}"] = self.unsetters[f"/{value}"] 870 self.user_tags[name] = self.user_tags[value] 871 return 872 873 for token in self.tokenize_markup(f"[{value}]"): 874 if token.ttype is TokenType.PLAIN: 875 continue 876 877 assert token.sequence is not None 878 setter += token.sequence 879 880 t_unsetter = _get_unsetter(token) 881 unsetter += f"\x1b[{t_unsetter}m" 882 883 self.unsetters[f"/{name}"] = unsetter.lstrip("\x1b[").rstrip("m") 884 self.user_tags[name] = setter.lstrip("\x1b[").rstrip("m") 885 886 marked: list[str] = [] 887 for item in self._cache: 888 if name in item: 889 marked.append(item) 890 891 for item in marked: 892 del self._cache[item] 893 894 # TODO: I cannot cut down the one-too-many branch that this has at the moment. 895 # We could look into it in the future, however. 896 def parse( # pylint: disable=too-many-branches 897 self, markup_text: str 898 ) -> StyledText: 899 """Parses the given markup. 900 901 Args: 902 markup_text: The markup to parse. 903 904 Returns: 905 A `StyledText` instance of the result of parsing the input. This 906 custom `str` class is used to allow accessing the plain value of 907 the output, as well as to cleanly index within it. It is analogous 908 to builtin `str`, only adds extra things on top. 909 """ 910 911 applied_macros: list[tuple[str, MacroCall]] = [] 912 previous_token: Token | None = None 913 previous_sequence = "" 914 sequence = "" 915 out = "" 916 917 def _apply_macros(text: str) -> str: 918 """Apply current macros to text""" 919 920 for _, (method, args) in applied_macros: 921 text = method(*args, text) 922 923 return text 924 925 def _is_same_colorgroup(previous: Token, new: Token) -> bool: 926 if not isinstance(new.data, Color) or not isinstance(previous.data, Color): 927 return False 928 929 return ( 930 type(previous) is type(new) 931 and previous.data.background == new.data.background 932 ) 933 934 if ( 935 self.should_cache 936 and markup_text in self._cache 937 and len(RE_MACRO.findall(markup_text)) == 0 938 ): 939 return self._cache[markup_text] 940 941 token: Token 942 for token in self.tokenize_markup(markup_text): 943 if sequence != "" and previous_token == token: 944 continue 945 946 # Optimize out previously added color tokens, as only the most 947 # recent would be visible anyways. 948 if ( 949 token.sequence is not None 950 and previous_token is not None 951 and _is_same_colorgroup(previous_token, token) 952 ): 953 sequence = token.sequence 954 continue 955 956 if token.ttype == TokenType.UNSETTER and token.data == "0": 957 out += "\033[0m" 958 sequence = "" 959 applied_macros = [] 960 continue 961 962 previous_token = token 963 964 # Macro unsetters are stored with None as their data 965 if token.data is None and token.ttype is TokenType.UNSETTER: 966 for item, data in applied_macros.copy(): 967 macro_match = RE_MACRO.match(item) 968 assert macro_match is not None 969 970 macro_name = macro_match.groups()[0] 971 972 if f"/{macro_name}" == token.name: 973 applied_macros.remove((item, data)) 974 975 continue 976 977 if token.ttype is TokenType.MACRO: 978 assert isinstance(token.data, tuple) 979 980 applied_macros.append((token.name, token.data)) 981 continue 982 983 if token.sequence is None: 984 applied = sequence 985 986 if not out.endswith("\x1b[0m"): 987 for item in previous_sequence.split("\x1b"): 988 if item == "" or item[1:-1] in self.unsetters.values(): 989 continue 990 991 item = f"\x1b{item}" 992 applied = applied.replace(item, "") 993 994 out += applied + _apply_macros(token.name) 995 previous_sequence = sequence 996 sequence = "" 997 continue 998 999 sequence += token.sequence 1000 1001 if sequence + previous_sequence != "": 1002 out += "\x1b[0m" 1003 1004 out = StyledText(out) 1005 self._cache[markup_text] = out 1006 return out 1007 1008 def get_markup(self, ansi: str) -> str: 1009 """Generates markup from ANSI text. 1010 1011 Args: 1012 ansi: The text to get markup from. 1013 1014 Returns: 1015 A markup string that can be parsed to get (visually) the same 1016 result. Note that this conversion is lossy in a way: there are some 1017 details (like macros) that cannot be preserved in an ANSI->Markup->ANSI 1018 conversion. 1019 """ 1020 1021 current_tags: list[str] = [] 1022 out = "" 1023 for token in self.tokenize_ansi(ansi): 1024 if token.ttype is TokenType.PLAIN: 1025 if len(current_tags) != 0: 1026 out += "[" + " ".join(current_tags) + "]" 1027 1028 assert isinstance(token.data, str) 1029 out += token.data 1030 current_tags = [] 1031 continue 1032 1033 if token.ttype is TokenType.ESCAPED: 1034 assert isinstance(token.data, str) 1035 1036 current_tags.append(token.data) 1037 continue 1038 1039 current_tags.append(token.name) 1040 1041 return out 1042 1043 def prettify_ansi(self, text: str) -> str: 1044 """Returns a prettified (syntax-highlighted) ANSI str. 1045 1046 This is useful to quickly "inspect" a given ANSI string. However, 1047 for most real uses `MarkupLanguage.prettify_markup` would be 1048 preferable, given an argument of `MarkupLanguage.get_markup(text)`, 1049 as it is much more verbose. 1050 1051 Args: 1052 text: The ANSI-text to prettify. 1053 1054 Returns: 1055 The prettified ANSI text. This text's styles remain valid, 1056 so copy-pasting the argument into a command (like printf) 1057 that can show styled text will work the same way. 1058 """ 1059 1060 out = "" 1061 sequences = "" 1062 for token in self.tokenize_ansi(text): 1063 if token.ttype is TokenType.PLAIN: 1064 assert isinstance(token.data, str) 1065 out += token.data 1066 continue 1067 1068 assert token.sequence is not None 1069 out += "\x1b[0m" + token.sequence + token.sequence.replace("\x1b", "\\x1b") 1070 sequences += token.sequence 1071 out += sequences 1072 1073 return out 1074 1075 def prettify_markup(self, text: str) -> str: 1076 """Returns a prettified (syntax-highlighted) markup str. 1077 1078 Args: 1079 text: The markup-text to prettify. 1080 1081 Returns: 1082 Prettified markup. This markup, excluding its styles, 1083 remains valid markup. 1084 """ 1085 1086 def _apply_macros(text: str) -> str: 1087 """Apply current macros to text""" 1088 1089 for _, (method, args) in applied_macros: 1090 text = method(*args, text) 1091 1092 return text 1093 1094 def _pop_macro(name: str) -> None: 1095 """Pops a macro from applied_macros.""" 1096 1097 for i, (macro_name, _) in enumerate(applied_macros): 1098 if macro_name == name: 1099 applied_macros.pop(i) 1100 break 1101 1102 def _finish(out: str, in_sequence: bool) -> str: 1103 """Adds ending cap to the given string.""" 1104 1105 if in_sequence: 1106 if not out.endswith("\x1b[0m"): 1107 out += "\x1b[0m" 1108 1109 return out + "]" 1110 1111 return out + "[/]" 1112 1113 styles: dict[TokenType, str] = { 1114 TokenType.MACRO: "210", 1115 TokenType.ESCAPED: "210 bold", 1116 TokenType.UNSETTER: "strikethrough", 1117 } 1118 1119 applied_macros: list[tuple[str, MacroCall]] = [] 1120 1121 out = "" 1122 in_sequence = False 1123 current_styles: list[Token] = [] 1124 1125 for token in self.tokenize_markup(text): 1126 if token.ttype in [TokenType.PLAIN, TokenType.ESCAPED]: 1127 if in_sequence: 1128 out += "]" 1129 1130 in_sequence = False 1131 1132 sequence = "" 1133 for style in current_styles: 1134 if style.sequence is None: 1135 continue 1136 1137 sequence += style.sequence 1138 1139 out += f"{sequence}{_apply_macros(token.name)}\033[0m" 1140 continue 1141 1142 out += " " if in_sequence else "[" 1143 in_sequence = True 1144 1145 if token.ttype is TokenType.UNSETTER: 1146 if token.name == "/": 1147 applied_macros = [] 1148 1149 name = token.name[1:] 1150 1151 if name in self.macros: 1152 _pop_macro(name) 1153 1154 current_styles.append(token) 1155 1156 out += self.parse( 1157 ("" if (name in self.tags) or (name in self.user_tags) else "") 1158 + f"[{styles[TokenType.UNSETTER]}]/{name}" 1159 ) 1160 continue 1161 1162 if token.ttype is TokenType.MACRO: 1163 assert isinstance(token.data, tuple) 1164 1165 name = token.name 1166 if "(" in name: 1167 name = name[: token.name.index("(")] 1168 1169 applied_macros.append((name, token.data)) 1170 1171 try: 1172 out += token.data[0](*token.data[1], token.name) 1173 continue 1174 1175 except TypeError: # Not enough arguments 1176 pass 1177 1178 if token.sequence is not None: 1179 current_styles.append(token) 1180 1181 style_markup = styles.get(token.ttype) or token.name 1182 out += self.parse(f"[{style_markup}]{token.name}") 1183 1184 return _finish(out, in_sequence) 1185 1186 def get_styled_plains(self, text: str) -> Iterator[StyledText]: 1187 """Gets all plain tokens within text, with their respective styles applied. 1188 1189 Args: 1190 text: The ANSI-sequence containing string to find plains from. 1191 1192 Returns: 1193 An iterator of `StyledText` objects, each yielded when a new plain token is found, 1194 containing the styles that are relevant and active on the given plain. 1195 """ 1196 1197 def _apply_styles(styles: list[Token], text: str) -> str: 1198 """Applies given styles to text.""" 1199 1200 for token in styles: 1201 if token.ttype is TokenType.MACRO: 1202 assert isinstance(token.data, tuple) 1203 text = token.data[0](*token.data[1], text) 1204 continue 1205 1206 if token.sequence is None: 1207 continue 1208 1209 text = token.sequence + text 1210 1211 return text 1212 1213 def _pop_unsetter(token: Token, styles: list[Token]) -> list[Token]: 1214 """Removes an unsetter from the list, returns the new list.""" 1215 1216 if token.name == "/": 1217 return list(filter(lambda tkn: tkn.ttype is TokenType.POSITION, styles)) 1218 1219 target_name = token.name[1:] 1220 for style in styles: 1221 # bold & dim unsetters represent the same character, so we have 1222 # to treat them the same way. 1223 style_name = style.name 1224 1225 if style.name == "dim": 1226 style_name = "bold" 1227 1228 if style_name == target_name: 1229 styles.remove(style) 1230 1231 elif ( 1232 style_name.startswith(target_name) 1233 and style.ttype is TokenType.MACRO 1234 ): 1235 styles.remove(style) 1236 1237 elif style.ttype is TokenType.COLOR: 1238 assert isinstance(style.data, Color) 1239 if target_name == "fg" and not style.data.background: 1240 styles.remove(style) 1241 1242 elif target_name == "bg" and style.data.background: 1243 styles.remove(style) 1244 1245 return styles 1246 1247 def _pop_position(styles: list[Token]) -> list[Token]: 1248 for token in styles.copy(): 1249 if token.ttype is TokenType.POSITION: 1250 styles.remove(token) 1251 1252 return styles 1253 1254 styles: list[Token] = [] 1255 for token in self.tokenize_ansi(text): 1256 if token.ttype is TokenType.COLOR: 1257 for i, style in enumerate(reversed(styles)): 1258 if style.ttype is TokenType.COLOR: 1259 assert isinstance(style.data, Color) 1260 assert isinstance(token.data, Color) 1261 1262 if style.data.background != token.data.background: 1263 continue 1264 1265 styles[len(styles) - i - 1] = token 1266 break 1267 else: 1268 styles.append(token) 1269 1270 continue 1271 1272 if token.ttype is TokenType.LINK: 1273 styles.append(token) 1274 yield StyledText(_apply_styles(styles, token.name)) 1275 1276 if token.ttype is TokenType.PLAIN: 1277 assert isinstance(token.data, str) 1278 yield StyledText(_apply_styles(styles, token.data)) 1279 styles = _pop_position(styles) 1280 continue 1281 1282 if token.ttype is TokenType.UNSETTER: 1283 styles = _pop_unsetter(token, styles) 1284 continue 1285 1286 styles.append(token)
A class representing an instance of a Markup Language.
This class is used for all markup/ANSI parsing, tokenizing and usage.
from pytermgui import tim
tim.alias("my-tag", "@152 72 bold")
tim.print("This is [my-tag]my-tag[/]!")
551 def __init__(self, default_macros: bool = True) -> None: 552 """Initializes a MarkupLanguage. 553 554 Args: 555 default_macros: If not set, the builtin macros are not defined. 556 """ 557 558 self.tags: dict[str, str] = STYLE_MAP.copy() 559 self._cache: dict[str, StyledText] = {} 560 self.macros: dict[str, MacroCallable] = {} 561 self.user_tags: dict[str, str] = {} 562 self.unsetters: dict[str, str | None] = UNSETTER_MAP.copy() 563 564 self.should_cache: bool = True 565 566 if default_macros: 567 self.define("!link", macro_link) 568 self.define("!align", macro_align) 569 self.define("!markup", self.get_markup) 570 self.define("!shuffle", macro_shuffle) 571 self.define("!strip_bg", macro_strip_bg) 572 self.define("!strip_fg", macro_strip_fg) 573 self.define("!rainbow", macro_rainbow) 574 self.define("!gradient", macro_gradient) 575 self.define("!upper", lambda item: str(item.upper())) 576 self.define("!lower", lambda item: str(item.lower())) 577 self.define("!title", lambda item: str(item.title())) 578 self.define("!capitalize", lambda item: str(item.capitalize())) 579 self.define("!expand", lambda tag: macro_expand(self, tag)) 580 self.define("!debug", lambda *args: ",".join(ascii(arg) for arg in args)) 581 582 self.alias("code", "dim @black") 583 self.alias("code.str", "142") 584 self.alias("code.multiline_str", "code.str") 585 self.alias("code.none", "167") 586 self.alias("code.global", "214") 587 self.alias("code.number", "175") 588 self.alias("code.keyword", "203") 589 self.alias("code.identifier", "109") 590 self.alias("code.name", "code.global") 591 self.alias("code.comment", "240 italic") 592 self.alias("code.builtin", "code.global") 593 self.alias("code.file", "code.identifier") 594 self.alias("code.symbol", "code.identifier")
Initializes a MarkupLanguage.
Args
- default_macros: If not set, the builtin macros are not defined.
Raise pytermgui.exceptions.MarkupSyntaxError
when encountering unknown markup tags.
635 def print(self, *args, **kwargs) -> None: 636 """Parse all arguments and pass them through to print, along with kwargs.""" 637 638 parsed = [] 639 for arg in args: 640 parsed.append(self.parse(str(arg))) 641 642 get_terminal().print(*parsed, **kwargs)
Parse all arguments and pass them through to print, along with kwargs.
644 def tokenize_markup(self, markup_text: str) -> Iterator[Token]: 645 """Converts the given markup string into an iterator of `Token`. 646 647 Args: 648 markup_text: The text to look at. 649 650 Returns: 651 An iterator of tokens. The reason this is an iterator is to possibly save 652 on memory. 653 """ 654 655 end = 0 656 start = 0 657 cursor = 0 658 for match in RE_MARKUP.finditer(markup_text): 659 full, escapes, tag_text = match.groups() 660 start, end = match.span() 661 662 # Add plain text between last and current match 663 if start > cursor: 664 yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:start]) 665 666 if not escapes == "" and len(escapes) % 2 == 1: 667 cursor = end 668 yield Token(ttype=TokenType.ESCAPED, data=full[len(escapes) - 1 :]) 669 continue 670 671 for tag in tag_text.split(): 672 token = self._get_style_token(tag) 673 if token is not None: 674 yield token 675 continue 676 677 # Try to find a color token 678 token = self._get_color_token(tag) 679 if token is not None: 680 yield token 681 continue 682 683 macro_match = RE_MACRO.match(tag) 684 if macro_match is not None: 685 name, args = macro_match.groups() 686 macro_args = () if args is None else args.split(":") 687 688 if not name in self.macros: 689 raise MarkupSyntaxError( 690 tag=tag, 691 cause="is not a defined macro", 692 context=markup_text, 693 ) 694 695 yield Token( 696 name=tag, 697 ttype=TokenType.MACRO, 698 data=(self.macros[name], macro_args), 699 ) 700 continue 701 702 if self.raise_unknown_markup: 703 raise MarkupSyntaxError( 704 tag=tag, cause="not defined", context=markup_text 705 ) 706 707 cursor = end 708 709 # Add remaining text as plain 710 if len(markup_text) > cursor: 711 yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:])
Converts the given markup string into an iterator of Token
.
Args
- markup_text: The text to look at.
Returns
An iterator of tokens. The reason this is an iterator is to possibly save on memory.
713 def tokenize_ansi(self, ansi: str) -> Iterator[Token]: 714 """Converts the given ANSI string into an iterator of `Token`. 715 716 Args: 717 ansi: The text to look at. 718 719 Returns: 720 An iterator of tokens. The reason this is an iterator is to possibly save 721 on memory. 722 """ 723 724 def _is_in_tags(code: str, tags: dict[str, str]) -> str | None: 725 """Determines whether a code is in the given dict of tags.""" 726 727 for name, current in tags.items(): 728 if current == code: 729 return name 730 731 return None 732 733 def _generate_color( 734 parts: list[str], code: str 735 ) -> tuple[str, TokenType, Color]: 736 """Generates a color token.""" 737 738 data: Color 739 if len(parts) == 1: 740 data = StandardColor.from_ansi(code) 741 name = data.name 742 ttype = TokenType.COLOR 743 744 else: 745 data = str_to_color(code) 746 name = data.name 747 ttype = TokenType.COLOR 748 749 return name, ttype, data 750 751 end = 0 752 start = 0 753 cursor = 0 754 755 # StyledText messes with indexing, so we need to cast it 756 # back to str. 757 if isinstance(ansi, StyledText): 758 ansi = str(ansi) 759 760 for match in RE_ANSI.finditer(ansi): 761 code = match.groups()[0] 762 start, end = match.span() 763 764 if code is None: 765 continue 766 767 parts = code.split(";") 768 769 if start > cursor: 770 plain = ansi[cursor:start] 771 772 yield Token(name=plain, ttype=TokenType.PLAIN, data=plain) 773 774 name: str | None = code 775 ttype = None 776 data: str | Color = parts[0] 777 778 # Styles & Unsetters 779 if len(parts) == 1: 780 # Covariancy is not an issue here, even though mypy seems to think so. 781 name = _is_in_tags(parts[0], self.unsetters) # type: ignore 782 if name is not None: 783 ttype = TokenType.UNSETTER 784 785 else: 786 name = _is_in_tags(parts[0], self.tags) 787 if name is not None: 788 ttype = TokenType.STYLE 789 790 # Colors 791 if ttype is None: 792 with suppress(ColorSyntaxError): 793 name, ttype, data = _generate_color(parts, code) 794 795 if name is None or ttype is None or data is None: 796 if len(parts) != 2: 797 raise AnsiSyntaxError( 798 tag=parts[0], cause="not recognized", context=ansi 799 ) 800 801 name = "position" 802 ttype = TokenType.POSITION 803 data = ",".join(reversed(parts)) 804 805 yield Token(name=name, ttype=ttype, data=data) 806 cursor = end 807 808 if cursor < len(ansi): 809 plain = ansi[cursor:] 810 811 yield Token(ttype=TokenType.PLAIN, data=plain)
Converts the given ANSI string into an iterator of Token
.
Args
- ansi: The text to look at.
Returns
An iterator of tokens. The reason this is an iterator is to possibly save on memory.
813 def define(self, name: str, method: MacroCallable) -> None: 814 """Defines a Macro tag that executes the given method. 815 816 Args: 817 name: The name the given method will be reachable by within markup. 818 The given value gets "!" prepended if it isn't present already. 819 method: The method this macro will execute. 820 """ 821 822 if not name.startswith("!"): 823 name = f"!{name}" 824 825 self.macros[name] = method 826 self.unsetters[f"/{name}"] = None
Defines a Macro tag that executes the given method.
Args
- name: The name the given method will be reachable by within markup. The given value gets "!" prepended if it isn't present already.
- method: The method this macro will execute.
828 def alias(self, name: str, value: str) -> None: 829 """Aliases the given name to a value, and generates an unsetter for it. 830 831 Note that it is not possible to alias macros. 832 833 Args: 834 name: The name of the new tag. 835 value: The value the new tag will stand for. 836 """ 837 838 def _get_unsetter(token: Token) -> str | None: 839 """Get unsetter for a token""" 840 841 if token.ttype is TokenType.PLAIN: 842 return None 843 844 if token.ttype is TokenType.UNSETTER: 845 return self.unsetters[token.name] 846 847 if token.ttype is TokenType.COLOR: 848 assert isinstance(token.data, Color) 849 850 if token.data.background: 851 return self.unsetters["/bg"] 852 853 return self.unsetters["/fg"] 854 855 name = f"/{token.name}" 856 if not name in self.unsetters: 857 raise KeyError(f"Could not find unsetter for token {token}.") 858 859 return self.unsetters[name] 860 861 if name.startswith("!"): 862 raise ValueError('Only macro tags can always start with "!".') 863 864 setter = "" 865 unsetter = "" 866 867 # Try to link to existing tag 868 if value in self.user_tags: 869 self.unsetters[f"/{name}"] = self.unsetters[f"/{value}"] 870 self.user_tags[name] = self.user_tags[value] 871 return 872 873 for token in self.tokenize_markup(f"[{value}]"): 874 if token.ttype is TokenType.PLAIN: 875 continue 876 877 assert token.sequence is not None 878 setter += token.sequence 879 880 t_unsetter = _get_unsetter(token) 881 unsetter += f"\x1b[{t_unsetter}m" 882 883 self.unsetters[f"/{name}"] = unsetter.lstrip("\x1b[").rstrip("m") 884 self.user_tags[name] = setter.lstrip("\x1b[").rstrip("m") 885 886 marked: list[str] = [] 887 for item in self._cache: 888 if name in item: 889 marked.append(item) 890 891 for item in marked: 892 del self._cache[item]
Aliases the given name to a value, and generates an unsetter for it.
Note that it is not possible to alias macros.
Args
- name: The name of the new tag.
- value: The value the new tag will stand for.
896 def parse( # pylint: disable=too-many-branches 897 self, markup_text: str 898 ) -> StyledText: 899 """Parses the given markup. 900 901 Args: 902 markup_text: The markup to parse. 903 904 Returns: 905 A `StyledText` instance of the result of parsing the input. This 906 custom `str` class is used to allow accessing the plain value of 907 the output, as well as to cleanly index within it. It is analogous 908 to builtin `str`, only adds extra things on top. 909 """ 910 911 applied_macros: list[tuple[str, MacroCall]] = [] 912 previous_token: Token | None = None 913 previous_sequence = "" 914 sequence = "" 915 out = "" 916 917 def _apply_macros(text: str) -> str: 918 """Apply current macros to text""" 919 920 for _, (method, args) in applied_macros: 921 text = method(*args, text) 922 923 return text 924 925 def _is_same_colorgroup(previous: Token, new: Token) -> bool: 926 if not isinstance(new.data, Color) or not isinstance(previous.data, Color): 927 return False 928 929 return ( 930 type(previous) is type(new) 931 and previous.data.background == new.data.background 932 ) 933 934 if ( 935 self.should_cache 936 and markup_text in self._cache 937 and len(RE_MACRO.findall(markup_text)) == 0 938 ): 939 return self._cache[markup_text] 940 941 token: Token 942 for token in self.tokenize_markup(markup_text): 943 if sequence != "" and previous_token == token: 944 continue 945 946 # Optimize out previously added color tokens, as only the most 947 # recent would be visible anyways. 948 if ( 949 token.sequence is not None 950 and previous_token is not None 951 and _is_same_colorgroup(previous_token, token) 952 ): 953 sequence = token.sequence 954 continue 955 956 if token.ttype == TokenType.UNSETTER and token.data == "0": 957 out += "\033[0m" 958 sequence = "" 959 applied_macros = [] 960 continue 961 962 previous_token = token 963 964 # Macro unsetters are stored with None as their data 965 if token.data is None and token.ttype is TokenType.UNSETTER: 966 for item, data in applied_macros.copy(): 967 macro_match = RE_MACRO.match(item) 968 assert macro_match is not None 969 970 macro_name = macro_match.groups()[0] 971 972 if f"/{macro_name}" == token.name: 973 applied_macros.remove((item, data)) 974 975 continue 976 977 if token.ttype is TokenType.MACRO: 978 assert isinstance(token.data, tuple) 979 980 applied_macros.append((token.name, token.data)) 981 continue 982 983 if token.sequence is None: 984 applied = sequence 985 986 if not out.endswith("\x1b[0m"): 987 for item in previous_sequence.split("\x1b"): 988 if item == "" or item[1:-1] in self.unsetters.values(): 989 continue 990 991 item = f"\x1b{item}" 992 applied = applied.replace(item, "") 993 994 out += applied + _apply_macros(token.name) 995 previous_sequence = sequence 996 sequence = "" 997 continue 998 999 sequence += token.sequence 1000 1001 if sequence + previous_sequence != "": 1002 out += "\x1b[0m" 1003 1004 out = StyledText(out) 1005 self._cache[markup_text] = out 1006 return out
Parses the given markup.
Args
- markup_text: The markup to parse.
Returns
A
StyledText
instance of the result of parsing the input. This customstr
class is used to allow accessing the plain value of the output, as well as to cleanly index within it. It is analogous to builtinstr
, only adds extra things on top.
1008 def get_markup(self, ansi: str) -> str: 1009 """Generates markup from ANSI text. 1010 1011 Args: 1012 ansi: The text to get markup from. 1013 1014 Returns: 1015 A markup string that can be parsed to get (visually) the same 1016 result. Note that this conversion is lossy in a way: there are some 1017 details (like macros) that cannot be preserved in an ANSI->Markup->ANSI 1018 conversion. 1019 """ 1020 1021 current_tags: list[str] = [] 1022 out = "" 1023 for token in self.tokenize_ansi(ansi): 1024 if token.ttype is TokenType.PLAIN: 1025 if len(current_tags) != 0: 1026 out += "[" + " ".join(current_tags) + "]" 1027 1028 assert isinstance(token.data, str) 1029 out += token.data 1030 current_tags = [] 1031 continue 1032 1033 if token.ttype is TokenType.ESCAPED: 1034 assert isinstance(token.data, str) 1035 1036 current_tags.append(token.data) 1037 continue 1038 1039 current_tags.append(token.name) 1040 1041 return out
Generates markup from ANSI text.
Args
- ansi: The text to get markup from.
Returns
A markup string that can be parsed to get (visually) the same result. Note that this conversion is lossy in a way: there are some details (like macros) that cannot be preserved in an ANSI->Markup->ANSI conversion.
1043 def prettify_ansi(self, text: str) -> str: 1044 """Returns a prettified (syntax-highlighted) ANSI str. 1045 1046 This is useful to quickly "inspect" a given ANSI string. However, 1047 for most real uses `MarkupLanguage.prettify_markup` would be 1048 preferable, given an argument of `MarkupLanguage.get_markup(text)`, 1049 as it is much more verbose. 1050 1051 Args: 1052 text: The ANSI-text to prettify. 1053 1054 Returns: 1055 The prettified ANSI text. This text's styles remain valid, 1056 so copy-pasting the argument into a command (like printf) 1057 that can show styled text will work the same way. 1058 """ 1059 1060 out = "" 1061 sequences = "" 1062 for token in self.tokenize_ansi(text): 1063 if token.ttype is TokenType.PLAIN: 1064 assert isinstance(token.data, str) 1065 out += token.data 1066 continue 1067 1068 assert token.sequence is not None 1069 out += "\x1b[0m" + token.sequence + token.sequence.replace("\x1b", "\\x1b") 1070 sequences += token.sequence 1071 out += sequences 1072 1073 return out
Returns a prettified (syntax-highlighted) ANSI str.
This is useful to quickly "inspect" a given ANSI string. However,
for most real uses MarkupLanguage.prettify_markup
would be
preferable, given an argument of MarkupLanguage.get_markup(text)
,
as it is much more verbose.
Args
- text: The ANSI-text to prettify.
Returns
The prettified ANSI text. This text's styles remain valid, so copy-pasting the argument into a command (like printf) that can show styled text will work the same way.
1075 def prettify_markup(self, text: str) -> str: 1076 """Returns a prettified (syntax-highlighted) markup str. 1077 1078 Args: 1079 text: The markup-text to prettify. 1080 1081 Returns: 1082 Prettified markup. This markup, excluding its styles, 1083 remains valid markup. 1084 """ 1085 1086 def _apply_macros(text: str) -> str: 1087 """Apply current macros to text""" 1088 1089 for _, (method, args) in applied_macros: 1090 text = method(*args, text) 1091 1092 return text 1093 1094 def _pop_macro(name: str) -> None: 1095 """Pops a macro from applied_macros.""" 1096 1097 for i, (macro_name, _) in enumerate(applied_macros): 1098 if macro_name == name: 1099 applied_macros.pop(i) 1100 break 1101 1102 def _finish(out: str, in_sequence: bool) -> str: 1103 """Adds ending cap to the given string.""" 1104 1105 if in_sequence: 1106 if not out.endswith("\x1b[0m"): 1107 out += "\x1b[0m" 1108 1109 return out + "]" 1110 1111 return out + "[/]" 1112 1113 styles: dict[TokenType, str] = { 1114 TokenType.MACRO: "210", 1115 TokenType.ESCAPED: "210 bold", 1116 TokenType.UNSETTER: "strikethrough", 1117 } 1118 1119 applied_macros: list[tuple[str, MacroCall]] = [] 1120 1121 out = "" 1122 in_sequence = False 1123 current_styles: list[Token] = [] 1124 1125 for token in self.tokenize_markup(text): 1126 if token.ttype in [TokenType.PLAIN, TokenType.ESCAPED]: 1127 if in_sequence: 1128 out += "]" 1129 1130 in_sequence = False 1131 1132 sequence = "" 1133 for style in current_styles: 1134 if style.sequence is None: 1135 continue 1136 1137 sequence += style.sequence 1138 1139 out += f"{sequence}{_apply_macros(token.name)}\033[0m" 1140 continue 1141 1142 out += " " if in_sequence else "[" 1143 in_sequence = True 1144 1145 if token.ttype is TokenType.UNSETTER: 1146 if token.name == "/": 1147 applied_macros = [] 1148 1149 name = token.name[1:] 1150 1151 if name in self.macros: 1152 _pop_macro(name) 1153 1154 current_styles.append(token) 1155 1156 out += self.parse( 1157 ("" if (name in self.tags) or (name in self.user_tags) else "") 1158 + f"[{styles[TokenType.UNSETTER]}]/{name}" 1159 ) 1160 continue 1161 1162 if token.ttype is TokenType.MACRO: 1163 assert isinstance(token.data, tuple) 1164 1165 name = token.name 1166 if "(" in name: 1167 name = name[: token.name.index("(")] 1168 1169 applied_macros.append((name, token.data)) 1170 1171 try: 1172 out += token.data[0](*token.data[1], token.name) 1173 continue 1174 1175 except TypeError: # Not enough arguments 1176 pass 1177 1178 if token.sequence is not None: 1179 current_styles.append(token) 1180 1181 style_markup = styles.get(token.ttype) or token.name 1182 out += self.parse(f"[{style_markup}]{token.name}") 1183 1184 return _finish(out, in_sequence)
Returns a prettified (syntax-highlighted) markup str.
Args
- text: The markup-text to prettify.
Returns
Prettified markup. This markup, excluding its styles, remains valid markup.
1186 def get_styled_plains(self, text: str) -> Iterator[StyledText]: 1187 """Gets all plain tokens within text, with their respective styles applied. 1188 1189 Args: 1190 text: The ANSI-sequence containing string to find plains from. 1191 1192 Returns: 1193 An iterator of `StyledText` objects, each yielded when a new plain token is found, 1194 containing the styles that are relevant and active on the given plain. 1195 """ 1196 1197 def _apply_styles(styles: list[Token], text: str) -> str: 1198 """Applies given styles to text.""" 1199 1200 for token in styles: 1201 if token.ttype is TokenType.MACRO: 1202 assert isinstance(token.data, tuple) 1203 text = token.data[0](*token.data[1], text) 1204 continue 1205 1206 if token.sequence is None: 1207 continue 1208 1209 text = token.sequence + text 1210 1211 return text 1212 1213 def _pop_unsetter(token: Token, styles: list[Token]) -> list[Token]: 1214 """Removes an unsetter from the list, returns the new list.""" 1215 1216 if token.name == "/": 1217 return list(filter(lambda tkn: tkn.ttype is TokenType.POSITION, styles)) 1218 1219 target_name = token.name[1:] 1220 for style in styles: 1221 # bold & dim unsetters represent the same character, so we have 1222 # to treat them the same way. 1223 style_name = style.name 1224 1225 if style.name == "dim": 1226 style_name = "bold" 1227 1228 if style_name == target_name: 1229 styles.remove(style) 1230 1231 elif ( 1232 style_name.startswith(target_name) 1233 and style.ttype is TokenType.MACRO 1234 ): 1235 styles.remove(style) 1236 1237 elif style.ttype is TokenType.COLOR: 1238 assert isinstance(style.data, Color) 1239 if target_name == "fg" and not style.data.background: 1240 styles.remove(style) 1241 1242 elif target_name == "bg" and style.data.background: 1243 styles.remove(style) 1244 1245 return styles 1246 1247 def _pop_position(styles: list[Token]) -> list[Token]: 1248 for token in styles.copy(): 1249 if token.ttype is TokenType.POSITION: 1250 styles.remove(token) 1251 1252 return styles 1253 1254 styles: list[Token] = [] 1255 for token in self.tokenize_ansi(text): 1256 if token.ttype is TokenType.COLOR: 1257 for i, style in enumerate(reversed(styles)): 1258 if style.ttype is TokenType.COLOR: 1259 assert isinstance(style.data, Color) 1260 assert isinstance(token.data, Color) 1261 1262 if style.data.background != token.data.background: 1263 continue 1264 1265 styles[len(styles) - i - 1] = token 1266 break 1267 else: 1268 styles.append(token) 1269 1270 continue 1271 1272 if token.ttype is TokenType.LINK: 1273 styles.append(token) 1274 yield StyledText(_apply_styles(styles, token.name)) 1275 1276 if token.ttype is TokenType.PLAIN: 1277 assert isinstance(token.data, str) 1278 yield StyledText(_apply_styles(styles, token.data)) 1279 styles = _pop_position(styles) 1280 continue 1281 1282 if token.ttype is TokenType.UNSETTER: 1283 styles = _pop_unsetter(token, styles) 1284 continue 1285 1286 styles.append(token)
Gets all plain tokens within text, with their respective styles applied.
Args
- text: The ANSI-sequence containing string to find plains from.
Returns
An iterator of
StyledText
objects, each yielded when a new plain token is found, containing the styles that are relevant and active on the given plain.