pytermgui.parser
This module provides TIM
, PyTermGUI's Terminal Inline Markup language. It is a simple,
performant and easy to read way to style, colorize & modify text.
Basic rundown
TIM is included with the purpose of making styling easier to read and manage.
Its syntax is based on square brackets, within which tags are strictly separated by one space character. Tags can stand for colors (xterm-256, RGB or HEX, both background & foreground), styles, unsetters and macros.
The 16 simple colors of the terminal exist as named tags that refer to their numerical value.
Here is a simple example of the syntax, using the pytermgui.pretty
submodule to
syntax-highlight it inside the REPL:
>>> from pytermgui import pretty
>>> '[141 @61 bold] Hello [!upper inverse] There '
General syntax
Background colors are always denoted by a leading @
character in front of the color
tag. Styles are just the name of the style and macros have an exclamation mark in front
of them. Additionally, unsetters use a leading slash (/
) for their syntax. Color
tokens have special unsetters: they use /fg
to cancel foreground colors, and /bg
to
do so with backgrounds.
Macros:
Macros are any type of callable that take at least *args; this is the value of the plain text enclosed by the tag group within which the given macro resides. Additionally, macros can be given any number of positional arguments from within markup, using the syntax:
[!macro(arg1:arg2:arg3)]Text that the macro applies to.[/!macro]plain text, no macro
This syntax gets parsed as follows:
macro("Text that the macro applies to.", "arg1", "arg2", "arg3")
macro
here is whatever the name macro
was defined as prior.
Colors:
Colors can be of three general types: xterm-256, RGB and HEX.
xterm-256
stands for one of the 256 xterm colors. You can use ptg -c
to see the all
of the available colors. Its syntax is just the 0-base index of the color, like [141]
RGB
colors are pretty self explanatory. Their syntax is follows the format
RED;GREEN;BLUE
, such as [111;222;333]
.
HEX
colors are basically just RGB with extra steps. Their syntax is #RRGGBB
, such as
[#FA72BF]
. This code then gets converted to a tuple of RGB colors under the hood, so
from then on RGB and HEX colors are treated the same, and emit the same tokens.
As mentioned above, all colors can be made to act on the background instead by
prepending the color tag with @
, such as @141
, @111;222;333
or @#FA72BF
. To
clear these effects, use /fg
for foreground and /bg
for background colors.
MarkupLanguage
and instancing
All markup behaviour is done by an instance of the MarkupLanguage
class. This is done
partially for organization reasons, but also to allow a sort of sandboxing of custom
definitions and settings.
PyTermGUI provides the tim
name as the global markup language instance. For historical
reasons, the same instance is available as markup
. This should be used pretty much all
of the time, and custom instances should only ever come about when some
security-sensitive macro definitions are needed, as markup
is used by every widget,
including user-input ones such as InputField
.
For the rest of this page, MarkupLanguage
will refer to whichever instance you are
using.
TL;DR : Use tim
always, unless a security concern blocks you from doing so.
Caching
By default, all markup parse results are cached and returned when the same input is
given. To disable this behaviour, set your markup instance (usually markup
)'s
should_cache
field to False.
Customization
There are a couple of ways to customize how markup is parsed. Custom tags can be created
by calling MarkupLanguage.alias
. For defining custom macros, you can use
MarkupLanguage.define
. For more information, see each method's documentation.
View Source
0""" 1This module provides `TIM`, PyTermGUI's Terminal Inline Markup language. It is a simple, 2performant and easy to read way to style, colorize & modify text. 3 4Basic rundown 5------------- 6 7TIM is included with the purpose of making styling easier to read and manage. 8 9Its syntax is based on square brackets, within which tags are strictly separated by one 10space character. Tags can stand for colors (xterm-256, RGB or HEX, both background & 11foreground), styles, unsetters and macros. 12 13The 16 simple colors of the terminal exist as named tags that refer to their numerical 14value. 15 16Here is a simple example of the syntax, using the `pytermgui.pretty` submodule to 17syntax-highlight it inside the REPL: 18 19```python3 20>>> from pytermgui import pretty 21>>> '[141 @61 bold] Hello [!upper inverse] There ' 22``` 23 24<p align=center> 25<img src="https://github.com/bczsalba/pytermgui/blob/master/assets/docs/parser/\ 26simple_example.png?raw=true" width=70%> 27</p> 28 29 30General syntax 31-------------- 32 33Background colors are always denoted by a leading `@` character in front of the color 34tag. Styles are just the name of the style and macros have an exclamation mark in front 35of them. Additionally, unsetters use a leading slash (`/`) for their syntax. Color 36tokens have special unsetters: they use `/fg` to cancel foreground colors, and `/bg` to 37do so with backgrounds. 38 39### Macros: 40 41Macros are any type of callable that take at least *args; this is the value of the plain 42text enclosed by the tag group within which the given macro resides. Additionally, 43macros can be given any number of positional arguments from within markup, using the 44syntax: 45 46``` 47[!macro(arg1:arg2:arg3)]Text that the macro applies to.[/!macro]plain text, no macro 48``` 49 50This syntax gets parsed as follows: 51 52```python3 53macro("Text that the macro applies to.", "arg1", "arg2", "arg3") 54``` 55 56`macro` here is whatever the name `macro` was defined as prior. 57 58### Colors: 59 60Colors can be of three general types: xterm-256, RGB and HEX. 61 62`xterm-256` stands for one of the 256 xterm colors. You can use `ptg -c` to see the all 63of the available colors. Its syntax is just the 0-base index of the color, like `[141]` 64 65`RGB` colors are pretty self explanatory. Their syntax is follows the format 66`RED;GREEN;BLUE`, such as `[111;222;333]`. 67 68`HEX` colors are basically just RGB with extra steps. Their syntax is `#RRGGBB`, such as 69`[#FA72BF]`. This code then gets converted to a tuple of RGB colors under the hood, so 70from then on RGB and HEX colors are treated the same, and emit the same tokens. 71 72As mentioned above, all colors can be made to act on the background instead by 73prepending the color tag with `@`, such as `@141`, `@111;222;333` or `@#FA72BF`. To 74clear these effects, use `/fg` for foreground and `/bg` for background colors. 75 76`MarkupLanguage` and instancing 77------------------------------- 78 79All markup behaviour is done by an instance of the `MarkupLanguage` class. This is done 80partially for organization reasons, but also to allow a sort of sandboxing of custom 81definitions and settings. 82 83PyTermGUI provides the `tim` name as the global markup language instance. For historical 84reasons, the same instance is available as `markup`. This should be used pretty much all 85of the time, and custom instances should only ever come about when some 86security-sensitive macro definitions are needed, as `markup` is used by every widget, 87including user-input ones such as `InputField`. 88 89For the rest of this page, `MarkupLanguage` will refer to whichever instance you are 90using. 91 92TL;DR : Use `tim` always, unless a security concern blocks you from doing so. 93 94Caching 95------- 96 97By default, all markup parse results are cached and returned when the same input is 98given. To disable this behaviour, set your markup instance (usually `markup`)'s 99`should_cache` field to False. 100 101Customization 102------------- 103 104There are a couple of ways to customize how markup is parsed. Custom tags can be created 105by calling `MarkupLanguage.alias`. For defining custom macros, you can use 106`MarkupLanguage.define`. For more information, see each method's documentation. 107""" 108# pylint: disable=too-many-lines 109 110from __future__ import annotations 111 112from random import shuffle 113from contextlib import suppress 114from dataclasses import dataclass 115from argparse import ArgumentParser 116from enum import Enum, auto as _auto 117from typing import Iterator, Callable, Tuple, List 118 119from .terminal import get_terminal 120from .colors import str_to_color, Color 121from .regex import RE_ANSI, RE_MARKUP, RE_MACRO, RE_LINK 122from .exceptions import MarkupSyntaxError, ColorSyntaxError, AnsiSyntaxError 123 124 125__all__ = [ 126 "StyledText", 127 "MacroCallable", 128 "MacroCall", 129 "MarkupLanguage", 130 "markup", 131 "tim", 132] 133 134MacroCallable = Callable[..., str] 135MacroCall = Tuple[MacroCallable, List[str]] 136 137STYLE_MAP = { 138 "bold": "1", 139 "dim": "2", 140 "italic": "3", 141 "underline": "4", 142 "blink": "5", 143 "blink2": "6", 144 "inverse": "7", 145 "invisible": "8", 146 "strikethrough": "9", 147 "overline": "53", 148} 149 150UNSETTER_MAP: dict[str, str | None] = { 151 "/": "0", 152 "/bold": "22", 153 "/dim": "22", 154 "/italic": "23", 155 "/underline": "24", 156 "/blink": "25", 157 "/blink2": "26", 158 "/inverse": "27", 159 "/invisible": "28", 160 "/strikethrough": "29", 161 "/fg": "39", 162 "/bg": "49", 163 "/overline": "54", 164} 165 166 167def macro_align(width: str, alignment: str, content: str) -> str: 168 """Aligns given text using fstrings. 169 170 Args: 171 width: The width to align to. 172 alignment: One of "left", "center", "right". 173 content: The content to align; implicit argument. 174 """ 175 176 aligner = "<" if alignment == "left" else (">" if alignment == "right" else "^") 177 return f"{content:{aligner}{width}}" 178 179 180def macro_expand(lang: MarkupLanguage, tag: str) -> str: 181 """Expands a tag alias.""" 182 183 if not tag in lang.user_tags: 184 return tag 185 186 return lang.get_markup(f"\x1b[{lang.user_tags[tag]}m ")[:-1] 187 188 189def macro_strip_fg(item: str) -> str: 190 """Strips foreground color from item""" 191 192 return markup.parse(f"[/fg]{item}") 193 194 195def macro_strip_bg(item: str) -> str: 196 """Strips foreground color from item""" 197 198 return markup.parse(f"[/bg]{item}") 199 200 201def macro_shuffle(item: str) -> str: 202 """Shuffles a string using shuffle.shuffle on its list cast.""" 203 204 shuffled = list(item) 205 shuffle(shuffled) 206 207 return "".join(shuffled) 208 209 210def macro_link(*args) -> str: 211 """Creates a clickable hyperlink. 212 213 Note: 214 Since this is a pretty new feature for terminals, its support is limited. 215 """ 216 217 *uri_parts, label = args 218 uri = ":".join(uri_parts) 219 220 return f"\x1b]8;;{uri}\x1b\\{label}\x1b]8;;\x1b\\" 221 222 223def _apply_colors(colors: list[str] | list[int], item: str) -> str: 224 """Applies the given list of colors to the item, spread out evenly.""" 225 226 blocksize = max(round(len(item) / len(colors)), 1) 227 228 out = "" 229 current_block = 0 230 for i, char in enumerate(item): 231 if i % blocksize == 0 and current_block < len(colors): 232 out += f"[{colors[current_block]}]" 233 current_block += 1 234 235 out += char 236 237 return markup.parse(out) 238 239 240def macro_rainbow(item: str) -> str: 241 """Creates rainbow-colored text.""" 242 243 colors = ["red", "208", "yellow", "green", "brightblue", "blue", "93"] 244 245 return _apply_colors(colors, item) 246 247 248def macro_gradient(base_str: str, item: str) -> str: 249 """Creates an xterm-256 gradient from a base color. 250 251 This exploits the way the colors are arranged in the xterm color table; every 252 36th color is the next item of a single gradient. 253 254 The start of this given gradient is calculated by decreasing the given base by 36 on 255 every iteration as long as the point is a valid gradient start. 256 257 After that, the 6 colors of this gradient are calculated and applied. 258 """ 259 260 if not base_str.isdigit(): 261 raise ValueError(f"Gradient base has to be a digit, got {base_str}.") 262 263 base = int(base_str) 264 if base < 16 or base > 231: 265 raise ValueError("Gradient base must be between 16 and 232") 266 267 while base > 52: 268 base -= 36 269 270 colors = [] 271 for i in range(6): 272 colors.append(base + 36 * i) 273 274 return _apply_colors(colors, item) 275 276 277class TokenType(Enum): 278 """An Enum to store various token types.""" 279 280 LINK = _auto() 281 """A terminal hyperlink.""" 282 283 PLAIN = _auto() 284 """Plain text, nothing interesting.""" 285 286 COLOR = _auto() 287 """A color token. Has a `pytermgui.colors.Color` instance as its data.""" 288 289 STYLE = _auto() 290 """A builtin terminal style, such as `bold` or `italic`.""" 291 292 MACRO = _auto() 293 """A PTG markup macro. The macro itself is stored inside `self.data`.""" 294 295 ESCAPED = _auto() 296 """An escaped token.""" 297 298 UNSETTER = _auto() 299 """A token that unsets some other attribute.""" 300 301 POSITION = _auto() 302 """A token representing a positioning string. `self.data` follows the format `x,y`.""" 303 304 305@dataclass 306class Token: 307 """A class holding information on a singular markup or ANSI style unit. 308 309 Attributes: 310 """ 311 312 ttype: TokenType 313 """The type of this token.""" 314 315 data: str | MacroCall | Color | None 316 """The data contained within this token. This changes based on the `ttype` attr.""" 317 318 name: str = "<unnamed-token>" 319 """An optional display name of the token. Defaults to `data` when not given.""" 320 321 def __post_init__(self) -> None: 322 """Sets `name` to `data` if not provided.""" 323 324 if self.name == "<unnamed-token>": 325 if isinstance(self.data, str): 326 self.name = self.data 327 328 elif isinstance(self.data, Color): 329 self.name = self.data.name 330 331 else: 332 raise TypeError 333 334 # Create LINK from a plain token 335 if self.ttype is TokenType.PLAIN: 336 assert isinstance(self.data, str) 337 338 link_match = RE_LINK.match(self.data) 339 340 if link_match is not None: 341 self.data, self.name = link_match.groups() 342 self.ttype = TokenType.LINK 343 344 if self.ttype is TokenType.ESCAPED: 345 assert isinstance(self.data, str) 346 347 self.name = self.data[1:] 348 349 def __eq__(self, other: object) -> bool: 350 """Checks equality with `other`.""" 351 352 if other is None: 353 return False 354 355 if not isinstance(other, type(self)): 356 return False 357 358 return other.data == self.data and other.ttype is self.ttype 359 360 @property 361 def sequence(self) -> str | None: 362 """Returns the ANSI sequence this token represents.""" 363 364 if self.data is None: 365 return None 366 367 if self.ttype in [TokenType.PLAIN, TokenType.MACRO, TokenType.ESCAPED]: 368 return None 369 370 if self.ttype is TokenType.LINK: 371 return macro_link(self.data, self.name) 372 373 if self.ttype is TokenType.POSITION: 374 assert isinstance(self.data, str) 375 position = self.data.split(",") 376 return f"\x1b[{position[1]};{position[0]}H" 377 378 # Colors and styles 379 data = self.data 380 381 if self.ttype in [TokenType.STYLE, TokenType.UNSETTER]: 382 return f"\033[{data}m" 383 384 assert isinstance(data, Color) 385 return data.sequence 386 387 388class StyledText(str): 389 """A styled text object. 390 391 The purpose of this class is to implement some things regular `str` 392 breaks at when encountering ANSI sequences. 393 394 Instances of this class are usually spat out by `MarkupLanguage.parse`, 395 but may be manually constructed if the need arises. Everything works even 396 if there is no ANSI tomfoolery going on. 397 """ 398 399 value: str 400 """The underlying, ANSI-inclusive string value.""" 401 402 _plain: str | None = None 403 _tokens: list[Token] | None = None 404 405 def __new__(cls, value: str = ""): 406 """Creates a StyledText, gets markup tags.""" 407 408 obj = super().__new__(cls, value) 409 obj.value = value 410 411 return obj 412 413 def _generate_tokens(self) -> None: 414 """Generates self._tokens & self._plain.""" 415 416 self._tokens = list(tim.tokenize_ansi(self.value)) 417 418 self._plain = "" 419 for token in self._tokens: 420 if token.ttype is not TokenType.PLAIN: 421 continue 422 423 assert isinstance(token.data, str) 424 self._plain += token.data 425 426 @property 427 def tokens(self) -> list[Token]: 428 """Returns all markup tokens of this object. 429 430 Generated on-demand, at the first call to this or the self.plain 431 property. 432 """ 433 434 if self._tokens is not None: 435 return self._tokens 436 437 self._generate_tokens() 438 assert self._tokens is not None 439 return self._tokens 440 441 @property 442 def plain(self) -> str: 443 """Returns the value of this object, with no ANSI sequences. 444 445 Generated on-demand, at the first call to this or the self.tokens 446 property. 447 """ 448 449 if self._plain is not None: 450 return self._plain 451 452 self._generate_tokens() 453 assert self._plain is not None 454 return self._plain 455 456 def plain_index(self, index: int | None) -> int | None: 457 """Finds given index inside plain text.""" 458 459 if index is None: 460 return None 461 462 styled_chars = 0 463 plain_chars = 0 464 negative_index = False 465 466 tokens = self.tokens.copy() 467 if index < 0: 468 tokens.reverse() 469 index = abs(index) 470 negative_index = True 471 472 for token in tokens: 473 if token.data is None: 474 continue 475 476 if token.ttype is not TokenType.PLAIN: 477 assert token.sequence is not None 478 styled_chars += len(token.sequence) 479 continue 480 481 assert isinstance(token.data, str) 482 for _ in range(len(token.data)): 483 if plain_chars == index: 484 if negative_index: 485 return -1 * (plain_chars + styled_chars) 486 487 return styled_chars + plain_chars 488 489 plain_chars += 1 490 491 return None 492 493 def __len__(self) -> int: 494 """Gets "real" length of object.""" 495 496 return len(self.plain) 497 498 def __getitem__(self, subscript: int | slice) -> str: 499 """Gets an item, adjusted for non-plain text. 500 501 Args: 502 subscript: The integer or slice to find. 503 504 Returns: 505 The elements described by the subscript. 506 507 Raises: 508 IndexError: The given index is out of range. 509 """ 510 511 if isinstance(subscript, int): 512 plain_index = self.plain_index(subscript) 513 if plain_index is None: 514 raise IndexError("StyledText index out of range") 515 516 return self.value[plain_index] 517 518 return self.value[ 519 slice( 520 self.plain_index(subscript.start), 521 self.plain_index(subscript.stop), 522 subscript.step, 523 ) 524 ] 525 526 527class MarkupLanguage: 528 """A class representing an instance of a Markup Language. 529 530 This class is used for all markup/ANSI parsing, tokenizing and usage. 531 532 ```python3 533 from pytermgui import tim 534 535 tim.alias("my-tag", "@152 72 bold") 536 tim.print("This is [my-tag]my-tag[/]!") 537 ``` 538 539 <p style="text-align: center"> 540 <img src="https://raw.githubusercontent.com/bczsalba/pytermgui/master/assets/\ 541docs/parser/markup_language.png" 542 style="width: 80%"> 543 </p> 544 """ 545 546 raise_unknown_markup: bool = False 547 """Raise `pytermgui.exceptions.MarkupSyntaxError` when encountering unknown markup tags.""" 548 549 def __init__(self, default_macros: bool = True) -> None: 550 """Initializes a MarkupLanguage. 551 552 Args: 553 default_macros: If not set, the builtin macros are not defined. 554 """ 555 556 self.tags: dict[str, str] = STYLE_MAP.copy() 557 self._cache: dict[str, StyledText] = {} 558 self.macros: dict[str, MacroCallable] = {} 559 self.user_tags: dict[str, str] = {} 560 self.unsetters: dict[str, str | None] = UNSETTER_MAP.copy() 561 562 self.should_cache: bool = True 563 564 if default_macros: 565 self.define("!link", macro_link) 566 self.define("!align", macro_align) 567 self.define("!markup", self.get_markup) 568 self.define("!shuffle", macro_shuffle) 569 self.define("!strip_bg", macro_strip_bg) 570 self.define("!strip_fg", macro_strip_fg) 571 self.define("!rainbow", macro_rainbow) 572 self.define("!gradient", macro_gradient) 573 self.define("!upper", lambda item: str(item.upper())) 574 self.define("!lower", lambda item: str(item.lower())) 575 self.define("!title", lambda item: str(item.title())) 576 self.define("!capitalize", lambda item: str(item.capitalize())) 577 self.define("!expand", lambda tag: macro_expand(self, tag)) 578 self.define("!debug", lambda *args: ",".join(ascii(arg) for arg in args)) 579 580 self.alias("code", "dim @black") 581 self.alias("code.str", "142") 582 self.alias("code.none", "167") 583 self.alias("code.global", "214") 584 self.alias("code.number", "175") 585 self.alias("code.keyword", "203") 586 self.alias("code.identifier", "109") 587 self.alias("code.name", "code.global") 588 self.alias("code.comment", "240 italic") 589 self.alias("code.builtin", "code.global") 590 self.alias("code.file", "code.identifier") 591 self.alias("code.symbol", "code.identifier") 592 593 def _get_color_token(self, tag: str) -> Token | None: 594 """Tries to get a color token from the given tag. 595 596 Args: 597 tag: The tag to parse. 598 599 Returns: 600 A color token if the given tag could be parsed into one, else None. 601 """ 602 603 try: 604 color = str_to_color(tag, use_cache=self.should_cache) 605 606 except ColorSyntaxError: 607 return None 608 609 return Token(name=color.value, ttype=TokenType.COLOR, data=color) 610 611 def _get_style_token(self, tag: str) -> Token | None: 612 """Tries to get a style (including unsetter) token from tags, user tags and unsetters. 613 614 Args: 615 tag: The tag to parse. 616 617 Returns: 618 A `Token` if one could be created, None otherwise. 619 """ 620 621 if tag in self.unsetters: 622 return Token(name=tag, ttype=TokenType.UNSETTER, data=self.unsetters[tag]) 623 624 if tag in self.user_tags: 625 return Token(name=tag, ttype=TokenType.STYLE, data=self.user_tags[tag]) 626 627 if tag in self.tags: 628 return Token(name=tag, ttype=TokenType.STYLE, data=self.tags[tag]) 629 630 return None 631 632 def print(self, *args, **kwargs) -> None: 633 """Parse all arguments and pass them through to print, along with kwargs.""" 634 635 parsed = [] 636 for arg in args: 637 parsed.append(self.parse(str(arg))) 638 639 get_terminal().print(*parsed, **kwargs) 640 641 def tokenize_markup(self, markup_text: str) -> Iterator[Token]: 642 """Converts the given markup string into an iterator of `Token`. 643 644 Args: 645 markup_text: The text to look at. 646 647 Returns: 648 An iterator of tokens. The reason this is an iterator is to possibly save 649 on memory. 650 """ 651 652 end = 0 653 start = 0 654 cursor = 0 655 for match in RE_MARKUP.finditer(markup_text): 656 full, escapes, tag_text = match.groups() 657 start, end = match.span() 658 659 # Add plain text between last and current match 660 if start > cursor: 661 yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:start]) 662 663 if not escapes == "" and len(escapes) % 2 == 1: 664 cursor = end 665 yield Token(ttype=TokenType.ESCAPED, data=full[len(escapes) - 1 :]) 666 continue 667 668 for tag in tag_text.split(): 669 token = self._get_style_token(tag) 670 if token is not None: 671 yield token 672 continue 673 674 # Try to find a color token 675 token = self._get_color_token(tag) 676 if token is not None: 677 yield token 678 continue 679 680 macro_match = RE_MACRO.match(tag) 681 if macro_match is not None: 682 name, args = macro_match.groups() 683 macro_args = () if args is None else args.split(":") 684 685 if not name in self.macros: 686 raise MarkupSyntaxError( 687 tag=tag, 688 cause="is not a defined macro", 689 context=markup_text, 690 ) 691 692 yield Token( 693 name=tag, 694 ttype=TokenType.MACRO, 695 data=(self.macros[name], macro_args), 696 ) 697 continue 698 699 if self.raise_unknown_markup: 700 raise MarkupSyntaxError( 701 tag=tag, cause="not defined", context=markup_text 702 ) 703 704 cursor = end 705 706 # Add remaining text as plain 707 if len(markup_text) > cursor: 708 yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:]) 709 710 def tokenize_ansi(self, ansi: str) -> Iterator[Token]: 711 """Converts the given ANSI string into an iterator of `Token`. 712 713 Args: 714 ansi: The text to look at. 715 716 Returns: 717 An iterator of tokens. The reason this is an iterator is to possibly save 718 on memory. 719 """ 720 721 def _is_in_tags(code: str, tags: dict[str, str]) -> str | None: 722 """Determines whether a code is in the given dict of tags.""" 723 724 for name, current in tags.items(): 725 if current == code: 726 return name 727 728 return None 729 730 end = 0 731 start = 0 732 cursor = 0 733 734 # StyledText messes with indexing, so we need to cast it 735 # back to str. 736 if isinstance(ansi, StyledText): 737 ansi = str(ansi) 738 739 for match in RE_ANSI.finditer(ansi): 740 code = match.groups()[0] 741 start, end = match.span() 742 743 if code is None: 744 continue 745 746 parts = code.split(";") 747 748 if start > cursor: 749 plain = ansi[cursor:start] 750 751 yield Token(name=plain, ttype=TokenType.PLAIN, data=plain) 752 753 name: str | None = code 754 ttype = None 755 data: str | Color = parts[0] 756 757 # Styles & Unsetters 758 if len(parts) == 1: 759 # Covariancy is not an issue here, even though mypy seems to think so. 760 name = _is_in_tags(parts[0], self.unsetters) # type: ignore 761 if name is not None: 762 ttype = TokenType.UNSETTER 763 764 else: 765 name = _is_in_tags(parts[0], self.tags) 766 if name is not None: 767 ttype = TokenType.STYLE 768 769 # Colors 770 if ttype is None: 771 with suppress(ColorSyntaxError): 772 data = str_to_color(code) 773 name = data.name 774 ttype = TokenType.COLOR 775 776 if name is None or ttype is None or data is None: 777 if len(parts) != 2: 778 raise AnsiSyntaxError( 779 tag=parts[0], cause="not recognized", context=ansi 780 ) 781 782 name = "position" 783 ttype = TokenType.POSITION 784 data = ",".join(reversed(parts)) 785 786 yield Token(name=name, ttype=ttype, data=data) 787 cursor = end 788 789 if cursor < len(ansi): 790 plain = ansi[cursor:] 791 792 yield Token(ttype=TokenType.PLAIN, data=plain) 793 794 def define(self, name: str, method: MacroCallable) -> None: 795 """Defines a Macro tag that executes the given method. 796 797 Args: 798 name: The name the given method will be reachable by within markup. 799 The given value gets "!" prepended if it isn't present already. 800 method: The method this macro will execute. 801 """ 802 803 if not name.startswith("!"): 804 name = f"!{name}" 805 806 self.macros[name] = method 807 self.unsetters[f"/{name}"] = None 808 809 def alias(self, name: str, value: str) -> None: 810 """Aliases the given name to a value, and generates an unsetter for it. 811 812 Note that it is not possible to alias macros. 813 814 Args: 815 name: The name of the new tag. 816 value: The value the new tag will stand for. 817 """ 818 819 def _get_unsetter(token: Token) -> str | None: 820 """Get unsetter for a token""" 821 822 if token.ttype is TokenType.PLAIN: 823 return None 824 825 if token.ttype is TokenType.UNSETTER: 826 return self.unsetters[token.name] 827 828 if token.ttype is TokenType.COLOR: 829 assert isinstance(token.data, Color) 830 831 if token.data.background: 832 return self.unsetters["/bg"] 833 834 return self.unsetters["/fg"] 835 836 name = f"/{token.name}" 837 if not name in self.unsetters: 838 raise KeyError(f"Could not find unsetter for token {token}.") 839 840 return self.unsetters[name] 841 842 if name.startswith("!"): 843 raise ValueError('Only macro tags can always start with "!".') 844 845 setter = "" 846 unsetter = "" 847 848 # Try to link to existing tag 849 if value in self.user_tags: 850 self.unsetters[f"/{name}"] = self.unsetters[f"/{value}"] 851 self.user_tags[name] = self.user_tags[value] 852 return 853 854 for token in self.tokenize_markup(f"[{value}]"): 855 if token.ttype is TokenType.PLAIN: 856 continue 857 858 assert token.sequence is not None 859 setter += token.sequence 860 861 t_unsetter = _get_unsetter(token) 862 unsetter += f"\x1b[{t_unsetter}m" 863 864 self.unsetters[f"/{name}"] = unsetter.lstrip("\x1b[").rstrip("m") 865 self.user_tags[name] = setter.lstrip("\x1b[").rstrip("m") 866 867 marked: list[str] = [] 868 for item in self._cache: 869 if name in item: 870 marked.append(item) 871 872 for item in marked: 873 del self._cache[item] 874 875 # TODO: I cannot cut down the one-too-many branch that this has at the moment. 876 # We could look into it in the future, however. 877 def parse( # pylint: disable=too-many-branches 878 self, markup_text: str 879 ) -> StyledText: 880 """Parses the given markup. 881 882 Args: 883 markup_text: The markup to parse. 884 885 Returns: 886 A `StyledText` instance of the result of parsing the input. This 887 custom `str` class is used to allow accessing the plain value of 888 the output, as well as to cleanly index within it. It is analogous 889 to builtin `str`, only adds extra things on top. 890 """ 891 892 applied_macros: list[tuple[str, MacroCall]] = [] 893 previous_token: Token | None = None 894 previous_sequence = "" 895 sequence = "" 896 out = "" 897 898 def _apply_macros(text: str) -> str: 899 """Apply current macros to text""" 900 901 for _, (method, args) in applied_macros: 902 text = method(*args, text) 903 904 return text 905 906 def _is_same_colorgroup(previous: Token, new: Token) -> bool: 907 if not isinstance(new.data, Color) or not isinstance(previous.data, Color): 908 return False 909 910 return previous.data.background == new.data.background and type( 911 previous 912 ) is type(new) 913 914 if ( 915 self.should_cache 916 and markup_text in self._cache 917 and len(RE_MACRO.findall(markup_text)) == 0 918 ): 919 return self._cache[markup_text] 920 921 token: Token 922 for token in self.tokenize_markup(markup_text): 923 if sequence != "" and previous_token == token: 924 continue 925 926 # Optimize out previously added color tokens, as only the most 927 # recent would be visible anyways. 928 if ( 929 token.sequence is not None 930 and previous_token is not None 931 and _is_same_colorgroup(previous_token, token) 932 ): 933 sequence = token.sequence 934 continue 935 936 if token.ttype == TokenType.UNSETTER and token.data == "0": 937 out += "\033[0m" 938 sequence = "" 939 applied_macros = [] 940 continue 941 942 previous_token = token 943 944 # Macro unsetters are stored with None as their data 945 if token.data is None and token.ttype is TokenType.UNSETTER: 946 for item, data in applied_macros.copy(): 947 macro_match = RE_MACRO.match(item) 948 assert macro_match is not None 949 950 macro_name = macro_match.groups()[0] 951 952 if f"/{macro_name}" == token.name: 953 applied_macros.remove((item, data)) 954 955 continue 956 957 if token.ttype is TokenType.MACRO: 958 assert isinstance(token.data, tuple) 959 960 applied_macros.append((token.name, token.data)) 961 continue 962 963 if token.sequence is None: 964 applied = sequence 965 for item in previous_sequence.split("\x1b"): 966 if item == "" or item[1:-1] in self.unsetters.values(): 967 continue 968 969 item = f"\x1b{item}" 970 applied = applied.replace(item, "") 971 972 out += applied + _apply_macros(token.name) 973 previous_sequence = sequence 974 sequence = "" 975 continue 976 977 sequence += token.sequence 978 979 if sequence + previous_sequence != "": 980 out += "\x1b[0m" 981 982 out = StyledText(out) 983 self._cache[markup_text] = out 984 return out 985 986 def get_markup(self, ansi: str) -> str: 987 """Generates markup from ANSI text. 988 989 Args: 990 ansi: The text to get markup from. 991 992 Returns: 993 A markup string that can be parsed to get (visually) the same 994 result. Note that this conversion is lossy in a way: there are some 995 details (like macros) that cannot be preserved in an ANSI->Markup->ANSI 996 conversion. 997 """ 998 999 current_tags: list[str] = [] 1000 out = "" 1001 for token in self.tokenize_ansi(ansi): 1002 if token.ttype is TokenType.PLAIN: 1003 if len(current_tags) != 0: 1004 out += "[" + " ".join(current_tags) + "]" 1005 1006 assert isinstance(token.data, str) 1007 out += token.data 1008 current_tags = [] 1009 continue 1010 1011 if token.ttype is TokenType.ESCAPED: 1012 assert isinstance(token.data, str) 1013 1014 current_tags.append(token.data) 1015 continue 1016 1017 current_tags.append(token.name) 1018 1019 return out 1020 1021 def prettify_ansi(self, text: str) -> str: 1022 """Returns a prettified (syntax-highlighted) ANSI str. 1023 1024 This is useful to quickly "inspect" a given ANSI string. However, 1025 for most real uses `MarkupLanguage.prettify_markup` would be 1026 preferable, given an argument of `MarkupLanguage.get_markup(text)`, 1027 as it is much more verbose. 1028 1029 Args: 1030 text: The ANSI-text to prettify. 1031 1032 Returns: 1033 The prettified ANSI text. This text's styles remain valid, 1034 so copy-pasting the argument into a command (like printf) 1035 that can show styled text will work the same way. 1036 """ 1037 1038 out = "" 1039 sequences = "" 1040 for token in self.tokenize_ansi(text): 1041 if token.ttype is TokenType.PLAIN: 1042 assert isinstance(token.data, str) 1043 out += token.data 1044 continue 1045 1046 assert token.sequence is not None 1047 out += "\x1b[0m" + token.sequence + token.sequence.replace("\x1b", "\\x1b") 1048 sequences += token.sequence 1049 out += sequences 1050 1051 return out 1052 1053 def prettify_markup(self, text: str) -> str: 1054 """Returns a prettified (syntax-highlighted) markup str. 1055 1056 Args: 1057 text: The markup-text to prettify. 1058 1059 Returns: 1060 Prettified markup. This markup, excluding its styles, 1061 remains valid markup. 1062 """ 1063 1064 def _apply_macros(text: str) -> str: 1065 """Apply current macros to text""" 1066 1067 for _, (method, args) in applied_macros: 1068 text = method(*args, text) 1069 1070 return text 1071 1072 def _pop_macro(name: str) -> None: 1073 """Pops a macro from applied_macros.""" 1074 1075 for i, (macro_name, _) in enumerate(applied_macros): 1076 if macro_name == name: 1077 applied_macros.pop(i) 1078 break 1079 1080 def _finish(out: str, in_sequence: bool) -> str: 1081 """Adds ending cap to the given string.""" 1082 1083 if in_sequence: 1084 if not out.endswith("\x1b[0m"): 1085 out += "\x1b[0m" 1086 1087 return out + "]" 1088 1089 return out + "[/]" 1090 1091 styles: dict[TokenType, str] = { 1092 TokenType.MACRO: "210", 1093 TokenType.ESCAPED: "210 bold", 1094 TokenType.UNSETTER: "strikethrough", 1095 } 1096 1097 applied_macros: list[tuple[str, MacroCall]] = [] 1098 1099 out = "" 1100 in_sequence = False 1101 current_styles: list[Token] = [] 1102 1103 for token in self.tokenize_markup(text): 1104 if token.ttype in [TokenType.PLAIN, TokenType.ESCAPED]: 1105 if in_sequence: 1106 out += "]" 1107 1108 in_sequence = False 1109 1110 sequence = "" 1111 for style in current_styles: 1112 if style.sequence is None: 1113 continue 1114 1115 sequence += style.sequence 1116 1117 out += f"{sequence}{_apply_macros(token.name)}\033[0m" 1118 continue 1119 1120 out += " " if in_sequence else "[" 1121 in_sequence = True 1122 1123 if token.ttype is TokenType.UNSETTER: 1124 if token.name == "/": 1125 applied_macros = [] 1126 1127 name = token.name[1:] 1128 1129 if name in self.macros: 1130 _pop_macro(name) 1131 1132 current_styles.append(token) 1133 1134 out += self.parse( 1135 ("" if (name in self.tags) or (name in self.user_tags) else "") 1136 + f"[{styles[TokenType.UNSETTER]}]/{name}" 1137 ) 1138 continue 1139 1140 if token.ttype is TokenType.MACRO: 1141 assert isinstance(token.data, tuple) 1142 1143 name = token.name 1144 if "(" in name: 1145 name = name[: token.name.index("(")] 1146 1147 applied_macros.append((name, token.data)) 1148 1149 try: 1150 out += token.data[0](*token.data[1], token.name) 1151 continue 1152 1153 except TypeError: # Not enough arguments 1154 pass 1155 1156 if token.sequence is not None: 1157 current_styles.append(token) 1158 1159 style_markup = styles.get(token.ttype) or token.name 1160 out += self.parse(f"[{style_markup}]{token.name}") 1161 1162 return _finish(out, in_sequence) 1163 1164 def get_styled_plains(self, text: str) -> Iterator[StyledText]: 1165 """Gets all plain tokens within text, with their respective styles applied. 1166 1167 Args: 1168 text: The ANSI-sequence containing string to find plains from. 1169 1170 Returns: 1171 An iterator of `StyledText` objects, each yielded when a new plain token is found, 1172 containing the styles that are relevant and active on the given plain. 1173 """ 1174 1175 def _apply_styles(styles: list[Token], text: str) -> str: 1176 """Applies given styles to text.""" 1177 1178 for token in styles: 1179 if token.ttype is TokenType.MACRO: 1180 assert isinstance(token.data, tuple) 1181 text = token.data[0](*token.data[1], text) 1182 continue 1183 1184 if token.sequence is None: 1185 continue 1186 1187 text = token.sequence + text 1188 1189 return text 1190 1191 def _pop_unsetter(token: Token, styles: list[Token]) -> list[Token]: 1192 """Removes an unsetter from the list, returns the new list.""" 1193 1194 if token.name == "/": 1195 return [] 1196 1197 target_name = token.name[1:] 1198 for style in styles: 1199 # bold & dim unsetters represent the same character, so we have 1200 # to treat them the same way. 1201 style_name = style.name 1202 1203 if style.name == "dim": 1204 style_name = "bold" 1205 1206 if style_name == target_name: 1207 styles.remove(style) 1208 1209 elif ( 1210 style_name.startswith(target_name) 1211 and style.ttype is TokenType.MACRO 1212 ): 1213 styles.remove(style) 1214 1215 elif style.ttype is TokenType.COLOR: 1216 assert isinstance(style.data, Color) 1217 if target_name == "fg" and not style.data.background: 1218 styles.remove(style) 1219 1220 elif target_name == "bg" and style.data.background: 1221 styles.remove(style) 1222 1223 return styles 1224 1225 styles: list[Token] = [] 1226 for token in self.tokenize_ansi(text): 1227 if token.ttype is TokenType.COLOR: 1228 for i, style in enumerate(reversed(styles)): 1229 if style.ttype is TokenType.COLOR: 1230 assert isinstance(style.data, Color) 1231 assert isinstance(token.data, Color) 1232 1233 if style.data.background != token.data.background: 1234 continue 1235 1236 styles[len(styles) - i - 1] = token 1237 break 1238 else: 1239 styles.append(token) 1240 1241 continue 1242 1243 if token.ttype is TokenType.LINK: 1244 styles.append(token) 1245 yield StyledText(_apply_styles(styles, token.name)) 1246 1247 if token.ttype is TokenType.PLAIN: 1248 assert isinstance(token.data, str) 1249 yield StyledText(_apply_styles(styles, token.data)) 1250 continue 1251 1252 if token.ttype is TokenType.UNSETTER: 1253 styles = _pop_unsetter(token, styles) 1254 continue 1255 1256 styles.append(token) 1257 1258 1259def main() -> None: 1260 """Main method""" 1261 1262 parser = ArgumentParser() 1263 1264 markup_group = parser.add_argument_group("Markup->ANSI") 1265 markup_group.add_argument( 1266 "-p", "--parse", metavar=("TXT"), help="parse a markup text" 1267 ) 1268 markup_group.add_argument( 1269 "-e", "--escape", help="escape parsed markup", action="store_true" 1270 ) 1271 # markup_group.add_argument( 1272 # "-o", 1273 # "--optimize", 1274 # help="set optimization level for markup parsing", 1275 # action="count", 1276 # default=0, 1277 # ) 1278 1279 markup_group.add_argument("--alias", action="append", help="alias src=dst") 1280 1281 ansi_group = parser.add_argument_group("ANSI->Markup") 1282 ansi_group.add_argument( 1283 "-m", "--markup", metavar=("TXT"), help="get markup from ANSI text" 1284 ) 1285 ansi_group.add_argument( 1286 "-s", 1287 "--show-inverse", 1288 action="store_true", 1289 help="show result of parsing result markup", 1290 ) 1291 1292 args = parser.parse_args() 1293 1294 lang = MarkupLanguage() 1295 1296 if args.markup: 1297 markup_text = lang.get_markup(args.markup) 1298 print(markup_text, end="") 1299 1300 if args.show_inverse: 1301 print("->", lang.parse(markup_text)) 1302 else: 1303 print() 1304 1305 if args.parse: 1306 if args.alias: 1307 for alias in args.alias: 1308 src, dest = alias.split("=") 1309 lang.alias(src, dest) 1310 1311 parsed = lang.parse(args.parse) 1312 1313 if args.escape: 1314 print(ascii(parsed)) 1315 else: 1316 print(parsed) 1317 1318 return 1319 1320 1321tim = markup = MarkupLanguage() 1322"""The default TIM instances.""" 1323 1324if __name__ == "__main__": 1325 main()
View Source
389class StyledText(str): 390 """A styled text object. 391 392 The purpose of this class is to implement some things regular `str` 393 breaks at when encountering ANSI sequences. 394 395 Instances of this class are usually spat out by `MarkupLanguage.parse`, 396 but may be manually constructed if the need arises. Everything works even 397 if there is no ANSI tomfoolery going on. 398 """ 399 400 value: str 401 """The underlying, ANSI-inclusive string value.""" 402 403 _plain: str | None = None 404 _tokens: list[Token] | None = None 405 406 def __new__(cls, value: str = ""): 407 """Creates a StyledText, gets markup tags.""" 408 409 obj = super().__new__(cls, value) 410 obj.value = value 411 412 return obj 413 414 def _generate_tokens(self) -> None: 415 """Generates self._tokens & self._plain.""" 416 417 self._tokens = list(tim.tokenize_ansi(self.value)) 418 419 self._plain = "" 420 for token in self._tokens: 421 if token.ttype is not TokenType.PLAIN: 422 continue 423 424 assert isinstance(token.data, str) 425 self._plain += token.data 426 427 @property 428 def tokens(self) -> list[Token]: 429 """Returns all markup tokens of this object. 430 431 Generated on-demand, at the first call to this or the self.plain 432 property. 433 """ 434 435 if self._tokens is not None: 436 return self._tokens 437 438 self._generate_tokens() 439 assert self._tokens is not None 440 return self._tokens 441 442 @property 443 def plain(self) -> str: 444 """Returns the value of this object, with no ANSI sequences. 445 446 Generated on-demand, at the first call to this or the self.tokens 447 property. 448 """ 449 450 if self._plain is not None: 451 return self._plain 452 453 self._generate_tokens() 454 assert self._plain is not None 455 return self._plain 456 457 def plain_index(self, index: int | None) -> int | None: 458 """Finds given index inside plain text.""" 459 460 if index is None: 461 return None 462 463 styled_chars = 0 464 plain_chars = 0 465 negative_index = False 466 467 tokens = self.tokens.copy() 468 if index < 0: 469 tokens.reverse() 470 index = abs(index) 471 negative_index = True 472 473 for token in tokens: 474 if token.data is None: 475 continue 476 477 if token.ttype is not TokenType.PLAIN: 478 assert token.sequence is not None 479 styled_chars += len(token.sequence) 480 continue 481 482 assert isinstance(token.data, str) 483 for _ in range(len(token.data)): 484 if plain_chars == index: 485 if negative_index: 486 return -1 * (plain_chars + styled_chars) 487 488 return styled_chars + plain_chars 489 490 plain_chars += 1 491 492 return None 493 494 def __len__(self) -> int: 495 """Gets "real" length of object.""" 496 497 return len(self.plain) 498 499 def __getitem__(self, subscript: int | slice) -> str: 500 """Gets an item, adjusted for non-plain text. 501 502 Args: 503 subscript: The integer or slice to find. 504 505 Returns: 506 The elements described by the subscript. 507 508 Raises: 509 IndexError: The given index is out of range. 510 """ 511 512 if isinstance(subscript, int): 513 plain_index = self.plain_index(subscript) 514 if plain_index is None: 515 raise IndexError("StyledText index out of range") 516 517 return self.value[plain_index] 518 519 return self.value[ 520 slice( 521 self.plain_index(subscript.start), 522 self.plain_index(subscript.stop), 523 subscript.step, 524 ) 525 ]
A styled text object.
The purpose of this class is to implement some things regular str
breaks at when encountering ANSI sequences.
Instances of this class are usually spat out by MarkupLanguage.parse
,
but may be manually constructed if the need arises. Everything works even
if there is no ANSI tomfoolery going on.
View Source
Creates a StyledText, gets markup tags.
The underlying, ANSI-inclusive string value.
Returns all markup tokens of this object.
Generated on-demand, at the first call to this or the self.plain property.
Returns the value of this object, with no ANSI sequences.
Generated on-demand, at the first call to this or the self.tokens property.
View Source
457 def plain_index(self, index: int | None) -> int | None: 458 """Finds given index inside plain text.""" 459 460 if index is None: 461 return None 462 463 styled_chars = 0 464 plain_chars = 0 465 negative_index = False 466 467 tokens = self.tokens.copy() 468 if index < 0: 469 tokens.reverse() 470 index = abs(index) 471 negative_index = True 472 473 for token in tokens: 474 if token.data is None: 475 continue 476 477 if token.ttype is not TokenType.PLAIN: 478 assert token.sequence is not None 479 styled_chars += len(token.sequence) 480 continue 481 482 assert isinstance(token.data, str) 483 for _ in range(len(token.data)): 484 if plain_chars == index: 485 if negative_index: 486 return -1 * (plain_chars + styled_chars) 487 488 return styled_chars + plain_chars 489 490 plain_chars += 1 491 492 return None
Finds given index inside plain text.
Inherited Members
- builtins.str
- encode
- replace
- split
- rsplit
- join
- capitalize
- casefold
- title
- center
- count
- expandtabs
- find
- partition
- index
- ljust
- lower
- lstrip
- rfind
- rindex
- rjust
- rstrip
- rpartition
- splitlines
- strip
- swapcase
- translate
- upper
- startswith
- endswith
- removeprefix
- removesuffix
- isascii
- islower
- isupper
- istitle
- isspace
- isdecimal
- isdigit
- isnumeric
- isalpha
- isalnum
- isidentifier
- isprintable
- zfill
- format
- format_map
- maketrans
View Source
528class MarkupLanguage: 529 """A class representing an instance of a Markup Language. 530 531 This class is used for all markup/ANSI parsing, tokenizing and usage. 532 533 ```python3 534 from pytermgui import tim 535 536 tim.alias("my-tag", "@152 72 bold") 537 tim.print("This is [my-tag]my-tag[/]!") 538 ``` 539 540 <p style="text-align: center"> 541 <img src="https://raw.githubusercontent.com/bczsalba/pytermgui/master/assets/\ 542docs/parser/markup_language.png" 543 style="width: 80%"> 544 </p> 545 """ 546 547 raise_unknown_markup: bool = False 548 """Raise `pytermgui.exceptions.MarkupSyntaxError` when encountering unknown markup tags.""" 549 550 def __init__(self, default_macros: bool = True) -> None: 551 """Initializes a MarkupLanguage. 552 553 Args: 554 default_macros: If not set, the builtin macros are not defined. 555 """ 556 557 self.tags: dict[str, str] = STYLE_MAP.copy() 558 self._cache: dict[str, StyledText] = {} 559 self.macros: dict[str, MacroCallable] = {} 560 self.user_tags: dict[str, str] = {} 561 self.unsetters: dict[str, str | None] = UNSETTER_MAP.copy() 562 563 self.should_cache: bool = True 564 565 if default_macros: 566 self.define("!link", macro_link) 567 self.define("!align", macro_align) 568 self.define("!markup", self.get_markup) 569 self.define("!shuffle", macro_shuffle) 570 self.define("!strip_bg", macro_strip_bg) 571 self.define("!strip_fg", macro_strip_fg) 572 self.define("!rainbow", macro_rainbow) 573 self.define("!gradient", macro_gradient) 574 self.define("!upper", lambda item: str(item.upper())) 575 self.define("!lower", lambda item: str(item.lower())) 576 self.define("!title", lambda item: str(item.title())) 577 self.define("!capitalize", lambda item: str(item.capitalize())) 578 self.define("!expand", lambda tag: macro_expand(self, tag)) 579 self.define("!debug", lambda *args: ",".join(ascii(arg) for arg in args)) 580 581 self.alias("code", "dim @black") 582 self.alias("code.str", "142") 583 self.alias("code.none", "167") 584 self.alias("code.global", "214") 585 self.alias("code.number", "175") 586 self.alias("code.keyword", "203") 587 self.alias("code.identifier", "109") 588 self.alias("code.name", "code.global") 589 self.alias("code.comment", "240 italic") 590 self.alias("code.builtin", "code.global") 591 self.alias("code.file", "code.identifier") 592 self.alias("code.symbol", "code.identifier") 593 594 def _get_color_token(self, tag: str) -> Token | None: 595 """Tries to get a color token from the given tag. 596 597 Args: 598 tag: The tag to parse. 599 600 Returns: 601 A color token if the given tag could be parsed into one, else None. 602 """ 603 604 try: 605 color = str_to_color(tag, use_cache=self.should_cache) 606 607 except ColorSyntaxError: 608 return None 609 610 return Token(name=color.value, ttype=TokenType.COLOR, data=color) 611 612 def _get_style_token(self, tag: str) -> Token | None: 613 """Tries to get a style (including unsetter) token from tags, user tags and unsetters. 614 615 Args: 616 tag: The tag to parse. 617 618 Returns: 619 A `Token` if one could be created, None otherwise. 620 """ 621 622 if tag in self.unsetters: 623 return Token(name=tag, ttype=TokenType.UNSETTER, data=self.unsetters[tag]) 624 625 if tag in self.user_tags: 626 return Token(name=tag, ttype=TokenType.STYLE, data=self.user_tags[tag]) 627 628 if tag in self.tags: 629 return Token(name=tag, ttype=TokenType.STYLE, data=self.tags[tag]) 630 631 return None 632 633 def print(self, *args, **kwargs) -> None: 634 """Parse all arguments and pass them through to print, along with kwargs.""" 635 636 parsed = [] 637 for arg in args: 638 parsed.append(self.parse(str(arg))) 639 640 get_terminal().print(*parsed, **kwargs) 641 642 def tokenize_markup(self, markup_text: str) -> Iterator[Token]: 643 """Converts the given markup string into an iterator of `Token`. 644 645 Args: 646 markup_text: The text to look at. 647 648 Returns: 649 An iterator of tokens. The reason this is an iterator is to possibly save 650 on memory. 651 """ 652 653 end = 0 654 start = 0 655 cursor = 0 656 for match in RE_MARKUP.finditer(markup_text): 657 full, escapes, tag_text = match.groups() 658 start, end = match.span() 659 660 # Add plain text between last and current match 661 if start > cursor: 662 yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:start]) 663 664 if not escapes == "" and len(escapes) % 2 == 1: 665 cursor = end 666 yield Token(ttype=TokenType.ESCAPED, data=full[len(escapes) - 1 :]) 667 continue 668 669 for tag in tag_text.split(): 670 token = self._get_style_token(tag) 671 if token is not None: 672 yield token 673 continue 674 675 # Try to find a color token 676 token = self._get_color_token(tag) 677 if token is not None: 678 yield token 679 continue 680 681 macro_match = RE_MACRO.match(tag) 682 if macro_match is not None: 683 name, args = macro_match.groups() 684 macro_args = () if args is None else args.split(":") 685 686 if not name in self.macros: 687 raise MarkupSyntaxError( 688 tag=tag, 689 cause="is not a defined macro", 690 context=markup_text, 691 ) 692 693 yield Token( 694 name=tag, 695 ttype=TokenType.MACRO, 696 data=(self.macros[name], macro_args), 697 ) 698 continue 699 700 if self.raise_unknown_markup: 701 raise MarkupSyntaxError( 702 tag=tag, cause="not defined", context=markup_text 703 ) 704 705 cursor = end 706 707 # Add remaining text as plain 708 if len(markup_text) > cursor: 709 yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:]) 710 711 def tokenize_ansi(self, ansi: str) -> Iterator[Token]: 712 """Converts the given ANSI string into an iterator of `Token`. 713 714 Args: 715 ansi: The text to look at. 716 717 Returns: 718 An iterator of tokens. The reason this is an iterator is to possibly save 719 on memory. 720 """ 721 722 def _is_in_tags(code: str, tags: dict[str, str]) -> str | None: 723 """Determines whether a code is in the given dict of tags.""" 724 725 for name, current in tags.items(): 726 if current == code: 727 return name 728 729 return None 730 731 end = 0 732 start = 0 733 cursor = 0 734 735 # StyledText messes with indexing, so we need to cast it 736 # back to str. 737 if isinstance(ansi, StyledText): 738 ansi = str(ansi) 739 740 for match in RE_ANSI.finditer(ansi): 741 code = match.groups()[0] 742 start, end = match.span() 743 744 if code is None: 745 continue 746 747 parts = code.split(";") 748 749 if start > cursor: 750 plain = ansi[cursor:start] 751 752 yield Token(name=plain, ttype=TokenType.PLAIN, data=plain) 753 754 name: str | None = code 755 ttype = None 756 data: str | Color = parts[0] 757 758 # Styles & Unsetters 759 if len(parts) == 1: 760 # Covariancy is not an issue here, even though mypy seems to think so. 761 name = _is_in_tags(parts[0], self.unsetters) # type: ignore 762 if name is not None: 763 ttype = TokenType.UNSETTER 764 765 else: 766 name = _is_in_tags(parts[0], self.tags) 767 if name is not None: 768 ttype = TokenType.STYLE 769 770 # Colors 771 if ttype is None: 772 with suppress(ColorSyntaxError): 773 data = str_to_color(code) 774 name = data.name 775 ttype = TokenType.COLOR 776 777 if name is None or ttype is None or data is None: 778 if len(parts) != 2: 779 raise AnsiSyntaxError( 780 tag=parts[0], cause="not recognized", context=ansi 781 ) 782 783 name = "position" 784 ttype = TokenType.POSITION 785 data = ",".join(reversed(parts)) 786 787 yield Token(name=name, ttype=ttype, data=data) 788 cursor = end 789 790 if cursor < len(ansi): 791 plain = ansi[cursor:] 792 793 yield Token(ttype=TokenType.PLAIN, data=plain) 794 795 def define(self, name: str, method: MacroCallable) -> None: 796 """Defines a Macro tag that executes the given method. 797 798 Args: 799 name: The name the given method will be reachable by within markup. 800 The given value gets "!" prepended if it isn't present already. 801 method: The method this macro will execute. 802 """ 803 804 if not name.startswith("!"): 805 name = f"!{name}" 806 807 self.macros[name] = method 808 self.unsetters[f"/{name}"] = None 809 810 def alias(self, name: str, value: str) -> None: 811 """Aliases the given name to a value, and generates an unsetter for it. 812 813 Note that it is not possible to alias macros. 814 815 Args: 816 name: The name of the new tag. 817 value: The value the new tag will stand for. 818 """ 819 820 def _get_unsetter(token: Token) -> str | None: 821 """Get unsetter for a token""" 822 823 if token.ttype is TokenType.PLAIN: 824 return None 825 826 if token.ttype is TokenType.UNSETTER: 827 return self.unsetters[token.name] 828 829 if token.ttype is TokenType.COLOR: 830 assert isinstance(token.data, Color) 831 832 if token.data.background: 833 return self.unsetters["/bg"] 834 835 return self.unsetters["/fg"] 836 837 name = f"/{token.name}" 838 if not name in self.unsetters: 839 raise KeyError(f"Could not find unsetter for token {token}.") 840 841 return self.unsetters[name] 842 843 if name.startswith("!"): 844 raise ValueError('Only macro tags can always start with "!".') 845 846 setter = "" 847 unsetter = "" 848 849 # Try to link to existing tag 850 if value in self.user_tags: 851 self.unsetters[f"/{name}"] = self.unsetters[f"/{value}"] 852 self.user_tags[name] = self.user_tags[value] 853 return 854 855 for token in self.tokenize_markup(f"[{value}]"): 856 if token.ttype is TokenType.PLAIN: 857 continue 858 859 assert token.sequence is not None 860 setter += token.sequence 861 862 t_unsetter = _get_unsetter(token) 863 unsetter += f"\x1b[{t_unsetter}m" 864 865 self.unsetters[f"/{name}"] = unsetter.lstrip("\x1b[").rstrip("m") 866 self.user_tags[name] = setter.lstrip("\x1b[").rstrip("m") 867 868 marked: list[str] = [] 869 for item in self._cache: 870 if name in item: 871 marked.append(item) 872 873 for item in marked: 874 del self._cache[item] 875 876 # TODO: I cannot cut down the one-too-many branch that this has at the moment. 877 # We could look into it in the future, however. 878 def parse( # pylint: disable=too-many-branches 879 self, markup_text: str 880 ) -> StyledText: 881 """Parses the given markup. 882 883 Args: 884 markup_text: The markup to parse. 885 886 Returns: 887 A `StyledText` instance of the result of parsing the input. This 888 custom `str` class is used to allow accessing the plain value of 889 the output, as well as to cleanly index within it. It is analogous 890 to builtin `str`, only adds extra things on top. 891 """ 892 893 applied_macros: list[tuple[str, MacroCall]] = [] 894 previous_token: Token | None = None 895 previous_sequence = "" 896 sequence = "" 897 out = "" 898 899 def _apply_macros(text: str) -> str: 900 """Apply current macros to text""" 901 902 for _, (method, args) in applied_macros: 903 text = method(*args, text) 904 905 return text 906 907 def _is_same_colorgroup(previous: Token, new: Token) -> bool: 908 if not isinstance(new.data, Color) or not isinstance(previous.data, Color): 909 return False 910 911 return previous.data.background == new.data.background and type( 912 previous 913 ) is type(new) 914 915 if ( 916 self.should_cache 917 and markup_text in self._cache 918 and len(RE_MACRO.findall(markup_text)) == 0 919 ): 920 return self._cache[markup_text] 921 922 token: Token 923 for token in self.tokenize_markup(markup_text): 924 if sequence != "" and previous_token == token: 925 continue 926 927 # Optimize out previously added color tokens, as only the most 928 # recent would be visible anyways. 929 if ( 930 token.sequence is not None 931 and previous_token is not None 932 and _is_same_colorgroup(previous_token, token) 933 ): 934 sequence = token.sequence 935 continue 936 937 if token.ttype == TokenType.UNSETTER and token.data == "0": 938 out += "\033[0m" 939 sequence = "" 940 applied_macros = [] 941 continue 942 943 previous_token = token 944 945 # Macro unsetters are stored with None as their data 946 if token.data is None and token.ttype is TokenType.UNSETTER: 947 for item, data in applied_macros.copy(): 948 macro_match = RE_MACRO.match(item) 949 assert macro_match is not None 950 951 macro_name = macro_match.groups()[0] 952 953 if f"/{macro_name}" == token.name: 954 applied_macros.remove((item, data)) 955 956 continue 957 958 if token.ttype is TokenType.MACRO: 959 assert isinstance(token.data, tuple) 960 961 applied_macros.append((token.name, token.data)) 962 continue 963 964 if token.sequence is None: 965 applied = sequence 966 for item in previous_sequence.split("\x1b"): 967 if item == "" or item[1:-1] in self.unsetters.values(): 968 continue 969 970 item = f"\x1b{item}" 971 applied = applied.replace(item, "") 972 973 out += applied + _apply_macros(token.name) 974 previous_sequence = sequence 975 sequence = "" 976 continue 977 978 sequence += token.sequence 979 980 if sequence + previous_sequence != "": 981 out += "\x1b[0m" 982 983 out = StyledText(out) 984 self._cache[markup_text] = out 985 return out 986 987 def get_markup(self, ansi: str) -> str: 988 """Generates markup from ANSI text. 989 990 Args: 991 ansi: The text to get markup from. 992 993 Returns: 994 A markup string that can be parsed to get (visually) the same 995 result. Note that this conversion is lossy in a way: there are some 996 details (like macros) that cannot be preserved in an ANSI->Markup->ANSI 997 conversion. 998 """ 999 1000 current_tags: list[str] = [] 1001 out = "" 1002 for token in self.tokenize_ansi(ansi): 1003 if token.ttype is TokenType.PLAIN: 1004 if len(current_tags) != 0: 1005 out += "[" + " ".join(current_tags) + "]" 1006 1007 assert isinstance(token.data, str) 1008 out += token.data 1009 current_tags = [] 1010 continue 1011 1012 if token.ttype is TokenType.ESCAPED: 1013 assert isinstance(token.data, str) 1014 1015 current_tags.append(token.data) 1016 continue 1017 1018 current_tags.append(token.name) 1019 1020 return out 1021 1022 def prettify_ansi(self, text: str) -> str: 1023 """Returns a prettified (syntax-highlighted) ANSI str. 1024 1025 This is useful to quickly "inspect" a given ANSI string. However, 1026 for most real uses `MarkupLanguage.prettify_markup` would be 1027 preferable, given an argument of `MarkupLanguage.get_markup(text)`, 1028 as it is much more verbose. 1029 1030 Args: 1031 text: The ANSI-text to prettify. 1032 1033 Returns: 1034 The prettified ANSI text. This text's styles remain valid, 1035 so copy-pasting the argument into a command (like printf) 1036 that can show styled text will work the same way. 1037 """ 1038 1039 out = "" 1040 sequences = "" 1041 for token in self.tokenize_ansi(text): 1042 if token.ttype is TokenType.PLAIN: 1043 assert isinstance(token.data, str) 1044 out += token.data 1045 continue 1046 1047 assert token.sequence is not None 1048 out += "\x1b[0m" + token.sequence + token.sequence.replace("\x1b", "\\x1b") 1049 sequences += token.sequence 1050 out += sequences 1051 1052 return out 1053 1054 def prettify_markup(self, text: str) -> str: 1055 """Returns a prettified (syntax-highlighted) markup str. 1056 1057 Args: 1058 text: The markup-text to prettify. 1059 1060 Returns: 1061 Prettified markup. This markup, excluding its styles, 1062 remains valid markup. 1063 """ 1064 1065 def _apply_macros(text: str) -> str: 1066 """Apply current macros to text""" 1067 1068 for _, (method, args) in applied_macros: 1069 text = method(*args, text) 1070 1071 return text 1072 1073 def _pop_macro(name: str) -> None: 1074 """Pops a macro from applied_macros.""" 1075 1076 for i, (macro_name, _) in enumerate(applied_macros): 1077 if macro_name == name: 1078 applied_macros.pop(i) 1079 break 1080 1081 def _finish(out: str, in_sequence: bool) -> str: 1082 """Adds ending cap to the given string.""" 1083 1084 if in_sequence: 1085 if not out.endswith("\x1b[0m"): 1086 out += "\x1b[0m" 1087 1088 return out + "]" 1089 1090 return out + "[/]" 1091 1092 styles: dict[TokenType, str] = { 1093 TokenType.MACRO: "210", 1094 TokenType.ESCAPED: "210 bold", 1095 TokenType.UNSETTER: "strikethrough", 1096 } 1097 1098 applied_macros: list[tuple[str, MacroCall]] = [] 1099 1100 out = "" 1101 in_sequence = False 1102 current_styles: list[Token] = [] 1103 1104 for token in self.tokenize_markup(text): 1105 if token.ttype in [TokenType.PLAIN, TokenType.ESCAPED]: 1106 if in_sequence: 1107 out += "]" 1108 1109 in_sequence = False 1110 1111 sequence = "" 1112 for style in current_styles: 1113 if style.sequence is None: 1114 continue 1115 1116 sequence += style.sequence 1117 1118 out += f"{sequence}{_apply_macros(token.name)}\033[0m" 1119 continue 1120 1121 out += " " if in_sequence else "[" 1122 in_sequence = True 1123 1124 if token.ttype is TokenType.UNSETTER: 1125 if token.name == "/": 1126 applied_macros = [] 1127 1128 name = token.name[1:] 1129 1130 if name in self.macros: 1131 _pop_macro(name) 1132 1133 current_styles.append(token) 1134 1135 out += self.parse( 1136 ("" if (name in self.tags) or (name in self.user_tags) else "") 1137 + f"[{styles[TokenType.UNSETTER]}]/{name}" 1138 ) 1139 continue 1140 1141 if token.ttype is TokenType.MACRO: 1142 assert isinstance(token.data, tuple) 1143 1144 name = token.name 1145 if "(" in name: 1146 name = name[: token.name.index("(")] 1147 1148 applied_macros.append((name, token.data)) 1149 1150 try: 1151 out += token.data[0](*token.data[1], token.name) 1152 continue 1153 1154 except TypeError: # Not enough arguments 1155 pass 1156 1157 if token.sequence is not None: 1158 current_styles.append(token) 1159 1160 style_markup = styles.get(token.ttype) or token.name 1161 out += self.parse(f"[{style_markup}]{token.name}") 1162 1163 return _finish(out, in_sequence) 1164 1165 def get_styled_plains(self, text: str) -> Iterator[StyledText]: 1166 """Gets all plain tokens within text, with their respective styles applied. 1167 1168 Args: 1169 text: The ANSI-sequence containing string to find plains from. 1170 1171 Returns: 1172 An iterator of `StyledText` objects, each yielded when a new plain token is found, 1173 containing the styles that are relevant and active on the given plain. 1174 """ 1175 1176 def _apply_styles(styles: list[Token], text: str) -> str: 1177 """Applies given styles to text.""" 1178 1179 for token in styles: 1180 if token.ttype is TokenType.MACRO: 1181 assert isinstance(token.data, tuple) 1182 text = token.data[0](*token.data[1], text) 1183 continue 1184 1185 if token.sequence is None: 1186 continue 1187 1188 text = token.sequence + text 1189 1190 return text 1191 1192 def _pop_unsetter(token: Token, styles: list[Token]) -> list[Token]: 1193 """Removes an unsetter from the list, returns the new list.""" 1194 1195 if token.name == "/": 1196 return [] 1197 1198 target_name = token.name[1:] 1199 for style in styles: 1200 # bold & dim unsetters represent the same character, so we have 1201 # to treat them the same way. 1202 style_name = style.name 1203 1204 if style.name == "dim": 1205 style_name = "bold" 1206 1207 if style_name == target_name: 1208 styles.remove(style) 1209 1210 elif ( 1211 style_name.startswith(target_name) 1212 and style.ttype is TokenType.MACRO 1213 ): 1214 styles.remove(style) 1215 1216 elif style.ttype is TokenType.COLOR: 1217 assert isinstance(style.data, Color) 1218 if target_name == "fg" and not style.data.background: 1219 styles.remove(style) 1220 1221 elif target_name == "bg" and style.data.background: 1222 styles.remove(style) 1223 1224 return styles 1225 1226 styles: list[Token] = [] 1227 for token in self.tokenize_ansi(text): 1228 if token.ttype is TokenType.COLOR: 1229 for i, style in enumerate(reversed(styles)): 1230 if style.ttype is TokenType.COLOR: 1231 assert isinstance(style.data, Color) 1232 assert isinstance(token.data, Color) 1233 1234 if style.data.background != token.data.background: 1235 continue 1236 1237 styles[len(styles) - i - 1] = token 1238 break 1239 else: 1240 styles.append(token) 1241 1242 continue 1243 1244 if token.ttype is TokenType.LINK: 1245 styles.append(token) 1246 yield StyledText(_apply_styles(styles, token.name)) 1247 1248 if token.ttype is TokenType.PLAIN: 1249 assert isinstance(token.data, str) 1250 yield StyledText(_apply_styles(styles, token.data)) 1251 continue 1252 1253 if token.ttype is TokenType.UNSETTER: 1254 styles = _pop_unsetter(token, styles) 1255 continue 1256 1257 styles.append(token)
A class representing an instance of a Markup Language.
This class is used for all markup/ANSI parsing, tokenizing and usage.
from pytermgui import tim
tim.alias("my-tag", "@152 72 bold")
tim.print("This is [my-tag]my-tag[/]!")
View Source
550 def __init__(self, default_macros: bool = True) -> None: 551 """Initializes a MarkupLanguage. 552 553 Args: 554 default_macros: If not set, the builtin macros are not defined. 555 """ 556 557 self.tags: dict[str, str] = STYLE_MAP.copy() 558 self._cache: dict[str, StyledText] = {} 559 self.macros: dict[str, MacroCallable] = {} 560 self.user_tags: dict[str, str] = {} 561 self.unsetters: dict[str, str | None] = UNSETTER_MAP.copy() 562 563 self.should_cache: bool = True 564 565 if default_macros: 566 self.define("!link", macro_link) 567 self.define("!align", macro_align) 568 self.define("!markup", self.get_markup) 569 self.define("!shuffle", macro_shuffle) 570 self.define("!strip_bg", macro_strip_bg) 571 self.define("!strip_fg", macro_strip_fg) 572 self.define("!rainbow", macro_rainbow) 573 self.define("!gradient", macro_gradient) 574 self.define("!upper", lambda item: str(item.upper())) 575 self.define("!lower", lambda item: str(item.lower())) 576 self.define("!title", lambda item: str(item.title())) 577 self.define("!capitalize", lambda item: str(item.capitalize())) 578 self.define("!expand", lambda tag: macro_expand(self, tag)) 579 self.define("!debug", lambda *args: ",".join(ascii(arg) for arg in args)) 580 581 self.alias("code", "dim @black") 582 self.alias("code.str", "142") 583 self.alias("code.none", "167") 584 self.alias("code.global", "214") 585 self.alias("code.number", "175") 586 self.alias("code.keyword", "203") 587 self.alias("code.identifier", "109") 588 self.alias("code.name", "code.global") 589 self.alias("code.comment", "240 italic") 590 self.alias("code.builtin", "code.global") 591 self.alias("code.file", "code.identifier") 592 self.alias("code.symbol", "code.identifier")
Initializes a MarkupLanguage.
Args
- default_macros: If not set, the builtin macros are not defined.
Raise pytermgui.exceptions.MarkupSyntaxError
when encountering unknown markup tags.
View Source
Parse all arguments and pass them through to print, along with kwargs.
View Source
642 def tokenize_markup(self, markup_text: str) -> Iterator[Token]: 643 """Converts the given markup string into an iterator of `Token`. 644 645 Args: 646 markup_text: The text to look at. 647 648 Returns: 649 An iterator of tokens. The reason this is an iterator is to possibly save 650 on memory. 651 """ 652 653 end = 0 654 start = 0 655 cursor = 0 656 for match in RE_MARKUP.finditer(markup_text): 657 full, escapes, tag_text = match.groups() 658 start, end = match.span() 659 660 # Add plain text between last and current match 661 if start > cursor: 662 yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:start]) 663 664 if not escapes == "" and len(escapes) % 2 == 1: 665 cursor = end 666 yield Token(ttype=TokenType.ESCAPED, data=full[len(escapes) - 1 :]) 667 continue 668 669 for tag in tag_text.split(): 670 token = self._get_style_token(tag) 671 if token is not None: 672 yield token 673 continue 674 675 # Try to find a color token 676 token = self._get_color_token(tag) 677 if token is not None: 678 yield token 679 continue 680 681 macro_match = RE_MACRO.match(tag) 682 if macro_match is not None: 683 name, args = macro_match.groups() 684 macro_args = () if args is None else args.split(":") 685 686 if not name in self.macros: 687 raise MarkupSyntaxError( 688 tag=tag, 689 cause="is not a defined macro", 690 context=markup_text, 691 ) 692 693 yield Token( 694 name=tag, 695 ttype=TokenType.MACRO, 696 data=(self.macros[name], macro_args), 697 ) 698 continue 699 700 if self.raise_unknown_markup: 701 raise MarkupSyntaxError( 702 tag=tag, cause="not defined", context=markup_text 703 ) 704 705 cursor = end 706 707 # Add remaining text as plain 708 if len(markup_text) > cursor: 709 yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:])
Converts the given markup string into an iterator of Token
.
Args
- markup_text: The text to look at.
Returns
An iterator of tokens. The reason this is an iterator is to possibly save on memory.
View Source
711 def tokenize_ansi(self, ansi: str) -> Iterator[Token]: 712 """Converts the given ANSI string into an iterator of `Token`. 713 714 Args: 715 ansi: The text to look at. 716 717 Returns: 718 An iterator of tokens. The reason this is an iterator is to possibly save 719 on memory. 720 """ 721 722 def _is_in_tags(code: str, tags: dict[str, str]) -> str | None: 723 """Determines whether a code is in the given dict of tags.""" 724 725 for name, current in tags.items(): 726 if current == code: 727 return name 728 729 return None 730 731 end = 0 732 start = 0 733 cursor = 0 734 735 # StyledText messes with indexing, so we need to cast it 736 # back to str. 737 if isinstance(ansi, StyledText): 738 ansi = str(ansi) 739 740 for match in RE_ANSI.finditer(ansi): 741 code = match.groups()[0] 742 start, end = match.span() 743 744 if code is None: 745 continue 746 747 parts = code.split(";") 748 749 if start > cursor: 750 plain = ansi[cursor:start] 751 752 yield Token(name=plain, ttype=TokenType.PLAIN, data=plain) 753 754 name: str | None = code 755 ttype = None 756 data: str | Color = parts[0] 757 758 # Styles & Unsetters 759 if len(parts) == 1: 760 # Covariancy is not an issue here, even though mypy seems to think so. 761 name = _is_in_tags(parts[0], self.unsetters) # type: ignore 762 if name is not None: 763 ttype = TokenType.UNSETTER 764 765 else: 766 name = _is_in_tags(parts[0], self.tags) 767 if name is not None: 768 ttype = TokenType.STYLE 769 770 # Colors 771 if ttype is None: 772 with suppress(ColorSyntaxError): 773 data = str_to_color(code) 774 name = data.name 775 ttype = TokenType.COLOR 776 777 if name is None or ttype is None or data is None: 778 if len(parts) != 2: 779 raise AnsiSyntaxError( 780 tag=parts[0], cause="not recognized", context=ansi 781 ) 782 783 name = "position" 784 ttype = TokenType.POSITION 785 data = ",".join(reversed(parts)) 786 787 yield Token(name=name, ttype=ttype, data=data) 788 cursor = end 789 790 if cursor < len(ansi): 791 plain = ansi[cursor:] 792 793 yield Token(ttype=TokenType.PLAIN, data=plain)
Converts the given ANSI string into an iterator of Token
.
Args
- ansi: The text to look at.
Returns
An iterator of tokens. The reason this is an iterator is to possibly save on memory.
View Source
795 def define(self, name: str, method: MacroCallable) -> None: 796 """Defines a Macro tag that executes the given method. 797 798 Args: 799 name: The name the given method will be reachable by within markup. 800 The given value gets "!" prepended if it isn't present already. 801 method: The method this macro will execute. 802 """ 803 804 if not name.startswith("!"): 805 name = f"!{name}" 806 807 self.macros[name] = method 808 self.unsetters[f"/{name}"] = None
Defines a Macro tag that executes the given method.
Args
- name: The name the given method will be reachable by within markup. The given value gets "!" prepended if it isn't present already.
- method: The method this macro will execute.
View Source
810 def alias(self, name: str, value: str) -> None: 811 """Aliases the given name to a value, and generates an unsetter for it. 812 813 Note that it is not possible to alias macros. 814 815 Args: 816 name: The name of the new tag. 817 value: The value the new tag will stand for. 818 """ 819 820 def _get_unsetter(token: Token) -> str | None: 821 """Get unsetter for a token""" 822 823 if token.ttype is TokenType.PLAIN: 824 return None 825 826 if token.ttype is TokenType.UNSETTER: 827 return self.unsetters[token.name] 828 829 if token.ttype is TokenType.COLOR: 830 assert isinstance(token.data, Color) 831 832 if token.data.background: 833 return self.unsetters["/bg"] 834 835 return self.unsetters["/fg"] 836 837 name = f"/{token.name}" 838 if not name in self.unsetters: 839 raise KeyError(f"Could not find unsetter for token {token}.") 840 841 return self.unsetters[name] 842 843 if name.startswith("!"): 844 raise ValueError('Only macro tags can always start with "!".') 845 846 setter = "" 847 unsetter = "" 848 849 # Try to link to existing tag 850 if value in self.user_tags: 851 self.unsetters[f"/{name}"] = self.unsetters[f"/{value}"] 852 self.user_tags[name] = self.user_tags[value] 853 return 854 855 for token in self.tokenize_markup(f"[{value}]"): 856 if token.ttype is TokenType.PLAIN: 857 continue 858 859 assert token.sequence is not None 860 setter += token.sequence 861 862 t_unsetter = _get_unsetter(token) 863 unsetter += f"\x1b[{t_unsetter}m" 864 865 self.unsetters[f"/{name}"] = unsetter.lstrip("\x1b[").rstrip("m") 866 self.user_tags[name] = setter.lstrip("\x1b[").rstrip("m") 867 868 marked: list[str] = [] 869 for item in self._cache: 870 if name in item: 871 marked.append(item) 872 873 for item in marked: 874 del self._cache[item]
Aliases the given name to a value, and generates an unsetter for it.
Note that it is not possible to alias macros.
Args
- name: The name of the new tag.
- value: The value the new tag will stand for.
View Source
878 def parse( # pylint: disable=too-many-branches 879 self, markup_text: str 880 ) -> StyledText: 881 """Parses the given markup. 882 883 Args: 884 markup_text: The markup to parse. 885 886 Returns: 887 A `StyledText` instance of the result of parsing the input. This 888 custom `str` class is used to allow accessing the plain value of 889 the output, as well as to cleanly index within it. It is analogous 890 to builtin `str`, only adds extra things on top. 891 """ 892 893 applied_macros: list[tuple[str, MacroCall]] = [] 894 previous_token: Token | None = None 895 previous_sequence = "" 896 sequence = "" 897 out = "" 898 899 def _apply_macros(text: str) -> str: 900 """Apply current macros to text""" 901 902 for _, (method, args) in applied_macros: 903 text = method(*args, text) 904 905 return text 906 907 def _is_same_colorgroup(previous: Token, new: Token) -> bool: 908 if not isinstance(new.data, Color) or not isinstance(previous.data, Color): 909 return False 910 911 return previous.data.background == new.data.background and type( 912 previous 913 ) is type(new) 914 915 if ( 916 self.should_cache 917 and markup_text in self._cache 918 and len(RE_MACRO.findall(markup_text)) == 0 919 ): 920 return self._cache[markup_text] 921 922 token: Token 923 for token in self.tokenize_markup(markup_text): 924 if sequence != "" and previous_token == token: 925 continue 926 927 # Optimize out previously added color tokens, as only the most 928 # recent would be visible anyways. 929 if ( 930 token.sequence is not None 931 and previous_token is not None 932 and _is_same_colorgroup(previous_token, token) 933 ): 934 sequence = token.sequence 935 continue 936 937 if token.ttype == TokenType.UNSETTER and token.data == "0": 938 out += "\033[0m" 939 sequence = "" 940 applied_macros = [] 941 continue 942 943 previous_token = token 944 945 # Macro unsetters are stored with None as their data 946 if token.data is None and token.ttype is TokenType.UNSETTER: 947 for item, data in applied_macros.copy(): 948 macro_match = RE_MACRO.match(item) 949 assert macro_match is not None 950 951 macro_name = macro_match.groups()[0] 952 953 if f"/{macro_name}" == token.name: 954 applied_macros.remove((item, data)) 955 956 continue 957 958 if token.ttype is TokenType.MACRO: 959 assert isinstance(token.data, tuple) 960 961 applied_macros.append((token.name, token.data)) 962 continue 963 964 if token.sequence is None: 965 applied = sequence 966 for item in previous_sequence.split("\x1b"): 967 if item == "" or item[1:-1] in self.unsetters.values(): 968 continue 969 970 item = f"\x1b{item}" 971 applied = applied.replace(item, "") 972 973 out += applied + _apply_macros(token.name) 974 previous_sequence = sequence 975 sequence = "" 976 continue 977 978 sequence += token.sequence 979 980 if sequence + previous_sequence != "": 981 out += "\x1b[0m" 982 983 out = StyledText(out) 984 self._cache[markup_text] = out 985 return out
Parses the given markup.
Args
- markup_text: The markup to parse.
Returns
A
StyledText
instance of the result of parsing the input. This customstr
class is used to allow accessing the plain value of the output, as well as to cleanly index within it. It is analogous to builtinstr
, only adds extra things on top.
View Source
987 def get_markup(self, ansi: str) -> str: 988 """Generates markup from ANSI text. 989 990 Args: 991 ansi: The text to get markup from. 992 993 Returns: 994 A markup string that can be parsed to get (visually) the same 995 result. Note that this conversion is lossy in a way: there are some 996 details (like macros) that cannot be preserved in an ANSI->Markup->ANSI 997 conversion. 998 """ 999 1000 current_tags: list[str] = [] 1001 out = "" 1002 for token in self.tokenize_ansi(ansi): 1003 if token.ttype is TokenType.PLAIN: 1004 if len(current_tags) != 0: 1005 out += "[" + " ".join(current_tags) + "]" 1006 1007 assert isinstance(token.data, str) 1008 out += token.data 1009 current_tags = [] 1010 continue 1011 1012 if token.ttype is TokenType.ESCAPED: 1013 assert isinstance(token.data, str) 1014 1015 current_tags.append(token.data) 1016 continue 1017 1018 current_tags.append(token.name) 1019 1020 return out
Generates markup from ANSI text.
Args
- ansi: The text to get markup from.
Returns
A markup string that can be parsed to get (visually) the same result. Note that this conversion is lossy in a way: there are some details (like macros) that cannot be preserved in an ANSI->Markup->ANSI conversion.
View Source
1022 def prettify_ansi(self, text: str) -> str: 1023 """Returns a prettified (syntax-highlighted) ANSI str. 1024 1025 This is useful to quickly "inspect" a given ANSI string. However, 1026 for most real uses `MarkupLanguage.prettify_markup` would be 1027 preferable, given an argument of `MarkupLanguage.get_markup(text)`, 1028 as it is much more verbose. 1029 1030 Args: 1031 text: The ANSI-text to prettify. 1032 1033 Returns: 1034 The prettified ANSI text. This text's styles remain valid, 1035 so copy-pasting the argument into a command (like printf) 1036 that can show styled text will work the same way. 1037 """ 1038 1039 out = "" 1040 sequences = "" 1041 for token in self.tokenize_ansi(text): 1042 if token.ttype is TokenType.PLAIN: 1043 assert isinstance(token.data, str) 1044 out += token.data 1045 continue 1046 1047 assert token.sequence is not None 1048 out += "\x1b[0m" + token.sequence + token.sequence.replace("\x1b", "\\x1b") 1049 sequences += token.sequence 1050 out += sequences 1051 1052 return out
Returns a prettified (syntax-highlighted) ANSI str.
This is useful to quickly "inspect" a given ANSI string. However,
for most real uses MarkupLanguage.prettify_markup
would be
preferable, given an argument of MarkupLanguage.get_markup(text)
,
as it is much more verbose.
Args
- text: The ANSI-text to prettify.
Returns
The prettified ANSI text. This text's styles remain valid, so copy-pasting the argument into a command (like printf) that can show styled text will work the same way.
View Source
1054 def prettify_markup(self, text: str) -> str: 1055 """Returns a prettified (syntax-highlighted) markup str. 1056 1057 Args: 1058 text: The markup-text to prettify. 1059 1060 Returns: 1061 Prettified markup. This markup, excluding its styles, 1062 remains valid markup. 1063 """ 1064 1065 def _apply_macros(text: str) -> str: 1066 """Apply current macros to text""" 1067 1068 for _, (method, args) in applied_macros: 1069 text = method(*args, text) 1070 1071 return text 1072 1073 def _pop_macro(name: str) -> None: 1074 """Pops a macro from applied_macros.""" 1075 1076 for i, (macro_name, _) in enumerate(applied_macros): 1077 if macro_name == name: 1078 applied_macros.pop(i) 1079 break 1080 1081 def _finish(out: str, in_sequence: bool) -> str: 1082 """Adds ending cap to the given string.""" 1083 1084 if in_sequence: 1085 if not out.endswith("\x1b[0m"): 1086 out += "\x1b[0m" 1087 1088 return out + "]" 1089 1090 return out + "[/]" 1091 1092 styles: dict[TokenType, str] = { 1093 TokenType.MACRO: "210", 1094 TokenType.ESCAPED: "210 bold", 1095 TokenType.UNSETTER: "strikethrough", 1096 } 1097 1098 applied_macros: list[tuple[str, MacroCall]] = [] 1099 1100 out = "" 1101 in_sequence = False 1102 current_styles: list[Token] = [] 1103 1104 for token in self.tokenize_markup(text): 1105 if token.ttype in [TokenType.PLAIN, TokenType.ESCAPED]: 1106 if in_sequence: 1107 out += "]" 1108 1109 in_sequence = False 1110 1111 sequence = "" 1112 for style in current_styles: 1113 if style.sequence is None: 1114 continue 1115 1116 sequence += style.sequence 1117 1118 out += f"{sequence}{_apply_macros(token.name)}\033[0m" 1119 continue 1120 1121 out += " " if in_sequence else "[" 1122 in_sequence = True 1123 1124 if token.ttype is TokenType.UNSETTER: 1125 if token.name == "/": 1126 applied_macros = [] 1127 1128 name = token.name[1:] 1129 1130 if name in self.macros: 1131 _pop_macro(name) 1132 1133 current_styles.append(token) 1134 1135 out += self.parse( 1136 ("" if (name in self.tags) or (name in self.user_tags) else "") 1137 + f"[{styles[TokenType.UNSETTER]}]/{name}" 1138 ) 1139 continue 1140 1141 if token.ttype is TokenType.MACRO: 1142 assert isinstance(token.data, tuple) 1143 1144 name = token.name 1145 if "(" in name: 1146 name = name[: token.name.index("(")] 1147 1148 applied_macros.append((name, token.data)) 1149 1150 try: 1151 out += token.data[0](*token.data[1], token.name) 1152 continue 1153 1154 except TypeError: # Not enough arguments 1155 pass 1156 1157 if token.sequence is not None: 1158 current_styles.append(token) 1159 1160 style_markup = styles.get(token.ttype) or token.name 1161 out += self.parse(f"[{style_markup}]{token.name}") 1162 1163 return _finish(out, in_sequence)
Returns a prettified (syntax-highlighted) markup str.
Args
- text: The markup-text to prettify.
Returns
Prettified markup. This markup, excluding its styles, remains valid markup.
View Source
1165 def get_styled_plains(self, text: str) -> Iterator[StyledText]: 1166 """Gets all plain tokens within text, with their respective styles applied. 1167 1168 Args: 1169 text: The ANSI-sequence containing string to find plains from. 1170 1171 Returns: 1172 An iterator of `StyledText` objects, each yielded when a new plain token is found, 1173 containing the styles that are relevant and active on the given plain. 1174 """ 1175 1176 def _apply_styles(styles: list[Token], text: str) -> str: 1177 """Applies given styles to text.""" 1178 1179 for token in styles: 1180 if token.ttype is TokenType.MACRO: 1181 assert isinstance(token.data, tuple) 1182 text = token.data[0](*token.data[1], text) 1183 continue 1184 1185 if token.sequence is None: 1186 continue 1187 1188 text = token.sequence + text 1189 1190 return text 1191 1192 def _pop_unsetter(token: Token, styles: list[Token]) -> list[Token]: 1193 """Removes an unsetter from the list, returns the new list.""" 1194 1195 if token.name == "/": 1196 return [] 1197 1198 target_name = token.name[1:] 1199 for style in styles: 1200 # bold & dim unsetters represent the same character, so we have 1201 # to treat them the same way. 1202 style_name = style.name 1203 1204 if style.name == "dim": 1205 style_name = "bold" 1206 1207 if style_name == target_name: 1208 styles.remove(style) 1209 1210 elif ( 1211 style_name.startswith(target_name) 1212 and style.ttype is TokenType.MACRO 1213 ): 1214 styles.remove(style) 1215 1216 elif style.ttype is TokenType.COLOR: 1217 assert isinstance(style.data, Color) 1218 if target_name == "fg" and not style.data.background: 1219 styles.remove(style) 1220 1221 elif target_name == "bg" and style.data.background: 1222 styles.remove(style) 1223 1224 return styles 1225 1226 styles: list[Token] = [] 1227 for token in self.tokenize_ansi(text): 1228 if token.ttype is TokenType.COLOR: 1229 for i, style in enumerate(reversed(styles)): 1230 if style.ttype is TokenType.COLOR: 1231 assert isinstance(style.data, Color) 1232 assert isinstance(token.data, Color) 1233 1234 if style.data.background != token.data.background: 1235 continue 1236 1237 styles[len(styles) - i - 1] = token 1238 break 1239 else: 1240 styles.append(token) 1241 1242 continue 1243 1244 if token.ttype is TokenType.LINK: 1245 styles.append(token) 1246 yield StyledText(_apply_styles(styles, token.name)) 1247 1248 if token.ttype is TokenType.PLAIN: 1249 assert isinstance(token.data, str) 1250 yield StyledText(_apply_styles(styles, token.data)) 1251 continue 1252 1253 if token.ttype is TokenType.UNSETTER: 1254 styles = _pop_unsetter(token, styles) 1255 continue 1256 1257 styles.append(token)
Gets all plain tokens within text, with their respective styles applied.
Args
- text: The ANSI-sequence containing string to find plains from.
Returns
An iterator of
StyledText
objects, each yielded when a new plain token is found, containing the styles that are relevant and active on the given plain.