--- title: "GitHub Flavored Markdown" output: rmarkdown::html_vignette: toc: true toc_depth: 3 vignette: > %\VignetteIndexEntry{GitHub Flavored Markdown} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, warning=FALSE, message=FALSE} library(gluedown) library(stringr) library(rvest) ``` In this vignette, we will explore how the functions in `gluedown` enable users to transition from R vectors to the kind of formatted markdown text used in the GitHub Flavored Markdown (GFM) specification. Functions like `md_bullet()` and `md_quote()` may be used more often, but there are functions for practically every section of the GFM spec including, some of which have limited practical value (e.g., `md_blank()` and `md_text()`). In each section of this vignette, the GFM spec is quoted first and followed by `gluedown` function usage. For each R code block containing `gluedown` usage, the raw text is shows as `#>` comments followed by the rendered HTML version, which is printed using the `results='asis'` option of the `knitr` package. (To achieve this output, each block is actually repeated twice, with the code of the second result printing block hidden via `echo=FALSE`.) ## 3 Blocks and Inlines > We can think of a document as a sequence of blocks—structural elements like paragraphs, block quotations, lists, headings, rules, and code blocks. Some blocks (like block quotes and list items) contain other blocks; others (like headings and paragraphs) contain inline content—text, links, emphasized text, images, code spans, and so on. > We can divide blocks into two types: container blocks, which can contain other blocks, and leaf blocks, which cannot. ## 4 Leaf blocks There are nine leaf block functions: 1. `md_rule()` 1. `md_heading()` 1. `md_setext()` 1. `md_indent()` 1. `md_fence()` 1. `md_convert()` 1. `md_reference()` 1. `md_paragraph()` 1. `md_blank()` 1. `md_table()` ### 4.1 Thematic breaks > A line consisting of 0-3 spaces of indentation, followed by a sequence of three or more matching `-`, `_`, or `*` characters, each followed optionally by any number of spaces or tabs, forms a thematic break. We can create thematic breaks with the `md_rule()` function, which allows users to define the type and number of valid characters to use. All these options create the same `

` HTML tag, but look different in the underlying `.md` document. ```{r} md_rule() ``` ```{r echo=FALSE, results='asis'} md_rule() ``` ```{r} md_rule(char = "-", n = 10, space = TRUE) ``` ```{r echo=FALSE, results='asis'} md_rule(char = "-", n = 10, space = TRUE) ``` ```{r echo=FALSE, results='asis'} md_heading("4.2 ATX Headings", level = 3) ``` > An ATX heading consists of a string of characters, parsed as inline content, between an opening sequence of 1–6 unescaped `#` characters and an optional closing sequence of any number of unescaped `#` characters... The raw contents of the heading are stripped of leading and trailing spaces before being parsed as inline content. The heading level is equal to the number of `#` characters in the opening sequence. The heading for this section was created with `md_heading()`. ```{r} md_heading("4.2 ATX Headings", level = 3) ``` ### 4.3 Setext headings > A setext heading consists of one or more lines of text, each containing at least one non-whitespace character, with no more than 3 spaces indentation, followed by a setext heading underline... A setext heading underline is a sequence of `=` characters or a sequence of `-` characters, with no more than 3 spaces indentation and any number of trailing spaces... The heading is a level 1 heading if `=` characters are used in the setext heading underline, and a level 2 heading if `-` characters are used. I'm going to forgo printing these headings, just to preserve the structure of this document. The `md_setext()` function can only be used to create level 1 (`

`) and level 2 (`

`) headings. A level 1 Setext heading is sometimes used as the very first, initial title heading. ```{r} md_setext("Headings", level = 1) ``` ```{r} md_setext("4.3 Setext headings", level = 2) ``` ```{r} md_setext("4.3 Setext headings", width = 50) ``` ### 4.4 Indented code blocks > An indented code block is composed of one or more indented chunks separated by blank lines. An indented chunk is a sequence of non-blank lines, each indented four or more spaces. The contents of the code block are the literal contents of the lines, including trailing line endings, minus four spaces of indentation... This function, along with the fenced code block alternative, are useful for creating displayed code blocks (as opposed to executed chunks) containing R source code, command line content, or any other pre-formatted fixed-width content. ```{r} rescale01 <- function(x) { rng <- range(x, na.rm = TRUE) (x - rng[1]) / (rng[2] - rng[1]) } # use deparse() to turn unevaluated expressions into character strings. source <- deparse(rescale01) # this new object is a regular character vector print(source) ``` ```{r} md_indent(source) ``` ```{r echo=FALSE, results='asis'} md_indent(source) ``` ### 4.5 Fenced code blocks > A code fence is a sequence of at least three consecutive backtick characters ... or tildes (`~`). (Tildes and backticks cannot be mixed.) A fenced code block begins with a code fence, indented no more than three spaces... > > The line with the opening code fence may optionally contain some text following the code fence; this is trimmed of leading and trailing whitespace and called the info string... > > The content of the code block consists of all subsequent lines, until a closing code fence of the same type as the code block began with (backticks or tildes), and with at least as many backticks or tildes as the opening code fence.... > > The content of a code fence is treated as literal text, not parsed as inlines. The first word of the info string is typically used to specify the language of the code sample, and rendered in the class attribute of the code tag. However, this spec does not mandate any particular treatment of the info string. The man feature of `md_fence()` is the ability to specify the info string which defines the code language used to highlight syntax. Compare the following output to the one produced by `md_indent()`. ```{r} md_fence(source) ``` ```{r echo=FALSE, results='asis'} md_fence(source) ``` ```{r} md_fence(source, char = "~") ``` ```{r echo=FALSE, results='asis'} md_fence(source, char = "~") ``` ```{r} md_fence("$ sudo apt install r-base-dev", info = "bash") ``` ```{r echo=FALSE, results='asis'} md_fence("$ sudo apt install r-base-dev", info = "bash") ``` ### 4.6 HTML blocks > An HTML block is a group of lines that is treated as raw HTML (and will not be escaped in HTML output). Markdown rendering engines (like the one used to convert this `.Rmd` document to `.md` and then to `.html`) allow users to write valid HTML code alongside regular markdown text. ```{r} lines <- c( "

", "

",
  "**Hello**,",
  "_world_.",
  "

", "

" ) md_text(lines) ``` ```{r echo=FALSE, results='asis'} lines <- c( "

", "

",
  "**Hello**,",
  "_world_.",
  "

", "

" ) md_text(lines) ``` ### 4.7 Link reference definitions > A link reference definition consists of a link label, indented up to three spaces, followed by a colon (`:`), optional whitespace (including up to one line ending), a link destination, optional whitespace (including up to one line ending), and an optional link title, which if it is present must be separated from the link destination by whitespace. No further non-whitespace characters may occur on the line. > > A link reference definition does not correspond to a structural element of a document. Instead, it defines a label which can be used in reference links and reference-style images elsewhere in the document. Link reference definitions can come either before or after the links that use them. The `md_reference()` function is used to create the link reference definition _only_, which must then be paired with the same label elsewhere in the document. We can put the definitions anywhere in the document, so when I type `[CRAN]` here in this paragraph (like [CRAN]), that text looks for the label created below to create a hyperlink. The reference definition printed to the document actually doesn't show up anywhere in the rendered `.html` file. ```{r} a <- c("cran", "tidy") b <- c("https://cran.r-project.org/", "https://www.tidyverse.org/") c <- c("CRAN Home", "Tidyverse Home") ``` ```{r} md_reference(label = a, url = b, title = c) ``` ```{r echo=FALSE, results='asis'} md_reference(label = a, url = b, title = c) ``` ### 4.8 Paragraphs > A sequence of non-blank lines that cannot be interpreted as other kinds of blocks forms a paragraph. The contents of the paragraph are the result of parsing the paragraph’s raw content as inlines. The paragraph’s raw content is formed by concatenating the lines and removing initial and final whitespace. Paragraphs are one instance are where it's easy for markdown newcomers can have difficulty writing the plain text in the proper format to produce their desired outcome. Words separated by a single newline are rendered as a single paragraph `

` tag. Using `md_paragraph()`, we can take a vector of character text and output _distinct_ paragraphs. In section 6.12 and 6.13 you can compare the output of `md_paragraph()` with the other line break functions. ```{r} sentences <- sample(stringr::sentences, size = 5) ``` ```{r} md_paragraph(sentences) ``` ```{r echo=FALSE, results='asis'} md_paragraph(sentences) ``` ### 4.9 Blank lines > Blank lines between block-level elements are ignored, except for the role they play in determining whether a list is tight or loose. > > Blank lines at the beginning and end of the document are also ignored. Blank lines are, well, not super useful... but you can create them with `md_blank()`! ```{r} md_blank() ``` ```{r echo=FALSE, results='asis'} md_blank() ``` ### 4.10 Tables (extension) > GFM enables the `table` extension, where an additional leaf block type is available. > > A table is an arrangement of data with rows and columns, consisting of a single header row, a delimiter row separating the header from the data, and zero or more data rows. > > Each row consists of cells containing arbitrary text, in which inlines are parsed, separated by pipes (`|`). A leading and trailing pipe is also recommended for clarity of reading, and if there’s otherwise parsing ambiguity. Spaces between pipes and cell content are trimmed. Block-level elements cannot be inserted in a table. > >The delimiter row consists of cells whose only content are hyphens (`-`), and optionally, a leading or trailing colon (`:`), or both, to indicate left, right, or center alignment respectively. The `md_table()` function wraps around `knitr::kable()` to return R objects (Typically data frames or matrices) as markdown tables. This is useful for presenting data created using R in a format that's more readable than a plain-text data frame (or tibble) printout. ```{r message=FALSE, warning=FALSE} states <- head(state.x77) print(states) ``` ```{r} options(knitr.kable.NA = "") md_table(states, align = "c") ``` ```{r echo=FALSE, results='asis'} options(knitr.kable.NA = "") md_table(states, align = "c") ``` ## 5 Container Blocks > A container block is a block that has other blocks as its contents. There are two basic kinds of container blocks: block quotes and list items. Lists are meta-containers for list items. ### 5.1 Block quotes > A block quote marker consists of 0-3 spaces of initial indent, plus (a) the character > together with a following space, or (b) a single character > not followed by a space. I've been copy-pasting block quotes from the GFM spec in this vignette, but we can also use `md_quote()` to grammatically create or manipulate strings and print them as block quotes. ```{r eval=FALSE} read_html("https://w.wiki/A58") %>% html_element("blockquote") %>% html_text(trim = TRUE) %>% str_remove("\\[(.*)\\]") %>% md_quote() ``` ```{r echo=FALSE, results='asis'} w <- "https://en.wikipedia.org/wiki/Preamble_to_the_United_States_Constitution" x <- tryCatch( expr = read_html(w), error = function(e) NULL ) if (!is.null(x)) { x %>% html_element("blockquote") %>% html_text(trim = TRUE) %>% str_remove("\\[(.*)\\]") %>% md_quote() } ``` ```{r} md_quote(sentences) ``` ```{r echo=FALSE, results='asis'} md_paragraph(md_quote(sentences)) ``` ### 5.2 List items > A list marker is a bullet list marker or an ordered list marker. > > A bullet list marker is a -, +, or * character. > > An ordered list marker is a sequence of 1–9 arabic digits (`0-9`), followed by either a `.` character or a `)` character. (The reason for the length limit is that with 10 digits we start seeing integer overflows in some browsers.) List items are automatically created in list blocks (section 5.4) via `md_order()` and `md_bullet()`. ### 5.3 Task list items (extension) > GFM enables the `tasklist` extension, where an additional processing step is performed on list items. > > A task list item is a list item where the first block in it is a paragraph which begins with a task list item marker and at least one whitespace character before any other content. > > A task list item marker consists of an optional number of spaces, a left bracket (`[`), either a whitespace character or the letter x in either lowercase or uppercase, and then a right bracket (`]`). > > When rendered, the task list item marker is replaced with a semantic checkbox element; in an HTML output, this would be an `` element. > > If the character between the brackets is a whitespace character, the checkbox is unchecked. Otherwise, the checkbox is checked. Markdown extensions like this one are optionally supported on different venues. On GitHub (in `README.md` files, issues, or comments, etc.) these tasks list are rendered as check boxes, sometimes even interactable. On other venues, like this very HTML vignette, these check boxes just render are bullet lists with weird bracket boxes on each line. ```{r} md_task(sentences, check = 3:4) ``` ```{r echo=FALSE, results='asis'} md_task(sentences, check = 3:4) ``` ### 5.4 Lists > A list is a sequence of one or more list items of the same type. The list items may be separated by any number of blank lines. > > Two list items are of the same type if they begin with a list marker of the same type. Two list > markers are of the same type if (a) they are bullet list markers using the same character (`-`, `+`, or `*`) or (b) they are ordered list numbers with the same delimiter (either `.` or `)`). > > A list is an ordered list if its constituent list items begin with ordered list markers, and a > bullet list if its constituent list items begin with bullet list markers. > > The start number of an ordered list is determined by the list number of its initial list item. The numbers of subsequent list items are disregarded. The `md_order()` and `md_bullet()` functions let users display character vectors as list blocks. The functions let users adjust the which list item markets are used, but all the options are rendered as the same `

` > * `<textarea>` > * `<style>` > * `<xmp>` > * `<iframe>` > * `<noembed>` > * `<noframes>` > * `<script>` > * `<plaintext>` > > Filtering is done by replacing the leading < with the entity <. These tags are chosen in particular as they change how HTML is interpreted in a way unique to them (i.e. nested HTML is interpreted differently), and this is usually undesireable in the context of other rendered Markdown content. ```{r} lines <- c( "<blockquote>", " <xmp> is disallowed. <XMP> is also disallowed.", "</blockquote>" ) md_disallow(lines) ``` ```{r echo=FALSE, results='asis'} lines <- c( "<blockquote>", " <xmp> is disallowed. <XMP> is also disallowed.", "</blockquote>" ) md_disallow(lines) ``` All other HTML tags are left untouched. ### 6.12 - 6.13 Line breaks The difference between _hard_ line breaks, _soft_ line breaks, paragraphs, and regular text can be confusing. New markdown users often struggle with knowing exactly how many invisible newlines or spaces are needed to present each line in the way they want. The line break functions below make it easier. > A line break (not in a code span or HTML tag) that is preceded by two or more spaces and does not occur at the end of a block is parsed as a hard line break (rendered in HTML as a `<br />` tag): ```{r} # 6.12 Hard line breaks md_hardline(sentences) ``` ```{r echo=FALSE, results='asis'} md_hardline(sentences) ``` *** > A regular line break (not in a code span or HTML tag) that is not preceded by two or more spaces or a backslash is parsed as a softbreak. (A softbreak may be rendered in HTML either as a line ending or as a space. The result will be the same in browsers. In the examples here, a line ending will be used.) ```{r} # 6.13 Soft line breaks md_softline(sentences) ``` ```{r echo=FALSE, results='asis'} md_softline(sentences) ``` *** You can see how both hard line and soft line breaks are different from the paragraphs as created using `md_paragraph()` ```{r} md_paragraph(sentences) ``` ```{r echo=FALSE, results='asis'} md_paragraph(sentences) ``` ### 6.14 Textual content > Any characters not given an interpretation by the above rules will be parsed as plain textual content. The `md_text()` function is a catch-all wrapper around `glue::as_glue()` with simply converts any character string into a `glue` object, which when used in an `.Rmd.` document alongside the `results='asis'` allows users to simply print text to the body of the document. ```{r} # 6.14 Textual content md_text(sentences) ``` ```{r echo=FALSE, results='asis'} md_text(sentences) ``` *** Compare this to the regular handling of character strings. ```{r} print(sentences) ``` ```{r echo=FALSE, results='asis'} print(sentences) ``` *** The `md_text()` function creates strings with the `glue` S3 class. This class works in the same way as the `cat()` functions. ```{r} cat(sentences) ``` ```{r echo=FALSE, results='asis'} cat(sentences) ```