Cui Mingda

Cui Mingda

Sep 30, 2020

Syntax Highlighting With Triple Backticks In Jekyll 4

Jekyll default use theme minima, when we create project with command jekyll new, all prepare is ready, without any config, the code is syntax highlighting enabled.

But, if we do not use minima, for example remove the following code in the _config.yml. How does Jekyll 4 convert code snippets wrapped with triple backticks to web page with syntax highlighting enabled? Let us review the overall process.

theme: minima

First, write some code within triple backticks in the markdown file, remember to delete ^ in my code. javascript is the language type, Jekyll use kramdown to parse markdown, and use Rouge to generate syntax highlighted html code, there are many language types supported, you can found full list on the home page of Rouge.

^``` javascript
const enabled = true;
^```

Indeed, Rouge do not support triple backticks code block, aka GitHub Flavored Markdown, the convertion is completed by kramdown-parser-gfm, the plugin convert triple backticks code to HTML. Then Rouge continue to convert to the following code

<div class="language-javascript highlighter-rouge">
  <div class="highlight">
    <pre class="highlight">
      <code>
        <span class="kd">const</span>
        <span class="nx">enabled</span>
        <span class="o">=</span>
        <span class="kc">true</span>
        <span class="p">;</span>
      </code>
    </pre>
  </div>
</div>

what does kd and nx mean? Actually, they are Pygments map names, Pygments is also a generic syntax highlighter, and Rouge completely compatible with Pygments styles. Acoording source code, we found all short names for Pygments token types.

STANDARD_TYPES = {
    Token:                         '',

    Text:                          '',
    Whitespace:                    'w',
    Escape:                        'esc',
    Error:                         'err',
    Other:                         'x',

    Keyword:                       'k',
    Keyword.Constant:              'kc',
    Keyword.Declaration:           'kd',
    Keyword.Namespace:             'kn',
    Keyword.Pseudo:                'kp',
    Keyword.Reserved:              'kr',
    Keyword.Type:                  'kt',

    Name:                          'n',
    Name.Attribute:                'na',
    Name.Builtin:                  'nb',
    Name.Builtin.Pseudo:           'bp',
    Name.Class:                    'nc',
    Name.Constant:                 'no',
    Name.Decorator:                'nd',
    Name.Entity:                   'ni',
    Name.Exception:                'ne',
    Name.Function:                 'nf',
    Name.Function.Magic:           'fm',
    Name.Property:                 'py',
    Name.Label:                    'nl',
    Name.Namespace:                'nn',
    Name.Other:                    'nx',
    Name.Tag:                      'nt',
    Name.Variable:                 'nv',
    Name.Variable.Class:           'vc',
    Name.Variable.Global:          'vg',
    Name.Variable.Instance:        'vi',
    Name.Variable.Magic:           'vm',

    Literal:                       'l',
    Literal.Date:                  'ld',

    String:                        's',
    String.Affix:                  'sa',
    String.Backtick:               'sb',
    String.Char:                   'sc',
    String.Delimiter:              'dl',
    String.Doc:                    'sd',
    String.Double:                 's2',
    String.Escape:                 'se',
    String.Heredoc:                'sh',
    String.Interpol:               'si',
    String.Other:                  'sx',
    String.Regex:                  'sr',
    String.Single:                 's1',
    String.Symbol:                 'ss',

    Number:                        'm',
    Number.Bin:                    'mb',
    Number.Float:                  'mf',
    Number.Hex:                    'mh',
    Number.Integer:                'mi',
    Number.Integer.Long:           'il',
    Number.Oct:                    'mo',

    Operator:                      'o',
    Operator.Word:                 'ow',

    Punctuation:                   'p',

    Comment:                       'c',
    Comment.Hashbang:              'ch',
    Comment.Multiline:             'cm',
    Comment.Preproc:               'cp',
    Comment.PreprocFile:           'cpf',
    Comment.Single:                'c1',
    Comment.Special:               'cs',

    Generic:                       'g',
    Generic.Deleted:               'gd',
    Generic.Emph:                  'ge',
    Generic.Error:                 'gr',
    Generic.Heading:               'gh',
    Generic.Inserted:              'gi',
    Generic.Output:                'go',
    Generic.Prompt:                'gp',
    Generic.Strong:                'gs',
    Generic.Subheading:            'gu',
    Generic.Traceback:             'gt',
}

Next question, what does Pygments token type Keyword.Constant mean? Let us just look at documents of Pygments, there are detailed description on it.

Keyword
For any kind of keyword (especially if it doesn’t match any of the subtypes of course).

Keyword.Constant
For keywords that are constants (e.g. None in future Python versions).

Keyword.Declaration
For keywords used for variable declaration (e.g. var in some programming languages like JavaScript).

...

To make it easier for everyone to read, I organized all the infomation into a table.

  Token Type Description
k Keyword For any kind of keyword (especially if it doesn’t match any of the subtypes of course).
kc Keyword.Constant For keywords that are constants (e.g. None in future Python versions).
kd Keyword.Declaration For keywords used for variable declaration (e.g. var in some programming languages like JavaScript).
kn Keyword.Namespace For keywords used for namespace declarations (e.g. import in Python and Java and package in Java).
kp Keyword.Pseudo For keywords that aren’t really keywords (e.g. None in old Python versions).
kr Keyword.Reserved For reserved keywords.
kt Keyword.Type For builtin types that can’t be used as identifiers (e.g. int, char etc. in C).
n Name For any name (variable names, function names, classes).
na Name.Attribute For all attributes (e.g. in HTML tags).
nb Name.Builtin Builtin names; names that are available in the global namespace.
bp Name.Builtin.Pseudo Builtin names that are implicit (e.g. self in Ruby, this in Java).
nc Name.Class Class names. Because no lexer can know if a name is a class or a function or something else this token is meant for class declarations.
no Name.Constant Token type for constants. In some languages you can recognise a token by the way it’s defined (the value after a const keyword for example). In other languages constants are uppercase by definition (Ruby).
nd Name.Decorator Token type for decorators. Decorators are syntactic elements in the Python language. Similar syntax elements exist in C# and Java.
ni Name.Entity Token type for special entities. (e.g. ` ` in HTML).
ne Name.Exception Token type for exception names (e.g. RuntimeError in Python). Some languages define exceptions in the function signature (Java). You can highlight the name of that exception using this token then.
nf Name.Function Token type for function names.
fm Name.Function.Magic same as Name.Function but for special function names that have an implicit use in a language (e.g. __init__ method in Python).
py Name.Property  
nl Name.Label Token type for label names (e.g. in languages that support goto).
nn Name.Namespace Token type for namespaces. (e.g. import paths in Java/Python), names following the module/namespace keyword in other languages.
nx Name.Other Other names. Normally unused.
nt Name.Tag Tag names (in HTML/XML markup or configuration files).
nv Name.Variable Token type for variables. Some languages have prefixes for variable names (PHP, Ruby, Perl). You can highlight them using this token.
vc Name.Variable.Class same as Name.Variable but for class variables (also static variables).
vg Name.Variable.Global same as Name.Variable but for global variables (used in Ruby, for example).
vi Name.Variable.Instance same as Name.Variable but for instance variables.
vm Name.Variable.Magic same as Name.Variable but for special variable names that have an implicit use in a language (e.g. __doc__ in Python).
l Literal For any literal (if not further defined).
ld Literal.Date for date literals (e.g. 42d in Boo).
s String For any string literal.
sa String.Affix Token type for affixes that further specify the type of the string they’re attached to (e.g. the prefixes r and u8 in r"foo" and u8"foo").
sb String.Backtick Token type for strings enclosed in backticks.
sc String.Char Token type for single characters (e.g. Java, C).
dl String.Delimiter oken type for delimiting identifiers in “heredoc”, raw and other similar strings (e.g. the word END in Perl code print <<'END';).
sd String.Doc Token type for documentation strings (for example Python).
s2 String.Double Double quoted strings.
se String.Escape Token type for escape sequences in strings.
sh String.Heredoc Token type for “heredoc” strings (e.g. in Ruby or Perl).
si String.Interpol Token type for interpolated parts in strings (e.g. #{foo} in Ruby).
sx String.Other Token type for any other strings (for example %q{foo} string constructs in Ruby).
sr String.Regex Token type for regular expression literals (e.g. /foo/ in JavaScript).
s1 String.Single Token type for single quoted strings.
ss String.Symbol Token type for symbols (e.g. :foo in LISP or Ruby).
m Number Token type for any number literal.
mb Number.Bin Token type for binary literals (e.g. 0b101010).
mf Number.Float Token type for float literals (e.g. 42.0).
mh Number.Hex Token type for hexadecimal number literals (e.g. 0xdeadbeef).
mi Number.Integer Token type for integer literals (e.g. 42).
il Number.Integer.Long Token type for long integer literals (e.g. 42L in Python).
mo Number.Oct Token type for octal literals.
o Operator For any punctuation operator (e.g. +, -).
ow Operator.Word For any operator that is a word (e.g. not).
p Punctuation For any punctuation which is not an operator (e.g. [, (…)
c Comment Token type for any comment.
ch Comment.Hashbang Token type for hashbang comments (i.e. first lines of files that start with #!).
cm Comment.Multiline Token type for multiline comments.
cp Comment.Preproc Token type for preprocessor comments (also <?php/<% constructs).
cpf Comment.PreprocFile  
cl Comment.Single Token type for comments that end at the end of a line (e.g. # foo).
cs Comment.Special Special data in comments. For example code tags, author and license information, etc.
g Generic A generic, unstyled token. Normally you don’t use this token type.
gd Generic.Deleted Marks the token value as deleted.
ge Generic.Emph Marks the token value as emphasized.
gr Generic.Error Marks the token value as an error message.
gh Generic.Heading Marks the token value as headline.
gi Generic.Inserted Marks the token value as inserted.
go Generic.Output Marks the token value as program output (e.g. for python cli lexer).
gp Generic.Prompt Marks the token value as command prompt (e.g. bash lexer).
gs Generic.Strong Marks the token value as bold (e.g. for rst lexer).
gu Generic.Subheading Marks the token value as subheadline.
gt Generic.Traceback Marks the token value as a part of an error traceback.

We now have syntax highlighting HTML structure, and need CSS Styles now, let me introduce a new friend base16. It is an architecture for building syntax highlighting themes. Here are some styling guidelines.

Variable Description
base00 Default Background
base01 Lighter Background (Used for status bars)
base02 Selection Background
base03 Comments, Invisibles, Line Highlighting
base04 Dark Foreground (Used for status bars)
base05 Default Foreground, Caret, Delimiters, Operators
base06 Light Foreground (Not often used)
base07 Light Background (Not often used)
base08 Variables, XML Tags, Markup Link Text, Markup Lists, Diff Deleted
base09 Integers, Boolean, Constants, XML Attributes, Markup Link Url
base0A Classes, Markup Bold, Search Text Background
base0B Strings, Inherited Class, Markup Code, Diff Inserted
base0C Support, Regular Expressions, Escape Characters, Markup Quotes
base0D Functions, Methods, Attribute IDs, Headings
base0E Keywords, Storage, Selector, Markup Italic, Diff Changed
base0F Deprecated, Opening/Closing Embedded Language Tags, e.g. <?php ?>

In the end we are left with only one task, linking Rough mapping class names and base16 theme colors. We will implement this function based on SaSS.

First we need clone samme/base16-styles to local, it’s a Scss template of base16.

git clone --depth 1 https://github.com/samme/base16-styles.git ~/github.com/samme/base16-styles

There are several editions in the reponsity, copy the Scss directory to our Jekyll project.

mkdir _sass
cp -R ~/github.com/samme/base16-styles/sass _sass/themes

Next clone cuimingda/rouge-base16-scss to local, and copy _syntax-base16.scss to _sass directory in the project.

git clone --depth 1 https://github.com/cuimingda/rouge-base16-scss.git ~/github.com/samme/base16-styles
cp -R ~/github.com/cuimingda/rouge-base16-scss/_syntax-base16.scss _sass

Create assets/css/app.scss, the file will be converted to _site/assets/css/app.css, since this is kramdown’s job, we should add two rows ---, trick kramdown to parse this file. The syntax-base16 corresponds _sass/_syntax-base16.scss.

---
---
@import
  "base",
  "layout",
  "themes/base16-rebecca.sass",
  "syntax-base16";

You should notice we just use a variable, if we do not set in _config.yml, the default value is default-dark, corresponds _sass/themes/base16-default-dark.sass. Of cause we can set it in the _config.yml

base16_theme: rebecca

Now we will see rendered code snippet with syntax highlighting enabled, if you found some code is not highlighted, just modify _syntax-base16.scss

.highlighter-rouge {
  color: $base05;
  background-color: $base00;

  .k { color: $base0E; }
  .kc { color: $base0E; }
  .kd { color: $base0E; }
  .kn { color: $base0E; }
  .kp { color: $base0E; }
  .kr { color: $base0E; }
  .kt { color: $base0E; }
  .n { color: inherit; }
  .na { color: $base09; }
  .nb { color: $base08; }
  .bp { color: inherit; }
}

Conclusion

Jekyll embed kramdown to parse markdown file, embed Rouge to generate code with syntax highlighting, all is done without any configuration in the default theme minima. But if you do not use minima, then you can diy with base16 and rouge-base16-scss, the final styles are more compatiable with you website.