Categories

I introduced an idea of CI integration for document writing in a previous post. This spring, we finally built a continuous checking environment with Travis for RedPen user’s manual. The following is an image of the RedPen manual.

The manual is written in AsciiDoc and the source file is maintained in GitHub.

What do we check?

In the CI system, we checked two aspects of the document.

Document build
RedPen user’s manual is written in AsciiDoc. The AsciiDoc files are converted to HTML with Asciidoctor. In the CI system, we check if the conversion is successful.

Document quality
The quality of the document is checked with RedPen, a linting tool for markup text. We checked the document with the following settings. For the details of the RedPen configuration, please refer to the RedPen user’s manual.

Check documents in an editor

I have been writing the document with IntelliJ IDEA, with plugins for document writing. IntelliJ IDEA is a popular Java IDE but also useful for document writing. In addition, to check documents we can use the IntelliJ IDEA plugin for RedPen reusing the configuration file for the CI. For details of the IntelliJ IDEA plugin for RedPen, please see this blog post.

The below is the image.

Results

We got the green badge from Travis as the following image.

Travis Badge

Summary and Future Work

This article shows how to implement the continuous checking of markup documents with CIs and editors. The system checks the build and quality with RedPen and Asciidoctor. We will enhance the checks adding more and more validators.

RedPen version 1.6 supports error suppression with text annotations. This feature gives us a good balance between the quality and the productivity for document writing.

Sometimes we do not want to fix errors from RedPen, a text linting tool. Most of reasons are that the cost to remove the error is high. Or the writer breaks the writing standard at particular points on purpose. For such cases, the error suppression by annotation is useful. The annotations are added just before the sections containing the errors.

Currently error suppression is supported for four types of formats (AsciiDoc, Markdown, Re:VIEW, LaTeX). In the following section, I will show a sample text containing the error suppression annotation for error suppression.

As the sample of the annotation for error suppression, an AsciiDoc text is applied. AsciiDoc is a popular format, which is adopted by GitBook.

Sample: error suppression in AsciiDoc text

For AsciiDoc text, writers add the suppress annotation in attribute block. The annotation is [suppress]. For example, the following AsciiDoc text suppresses the all the errors in the section.

[suppress]
= Instances
Some software tools work in more than one machine, and such distributed (cluster)systems can handle huge data or tasks, because such software tools make use of large amount of computer resources, such as CPU, Disk, and Memory.

When we apply RedPen to the AsciiDoc file, we got the following messages.

When the we want to suppress only the specified errors, add Validator names after suppress. The following example suppresses only two types of errors (Contraction WeakExpression) in the section.

[suppress='Contraction WeakExpression']
= Instances
Some software tools work in more than one machine, and such distributed (cluster)systems can handle huge data or tasks, because such software tools make use of large amount of computer resources, such as CPU, Disk, and Memory.

When we apply RedPen to the AsciiDoc file, we got the following messages.

redpen sample2.asciidoc
[2016-06-14 16:13:38.005][INFO ] cc.redpen.Main - Configuration file: /usr/local/Cellar/redpen/1.6.1/libexec/conf/redpen-conf-en.xml
[2016-06-14 16:13:38.010][INFO ] cc.redpen.config.ConfigurationLoader - Loading config from specified config file: &amp;amp;amp;amp;amp;amp;amp;quot;/usr/local/Cellar/redpen/1.6.1/libexec/conf/redpen-conf-en.xml&amp;amp;amp;amp;amp;amp;amp;quot;
[2016-06-14 16:13:38.019][INFO ] cc.redpen.config.ConfigurationLoader - Succeeded to load configuration file
[2016-06-14 16:13:38.019][INFO ] cc.redpen.config.ConfigurationLoader - Language is set to &amp;amp;amp;amp;amp;amp;amp;quot;en&amp;amp;amp;amp;amp;amp;amp;quot;
[2016-06-14 16:13:38.019][WARN ] cc.redpen.config.ConfigurationLoader - No variant configuration...
[2016-06-14 16:13:38.020][INFO ] cc.redpen.config.ConfigurationLoader - No &amp;amp;amp;amp;amp;amp;amp;quot;symbols&amp;amp;amp;amp;amp;amp;amp;quot; block found in the configuration
[2016-06-14 16:13:38.023][INFO ] cc.redpen.config.SymbolTable - Default symbol settings are loaded
[2016-06-14 16:13:38.082][INFO ] cc.redpen.parser.SentenceExtractor - &amp;amp;amp;amp;amp;amp;amp;quot;[., ?, !]&amp;amp;amp;amp;amp;amp;amp;quot; are added as a end of sentence characters
[2016-06-14 16:13:38.083][INFO ] cc.redpen.parser.SentenceExtractor - &amp;amp;amp;amp;amp;amp;amp;quot;[', &amp;amp;amp;amp;amp;amp;amp;quot;]&amp;amp;amp;amp;amp;amp;amp;quot; are added as a right quotation characters
[2016-06-14 16:13:38.200][INFO ] org.reflections.Reflections - Reflections took 63 ms to scan 1 urls, producing 4 keys and 46 values
[2016-06-14 16:13:38.349][INFO ] cc.redpen.util.DictionaryLoader - Succeeded to load UnexpandedAcronymValidator default dictionary.
[2016-06-14 16:13:38.353][INFO ] cc.redpen.util.DictionaryLoader - Succeeded to load weak expressions.
[2016-06-14 16:13:38.361][INFO ] cc.redpen.util.DictionaryLoader - Succeeded to load word frequencies.
[2016-06-14 16:13:38.363][INFO ] cc.redpen.validator.JavaScriptValidator - JavaScript validators directory: js
redpen-suppress-2.asciidoc:3: ValidationError[SentenceLength], The length of the sentence (226) exceeds the maximum of 120. at line: Some software tools work in more than one machi\
ne, and such distributed (cluster)systems can handle huge data or tasks, because such software tools make use of large amount of computer resources, such as CPU, Disk, and Memory.
redpen-suppress-2.asciidoc:3: ValidationError[CommaNumber], The number of commas (6) exceeds the maximum of 3. at line: Some software tools work in more than one machine, and such \
distributed (cluster)systems can handle huge data or tasks, because such software tools make use of large amount of computer resources, such as CPU, Disk, and Memory.
redpen-suppress-2.asciidoc:3: ValidationError[SymbolWithSpace], Need whitespace after symbol &amp;amp;amp;amp;amp;amp;amp;quot;)&amp;amp;amp;amp;amp;amp;amp;quot;. at line: Some software tools work in more than one machine, and such distributed (\
cluster)systems can handle huge data or tasks, because such software tools make use of large amount of computer resources, such as CPU, Disk, and Memory.
[2016-06-14 16:13:38.411][ERROR] cc.redpen.Main - The number of errors &amp;amp;amp;amp;amp;amp;amp;quot;3&amp;amp;amp;amp;amp;amp;amp;quot; is larger than specified (limit is &amp;amp;amp;amp;amp;amp;amp;quot;1&amp;amp;amp;amp;amp;amp;amp;quot;).

We can see that we got the only the errors not specified in the annotation block are flush.

Summary and Future work

This article demonstrates the error suppression by text annotation. The next release of RedPen IntelliJ plugin is going to support the quick fix of errors by inserting the suppress annotation.

RedPen proved to be an extremely flexible and universal tool. We managed to add basic Russian language support without any major changes done to the code. In fact, the only thing that we had to implement was Russian language auto-detection. The rest worked out the box with simple modification made to the default symbols configuration: in Russian language «quote» is used instead of “quote” and № instead of #. Most likely, it will work equally fine for Ukrainian and Belorussian as well, and maybe some other languages using the Cyrillic script. This is how it looks like in RedPen Intellij Plugin:

Actually adding custom languages to RedPen can be done in Intellij IDEA without making any modifications to RedPen itself. Use Settings -> Editor -> RedPen-> Import.

For example, to add Russian language with correct double quotation marks, number sign and a couple of validators all you need to do is import a file with the following contents:

Only symbols that are different from English configuration should be listed. As for validators, you need to list all that you would like to use.

Most European/Western languages should also work fine with RedPen either using the default English configuration or a slight modification of it.

If you decide to add your own custom language, then it is a good idea to export English language configuration via Settings -> Editor -> RedPen-> Export and to change the language name in lang attribute of redpen-conf tag of the resulting file. It will serve as a good template to start with.

Then, optionally, you can add spelling and/or other dictionaries by specifying file names in either dict or list validator properties. Dictionary files are just text files with words listed one per line.

To make usage of RedPen among developers even easier we created an Intellij IDEA Plugin that also works with recent releases of other JetBrains IDEs. This plugin integrates RedPen text validation by adding a new RedPen inspection.

By default, RedPen validation errors are underlined with red (Intellij error style), but you can change it to yellow (warning) or any other highlighting style in Settings -> Editor -> Inspections -> Code style issues -> RedPen Validation.

Alternatively, raw validation error messages can be listed by pressing Ctrl+Alt+Shift+R or via IDEA menu Analyze -> RedPen: List Errors having a file selected either in editor or in the Project pane.

Installation

Just open Settings -> Plugins -> Browse Repository, and search for RedPen to install.

File formats

RedPen plugin supports the following file formats provided that the relevant plugins are installed:

Plain Text

Properties and Resource Bundles

Markdown

AsciiDoc

Language support

The plugin supports all default RedPen languages and variants (currently, English and Japanese). Language and variant are auto-detected for each file, but can be manually overridden per file via status bar widget. Manually chosen language will be saved to .idea/redpen/files.xml and therefore selection will be preserved within the project.

Quick fixes

Some validation errors can be fixed via quick fix (Alt+Enter when cursor is on an error). If no specific fix is available, it will at least offer you to remove the erroneous text. We will be adding more specific quick fixes in later releases.

RedPen configuration

RedPen is highly customizable with its configuration files, where you can define specific validators, change their properties or configure valid and invalid symbols for your writing style.

All the same can be done using RedPen configuration in Settings -> Editor -> RedPen.

Validators can be disabled by unchecking them and their properties can be edited by double-clicking on them in the table. Different properties are separated by semicolons, so you can use comma-separated values for e.g. list properties, allowing to use short custom dictionaries (see Advanced Topics for details). Spaces after = are not trimmed, which allows you to have space-only values for e.g. start_from property.

If you already have configuration files in xml format that you previously used with command-line version of RedPen, you can import them in the Settings dialog using the Import button. In a similar way Export button allows you to save current configuration snapshot for future use in other projects.

Configuration is edited or imported for each language and variant separately. If you have changed the default configuration for some language and variant pair, it will be stored per project under .idea/redpen directory, so it can be shared with fellow developers by committing it to version control.

Advanced topics

In case you want to edit raw xml configuration files under .idea/redpen, make sure you either reload the project or switch focus away from IDEA for the changes to take effect.

Many RedPen validators support custom dictionaries. In most cases, they provide two properties, list and dict.

You can use the list property to provide a short inline dictionary, just separate words with commas, e.g. list=apples,oranges. Do not put spaces between the words.

Longer custom dictionaries can be put into separate files under .idea/redpen directory. Once the file is there, you can use the dict property to specify its name, e.g. dict=mywords.txt

JavaScriptValidator is a special one, it allows you to write additional custom validators in JavaScript. By default, you can put such scripts to .idea/redpen/js directory or override the location using script_path property. All custom validators from *.js files will be activated if JavaScriptValidator is enabled in Settings.

As a major step towards popularization of RedPen we created a WordPress Plugin. It was never easier to auto-detect mistakes in posts before they get published. RedPen Plugin allows validating posts as you type by marking validation errors in-place in Visual WordPress editor or highlighting them on-click in Text WordPress editor. Mistake explanation can be found by hovering marked text in the post or in a list below, which shows all currently present mistakes.

To make it easier to maintain multilingual websites the language of the current post is detected automatically up to a variant (e.g. different Japanese symbol widths: zenkaku or hankaku). However, manual language change is also available if auto-detection fails (e.g. for cases when multiple languages are used in a single post).

Validation starts working out of the box after simple installation from WordPress Plugin Directory. Advanced users of RedPen will find it easy to customize RedPen settings to match their needs: all validators and symbols can be easily configured via convenient GUI. Don’t hesitate to modify the configuration, it can always be reset to default by a single click.

The plugin is integrated with the RedPen Server via REST API. By default, the plugin uses public RedPen installation at Heroku for validation. However, if you are uncomfortable sending your text for validation to an external server, you can easily configure the plugin to use your own instance of RedPen Server. Server location is configurable via Settings > Writing.

Pandoc correctly converted in-line formatting, and automatically provided anchors for all headings. It quickly enabled us to get basic AsciiDoc versions of the existing files.

However, although pandoc is a very useful and powerful tool, it encountered a few problems converting our RST files to AsciiDoc. This was not totally unexpected, since there are markup options that cannot be directly translated between reStructured Text and AsciiDoc. However, some of the problems encountered meant that a significant amount of text had to to be reconverted by hand.

The issues we encountered were:

Missing tables

Several of the tables in the source documents were totally absent in the AsciiDoc files pandoc created. For example, this RST text:

Other considerations

AsciiDoctor does not currently support the creation of a table of contents that spans multiple source documents. Given this limitation, we decided to combine all RedPen documents into a single HTML page. Although the page is larger, it is easier to navigate on some devices, and does not require any additional tools to build a surrounding multi-document index or menu.

Summary

Converting documents between different markup formats is reasonably straightforward, and tools such as pandoc are excellent choices for making headway quickly and easily. However, such tools do not always support all markup formats equally, and there may still be plenty of manual validation and editing to get your documents in good order.

Although pandoc v1.13.2 is not the current version, it was the version available at the time on Fedora 23. However, at the time of writing, the latest version available via http://pandoc.org/try/ still appears to produce the same translation as we encountered.

LaTeX support

In this release, we provided experimental support for LaTeX as an input format. Many people have requested LaTeX support, starting from the initial development of RedPen. Since the v0.6 release, we took one year to support LaTeX, and we finally succeeded.

Unfortunately, the LaTeX support is limited in the following ways:

RedPen LaTeX parser does not work well when macros are used to add your own tags

It does not support a complete check of sentence in lists and tables

Although these are big constraints, we believe that it would be used to inspect papers and documents.

Enhancement of functions (Validators)

In v1.4, we also concentrated on enhancements the (Validator) functions. To the added functions, three types of language support were added: support in both Japanese and English, support in English only, and support in Japanese only.

Functions supported in both Japanese and English

DoubleNegative In both Japanese and English, double negative statements are difficult to understand. If a double negative is present in the text, an error is output.

Functions supported in English only

FrequentSentenceStart When writing a document in English, many sentences can start with We. Because even when there is no problem with the content, the appearance is bad, and therefore it is good to swiftly replace them. Consider the following example.

We propose a novel method. We demonstrate the effectiveness of the method.

We in the above example has been used twice in a row. Without changing the meaning, we will edit the sentences to prevent continuous use of the same subject.

We propose a novel method. The effectiveness of the method is demonstrated in the experiments.

UnexpandedAcronym This function checks documents for the presence of acronyms and also for the original words that they represent.

WordFrequency If the word frequency within the document differs from the usual, an error is output.

Hyphenation If hyphen usage is not correct, an error is output.

NumberFormat If number formats differ from correct usage in English, an error is output.

ParenthesizedSentence This function inspects for usage of parentheses. If there are nested parentheses or more parentheses than specified, an error is output.

WeakExpression If the text has an ambiguous English expression, an error is output. For example, words such as completely and huge should be replaced with more accurate representations.

Functions supported in Japanese only

Okurigana If Japanese okurigana word endings are used incorrectly, an error is output.

DoubledJoshi if a particle is used more than once in a sentence, it might be difficult to read.

Prospect of version 1.5

We will continue development of RedPen v1.5 and more. The fact is that we have not yet set the priorities, but for v1.5, a mechanism that can easily test functions written in JavaScript would be included.