Artifact Content

Artifact b6ddd8d5a8c53fe080ff80fa0c6a4dc5c93e9917:

D 2014-10-12T04:25:19.625
L AutoupdateRegex
N text/x-markdown
P c24bbc1b08d86e26959b70fc6a3a9413cee9cd5d
U mario
W 5495
<img src="http://freshcode.club/img/drchangelog.png" align=right height=150 width=150 alt=birdy>
The [Autoupdate](wiki/Autoupdate) "regex" module is the most versatile for collecting release infos from project pages. Besides **RegExp** matching (for text sources), it also supports **XPath** and **jQuery**-style selections now, which ease HTML project website scraping.
See also <a href="http://freshcode.club/drchangelog">Dr. Changelog</a> for trying it out.
### Field Rules
It can be configured in the *Autoupdate Rules/Regex* project field, where it expects a list of `key = ...` entries. Each key can list an URL, one or more RegExp, XPath or jQuery expressions.
version = http://example.com/download.html
version = /(\d+\.\d+(\.\d+)+)/
changes = http://example.com/news.html
changes = $("#main .release div.current")
changes = /Summary:\s*(.+?)\R\R/smix
scope = ~((minor|major) (bugfix|cleanup|security))~
state = ~(stable|beta|prerelease)~i
download = $("a.download").attr("href")
It will not update general project descriptions, but only `version=` and `changes=` or optionally `scope=`, `state=` and `download=`.
* URLs should preceed the extraction expressions.
* For regex rules the first capture group `(..)` will be used as result.
* All regex flags `/Umixus` are allowed, and a special `/*` match-all flag is provided.
* Use line breaks to separate rule assignments. Comments in between will effectively be ignored.
* Xpath expressions for example take the form `changes = (//ul)[1]/li`
* jQuery-style selectors can chain `$("div").find("#first")` multiple selector functions.
* Field/key names may be prefixed with `$` or `%` as in `$version = /([\d.]+)/`.
### URL sources
Initially the primary *Autoupdate URL* is used as source for extraction. It's equivalent to listing an URL for `version =`. Each subsequent field extraction will reuse the lastly retrieved page. Like-named URL entries in *[Other URLs](wiki/Other+URLs)* will also be recognized.
### Regex multi-match /* flag
There's a special regex flag `/*` for a `preg_match_all` mode. It's used by the listing for the Linux kernel (which is a git log) for instance:
changes = /^Date:.+\R\R\s+(.+)\s+[ ]commit/m*
Here multiple occurences will be found, and merged into a changelog list. (So it's somewhat like the `/g` flag in JavaScript.)
### Slicing
Oftentimes it's simpler to just narrow down the extraction area however. Therefore repeating `key=/regex/` specifiers often is useful:
changes = /Changelog(.+?)\Z/s
changes = /(.+)---/
It's sometimes sensible to mix XPath/jQuery extractions first and a regex thereafter to cut out the actual result:
version = $("article h4")
version = ~Version ([\d.]+)~
Matching rules thus iteratively isolate the field to be populated.
### jQuery-style selector chaining
Often it suffices to call the main `$()` CSS selector function. And one could again use multiple slicing rules, but many jQuery-style subfunctions can be chained in one line:
changes = $(".article .first").next().find("li")
XPath and jQuery rule assignments can only be single-line directives. (Unlike RegExps with the /x flag, which can wrap around linebreaks.)
### References
See [regular-expressions.info](http://www.regular-expressions.info/) for a simple RegExp introduction. Otherwise check out [jQ & CSS selectors](http://standardista.com/jquery/) and the [w3.org spec](http://www.w3.org/TR/CSS2/selector.html) or [jQuery pseudo selectors](http://api.jquery.com/category/selectors/) for CSS selectors. And the [XPath / Selenium cheat sheet](https://www.simple-talk.com/dotnet/.net-framework/xpath,-css,-dom-and-selenium-the-rosetta-stone/) or an [Xpath/Regex overview](http://xpath.alephzarro.com/content/cheatsheet.html) for XPath examples.
### Examples Regex
If you use semantic versioning, then you can keep the `\d+.\d+.\d+` version= field. To allow for `-beta` or `-dev.2` prefixes even:
version = /((\d+\.\d+(\.\d+)+(-\w+(?:\.\w+)*)*/
You can of course preceed this regex with more concrete context matches. If for example you were to use meta data comments:
version = ~ ^\h* [/#*]+ \h*version:\h* (\d+(?:\.\d+)+[-.\w]+) ~mix
Extracting a Changelog summary is more difficult. If you want to eschew manual release submissions on *freshcode.club* you may wish to adopt a coherent README or CHANGELOG scheme.
For example I use a `history\n------\n` marker in the README, where it's easy to match the pre-summarized changes:
changes = /history\R-----+\R+[\d.]+\R(.+?)\R\R/s
The `\R` is a linebreak placeholder (all CR, LF, CRLF variants), and `\R\R` hence an empty line.
For the `changes` field any `-` or `#` and `*` at the start of lines get stripped, btw.
You still ought to keep the changelog in an end-user approachable writing style.
### hidden releases
If you can't uncover a suitable source for `$changes=` then your automated release submission will be classified as *hidden*. Thus the project entry will stay current, but no frontpage listing (or notification) will occur.
The regex module will also likely be rate limited, so won't rescan your website daily.
### interval= rule
All Autoupdate modules additionally support the `interval = 7` rule; the number specifying a minimum amount of days before any new release lookup is attempted.
Z e6343c9428edbadb68a87b3f95ff013e