Description

Brief summary

MediaWiki provides a Linter extension that exposes markup issues to editors to fix. These markup issues are classified into different Linter categories. These issues are identified by Parsoid during its wikitext parsing process.

As part of this project, at the very minimum, you would be implementing two things: (a) write code in Parsoid to detect the use of links-in-links which is not semantically meaningful and cannot be rendered in HTML (b) write code in the PHP Linter extension to add this new category.

Example wikitext that has this markup error.

[http://google.com This is [[Google]]'s search page]

In the above example, Google is linked in the This is [[Google]]'s search page link text for the http://google.com url link. This is invalid and should be flagged by the Linter code in Parsoid.

Skills required

Both node.js and PHP skills would be ideal. At least one of them would be good. Familiarity with wikitext and/or DOM manipulation would be a bonus, but not required for this project. You will be picking up the necessary skills.

Is this specifically about link syntax in external links, or in any links? If the latter, keep in mind that descriptions in file embeds may contain links, e.g. [[File:Example.png|an [[example]] image]] or [[File:Example.png|an [https://en.wikipedia.org example] image]]; the linter category should not pick these up.

Is this specifically about link syntax in external links, or in any links? If the latter, keep in mind that descriptions in file embeds may contain links, e.g. [[File:Example.png|an [[example]] image]] or [[File:Example.png|an [https://en.wikipedia.org example] image]]; the linter category should not pick these up.

This is about invalid HTML output, not so much about wikitext. So, yes, image captions can contain links because the captions aren't embedded in other links. But, yes, wikitext syntax makes it confusing because in some cases, you can embed links in link syntax and in other cases, you cannot.

From the perspective of the wikitext preprocessor, [ and ] are not currently "seen" by the preprocessor. So any [... [[ ... ]] ... ] construct is an invalid link, but the preprocessor can't tell that currently. So that's the specific case which https://gerrit.wikimedia.org/r/396049 would help with.

Agreed with @Dinoguy1000 and @ssastry that [[ ... [[ ... ]] ... ]] is a little more subtle, since some of those are valid. But we should try to make the behavior consistent for the invalid cases, instead of emitting broken HTML and letting tidy fix it up arbitrarily. The "wikitext way" is probably to emit literal [[ characters in the output for the inner link, which will make it obvious to editors that there's a problem that needs to be fixed.

This message is for all candidates interested in working on this project for Outreachy. Please make sure that before you start working on this project, you've filled out an initial application to help Outreachy organizers verify whether or not you are eligible to participate in the program: https://www.outreachy.org/eligibility/. It should only take you 5 minutes to 30 minutes to complete.