Abstract

Wikipedia is expanding its horizons and every day millions of new users engage with it. A lot of the people who contribute to or want to do so might not be as comfortable with the source wiki editor. Visual Editor provides a brilliant WYSIWYG interface, and with support for language proofing it should reduce the number of grammatical and spelling errors which are often overlooked in several re-edits. So far it does not have an integrated tool for language proofing. There are isolated bots or gadgets in some wikipedia communities but there is no aggregated uniform implementation. From a developer’s perspective, I feel this integration will enhance the the user experience of VisualEditor significantly.

Design Details :

I plan to add a button to the VisualEditor toolbar. Once a User is done editing the document, he/she can just click the button to scan for possible grammatical or spelling errors.
Here are a couple of mockups to detail upon the idea.

Implementation Details

Approach
I am going to listen to the changes on the document.
Once the User presses the 'check' button, all the text is collected and sent to the LanguageTool server running locally.
Once the response from LanguageTool server is received, the text is annotated accordingly.

Project Architecture

LanguageTool Server will run in the backend similar to Parsoid. The DM is the layer above it with UI and CE on top. Any text rendered on CE will be transferred to LanguageTool server for proofing. The server's response is communicated back.

ve.ui : Toolbars and Inspectors (User Interface)

ve.ce : Rendering, selection and Input (Content Editable)

ve.dm : Linear model and Transaction System (Data Model)

Development Plan

Phase 1 :

Set up basic infrastructure (shouldn’t take much time as I have set up both LanguageTools and MediaWiki with VisualEditor successfully)

Set up LanguageTools server inside MediaWiki

Phase 2 :

Support within ve.ce to extract text

Support within ve.ce to query LT server

Phase 3 :

Support within ve.ce to process the response from LT server to annotate text

Add toolbar button to VisualEditor to turn LanguageProofing on or off.

Parts that might require extensive work :

Annotating text according to the response from LT server

UI integration

Once the integration is functional, an optimized algorithm to send text to LT server would be required.Sending data on every update would be very expensive and unnecessary.

Extensive testing to see that all the supported languages function smoothly

Components/modules will the proposed work modify or create
I am going to work on three aspects :-

Setting up of LanguageTool Server inside Mediawiki and its integration with ve.ce

Integration with the UI

Testing

Tentative Project Timeline

May 25 - June 3

Implement LanguageTool server in MediaWiki

June 4 - June 14

Extraction of text nodes from ve.ce

June 15 - June 25

Querying LT server

June 26 - July 10

Processing of response from LT server

July 11 - July 31

Annotation of text marked in the xml response generated above

August 1 - August 10

Adding toolbar button to the Editor View.

August 11 - August 21

Extensive testing, documentation and clean up of code.

Deliverables at mid-term evaluation

Partially integrated Language Proofing : VisualEditor will be able to query LanguageTool Servers and generate an XML response.

Testing modules for all the modules implemented so far.

Final Deliverables

VisualEditor with integrated LanguageTool. The toolbar will have an additional button to provide for proofing. When the button is clicked, the grammatical mistakes will be highlighted in green and spelling mistakes in red. When the user clicks on the highlighted word, he is shown a list of alternative options and example use cases of the word or phrase.

Documentation of the code detailing all the steps taken and changes made to the original codebase.

Testing modules for the new feature so that it can be tested across a large number of wikis in supported languages.

Work done so far

I have mediawiki and VisualEditor set up both locally and on vagrant. I contributed some documentation for setting up mediawiki on vagrant from behind proxy.
I have been tinkering around with VisualEditor to get familiar with the codebase. I submitted a patch to fix the empty transclusion box problem.
I have successfully setup LanguageTool as a network service on my system. I used it to add language proofing support to a locally hosted website. I also added support for Hindi to LanguageTool. This was done as a proof of concept to determine if language proofing can be extended to new languages with ease.

Skills relevant for the project :
I have had elementary experience with JavaScript and PHP which is mostly used in VisualEditor. I am very comfortable with Java, which is what Language Editor is based on.
Apart from this I am a quick learner and can easily adapt to new languages and frameworks.

Please describe any relevant projects that you have worked on previously and what knowledge you gained from working on them :

I have had some experience working with Wikidata in the past. One of the relevant projects was building a search engine over 40GB of Wikipedia data, which provides efficient indexing of documents and retrieval of search queries. This project gave me a fairly detailed idea of the structure of text in Wikipedia pages. This would come in handy while extracting text for language proofing.
I also worked on a project which detected subtopics related to an entity in Tweets. As a part of this project I got fairly familiar with text parsing and various tools like TagMe API and Lucene. This knowledge will help me work my way around LanguageTool.

About Me

Education : Computer Science and Engineering undergraduate student at International Institute of Information TechnologyCommitments : As of now I have no prior commitments between 25th May to 25th August. I intend to contribute about 35~40 hours per week towards the project. What drives me to do this?
I have been using open source software for quite some now but I am a beginner when it comes to contributing to open source. I am a Linux enthusiast and I love scripting little tasks that make my life easier on my Linux-box.
I chose Wikimedia Foundation as an organization because I strongly believe that knowledge should be free for the use of all. I am also very interested in products that promote social engagement. I wouldn’t exactly call myself a grammar nazi, but tiny grammatical mistakes do annoy me a little. So I feel that by doing this, I would be doing my bit towards promotion of education and knowledge for all.