A comparison test

We thank the software publishers – Compilatio and Urkund – for their willingness to participate.

Please let us know if you use another program and would like to share your experiences with all our newsletter subscribers.

Compilatio

Urkund

General description of the program

Compilatio is a French program.

“In 2005, teachers in France let the management of the Six Degrés, a company specialized in web design, know what their needs were with respect to being able to check for plagiarism. The developers and teachers reflected on the possible options together. Frédéric Agnès, one of the two partners of Six Degrés, then decided to take on the project. The first version of Compilatio was released in 2008. In 2009, the team working on Compilatio created a new company with the same name, which was integrated into the holding company Six Degrés.” (Source: http://cursus.edu/article/17669/comment-utiliser-logiciel-anti-plagiat/#.VrS2tlLz8cs).

Urkund is a Swedish program.

“URKUND is owned and developed by PrioInfo AB. PrioInfo is a company with over 25 years’ experience of the requirements and needs of information-intensive organisations. URKUND originated in the academic world. A team of teachers developed the idea of a web-based service that would help them detect and deter plagiarism and URKUND was born in the autumn of 2000… URKUND continued to grow and develop over the years and came to be recognised as Sweden’s foremost anti plagiarism service.” (Source: http://www.urkund.com/en/about-urkund/272-about-urkund).

Ergonomics

Pros

Intuitive and easy to use

Cons

Offers fewer features (for example, simultaneous access to other sources where there are similarities) than Urkund.

Pros

Allows users to access many features on a single page.

Cons

Less intuitive than Compilatio because the layout is more complex and sophisticated

Display of similarities

Pros

The text analyzed and the source text appear in their entirety: Similarities not recognized by the program and not indicated in a different color can be detected by the teachers during their review. Detecting paraphrases or sophisticated plagiarism is also made easier since the two texts appear side by side.

Words in bold (and red) indicate 100% similarities.

Cons

All the sentence segments that evidence verbatim plagiarism do not appear in a different color; the analysis takes more time. In addition, some words that appear in a different color do not evidence verbatim plagiarism.

Pros

The display allows users to simultaneously access other sources where the same similarities were detected within a single sentence. “URKUND always shows the best resource in the first layer, closest to the text, but also reports on up to five other sources. These other sources are considered alternative and are shown in the left margin” (“URKUND Administrator Guide”).

Cons

The text of the document analyzed appears in its entirety, but only the similarities detected in the source appear next to it. It is only the portion of the source that is also in the document being examined that is presented on the source side (and not the text in its entirety).

It is therefore not possible to see what may not have been recognized by the program and that might be a paraphrase: the user must click on the link for the source in order to see the original article. The review therefore takes longer, especially since the two texts are not side by side in this process.

The review is also not as easy because Urkund converts all characters to Verdana (for example, to detect chapter titles): “During the analysis process, any italics, underlining, and bold has been removed. The font has been replaced by Verdana, to facilitate the on-screen review. All images and tables that cannot be converted to text are also removed” (“URKUND Administrator Guide”).

Percentages of similarity

The document submitted for analysis is broken down into “parts” that can be quite numerous depending on the length of the text.

Compilatio provides an overall percentage of similarities for the entirety of the text and also a percentage per “part.”

The percentages are in relation to the document analyzed: 28% similarity, for example, means that 28% of the text contained in the document submitted for analysis was recognized as being similar to the sources.

For each source, a specific percentage is given, which means that the document analyzed contains X% of text similar to that source.

The total similarities for an analyzed document is the sum of the similarities for each source. It is possible for users to ignore sources they do not want to take into account. Those sources will then not be taken into account for the calculation of the percentage of similarity. (The user just has to check the box next to the source in question and then click on the “ignore” button.)

All the other sources, whether they are “very likely” or “somewhat likely,” will be taken into account in calculating the percentage of similarity. (The sections of the text that are similar to several sources are only taken into account once.)

As a result, the user will obtain:

– a percentage of similarity for each part

– a percentage of similarity for the document in its entirety

Pros

– The selections made to refine the analysis (for example, whether or not to remove sources) do not disappear when the program is closed. They can however be modified at any time with a simple click.

– Passages in quotation marks can easily be excluded from the calculation of the percentages: The user just has to select the option to exclude text in quotation marks from the percentage of similarity.

Urkund provides:

1) an overall percentage of similarity: 12% similarity, for example, in a 700-page text means that 12% of the document submitted for analysis is identical to the sources found by Urkund

2) A percentage for each source where the program found similarities

3) Within a source, a percentage for each excerpt of text where similarities were detected

In this last case, the percentage “is the degree of similarity in detail that the text share with the excerpt from the source. This value helps in detailing the review process:

“100% means that the text is identical with the excerpt from the source.

“50% means, simply put, that half of the words in the text in some way differs from the excerpt from the source.

“Similarities below 20% are normally not highlighted.”

“If a highlight is considered correct or irrelevant it is easily deactivated… An inactive highlight turns grey and eventually disappears” (“URKUND Administrator Guide”). The overall highlighting will be modified as a result.

Pros

The overall percentage can be refined by ignoring the segments of text detected as being similar within a single source.

Cons

– Although the results of the selections made (whether or not to remove certain passages) can be sent by e-mail, these selections disappear when the program is closed.

Hint: saving the Internet link will help you come back to your previous selections.

– Passages in quotation marks appear in a different color if the user so wishes but, if there is both verbatim and properly quoted text within a single passage, it is not possible to take out the sections that appear in quotation marks so that they are not taken into account in the calculation of the percentages.

Limits of percentages: general comment

The percentages, except in cases of verbatim plagiarism, do not reveal the extent of the plagiarism. They are just leads for a later analysis (a necessary step) since paraphrases and sophisticated verbatim, as well as graphics, images and non-textual data in their entirety, cannot be detected by the programs.

The percentages, except in cases of verbatim plagiarism, do not reveal the extent of the plagiarism. They are just leads for a later analysis (a necessary step) since paraphrases and sophisticated verbatim, as well as graphics, images and non-textual data in their entirety, cannot be detected by the programs.

It is however to be noted that Urkund shows the words that differ between the two texts where the program detected similarities. (See “Particularities” below.)

Detection of attempts to manipulate the text so that the program does not recognize the similarities

Compilatio can detect attempts at manipulation, which are indicated with a pictogram.

According to Compilatio’s customer support, “There were new implementations to going around our program, including the detection of non-analyzable text (pictogram of a triangle with an exclamation point, which means that a portion of the document could potentially have been modified so that a source would not be detected).

“New search functions will also be implemented in our software in 2016, including significantly improved functions for detecting reformulations.”

URKUND can detect attempts at manipulation. This is indicated with “Warnings.”

The warnings also detect the manipulation of spaces (for example, by adding extra blank spaces). “We are also testing a new function” to be able to display what is in parentheses in the texts analyzed.

Limits of plagiarism detection

– Does not take translations into account

– Does not recognize tables / graphics / images

– All sources are not accessible (for example, if the person running the analysis chooses to remove a document from the “reference library,” “this action corresponds to erasing it entirely from your database and from that of Compilatio.net” (Source: Magister by compilatio.net)

– Documents that are not available for free cannot be accessed.

– Does not take translations into account

– Does not recognize tables / graphics / images

– Not all of the sources submitted to Urkund for analysis are accessible (for example, if the user or student chose the option “anonymous” in the context of respecting copyright). The option “auto-delete” also allows the text analyzed to be deleted in its entirety.

– Documents that are not available for free cannot be accessed. However, Urkund is creating many partnerships (with scientific journals, encyclopedias, etc.) in order to expand its database.

Analysis reports

Compilatio offers “3 levels of precision in your report:

1. The report’s ‘summary’ tab: an overview of your document, with the top sources (the main sources found) and the corresponding similar passages. You can access the website directly by clicking on the source.

2. The ‘whole text’ tab: your document in its entirety with the similarities found.

3. The report’s ‘sources’ tab: all sources similar to your document, ranked by percentage and by degree of relevance.”

Compilatio allows you to “decipher the categories of sources:

– The ‘very likely’ sources: a list of the sources that can be the most easily copied by the student (the most common sites) and where the program detected an abnormally high rate of similarity.

– The ‘somewhat likely’ sources: a list of the sources that can be somewhat easily copied by the student and where the program detected some suspicious similarities.

– The ‘accidental’ sources: a list of the sources where the program detected a very low rate of similarity with the student’s document.” (Source: Compilatio Magister “Aide au démarrage” (Help getting started))

The report presents the text of the document analyzed, with retranscription, in color, of the similarities and source references at the exact point in the text where those similarities were detected by Urkund. The corresponding percentages are also indicated.

Particularities

Certain sources are indicated as belonging “to another user”: these are either sources submitted by authors who chose to remain anonymous or “external sources,” that is, sources from a Compilatio user outside your university.

To preserve the requested anonymity, the data are encrypted but Compilatio still shows the portions of similar text. This display is invaluable in cases of extensive plagiarism.

This is especially true as it is possible to gain access to the document through Compilatio.

These are the steps to follow:

– send certain information to Compilatio –(Name of the account / Name of the file / Name of the document / Source in question)

– wait for the person to agree to send the source in question and make contact with you after Compilatio has sent them your contact information.

When two samples of similar text appear opposite each other, the program allows you to view, in detail, the differences in the two texts. This could involve, for example, missing words in one of the two texts, tense differences or the use of synonyms.

“When [Show detailed text differences] is On these will be displayed on the source side in the form of colored boxes around the words that differ from the examined document.”

For example, if “[t]here is a word in the examined document that is not in the source,” the colored box is empty.

“There are one or more sentences in the examined document that is not present in the source.

“There are one or more words in the source that is not present in the examined document.

There is a word in the source that is also in the examined document, but in another form[, i.e., a] synonym, a changed [tense], misspelled or similar.” For example, “In some cases” becomes “in some circumstances.”

(“URKUND Administrator Guide”)

Language analysis

Compilatio can analyze all documents written in the Latin alphabet, in any language.

URKUND can analyze documents in all the languages using the Latin alphabet and offers “the possibility of analyzing Arabic, Mandarin and Hebrew, among others.” (Source: Urkund customer support.)

Database

The response from Compilatio customer support
“Our service includes a three-level comparison:
– the freely accessible Internet
– documents submitted in your university
– documents submitted by all Compilatio users (respecting the confidentiality of the documents). We can add the archives of papers from previous years or document collections that you send us. Users can also enhance their own ‘reference libraries’ with any documents at their disposal, at any time.”

The response from Urkund customer support

“1. All sources available on the Internet, 45 billion Internet sites.

2. Documents already received by URKUND, in its archives, approximately 17 million documents (2016-02-15).

3. Publications accessible in the databases of our partners. 4,000 sources of information, a database of more than 1,000,000 newspapers.”

Storage

The response from Compilatio customer support
“There is no restriction on the number of documents whose content is in the ‘reference library.’ The storage limit for the original files of the documents analyzed by the users depends on the service chosen.
You can analyze as many documents as you would like, with no restrictions, in the context of an individual user in a normal university setting.”

The response from Urkund customer support



“In order to use URKUND, the university or school must have a license and a contract with us. With that license, they can analyze as many documents as they wish and they can have as many users (professors) as they wish. We do not have limits per year or per student. Even if they use the URKUND’s web inbox, they can store an unlimited number of documents. There is no limit on the size of the documents that can be analyzed.

Document confidentiality and intellectual property:

 URKUND allows you to ensure the confidentiality of certain documents: completely deleting the document without the ability to access or share it externally. If you choose to keep the content of confidential student papers, URKUND will not give copies of the documents to others. URKUND may, at the end of the contract, return the entirety of the data to the university and destroy all of the files in its storage.”

Training available

The response from Compilatio customer support

“Compilatio provides the institution different types of assistance:

– training the user who will be the point person for other members of the institution

– setting up a ‘plagiarism prevention’ action plan for the institution, including:

– formulating the regulatory framework

– setting up a program to train and educate teachers and students

– procedures for monitoring and handling situations

– communicating with teachers/students regarding all activities undertaken
As a user, you will have guides available in your account that will help you when you use the service.

We also provide on-site training.

Technical support

Users of the service can access technical support from their Magister account (by submitting a form).

Our team will respond within 48 hours (working days).”

The response from Urkund customer support

“Urkund makes downloadable manuals available to users on its website. They are in English, except for “quick start,” which is available in French.” However, it was pointed out to us that:

“If you write to us in French, your message will be forwarded to someone who can respond in French.

Telephone support is available in English and in French.

Our “Help Desk” service can assist customers by e-mail or telephone in French.

The department will respond quickly, in less than 24 hours (Monday-Friday).