Technical description

Algorithm

Selection

Set2: All the elements from Set1 with an "href" attribute that does not contain a fragment (presence of the hash sign (#))

Set3 : All the elements from Set2 that have a proper extension (no parameters, a path after the domain that contains a "." character)

Set4 : All the <form> tags (form)

Process

Test1

For each element of Set3, we check whether the content of the "href" attribute of the link ends with an extension that belongs to the office document extension list

For each element returning true in Test1, raise a Message1

Test2

IF Test1 returns false, we check whether the size of Set2 is equals to size of Set3. In other words, we verify that all the links of the page have a well-defined extension.

If Test2 returns false (some links have no extension on the page), raise a Message2.

Test3

If Test2 returns true (all the links have a well-defined extension that belongs to the office document extension list), we check whether Set4 is empty (the page contains forms that may lead to a downloadable document).

If Test3 returns false (some form are found on the page), raise a Message3.

Message1: Office Document Detected

code : OfficeDocumentDetected

status: NMI

parameter : href title attribute, title attribute, snippet

present in source : yes

Message2: Check manually links without extension

code : CheckManuallyLinkWithoutExtension_Aw22-13071

status: NMI

present in source : no

Message3: Check downloadable document from form

code : CheckDownloadableDocumentFromForm_Aw22-13071

status: NMI

present in source : no

Analysis

NA

Set2 is empty (the page has no link that are not anchor)

Test3 returns true (all the links of the page have a well-defined extension AND all the extension are of office document type AND the page has no form)

NMI

In all other cases

Notes

We assume that a targetted document (pointed by the "href" attribute of the link) can be characterized by its extension.

Here is the content of the office document extension list (feel free to help us improving it or to criticise it) :