The Majestic Documents: A Forensic Linguistic Report (Pt 1)

Â Â Â The term âMajestic documentsâ refers generally to thousands of pages of purportedly classified government documents that prove the existence of a Top Secret group of scientists and military personnelâMajestic 12âformed in 1947 under President Harry Truman, and charged with investigating crashed extraterrestrial spacecraft and their occupants. Majestic 12 personnel allegedly included a number of noteworthy political, scientific, and military figures, including: Rear Admiral Roscoe Hillenkoetter, the first CIA Director; Dr. Vannevar Bush, wartime chair of the Office of Scientific Research; James Forrestal, Secretary of the Navy and first Secretary of Defense; General Nathan Twining, head of Air Materiel Command at Wright-Patterson Air Force Base and later Chairman of Joint Chiefs of Staff; and Dr. Donald Menzel, an astronomer at Harvard University. More specifically, the Majestic documents refer to a series allegedly classified documents leaked from 1981 to the present day by unidentified sources concerning Majestic 12 and the United States governmentâs knowledge of intelligent extraterrestrials and their technology.1 The documents date from 1942 to 1999.

Due to the explosive nature of their content, the Majestic documents are considered by many to be the core evidence for a genuine extraterrestrial reality and alien visitation of planet Earth in the 20th century. United States government personnel have denied their authenticity, primarily on an opinion rendered by AFOSI, the U.S. Air Force counterintelligence office. The AFOSI report focused on certain features of the documents it considered historically anachronous and other historical inconsistencies (see Section 1.2 below). The charges of the AFOSI have been coherently rebutted, and so both validation and debunking efforts has resulted in a stalemate.

This impasse notwithstanding, other documents discovered before and after the alleged leaking of the Majestic documents appear to validate the existence of the group Majestic-12. In 1985, a document referring to a joint National Security Council (NSC) MJ-12 âSpecial Studies Projectâ group was discovered by Jaime Shandera in the National Archives.2 This document, a 1954 memorandum from Robert Cutler to General Nathan Twining, became known by UFO researchers as the Cutler-Twining memo. The Cutler-Twining memo shared certain stylistic traits with a 1953 memorandum between Cutler and Twining discovered in 1981 among General Twiningâs papers at the Library of Congress. Canadian documents discovered in 1978, three years before the first alleged leak of the first Majestic documents, note the existence of a highly-classified UFO study group operating within the Pentagon’s U.S. Research and Development Board, and headed by Dr. Vannevar Bush. Although the name of the group is not given, these Canadian documents appear to support the existence of Majestic 12. While this may be the case, proof for the existence of Majestic 12 does not logically translate into authentication for the Majestic Documents themselves or their content on other points.

Previous Research on the Majestic Documents

The Majestic documents have undergone thorough forensic authentication with respect to non-linguistic issues and methods.3 The primary researchers who have put considerable effort into authenticating the documents are Stanton Friedman4 and the father-son team of Dr. Robert and Ryan Wood.5 These researchers have tested the documents in the following ways:6

1. Physical dating of the ink, pencil and paper.

2. Dating by matching the reproductive process (typography) of the typewriter, printer,copy machine, or mimeographic machine.

3. Dating by use of language of the period.

4. Watermarks and chemical composition of paper.

5. Comparison of handwriting.

6. Comparison with known events of record.

7. Comparison with known styles for government memoranda and correspondence.

8. Comparison with known or expected security procedures.

9. Logic of content.

10. Records of provenance.

11. Eyewitness testimony of individuals mentioned in documents

The Wood team was able to solicit the expertise of specialists in their authentication effort. For comparison of typewriter impressions, watermarks, James Black served as their primary expert. Mr. Black is a Fellow of the Questioned Documents Section of the American Academy of Forensic Sciences and a former chairman of the Questioned Documents Subcommittee of the American Society of Testing and Materials.7 For examination of paper, ink, and watermarks, the Wood team sought the services of the Speckin Forensic Laboratories. The Speckin website states that the laboratory is:

A variety of concerns have been raised in the course of forensic authentication procedures and publication of these efforts, such as apparent anachronistic statements, possible typewriter impression inconsistencies, grammatical errors, departures from standard styles, printing flaws, and virtually identical signatures on different documents. Examples of each of these concerns have been catalogued and answered by the Wood team.9

To date such criticisms of the Majestic Documents have failed to deliver conclusive evidence of forgery. However, Stanton Friedman has successfully detected several fakes among the cache. The forgeries were photocopies of authentic documents with certain content and vocabulary changes designed to alter the content toward a discussion of Majestic 12. These forgeries are explained and illustrated on Friedmanâs website.10 The presence of these forgeries do raise the spectre that all the Majestic documents may be contrived, especially since an estimated seventy percent of the documents are photocopies. However, it is important to note that no other fakes have been conclusively detected.

Notwithstanding the examinations noted above, the Majestic documents have never been subjected to scientific linguistic analysis to determine the validity of their authorship. While the Wood team and Mr. Friedman mention in several of the cited publications and websites that the Majestic documents have also undergone âlinguisticâ testing, the same publications and online sources offer no evidence of such testing. The Wood team and Mr. Friedman fail to define what they mean by terms like âlinguistic testingâ or âlinguistic analysis,â and offer no proof that genuine forensic linguistic analysis of the type conducted for this paper ever took place as part of their authentication efforts. Additionally, while the Speckin Forensic Laboratories website mentions that the company does work in âcomputer forensicsâ (see above), the Woods offer no evidence in their writings or website that Speckin ever tested the Majestic documents in this way.

Only Stanton Friedman makes any attempt to describe an effort to have the Majestic documents tested linguistically and, as his description makes clear, no modern forensic computational linguistic work was actually done:

At the suggestion of attorney Bob Bletchman, I had obtained 27 examples of Hillenkoetter’s various writings from the Truman Library. Dr. Wescott reviewed these and the EBD [Eisenhower Briefing Document] and stated in an April 7, 1988, letter to Bob . . . âIn my opinion there is no compelling reason to regard any of these communications as fraudulent or to believe that any of them were written by anyone other than Hillenkoetter himself. This statement holds for the controversial presidential briefing memorandum of November 18, 1952, as well as for the letters, both official and personal.â11

The above account contains no information on what Dr. Wescott (now deceased) did with the documents given to him. Several considerations suggest that Dr. Wescott likely did little more than look at the documents, rather than conducting actual tests. First, the development of the field of computational linguistics and the use of computers for natural language processing of necessity followed the development of computers and processing power. In 1988 these research methods were known, but not widely available. Second, Dr. Wescottâs areas of expertise included neither authorship attribution research or computer forensic linguistics. Rather, the focus of Dr. Wescottâs work was anthropological linguistics.12 Despite his distinguished academic year, a search of linguistics databases produces no evidence that Dr. Wescott ever did any work in these areas. This is no doubt because his teaching career ended at roughly the time these fields were beginning to blossom.

These observations are significant, since training as a linguist, especially one that earned his Ph.D. in 1948, does not guarantee one has any knowledge of any given subfield within oneâs discipline. For example, what would a podiatrist know about heart surgery? A cardiologist about neuro-medicine? A defense attorney about patent law? A microbiologist about frogs? The answer to all would be very littleâenough to perhaps converse with other nonspecialists, but not nearly enough to be considered competent by specialists. The point is that a doctoral degree in linguistics hardly guarantees and sort of expertise in a specific sub-discipline of linguistics, especially one that dovetailed with computer science. Dr. Wescott had perhaps used a computer by 1988, but his academic record gives no indication that he was either proficient in their use or involved in applying computers to language processing and authorship attribution. Consequently, he would be disqualified from having anything meaningful to contribute to any discussion of computational methods of authorship attribution.

It should also be noted that Dr. Wescottâs assessment lacks conviction. At best his amateur opinion in this sub-discipline of linguistics offers the conclusion that he has no basis to draw an actual conclusion. As UFO researcher Paul Kimball points out, Wescott himself made it clear that he had given no conclusive answer or endorsement to authenticity. In a letter to the International UFO Reporter, Wescott wrote: âI have no strong conviction favoring either rather polarized position in the matter . . . I wrote that I thought its [the EBD] fraudulence [was] unproved . . . I could equally well have maintained that its authenticity is unproved . . . inconclusiveness seems to me to be of its essence.â13

This is all that is offered in terms of linguistic testing and evidence for the Majestic documents. The thoroughness and care with which Friedman and the Woods have addressed other forensic issues is sorely lacking with respect to modern methods of linguistic analysis, specifically designed to determine (or rule out the possibility) of authorship of documents. The absence of demonstrable testing data in any form of publication puts the burden of proof on these and other researchers to prove they have indeed subjected the Majestic documents to linguistic analysis.

Nature and Objectives of the Current Study

This study fills the existing research void created by the absence of strictly linguistic approaches to the problem of authenticating the Majestic documents. The goal of the research presented in this study was to determine whether the Majestic documents that carry a signature were indeed written by the people to whom authorship is attributed. Toward achieving this goal, the study employed state-of-the-art computational linguistic methods of authorship attribution. In some cases, these techniques have been pioneered by Dr. Carol Chaski, a recognized leader in this type of linguistic research.14 These methods have been employed, validated, and approved numerous times in various courts of law. It is the opinion of the authors that the utilization of these methods is the most reliable and testable means of authenticating or refuting the authorship attribution of those Majestic documents that bear the name of an author.

The focus of this study, as noted, is validation or falsification of the authorship attributions of the Majestic documents. As such, the scientific methods employed for this study cannot be used to validate the content of any of the Majestic documents whose authorship proves genuine. The computational methods of the research cannot determine the truth of written content. It can only determine whether or not that content was written by the attributed author. Refutation of attributed authorship would prove a document is a forgery, and so the content of that document would therefore be considered spurious. The converse is not true, however. Authentication of authorship means only that the document was written by the person to whom it is attributed. This suggests that the content is genuine, but does not actually prove that to be the case. Additionally, computational methods of authorship attribution lend nothing to the necessary enterprise of interpreting written content. The study should be characterized as preliminary because further testing that could be applied to the documents is currently cost prohibitive. As funding becomes available, other methods will be applied for redundancy and validation of the results presented in this paper.

The remainder of this paper details the application of computational linguistic methods to determine the authenticity of authorship attributions of the Majestic documents. The paper is divided into the following sections:

â¢ Description of the Majestic documents included and excluded in the study.

â¢ Overview of the linguistic testing methods used in the study.

â¢ Explanation and interpretation of the test results.

â¢ Overview of how these same methods have held up in courts of law.

â¢ Suggestions for future linguistic research of the Majestic documents

Authorship Attribution Study of the Majestic Documents

Source of the Majestic Documents for Testing

The Majestic documents tested were obtained online via www.majesticdocuments.com, the website repository for the Majestic documents maintained by Dr. Robert Wood and his son Ryan Wood. The Woods have had the Majestic documents posted free to the public for several years as part of their efforts to expose the public to this material.

Selection of the Majestic Documents for Testing

For authorship attribution testing to be undertaken, the document under question must have been attributed to some author. As such, only those documents among the Majestic documents that specifically bear the name of a signatory author were considered for testing. Famous Majestic documents such as the Eisenhower Briefing, for example, were not tested because there is no claim in the briefing as to the author of the briefing. Researchers and amateurs refer to the Eisenhower Briefing as though its authorship by Dwight D. Eisenhower was self-evident. The document itself makes clear that Eisenhower was not the author, as the very first page informs the reader that the briefing was âprepared for President-elect Dwight D. Eisenhower.â Another famous Majestic document not bearing an author name and therefore excluded from testing is the SOM1-01 manual for Extraterrestrial Entities and Technology, Recovery and Disposal. Additionally, the Einstein-Oppenheimer document could not be tested because it represents overlapping authorship.

Another criterion was applied to the list of documents that passed the initial litmus test of bearing an author name. The testing methods employed require that a document be more substantial than a couple sentences, and so length was an issue. The need for length notwithstanding, a document of this brevity that met the third criteria below would have been included in testing due to content importance. There was no instance, however, of a document of insufficient length being important enough in terms of content to still test that document. An example of a document too brief for testing would be the âMalcolm Grow to Lt. Gen. Twining â Aero Medical Laboratoryâ (20 September 1947), which is a single sentence.

The third criteria was pragmatic, and driven in part by cost considerations. Of those documents that bore a signature and were of sufficient length (more than a sentence or two), preference for testing was given to those documents that contained specific reference to the existence of extraterrestrial biological entities (EBEs) or claims of an extraterrestrial origin for salvaged wreckage. Any document that appeared important for validating the extraterrestrial hypothesis (ETH) as an explanation to UFOs was included in the testing. For example, a document that mentioned the retrieval or transport of wreckage from Roswell or some other event famous for its connection to the UFO question may have been deferred for testing if there was nothing in the document that specifically pointed to the ETH or an EBE. The mere mention of âRoswellâ or âWright Pattersonâ would not be sufficient to mandate testing. In brief, there has to be something compelling about the document for it to merit testing.

Fourth, some of the Majestic documents could not be tested because they contained no prose text. An example is the document entitled, âMajestic Twelve Project, Purpose and Table of Contents (Summer, 1952?).â This document is simply a table of contents. Even if a document of this nature had an attributed author, it could not be tested by linguistic means.

Lastly, documents that were clearly secondhand in nature were not chosen for testing. An example is the lengthy Bowen manuscript. While the Wood team labels this document as âhigh interest,â it is not written by a person who would be âin the knowâ with respect to the high levels of security needed to be a primary witness to either evidence for the ETH and EBEs or to discussions within Majestic Twelve. While it may be true that, as the Wood team states, âBowen was personally connected to many top people,â15 it defies coherence to argue, on one hand, that Majestic-12 and its activities were so secret that evidence of its existence only became available in the 1980s, and on the other, to suggest that members of Majestic-12 were sharing the nationâs most highly classified secrets with an outsider like Mr. Bowen. The secondary nature of the Bowen manuscript is acknowledged by the Wood team, as they note its status as âa well written snapshot of the public history of flying saucersfrom 1947 to 1954.â16 The operative word in this comment is âpublic,â which reveals its peripheral importance in terms of content.

Preparation of the Majestic Documents for Testing

The Majestic documents tested by Dr. Chaski were typed and proofed by Dr. Michael S. Heiser, Amy C. Ward, and Joe E. (âFreeâ) Ward, of Roswell, NM. Only the prose content of the documents was typed out for testing, along with salutations and benedictions. Date formulas, stamps, handwritten annotations, military file numbers, memoranda headings, etc. were not typed out since authorship attribution testing concerns the testing of written prose content for author-particular stylistics. Misspellings and ungrammatical errors in usage were preserved in the prose content reproduced for testing. Documents were saved as text (.txt) files.

The Majestic Documents Chosen for Authorial Verification

The following spreadsheet chart (Chart 1) contains the seventeen documents allegedly written by nine authors that were tested by Dr. Chaski. Unknown to Dr. Chaski, I included several documents previously demonstrated as fraudulent by Stanton Friedman (see Section 2.7). I did so to test Dr. Chaskiâs analysis independently. The identity of these fraudulent documents is revealed below under the test results.

Chart 1

– click image(s) to enlarge –

Documents of Verified Authorship Against Which the Majestic Documents Were Tested

Thirty documents whose composition by the nine authors to whom the Majestic documents were attributed served as the data pool for computational stylistic comparison.17 The chart below (Chart 2) reveals that these âknown authorâ documents were selected with

Chart 2

– click on image(s) to enlarge –

sensitivity to sameness of word and character count, genre, chronological era, and recipient. While the enterprise of authorship attribution by computational linguistic methods does not require sameness of subject matter for document comparison, several of the âknown authorâ documents contained similar subject matter (e.g., space technology). In some instances, the âknown authorâ document references an event in one of the unverified documents (e.g., the 1942 Los Angeles sighting).

Overview of the Linguistic Testing Methods Used in the Study

The material in this section draws heavily upon the peer-reviewed article by Dr. Chaski.18

Dr. Chaski explains that, when it comes to document attribution in the legal world, methods for determining authorship âmust work in conjunction with the standard investigative and forensic techniques which are currently available.â19 Determining authorship of a typewritten document, whether originally or subsequently put into electronic form, can be approached three ways: â. . . biometric analysis of the computer user; qualitative analysis of âidiosyncrasiesâ in the language in questioned and known documents; and quantitative, computational stylometric analysis of the language in questioned and known documents.â20

With respect to the Majestic documents, the first method is not possibleâthere is no way to analyze actual keystroke pattern dynamics. This method is technically non-linguistic. The second method âassesses errors and âidiosyncrasiesâ based on the examinerâs experience.â21 This method also has the disadvantage of requiring the pre-existence of a stylistic database against which to measure presumed idiosyncrasies. Chaski elaborates:

This approach, known as forensic stylistics, could be quantified through databasing, as suggested by McMenamin (2001), but at this time the databases which would be required have not been fully developed. Without the databases to ground the significance of stylistic features, the examinerâs intuition about the significance of a stylistic feature can lead to methodological subjectivity and bias. Another approach to quantifying is counting particular errors or idiosyncrasies and inputting this into a statistical classification procedure. When the forensic stylistics approach was quantified in this way by Koppel and Schler (2000), using 100 âstylemarkersâ in a Support Vector Machine (Vapnik 1995) and C4.5 (Quinlan 1993) analysis, the highest accuracy for author attribution was 72%.22

The third approach, stylometry, âis quantitative and computational, focusing on readily computable and countable language features, e.g. word length, phrase length, sentence length, vocabulary frequency, distribution of words of different lengths.â23 Stylometric analysis also may include analysis of function word frequency and punctuation.24

As one of the leaders in the field of the development of authorship attribution techniques that meet legal standards for evidence, Dr. Chaski has developed âa computational, stylometric method which has obtained 95% accuracy and has been successfully used in investigating and adjudicating several crimes involving digital evidence.â25 Chaski elaborates on her method (ALIAS26):

[My] syntactic analysis method (Chaski 1997, 2001, 2004) has obtained an accuracy rate of 95%. The primary difference between the syntactic analysis method and other computational stylometric methods is the syntactic methodâs linguistic sophistication and foundation in linguistic theory. Typical stylometric features such as word length and sentence length are easy to compute even if not very interesting in terms of linguistic theory, but the more difficult to compute features such as phrasal type are also more theoretically grounded in linguistic science and experimental psycholinguistics.27

As noted above (Sec. 1.3), with respect to the Majestic documents, Dr. Chaskiâs testing was not as thorough as it could have been due to expense. Variations on the capabilities of ALIAS were employed to test the Majestic documents. The testing is therefore referred to as preliminary in this paper. Future testing will allow a full exploitation of the capabilities of ALIAS.

Specifically, the method employed in this initial round of testing by Dr. Chaski was an ângramâ approach. N-gram approaches involves pattern detection of a specific number (n) of parts-of-speech labels or words in sequence. Once these sequences are found, they can be sorted by similarity.28 (Chaski, âKeyboard,â 5). In regard to her own pioneering techniques in the fieldâwhich were used for testing the Majestic documentsâDr. Chaski noted:

âN-gram approaches for author identification have been very successful on large documents, approaching 98% accuracy verified. I wanted to make sure that an n-gram approach would also work on short documents. Another problem is that some n-gram approaches are very biased toward document length, so the wordier person always gets selected as the author. I was able to fix both problems and get ~90% accuracy on short documents with verbose known authors not being favored over concise known authors or vice versa. The exact details are proprietary, as this is a real advance in the field.â29

One final word on the testing enterprise is necessary. It is acknowledged that many of the Majestic documents were not handwritten or even typed by the author to whom they are attributed. The typical practice, especially for presidents, would be to verbally dictate the content of correspondence to a secretary who would type and reproduce the content. This reality is not at odds with Dr. Chaskiâs testing methods since memoranda and correspondence are not be produced by distinct psycho-linguistic processes. In other words, there is no significant linguistic difference between dictating a letter as one would desire it be written and the mental connection to the act of typing those thoughts oneself.

– end part I –

(pt II pending)

* Special Thanks To Dr. Michael S. Heiser

1).See the chronological listing of the reception of the Majestic documents reconstructed by Dr. Robert Wood and Ryan Wood, http://www.majesticdocuments.com/sources.php, accessed June 5, 2007. A table summary of the circumstances of the source and provenance of each document can be found in Dr. Robert Wood, âMounting Evidence for the Authenticity of MJ-12 Documents,â paper presented at the International MUFON Symposium, Irvine, CA; July 21, 2001, 5. Accessed at 209.132.68.98/pdf/rmwood_mufon2001.pdf on June 5, 2007.

2). Shandera was one of the early recipients of the Majestic documents.

4). Stanton Friedmanâs website biography reads in part: âStanton Friedman received the BSc and MSc degrees in physics from the University of Chicago in 1955 and 1956. He was employed for 14 years as a nuclear physicist for such companies as GE, GM, Westinghouse, TRW Systems, Aerojet General Nucleonics, and McDonnell Douglas on such advanced, classified, eventually cancelled, projects as nuclear aircraft, fission and fusion rockets, and nuclear powerplants for space.â Accessed at www.v-j-enterprises.com/sfbio.html on June 5, 2007.

5). Dr. Robert Wood holds a B.S. in Aeronautical Engineering from the University of Colorado and a Ph.D. in Physics from Cornell University. He spent 43 years in research and development with Douglas Aircraft and McDonnell Douglas before retiring in 1993. Ryan Wood holds a B.S. in Mathematics and Computer Science from California Polytechnic State University at San Luis Obispo. He has held various positions in marketing, consulting, and sales for Intel Corporation, Digital Equipment, and Toshiba.

13).International UFO Reporter, vol. 13, no. 4, July / August 1988, p. 19. Cited by Paul Kimball, âMJ-12 â The Wescott âAnalysisâ Red Herring,â The Other Side of the Truth, July 14, 2005, accessed at redstarfilms.blogspot.com/2005_07_01_archive.html on June 6, 2007.

14). Dr. Chaski holds an M.A. and Ph.D. in linguistics from Brown University. Computational linguistics is one of her specialties, and her work in this field has been recognized and validated through peer review, numerous legal cases, and scientific grant funding. See www.linguisticevidence.org/FLCV.htm.

17). Numbers and names of these documents were invented by Dr. Heiser as a means of categorization. The âknown authorâ documents in the spreadsheet above were drawn from a larger number of possible documents.