Wegman report update, part 2: GMU dissertation review

Several posts in past months have highlighted highly questionable scholarship in the 2006 Wegman report on the “hockey stick” temperature reconstruction (and revelations of much more will come soon, with the imminent release of John Mashey’s massive analysis). Today I present yet another analysis of background material of “striking similarity” to antecedents, this time found in a trio of dissertations by recent George Mason University PhD students under the supervision of Edward Wegman.

Wegman Report co-author Yasmin Said’s 2005 dissertation on the “ecology” of alcohol consumption appears to presage some of the questionable scholarship techniques employed in the Wegman Report. And later dissertations from two other Wegman proteges, Walid Sharabati (2008) and Hadi Rezazad (2009), both have extensive passages that follow closely Wegman Report’s social networks background section, which in turn is based on unattributed material from Wikipedia and two widely used text books. Thus, as in the case of Donald Rapp, there appears to be serial propagation of unattributed, “striking similar” material. Astonishingly, all three Wegman acolytes were honored with an annual GMU award for outstanding dissertations in statistics and computational science. However, a closer look betrays not only scholarship problems in the work, but clear failure in the PhD supervision process itself.

It may also be that some heat is being felt behind the scenes. For one thing, Said’s 2005 dissertation was recently deleted from the George Mason University website. And around the same time, most traces of Said’s eye-opening presentation on the Wegman panel process [PDF] were also deliberately removed. That appears to be a clumsy attempt to cover up embarrassing details about the U.S. House Energy and Commerce Committee 2005-2006 climate investigation, including the key role of Republican staffer Peter Spencer, Representative “Smoky” Joe Barton’s long time point man on climate change issues. (These disappearances were pointed out to me by the ever-vigilant John Mashey).

Before diving into the details of Wegman’s proteges, here is a statistical summary of the material in section 2 of the Wegman report, found to be “strikingly similar” to various unattributed sources.

The section column has links to each relevant post, along with links to the detailed side-by-side comparisons. The other columns list total word count (WC), along with percentage of identical words (ID), identical plus trivially changed words (ID+TC), and overall percentage of “strikingly similar” material (SS). The final column gives the average block length (BL), i.e. average length of all exactly identical phrases.

Yasmin Said: In the beginning

Yasmin Said’s 2005 dissertation, Agent–Based Simulation of Ecological Alcohol Systems, seeks to “establish a modeling framework for alcohol abuse that allows evaluation of interventions meant to reduce adverse effects of alcohol overuse without the financial, social and other costs of imposing interventions that are ultimately ineffective (or even simply not cost effective)” [Cached Google doc version].

I’d skimmed the dissertation and had the impression of a less than fully-developed concept with little practical application. As there was no apparent connection to the Wegman report, I gave it no further consideration.

Previously Prof. D. Climate found that many strikingly similar passages in the Wegman Report are in the background sections. These passages are characterized by minor alterations to large blocks of text copied from other sources together with insufficient attribution. With this in mind, I had a quick look at Said’s Ph.D. thesis, “AGENT-BASED SIMULATION OF ECOLOGICAL ALCOHOL SYSTEMS” and observed a similar pattern of strikingly similar text.

“Terry” went on to note that Said’s background section on alcohol “follows both the structure and phrasing” of the web page Chemical of the Week: Ethanol by University of Wisconsin professor Bassam Shakhashiri “extremely closely”.

Notice how many of the same words have been used, but have now been split up into single words or two-word phrases. This reworking has even introduced an obvious error as the revised sentence has classified potatoes and corn as forms of a mysterious group called “wheat plants”.

Shakhashiri concludes that same paragraph:

Thus, the germination of barley,called malting, is the first step inbrewing beerfrom starchy plants,such as corn and wheat.

Said has slightly reworked this:

… the germination of barley, which is therefore required to be the first step inproducing alcoholfrom starchy plants (Petrucci, 2001; Shakhashiri, 2005).

Here the rephrasing has introduced yet another error. Of course, there are other methods of producing alcohol from starchy plants. The germination of barley is used in the production of beer, but not in production of vodka from potatoes . And this paragraph ends with a strange double citation, a pattern repeated for some of the paragraphs in this section, although most paragraphs have no citation at all.

Here are a couple more examples, both of which appear in paragraphs without any citation.

Said:Impairmentof brain functions for most people begin tobecome noticedat around a blood alcoholpercentage of0.05 …

Interestingly, Said decided to remove the one explicit reference to alcohol acting as a drug, even though this fact would appear to be at the center of the motivation for research, and is discussed at length in the abstract.

The final example comes in the penultimate sentence (once again, this final paragraph is unattributed).

Here, the order of the identical wording has been preserved. But the amount of trivial changes (no less than six!) is truly astonishing:

Above -> at a level of

of the brain -> the brain’s

or -> and

beating action -> pumping

be -> become

death -> their impediment

And what can say about that last change? The reference to the mortal consequences of alcohol overdose has been transformed into “impediment” of vital organs. It’s doubtful that this should be construed as extreme scholarly understatement, but rather appears to be yet another example of error introduced by the poorly conceived changes. In any case, it’s a howler for the ages (once again, hat tip to John Mashey).

The GMU Writing Center weighs in

Overall, nearly 50% of the wording in Said is identical to the Shakhashiri source, with another 20% involving trivial changes of the sort highlighted in yellow above, and the rest consisting expanded verbiage that adds little of substance.

First, for paraphrasing, a good idea is to read the original, make sure that you understand it, lay it aside, and then write it down in your own words imagining that you are explaining it to someone who will read your paper. If you are having trouble putting it into your own words, then you probably don’t understand it well enough to write about it. When you are finished, cite the author according to the style you are using.

Always remember, borrowing (both language and syntax) too heavily from a source, even if you cite it, is plagiarism. A good thing to keep in mind is to use no more than two of the author’s original words.

Clearly, Said did not follow any of this sage advice. Here is a telling example given by the GMU Writing Center. First, here is the original:

The park [Caspers Wilderness Park] was closed to minors in 1992 after the family of a girl severely mauled there in 1986 won a suit against the county. The award of $2.1 million for the mountain lion attack on Laura Small, who was 5 at the time, was later reduced to $1.5 million. – Reyes and Messina, “More Warning Signs,” p. B1.

And now here is one attempted example at paraphrase that falls short, with identical and trivially changed wording marked in the same manner as I have done above:

Reyes and Messina report that Caspers Wilderness Park was closed tochildrenin 1992 after the family of a girlbrutallymauled there in 1986suedthe county. The family wasultimatelyawarded$1.5 millionfor the mountain lionassaulton Laura Small, who was 5 at the time (B1).

This example has a proper citation and is rather short. Yet it still is a clear example of unacceptable paraphrasing rising to plagiarism.

In Said’s case, the entire section of five pages consists of a similar sort of supposed paraphrasing strung together from one source. The source is cited at the end of some of the paragraphs, but not the majority of them.

Moreover, Said has made no effort to distill, as it were, her chosen source into a summary of material actually relevant to her own study, choosing instead to indulge in what “terry” rightly called a “minimal rewrite” of the entire original article.

All of this eerily presages the Wegman Report background sections, where, for example, section 2.3 on social networks contained much copied unattributed material that was clearly irrelevant (or at least not adduced in any way) in Wegman et al’s use of SNA to characterize co-author relationships in paleoclimatology.

Nevertheless, the possibility exists that Said (or even Wegman himself) does not understand that this kind of “scholarship” is entirely unacceptable. That would be very sad indeed.

Walid Sharabati joins the team

At the time of the Wegman report, Walid Sharabati was an up-and-coming Wegman protege; after receiving his PhD he went on to a position at Purdue University. Although he was not involved in the report itself, Wegman tapped him to produce an appendix to Wegman’s supplementary congressional testimony, a written response to supplementary questions from Rep. Bart Stupak [PDF 2.7 Mb].

In the original report, Wegman et al had claimed that “hockey stick” author Michael Mann’s wide social network of co-authors, including several connected co-author “cliques”, implied that these same co-authors were reviewing each others’ work, to the detriment of the quality of paleoclimatology research. Wegman’s response to Stupak further claimed that Wegman’s own social network of co-authors (many of whom were ex-students) exhibited a so-called “mentor-scholar” style of co-authorship that was much less prone to this problem than Mann’s “entrepreneurial” style.

The claim seems absurd on its face, as Mann’s seminal (if controversial) 1998 and 1999 papers were written before he had ever collaborated with most of the co-authors in his “social network”.

Of all the work that has been done on social networks, very few investigators have considered coauthorship network. Therefore, what we are about to observe in this paper is a brand new approach in the social networks field.

Eventually, this highly speculative and weak social network analysis of co-author relationships was developed into an article and submitted to the journal Computational Statistics and Data Analysis in mid-2007. Lead author Said was joined by Wegman, Sharabati, and yet another Wegman protege, John Rigsby, who had performed the original analysis of Mann’s co-author network. Social Networks of Author–Coauthor Relationships was published in early 2008.

Exploratory Social Network Analysis with Pajek by W. de Nooy, A. Mrvar and V. Batagelj (2005)

Not only that, but the background material was largely irrelevant to the article’s analysis, and the analysis no less fatuous and unconvincing than the original found in Wegman et al.

Yet despite these problems, not to mention the veritable paucity of citations, the article sailed through peer review in six days! Perhaps Wegman’s presence on the journal’s advisory board and Said’s previous tenure as an editor had something to do with the lightning acceptance. In any event, as John Mashey has pointed out, the article and its history stand out as an excellent example of a self-refuting paper.

In fact, I have identified only two small changes in the actual text of Sharabati’s section 1.1, compared to the earlier Said et al.

First, Sharabati’s version has no attribution whatsoever, just like the Wegman material it was based on, whereas Said et al has added a citation of Granovetter (1973) in a section discussing “weak ties”.

The second small difference is quite telling. Here is a passage mostly identical to Wasserman and Faust:

Social tieslinkactors to one another. The range and type of social ties can be quite extensive. A tie establishes a linkage between a pair of actors. Linkages are represented by edges of the graph. Examples oflinkages include the evaluation of one person by another (such asexpressed friendship, liking, respect), transferof material resources (such asbusiness transactions, lending or borrowing things), association or affiliation (such asjointly attendingthe samesocial event or belonging to the same social club), behavioral interaction (talking together, sending messages), movement between places orstatuses[states]{statues}(migration, social or physical mobility), physical connection (a road, river, bridge connecting two points), formal relationssuch asauthority and biological relationships such as kinship or descent.

As before, we see the same pattern of identical text (highlighted in cyan), interspersed with truly trivial changes (highlighted in yellow), all carried over from Wegman et al. The phrase “for example” was carefully changed to “such as” four times. That suggests that Said may have done the original rendition found in the Wegman report.

However, Wegman et al had the nonsensical “movement between places or statues” instead of Wasserman and Faust’s “movement between places and statuses” (a mistake treated with much-deserved derision last time I discussed this).

In Said et al, this was “corrected” to read “movement between places and states” (shown in square-brackets above). But notice that in Sharabati’s dissertation this has reverted back to “statues” (in curly-brackets).

One plausible explanation is that Sharabati performed the reduction from Wegman et al for Said et al, and then used his own reduction in his dissertation. Meanwhile, someone else, possibly Said herself, added the Granovetter citation and attempted to correct the obvious error in “statues”.

Sharabati did redeem himself somewhat by omitting the Said et al passage on centrality and writing his own section on this concept, properly citing Wasserman and Faust and even relating the section to susbequent analysis.

Nevertheless, Sharabati’s section 1.1 is noteworthy as a possible example of apparent triple serial plagiarsim, a unique occurrence to my knowledge. And it’s also important to observe that in this case, both Wegman and Said played the role of mentor, as they acted as Sharabati’s joint dissertation advisors. Indeed, it is hard to avoid the conclusion that the supreme weakness of the “mentor-scholar” style of co-authorship, at least as practiced at at Computational Data Sciences department of GMU, has now been exposed.

Through this work, I develop a novel method to assess and improve the robustness and efficiency of computer networks. This method uses computer network analysis, social network analysis, evolutionary computing, statistical methods, and graph theory. Specifically, my aim is to achieve enhanced network robustness and efficiency with a primary focus on architecture and topology of networks.

In the scholarly literature, social network analysis has generally been applied to computer networks in such intuitively apt ways as studies of collaboration via the internet, or development of security protocols based on trust.

Curiously, though, the only article cited as a previous example of SNA applied to Rezazad’s focus is a brief article from d’Ambrosio and Birmingham, Achieving agent coordination via Distributed Preferences. As Rezazad explains:

A research effort (D’Ambrosio, et al, 1996) for applying social network analysis to Local Area Network (LAN) topologies takes a close look at the network activity, betweenness, and centrality. Here, a link between a pair of nodes represents a bidirectional information flow or knowledge exchange between two nodes. The total number of direct connections of a node is referred to as the degree of that node.

But as the title implies, the cited article discussed “agent-based systems”, in this case focusing on concurrent engineering, not LAN topologies. The relevance is unclear, and in any event there is no reference in the actual article by d’Ambrosio and Birmingham to SNA nor to SNA concepts such as centrality and betweenness. It is possible that Rezezad meant to cite some other work as Rezezad cites “d’Ambrosio et al”, but only lists d’Ambrosio and Birmingham; however, chasing down these citations and references is beyond the scope of my present analysis. Suffice it to say that Google scholar returns no hits for a search on the author”J d’Ambrosio” and the term “centrality”. And the same reference is cited, more sensibly, in Rezazad’s later discussion of “agent-based” systems.

The side-by-side comparison leaves no doubt that these sections have largely reproduced Wegman et al (which needless to say is not cited or listed as a reference). Of the three underlying sources, only Wasserman and Faust is listed as a reference, albeit with the mistaken date of 1999.

As before, I use highlighting to show the relationship to unattributed antecedents, as well as curly brackets {} for additions by Rezazad and square brackets [with strikeout] to show material in Wegman, but omitted by Rezazad. The opening of Rezazad (and Wegman) is one of the very few passages not copied from any of the three sources, as far as I can tell.

{Networks are useful mechanisms for modeling and understanding
the existing relationships in the world.}

Networks operate anywhere that energy and information are
exchanged: between neurons and cells, computers and people,
genes and proteins, atoms and atoms, and people and people.
{ (Wasserman, 1999) }

Oddly, this opener from Wegman is the one sentence for which I could not find an antecedent; yet, it is the only one where Rezazad has added a citation! (The bibliography lists Wasserman and Faust, not Wasserman, and the date appears to be mistaken).

For the rest, Rezazad, has been careful to change many (but by no means all) references to “persons” or “people” to “actors” or “nodes”; either that, or simply omit occasional sentences that are an especially poor fit. In some cases, a sentence or two attempting to tie the material to computer network topology has been added, as in the following definition originally from Wasserman and Faust, via Wegman et al:

Actor: Social network analysis is concerned with understanding the linkages among social entities and the implications of these linkages. The social entities are referred to as actors.Actors do notnecessarily havethe desireor the ability to act. Most social network applicationsconsider a collectionof actors that are all of the same type. These are known asone-mode networks.

{ In the domain of computer networks, an actor is a network
component, which may be a server, hub, a router, or a workstation.}

Since the concepts of centrality and closeness are actually used in subsequent analysis, Rezazad has made a special effort to eschew references to “people” (or even worse the second person “you”) , in favour of “actors” and “nodes” (p. 17):

The concepts of vertex centrality and network centralization are best understood by considering undirected communication networks. If social relations are channels that transmit information between [people], {actors}central [people]{actors} are those[people]{actors} whohaveaccess to information circulating in the network or who may control the circulation of information.

Closeness – The accessibility of information is linked to the concept of distance. If [you are]{ a node A is} closer to the other [people]{nodes} in the network, the paths that information has to follow to reach [you]{node A} are shorter, so it is easier for [you]{node A}to acquire information.

And Rezazad (and possibly even Sharabati) may not have realized there was anything problematic with the provenance of the large swathes of unattributed background material. It could even be that Wegman or Said actually encouraged the wholesale use of this material of dubious provenance from the Wegman report.

That makes the apparent failure of PhD supervision even worse; quite simply, Wegman and Said have failed to uphold, much less instill, minimal standards of scholarship. The reputation of the current cohort of PhD students at the Center for Computational Statistics and Computational Data Sciences, and indeed George Mason University itself, will undoubtedly be damaged by their actions.

The pattern of shoddy scholarship outlined here can not be excused or easily explained away. Such a pattern not only bespeaks a lack of integrity, but also a willingness to cut corners and substitute ignorance, obfuscation and incompetence for diligent scholarship.

And now that the sad truth is emerging, it would seem that an attempted cover up is under way. Not only was Yasmin Said’s dissertation removed from the GMU website after the initial revelations of its dubious scholarship, but a key Said presentation on the Wegman panel was also recently excised.

That talk was the very first event in the GMU Fall 2007 Statistics Colloquium Series. The original colloquium web page linked to the abstract and presentation slides, but on August 20, all references to the September 7, 2007 talk were suddenly removed from the revised colloquium web page.

No wonder Wegman and Said received a “bad invitation” from top administrators at GMU “to explain our testimony”. Perhaps it is time for another such meeting, in light of the mounting evidence of shoddy scholarship and poor leadership exhibited by Wegman and Said.

GMU can no longer ignore the obvious. Two faculty members have not only exhibited a lack of scholarly standards, but they have also participated in an unscrupulous attack on climate science and scientists, part of a blatantly partisan and dishonest effort to mislead the U.S. congress, and the public at large.

LOL…It isn’t just Wegman who figured out that Mann’s scholarship was terrible. That is obvious for anyone who cares to look at the statistical methods and the data being used. Why is it that the AGW movement can’t accept the fact that the Hockey Stick is dead and move on to some of the areas where it has a chance of being convincing?

1) re: triple serial plagiarism
Well, it might not be that rare, given that
original => SWSR => {Sharabati, Rezzazad} is a pair of triples
Alterrnatively, tehre is
original => (internal version) => SWSR
(internal version) {Sharabati, Rezazad}

2) An odd effect in Rezazad (and earlier in Rigsby’s work, and I think he’s currently doing PhD at GMU)me is the seeming attempt to jam a bunch of SNA terminology into work where it isn’t particularly relevant and doesn’t even get used that much. It is really weird to copy terminology like dyad and triad and never use them.

It is really weird to use SNA terminology for *computer* networks, not for the human networks that happen to use computers, i..e, including the “social network applications” people now use. Really, graph theory has been around a long time, and people have been using to model computer networks since they’ve existed. For example, I grabbed my wife’s Andrew S. Tannenbaum, Computer Networks, 1981 off the shelf. Chapter 2, “Network Topology” has nodes and networks, but uses the typical graph theory terminology. It doesn’t try to call nodes actors. It was hardly new then. It certainly goes back many decades, before modern computers, into telephone switching machines and their topologies. I worked at Bell labs 1973-1983. This wasn’t new then.

3) Sharabati has a similar citation/reference problem to that of the Wegman report: half (or more) of the references are uncited.

4) Without commenting on the *quality or novelty* of the most of the work in Rezazad or Sharabati, (that takes way more expertise to check), both of them clearly spent a lot of time.
They each wrote 200+-page dissertations covering much ground. This is pretty sad.

They each plagiarized a few pages text that was (marginal) for Sharabati and irrelevant/distracting for Rezazad. Quite possibly, both will end up with PhD’s revoked, and while they hardly seem blameless, this is *not* good PhD supervision, especially since it almost seems they were “pushed” into this.

The NAS and Wegman came to the same conclusion about the statistical methods and use of bristlecones and Foxtail Pines. You can’t get around the fact that Mann’s paper reached conclusions that were not based on the evidence by attacking Wegman for something else that may or may not be valid. Stick to what is material and make your case there. Unless you can show how short centering is appropriate, why stripbark proxies should be used, why it is OK to backfill missing data and cut off data series when real data is available, etc., etc., etc., you have no case and can’t defend Mann’s fraud.

WHAT?
This article has to do with Wegman et al. [edit] articles and Phd Theses.
nice try at diverting the discussion without providing any evidence.
It is obvious from your comments that you have not stopped beating your wife. TRUE or FALSE!!!
SEE? I too can make comments pulled out of the hat.

Oh look, yet another Hockey Stick! This one is from Thibodeau et al. (2010, GSL), “Twentieth century warming in deep waters of the Gulf of St. Lawrence: A unique feature of the last millennium”

Their conclusion:

“We conclude that the 20th century warming of the incoming intermediate North Atlantic water has had no equivalent during the last thousand years.”

And the folks at ClimateAudit [edited] say that they do not know whether or not we have a problem.

If Mann et al. had committed fraud they would have been found out by their peers long before McIntyre and Barton arrived on the scene. Funny how Mann et al. keep getting vindicated (NAS, PSU, HoC, Muir etc.), but some are just so obsessed with ideology that they can’t see the forest for the trees or is that the trees for the forest of political spin?

The really juicy story here is the convincing case for plagiarism committed by groups with ties to McIntyre.
[DC: That looks to be an interesting paper. If you post some commentary on it on the Open Thread, no doubt some will comment on it there (hint, hint). ]

But I’ll briefly respond to Vangel before moving on and returning to the actual topic under discussion.

Both the NAS and the later Wegman report agreed that “short-centred” PCA was inappropriate and should not have been used. The NAS also agreed that firm probabalistic statements about specific years or specific decades in temperature reconstructions could not be sustained.

However, the NAS disagreed with Wegman in several important respects.

– The NAS was careful to note that other proxy studies, whether using corrected PCA (Wahl and Ammann) or non-PCA methodologies, arrived at similar findings of anomalous late 20th century warmth. The NAS also noted several other lines of evidence for this assertion.

– The NAS recommended that “strip-bark samples” be “avoided”, implying that the bristlecone and foxtail proxies could yield useful climatic information, if properly handled.

– The NAS overview of paleoclimatological methods and findings avoided the errors and bias evident in the Wegman report.

– Lead author Gerald North vehemently disagreed with Wegman’s assertion of the failure of peer review in paleoclimatology. North also denigrated the social network analysis on which the assertion was based.

Finally, there is no evidence whatsoever of fraud on the part of Michael Mann. But in the case of Wegman and Said, there is ample evidence of questionable scholarship, apparently rising to research misconduct.

The latter is the subject of this thread, and I expect all future comments to address this issue or else be subject to moderation at my discretion. Thanks!

There may well be extenuating circumstances for Rezazad, and even for Sharabati. The unattributed material on social network analysis was not crucial to the dissertations, and there is not evidence of a wider pattern elsewhere in the dissertations or other work.

[DC: This comment ignores my previous warnings. It is off-topic, full of false accusations and way too long. You’ll have to discuss Wegman’s cherrypicking and misinterpretation of Wahl and Ammann somewhere else. Thanks! ]

Of the 3 PhDs, Said is pretty clear.
1) and I recommend going back and reviewing her Sept 2007 talk that has been disappeared. Of course, that didn’t quite happen… and actually, someone missed another seminar list:
the Washington Statistical Society, and happily, it even links to an abstract, which is informative:
“Rarely does the federal government need advice on theoretical statistics. I would like to talk about one exception. Efforts to persuade Congress to enact legislation that affects public policy are constantly being made by lobbyists who are paid by special interests. While this mode of operation is frequently extremely effective for achieving the goals of the special interest groups, it often does not serve the public interests in the best possible way. As counterpoint to this mode of operation, pro bono interaction with individual legislators and especially testimony in Congressional hearings can be remarkably effective in presenting a balanced picture. The debate on anthropogenic global warming has in many ways left scientific discourse and landed in political polemic. In this talk I will discuss our positive and negative experiences in formulating testimony on this topic. ”

2) And every time I look at that, more things pop up. we’d noticed a while back that the WR used a distorted version of the long-obsolete 1990 IPCC FAR graph. By chance, it over-emphasized the MWP. They didn’t have a copy of the FAR, so got it from someone else. I always wondered if it was distorted before or afterwards.

Slide 8 of Said’s talk has a copy of the 1990 IPCC FAR temperature sketch, the right one.

Hmm DC
What this reminds me of is forensics performed in trying to determine if computer software has been copied from a previous employer and used at a new company. I have been involved in analyzing some code in support of legal actions for copyright infringement based on this. At what point does plagiarism become copyright infringement? As I understand the current laws, just the publishing of your text invokes copyright so that you now own what you have written. The fact that you change a few words, does not invalidate the ownership of the original copyright holder.

eg. in one famous case, the infringment was determined by the tabs/spaces found at the beginning of the lines of code, which were identical, even if the code had had variables etc changed.

Hmm, this is interesting. I guess, given the level of scholarship in Wyner & McShane, that this sort of problem may be endemic to the sort of “scholarship”

It’s sort of like the old cliche: “wer’re from the statistics department, and we’re here to help”. There seem to be a lot of statisticians who are more than willing to leap into a field w/o understanding the science, the data problems, or the questions the scientists in a given field (mostly climate science) face. I’d like to see Mashey (he has just a wee bit of expertise in this area) comment more on network architecture and SNA. This seems a rather odd application of SNA and as he points out graph theory has been the basic method for analyzing network architecture for, well, pretty much since computer networks were invented. Does SNA add anything?

I suppose this add fuel to the ASA ethics requirement that the statistician should understand the underlying science. Wegman (and his students) clearly did not.

1) Recall that last year I found 200+ physicists willing to sign that silly Austin/Happer/Singer anti-AGW petition for the American Physical Society.

2) I don’t know that statisticians are any more ready to jump into unfamiliar turf than physicists. After all, so far we’ve only identified a very small set of them, basically Wegman & a few of his students, and McShane Wyner.

3) The last thing in the world we need is the “faux fight between statisticians and climate scientists” that Wegman tried to start. in any acse, ASA has some good folks, and I recommend ASA Workshop 2007.
Jim Berger’s talk, the part on role of statisticians in climate, is good.
In fact, all but one of the talks is good.

On the other issues around SNA, I’ve written a few pages on that, Coming Very Soon.

DC: — “Experiences with Congressional Testimony: Statistics and The Hockey Stick gave an eye-opening account of the Wegman panel saga.”

Interesting presentation. Does anyone know who dropped out and why they dropped out (page 5)?

Page 6: “None of our team had any real expertise in paleoclimate
reconstruction,…”, meaning NO expertise in paleoclimate reconstruction, surely?

Page 8: “The 1990 IPCC report showed a very different
curve with a warmer-than-current period from
1000 to about 1450.” Wasn’t that based on Central England alone? Was this pointed out orally at the presentation?

On page 20: “This was obviously coached by the “Hockey Team” asking very detailed statistical questions.” Big deal. Were the questions invalid? Not likely.

Nice to see she got a load of contacts, though. It was obviously good for her career.

Does anyone on the contrarian side of the fence take this climate s**t seriously?

Why yes, it does. However Dr Said didn’t feel it was important enough to mention on the slide that apparently these preferentially selected “hockey stick” shapes are *much* smaller than the one in Mann et al, and that they are equally likely to have the blade pointing down as pointing up. If this is true and she didn’t mention it, one wonders about the “honest broker” bit…

Hi,
It isn’t just Wegman who figured out that Mann’s scholarship was terrible. [etc. ….]

[DC: Here’s a bit of comic relief. This comment reproduced a previous one from Vangel, and its URL (now removed) pointed to a website called DissertationMaster, where you can get “dissertation writing help” and even “a list of suggested topics”. ]

Yes, I believe that the Vangel’s comment is a perfect example of the bandwagon technique.

How is that? I figured out the problems with the Mann paper years ago and was arguing that the verification statistics showed that there was no ‘there’ there. It is the AGW clowns who are jumping on bandwagons. They were unable to argue against the statisticians’ critiques of the MBH papers so they tried to smear them by making up charges that divert attention from the actual issues.

[DC: You are grossly mistaken. There are several substantive critiques of M&M and Wegman. For example, M&M and Wegman never addressed the need for an objective criterion for retention/selection of principal components, an issue I examined (among many others) in my discussion of McShane and Wyner.

And the poor and biased scholarship of the Wegman report has now been amply supported by reams of evidence. Should we believe you, or our own “lying eyes”? ]

> That is obvious for anyone who cares to look at the statistical methods and the data being used. Why is it that the AGW movement can’t accept the fact that the Hockey Stick is dead and move on to some of the areas where it has a chance of being convincing?

The first sentence amounts to say: we won. The second sentence amounts to say: if you join the AGW movement, whatever that is, you’re a loser. The bandwagon technique relies on the fact that people wants to be the part of winners, not losers. The bandwagon here is the anti-AGW movement.

There are lots of entailments that are presupposed by that bit. The most important one is that Mann’s work is essential for the AGW movement to stay alive. So we have the usual “we broke the hockeystick, so we won” meme.

Aside: DC, I wasn’t criticizing your moderation, which is commendably patient — you give people the chance to think and improve what they write, no matter how poor their initial posts.
I was noting the persistently poor posts by ‘vangel’ — the same userid (though we can’t tell if it’s the same person) has been trolling and gotten moderated down on other blogs, on both climate and US politics, in the last few years.

Although it’s not clear, I think you are referring to the 90 or so proxies used by McShane and Wyner in their reconstruction back to 1000. That’s because only 90 are available back to that century. (Alternatively one can build the reconstruction in “steps” using progressively fewer proxies as one goes further back).

Could someone provide a bit of clarity on what appears to be the core textual analysis technique being used in this post?

AFAIK, we’re dealing with comparison of original and paraphrased texts. The complaints seem to be a) Too much similarity in the paraphrase; b) insufficient citation/credit given.

The three confusing things I see are:
a) When paraphrasing a factual data-based statement, I would expect the word count of “data communication” to be passed through pretty much intact. Wouldn’t analysis of ANY such brief section produce a similar result? OR… is the unstated complaint that a paraphrase should cover a larger section of text than a paragraph?
b) If GMU’s own example fails your test… I wonder if you have ever applied this test to a variety of other “training texts”? After all, if your test fails on the instructional examples provided to students, is it any surprise that the students’ papers produce a similar result?
c) AFAIK, each “chunk” of material bringing in information from a source should properly cite the original. Two questions about situations like this where there are a number of citations: 1) with lots of citations, is it still required for the “citation rate” to be 100%, or are you just being picky? and 2) what do we see in other typical papers?

My bottom line query: we have incredibly powerful textual analysis tools today, to tear into written material… before we can confidently claim these tools have found something significant, it seems to me they need to be applied to a variety of similar original/paraphrased texts to understand what ought to be expected in the first place.

I had no real question about this until I saw that your technique produced a “fail” even for the GMU “good” example. At that point I began to wonder if the problem lay more with your measuring stick than the thing being measured :)

I’m afraid you have misconstrued the GMU Writing Centre example (which, by the way, was a constructed example, not student work).

The writing centre proffered this as an example of improper paraphrasing rising to plagiarism. The point of the Writing Centre example is that using mostly the same wording in a paraphrase is unacceptable, even if the citation itself is properly done, and even if the paraphrase is relatively short.

All I did was to show clearly why this example is plagiarism. Obviously, if the GMU Writing Centre considers that example plagiarism, there can be little doubt about their finding on the Said passage.

In general, use of key technical terms is sometimes inescapable, so when I was doing the exhausting analysis of the Summaries, I did not think “plagiarism” just because I saw a few of the obvious terms. Since I was looking at Summaries, whose sources were identified, I started with a slightly different, and more restrictive approach, to be as conservative as possible, and give them every benefit of the doubt.

Mine worked like this:
1) Use a manual approximation to the “longest common subsequence” problem, i.e., UNIX “diff” looking for *exact* word matches in order when comparing the WR versus the antecedent, i.e., ID in cyan. The common technical terms were generally nowhere as useful in locating the antecedent sentences as were unusual words. However, inescapable technical terms adjacent to a lot of cyan got marked cyan as well, since it was obvious,

2) I gave them the benefit of the doubt on inescapable common words.
I never started with such.

3) Obvious Trivial Changes, in order, were marked in yellow, and included in SS, which also picked up obvious rearrangements and rephrasings.

In practice, I think my version tends to be a little more restrictive on ID, but we end up roughly the same on SS. Mine is slightly more algorithmic, with less low-level judgment, DC’s captures “cut-paste-rearrange” plagiarism a little better. Think of them as two slightly different reconstruction techniques.

By my (restrictive) rules, 50% of the WR Summaries words 9as a group) are ID, another 31% SS, and of course, I was giving them every break.

In any case, it doesn’t matter. This stuff is so clearly plagiarism that a few percentage points here are there are irrelevant. The most amazing one was MBH99, which was 100% SS … obviously, that was an unimportant paper, could be dealt wit very quickly.

Of course, the hard part, of which DC is the master, is *finding* the antecedents in the first place, especially when not available online. Of course, that step wasn’t needed for the Summaries.

However, fairly soon, you will be able to see even more evidence of the visual similarity of plagiarism style between the WR and Said’s dissertation. DC has showed 10 WR pages so far, I’ll add another 25 for anyone not yet convinced. That’s 35 of 91 pages … that is mostly plagiarism. The plagiarism in MW is more sophisticated, as it blends 2 sources (sometimes wrongly), but its problem is odd tip-off words.
Said’s stuff (except for the weird howlers) mostly looks OK because the *words* are OK words, since they are mostly cut-and-paste. As best as i can tell, even experts miss this stuff because they skim introductory material, see what they expect, and go on … without studying every sentence to notice the silly things.

[DC: For the record, I don’t agree there is a strong case for plagiarism in M&W. But I would say the evidence is overwhelming that they have cited sources that were not in fact consulted, while not fully disclosing the actual sources. While that is not acceptable, I’m not quite sure what might be the correct term for such lapses of scholarship standards. ]

Proabably it’s ‘unethical citation pratices’ which comes generally under academic misconduct, though it generally seems to be considered a minor type. On the other hand, as correct referencing is meant to be your protection from allegations of plagiarism, consistently using ‘unethical citation practices’ may leave you in a sticky situation.

It’s ok to use secondary referencing, but it should be made crystal clear that is what is occurring (eg. A reports (cited in B, 19xx)), and only the material actually read should be listed in the reference list, not the original source.

Oh, to be clear, this is nothing like the massive obvious plagiarism of the WR. MW are much more subtle.

The correct term for most of the issues is fabrication. The plagiarism is relatively minor, but simply shows how they mangled two sources together, but with tip-offs that show where the ideas came from. You might post the new source you found, and see what people think.

Anyway, in some cases, an “as cited” reference should have been used. This appears to be the case for the Bradley references, which were almost certainly not consulted directly, but rather really “as cited” in Wegman et al.

In other cases, the actual source was likely something even less “citable” than Wegman et al. For example, M&W’s mixup over the unpublished Mann et al 2004 reply, was almost certainly the result of poor understanding of ClimateAudit.

As John mentions, there is a rather embarrassing apparent antecedent for M&W. I’ll be posting on this within the next day or two.

Who would this complaint be submitted to? Is the Wegman report considered a scholarly (peer-reviewed) work? If it is just an opinion piece and not research, then it is not research misconduct. Copyright violation maybe, but that is entirely different.

With that said, plagiarism in a scholarly publication or dissertation is considered research or academic misconduct, respectively. Research misconduct would be under the purview of the federal agency who funded the work (NIH in case of Said) and the university. Academic misconduct would only concern the university. Universities handled thesis plagiarism differently, as indicated in the link included in my earlier post, reposted below. Some universities look the other way, and other universities take concrete action.

BTW, the Board of Trustees of Ohio State University voted on Friday to revoke Nixon’s doctoral degree.

The venue of the plagiarism is immaterial. Professors have been disciplined for plagiarizing in op-ed pieces and I would think a report to Congress would be held to as high a standard as other scholarly work.

Isn’t Ken Cuccinelli coming down hard on Virginia university staff who may be involved in dodgy goings on? Will he be producing a civil investigative demand for all documents pertaining to Wegman and demanding his emails? Is this an ideal case for him to sink his teeth into?

a) Cuccinelli and his assistant, Wesley Russell, both got their JD’s at GMU.

b) The Kochs have funded GMU well, its Mercatus Center, its Institute for Humane Studies (where Fred Singer used to be attached), see
see pp.93-95 in CCC.

c) The Kochs have also given to Cuccinelli.
See energy>natural gas, although of course coal does even more. I doubt the Kochs need to spend a l0t of money on Cuccinelli.

An interesting, and totally nonobvious funder is Qustfore Communications, run by Ken’s father Ken Cuccinelli, Sr, previously a natural gas executive, and if you look closely still does consulting for Latin America and Europe. (Europe: hmm).

I just had a thought. I bet that Peter Sinclair would love to do a “Crock of the week” on this. Perhaps one of you should contact him and try and work something out. Your story would get a lot of exposure that way.

I was thinking specifically of the PhD dissertation problems. But, yes, in the larger context, there is strong evidence that Congress was misled. The real villains here are Joe Barton and Ed Whitfield, with Peter Spencer co-ordinating the attack on climate science and scientists on their behalf.

I am together with many others very grateful for your discovery of plagiarism in the WR, investigation of the associated theses following a tip-off and the fact that your diligent and tireless groundwork inspired and ultimately grew into Dr Mashey’s magnum opus.

I truly expect that the DoJ and GMU both investigate this to the fullest possible extent, following that I hope justice is served and that the right and proper consequences follow.

The can of worms that has been revealed, shows the extent of the forces arrayed against science. When this conspiracy is combined with the Kochtopus network that has been revealed by GreenPeace [1]; The New Yorker [2]; The New York Times [3]; DirtyEnergyMoney [4]; it shows that democracy itself is under attack.