Posted
by
kdawson
on Sunday July 11, 2010 @09:19PM
from the copied-by-a-spider-on-lsd dept.

walterbyrd writes "SCO's ex-CEO's brother, a lawyer named Kevin McBride, has finally revealed some of the UNIX code that SCO claimed was copied into Linux. Scroll down to the comments where it reads: 'SCO submitted a very material amount of literal copying from UNIX to Linux in the SCO v. IBM case. For example, see the following excerpts from SCO's evidence submission in Dec. 2005 in the SCO v. IBM case:' There are a number of links to PDF files containing UNIX code that SCO claimed was copied into Linux (until they lost the battle by losing ownership of UNIX)." Many of the snippets I looked at are pretty generic. Others, like this one (PDF), would require an extremely liberal view of the term "copy and paste."

"In a blog post dated July 10th, 2010, Kevin McBride has leaked almost 50 of the code comparisons that were submitted in evidence in SCO vs Novell. You can download the archive. [slushdot.com]

Read on to view individual files if you don't want to download the whole thing.

Linux STREAMS

We also learned that the whole STREAMS fuss was not about linux, but about a product distributed by gcom, a provider of legacy solutions.

Their Linux STREAMS (LiS) product provides a couple of loadable drivers that would intercept calls to the old streams api and convert them. In other words, far from the allegations that the linux kernel contained code that infringed streams, it's evident from the need of an add-on loadable module that the linux kernel does not contain any STREAMS code.

Of particular note, and probably a source of much consternation to SCO and their proponents, is that LiS itself doesn't implement streams either, just does protocol translation. So neither linux nor LiS contains infringing code.

The whole end-user $699 license was a scam

In my view, contract violations by IBM would not result in liabilities by other Linux users.

So according to Kevin McBride, one of the lawyers who worked on the case, there was no reason for end users to take out a license. It's logical to conclude that SCOsource was a protection scam. So what happened? To me, it looks like SCO lawyer-shopped until they found attorneys who were willing to go along with the scheme for a price - everyone has their price, and in this case, it was $30,000,000.00.

The Appeal of SCO's loss to Novell - Novell will probably win.

Will Novell win the current SCO appeal? Probably. Will Novell donate the UNIX copyrights to the Linux community if it wins the current appeal? Probably-although Novell's Linux activities have been difficult to predict in recent years.

I'm not a coder. I couldn't create a kernel if my life depended on it. I couldn't code a hungry cat to catch a mouse. 1/2 or more of what I read in code is gibberish to me.

But, one thing is pretty sure. Linus Torvalds wrote Linux, and his programming background came directly from Unix. OF COURSE he is going to write the same commands he has used a thousand times in the same way. OF COURSE there are going to be lines that look very much the same, sometimes even identical.

Now, open any dozen books that are 50,000 words in length. Search for strings that are duplicated between the books. Entire sentences, or phrases, it hardly matters. Just do the search. Anyone who is used to playing with databases can probably search those dozen books, and find numerous instances of phrases that were copy/pasted from one author's book to another. In fact, I'll bet that technical and factual books will have a higher incidence of matching phrases and sentences than works of fiction - but fiction will have it's share as well.

And, before we do this data base mining, we need to set up some method of assigning a variable string for proper names. In the wife's romance novels, we would be looking for " $ kissed $ ". It will be repeated so often that you can't help seeing the plegiarism. One author after another rips it from his predecessors.

50 instances are claimed for "copying". Out of how many lines of code? Good grief. I guess it should have been mandatory that Linus write any code not only in a different programming language, but in a different language than English. Then, he MIGHT have been safe. Maybe. Not likely though, because SCO can probably read Chinese, or hire some scumbag lawyer who can.

I've seen cases where me and another person are working on code independently, and when it came time to merge, we had both ended up creating the same variable names, and pretty much the same code.

About the only difference was in indentation - mine is "always put the opening brace on the same line, one true tab, else in same column as if, no braces for any single-line condition to a control structure (for, if, else, while, etc)". Even the comments were pretty much the same.

In this case, though, some of the code is from BSD - which is perfectly fine.

I learned perl from someone who named all his variables with variations on "foo" and "bar". Back in those days, if I was writing something short and simple enough, it was hard for me to break the habit of naming things $foo, $bar, $boo, $far, $foofoo, etc. I'll bet a lot of our code looked like it was from the same person:)

And heres the Magic. Linus learned his style by closely reading Andrew Tanenbaum's books, and reading the Minix code. Which of course is what your supposed to do with Minix. So have most OS coders who had their education back then.

The end result of course is that everyones code ends up looking like Tanenbaums , which is not a bad thing, the guy is up there with the gods in terms of importance to O/S theory.

Actually it is even simpler than that. The code in the PDF I saw for for ELF.They where all typedefs. Elf is a well documented format and not of the code that shows copying was actual functional code.As I was reading the code I was thinking just how trivial the example was but also how well written both.h files where. You could tell exactly what each variable in the type def did. It also looked like a lot of my own code when I am having a good day.Also these are.h files! they are not functional code blocks just definitions. Of course the definitions for the typedefs of a well documented file format will look a lot alike!It is a huge duh but an attorney that knows nothing about programing might not understand that.If this was an example of the infringement I would say the court did a great job when they tossed it out.

Because in FORTRAN, variable names beginning with letters I through N meant integers, and all other meant floating-point. For those with some FORTRAN experience, naming the first 6 loop counters "i,j,k,l,m,n" is automatic.

mine is "always put the opening brace on the same line, one true tab, else in same column as if, no braces for any single-line condition to a control structure (for, if, else, while, etc)"

Coding style like this makes me cringe, particularly the thing about no braces for single-line conditionals -- it makes it far too easy to make mistakes because you indent code and forget that indentation doesn't mean it's part of the conditional (unless you are using python, of course).

So use a REAL tab character in your code like $DIETY intended, and set your editor to "show tab character". We have wide screens now - there's NO excuse for using anything except a real tab any more.

Actually, there are several huge reasons for not using real tab.

1) Tab means different things to different people. Even when you spell it out, people interpret it differently. In the original sense (i.e. old Underwood typewriters, and the like), tab meant to release the carriage and let it move thanks to the spring, until it was stopped by a tab stop. This means that if you set your tab stop at position 5, and 60, and you pressed tab when positioned anywhere from position 1 to 4, it skipped to 5. Pressing tab when positioned at 5 through 59 (in this example) skipped you to position 60. So, in it's original sense, tab relied on tab stops (literally tiny "tabs" on the top of the typewriter). However, there are few standard document formats (especially for source code) that define the tab stops. You don't see a line in an ASCII or Unicode source code file that says "the tabs for this document are at position 5 and 60". There's no common convention for this.

So people invented arbitrary tab stop conventions. Like "tab stops are every 4 characters" or "tab stops are every 8 characters". But a small difference like this can change the meaning of your document! If you line up code and comments with "real tab characters" every 4 characters, and then someone opens your document with tab stops every eight characters, then the issue is NOT just that things are moved right. The issue is that things do not line up! If I create a nice comment section with a table explaining something, and use tab characters counting on a tab stop every 4 characters, and you open it with tab stops every 8 characters, the MEANING of the comments may change.

Example:/* Here is a table of all the fields and whether they are changed by this function:

[tab][tab]Passed to function[tab]Returned from function[tab]Changed by function[tab][tab]------------------[tab]----------------------[tab]-------------------A[tab][tab][space][space][space]Yes[tab][tab][tab][space][space][space]Yeslonger[tab][tab][space][space][space]Yes

*/

These comments will mean different things depending on the tab stop assumptions!

2) The designers of some editors mis-understood how tab-stops worked, and instead, some made tabs equivalent to a fixed number of spaces. For instance, for some editors a tab is instantly interpreted as 4 spaces. But in the original definition of tabs, it was a "variable" amount of spacing, which took you to a predictable column. These are two vastly different concepts.

3) Those who are smart enough to realize that there is confusion are really annoyed by those who are clueless and inserting real tab characters without knowing that there's an issue.

OF COURSE he is going to write the same commands he has used a thousand times in the same way

I'm sure this is one of the reasons it's best to call the system GNU: Linus didn't write any of the "commands". Linus wrote a kernel and GNU ended up adopting it. The GNU project wrote the system "commands".

Just do the search.

Trivia: Actually, the people at exbiblio found that there is very little repetition of text in literature. Any four or five word sequence in a common magazine article is likely to appear in very few or no other texts. That fact is foundational to their technology.

Now, open any dozen books that are 50,000 words in length. Search for strings that are duplicated between the books. Entire sentences, or phrases, it hardly matters. Just do the search. Anyone who is used to playing with databases can probably search those dozen books, and find numerous instances of phrases that were copy/pasted from one author's book to another. In fact, I'll bet that technical and factual books will have a higher incidence of matching phrases and sentences than works of fiction - but fiction will have it's share as well.

Actually, that's not true. There is some evidence to suggest you only need a remarkably short string of words to uniquely identify a piece of English prose - it's this kind of thing that cheating-detection algorithms rely on.

But we're talking about a structured programming language - with far more structure and rules than the English language - and the things that are at issue are by and large implementations of existing standards. The final link in TFS is a comparison of ELF utility header files, FFS. They've got to look fairly similar or they won't be any use for dealing with ELF executables! Even then they're sufficiently different that it would probably have been easier to write from scratch than it would be to execute the "copy/paste/obfuscate" cycle that is being alleged.

>But we're talking about a structured programming language - with far more structure and rules than the English language

Not to mention a far smaller vocabulary, the complete absense of abstract forms of speech (no metaphors, similes)and in fact of even fundemental sentences.The vast majority of sentences in a programming language are verb(subject); THAT'S it, a rare few have an "object" (e.g. substr(S1,S2)) but at heart, that's 99% of the lines in a program. Simple commands. There are identifiers, control concepts (loops and conditionals) and structural stuff (classes, functions and the like) but these make up very little of the bulk. The implementation section consists of commands and variables for them to act on.Thus for the same algorythmic task, barring minor changes in indentation and identifier naming (which will be minor because both are matters on whic standards exist and within organisations some or other standard is usually enforced) the statistical likelihood of two programmers writing and identical solution to the problem is very high. After all, programming is maths and there is only so many ways you can calculate the same equation - which is basically all any algorithm does.

You need a lot more than a few functions with identical structures to prove copyright violation when the scope for individual change is that much more limited. Creativity in programming is VERY rarely coming up with a NEW algorythm for an old task. Nearly always it lies in how we combine algorythms with one another to solve the bigger problems. The bits and pieces of code are like nuts and bolts, every engine has a million of them and they all look pretty much the same.

Is that so? Let's see if we take a phrase from your own comment: "a higher incidence of matching phrases [google.com]". One hit. Not bothering with linking to them all, but how about "rips it from his predecessors"? One hit. "strings that are duplicated between the books"? One hit. "his programming background came directly from Unix"? One hit. "open any dozen books"? One.

I have, of course, duplicated them in this comment, meaning there will be two hits very soon. BTW, these are all the strings I searched for, giving your comment a 100% originality rating (admittedly, I didn't search for "I'm not a coder", which I expect would show up several times).

Duplication of whole sentences in ordinary human language is actually quite uncommon for all but the most trivial declarations and stock phrases ("Just do the search" gives 3 million hits; "Just do the twist" gives 105 000).

You only missed a verdict if you haven't looked up for seven years! Recently a jury in Utah confirmed what a judge found in a bench trial: Caldera (later SCO Group) did not get, and was not entitled to get, the UNIX copyrights in the 1995 deal they did with Novell. Unless you think that the jury was unreasonable in that finding (and guess what, SCOG and its lawyers do), SCOG does not 'own' UNIX in any useful sense.

I'm all for strict ethical standards, and I won't claim to have gone through more than a half dozen of these PDF's checking them.

That said - the six or so files I looked at were exclusively issues about some fairly well established naming conventions, the repetition of which is about as unlikely as finding an elm street and sycamore street 'coincidentally' close to each other in both our home towns and calling it 'evidence' the city planner of one stole the plans of the city planner of the other.

I've heard people claim in several posts that "Well, if you looked at the PDF's that weren't cherry picked for ridiculousness..." there are 'obvious' copying of code. That may be, but I'd like someone to actually link to a pdf with this obvious copying of code - burying the evidence in bullshit data is what you do when the evidence is against you and you *don't* want the other side to find it, not when it's in your favor and you're trying to make your case.

At a minimum, the fact that he has buried his 'evidence' in with other bs data doesn't make it very clear he knows what actual evidence of copying would look like.

These are not small words like "kissing" that are under dispute, this is not about reusing some very common routines that everyone uses, that's just silly. Rather it's about companies wanting to maintain compatibility with legacy versions of UNIX and doing so by referring directly to the legacy UNIX at best, and plagiarizing their code at worst.

Except it's not about that at all. It's about implementing standard interfaces that are defined by POSIX. POSIX defines things down to the variable type so it's natural that the resulting header files will look similar. In fact, some of the differences I'm seeing in these files are from SCO not implementing POSIX properly ex return type int where it should be size_t.

You also need to keep in mind that SCO's predecessor (AT&T) was itself caught copying code from Berkely.

There was exactly one case where there was copying shown between SCO and Linux. In that case the code was from Berkely (licensed open source) copied into SCO and copied into Linux by SGI as one of their internal filesystem driver headers. The code was determined to be non infringing due to it's history but deleted because it was old and reimplemented in a better way elsewhere.

From working with the Linux kernel maintainers I know they take copyright very seriously and investigate even the possibility that code was copied and you owe them an huge apology for that uninformed set of accusations.

The copyright notices in the comments, etc were then replaced with AT&T ones, replacing
the Berkeley ones (also replacing the earlier AT&T ones, btw.) I can vouch for this personally, having worked on the "vi" source code both at Purdue (original BSD 4.3 code) and at AT&T (System Vr4 code) -- all of the BSD copyrights, as well as the (bad) poetry, had been removed from the comments in the vi sources.

The Folks from the UCB law school took advantage of this in the counter suit,
since the AT&T folks, having changed the copyright notices in the troff sources, ended up doing this this then in the printed manuals.
So while AT&T was suing about vague things like including code derived from code derived from code they wrote; UC Berkeley countersued about printed, published, paper manuals, where AT&T was clearly publishing them without the UCB copyright and license info. Clear, obvious, game-set-match, paper copyright violations.

So rather than have to find and "Destroy all Copies" of SystemVr4 manuals (including those published in turn by licencees like HP, IBM, etc.) AT&T agreed to drop their
initial suit and make the countersuit go away.

Trying to imply that this is some nonsense that should be dismissed just because you like Linux is like playing down and ridiculing the evidence of the murder of Hans Reiser's wife because you like ReiserFS. It's even sillier in some ways because Linux isn't at stake in the case like ReiserFS was. (An extreme analogy I know, but valid).

I read the part in Groklaw about discovery in SCO v. IBM. They spent a year in discovery to see if there was similar code and once the infringing code was revealed, the response from the judge was "Is that all ya got?" After some pretty detailed analysis and dissection of the code in court, I think the result amounted to 230 lines of code out of several million.

Since they couldn't find examples of literal copying, SCO really wanted to go after methods and concepts as if that was protected by copyright. And they might have done it if they had been more forthcoming. But they were always late, delaying discovery as much as possible out of fear that any code revealed would be code removed from Linux. In a sense, they wanted a permanent tax on Linux without revealing the code so that it couldn't be removed.

Since you write code and you appear to have greater ethics than SCO, you wouldn't do what SCO did in order to recoup your investment in your code. What SCO did was try to reach something far beyond copyright enforcement in any reasonable sense of what the law actually says. And that was without owning the copyrights.

Having said all that, I can say that I am too, opposed to plagiarism in Linux. But I am also fairly confident that plagiarism isn't a problem in Linux since the code is open for anyone to see, and vetted by hundreds if not thousands of other programmers.

The far more compelling case of copyright infringement will come out in the counterclaims that IBM has against SCO. Right now, that giant is sleeping, but when it wakes, it's going to keep Boise Schiller very busy.

The truth is that code was reused (if not copied, exactly, in the same way you don't submit a copied essay which you've taken from a classmate) from a UNIX derivative, which is now (somewhat disputably) owned by SCO.

The truth is that SCO does not own the copyrights to UNIX code, as ruled by a judge, a jury and a second judge in Utah.

Beyond that, SCO already turned over all their evidence to IBM several years ago, where it was analyzed by an expert named Dr. Brian W. Kernighan (if the name doesn't ring a bell, you're not qualifed to be commenting on this topic), and he examined the comparisons and came to the conclusion that no illegal copying had taken place. Note, that's not "no copying", but "no illegal copying". UNIX is based on a number of sources. Most famously, they stole (and I use that word advisedly, since they removed copyright notices, which was illegal) from BSD. They also contributed parts of UNIX to public standards, including the ELF standard which is one of the examples shown here.

The problem is that you seem to think there's a single copyright to UNIX. There isn't. The three biggest stakeholders are Novell (inherited from AT&T), the University of Cal. regents (all the BSD code in UNIX), and Sun, but IBM and SGI also own largish chunks. The people you're accusing of being "lazy and careless" actually own parts of the code you're claiming they illegally copied. Do you know who actually owns the parts in question? I don't, but no evidence of illegal copying has been shown in court!

Releasing this thoroughly debunked information at this late date can only be a desperate attempt to spread FUD. Don't fall for it. I'm sorry that you once had your code plagiarized, but this case is nothing like yours.

Except that gcom didn't ship Caldera's STREAMS implementation. They shipped a loadable module (driver) that intercepted calls made by legacy apps to the STREAMS api and translated them to calls that linux could fulfill. In other words, no STREAMS code necessary in either the LiS product or in linux, and not even code to emulate STREAMS - just translate the call and forward it.

Comparing a variable named elf_t_arname to one names elf_c_arname is not very convincing. The suffix is generic, the prefix is activity specific, and the middle letter is presumably some datatype indicator.Where it gets dicey is when there are structs and every variable in the struct has a somewhat similarly named variable in the other one. This does arouse suspicion. even if you forget the variable names for a moment, any pattern like bool,real,real, *real, int, *char,*char,*bool,.... that is identical between two structs would be an improbable occurence. and when you see it in back to back structs it becomes nearly impossible to happen by chance.

The key question then is if there is some structural reason why the two might share an identical stuct? for example, is there an elf spec that defines a protocol for communication or the way a record on disk is serialized (i.e. packed)? if so then of course these will occur like this. Or perhaps both are derived from a common BSD ancestor so both vary only slightly.

if the answer is no, there was no reference implementation and no ancestor then I'd say that for examples like 251, Mcbride has some evidence.

This does arouse suspicion. even if you forget the variable names for a moment, any pattern like bool,real,real, *real, int, *char,*char,*bool,.... that is identical between two structs would be an improbable occurence. and when you see it in back to back structs it becomes nearly impossible to happen by chance.

Umm, no. This is only logical. If he had the documentation to data types, or even if he reverse engineered them, they would be identical. I have done some trampolining of functions to/from code which i only have binary access. If I am handling a struct, I would examine the structure in memory and make my struct identical so I could copy it with simple memcpy(to, fro, sizeof(struct)), or replace references by just changing the pointer to different place.

An executable file using the ELF file format consists of an ELF header, followed by a program header table or a section header table, or both. The ELF header is always at offset zero of the file. The program header table and the section header table's offset in the file are defined in the ELF header. The two tables describe the rest of the particularities of the file.

Applications which wish to process ELF binary files for their native architecture only should include.In elf.h in their source code. These applications should need to refer to all the types and structures by their generic names "Elf_xxx" and to the macros by "ELF_xxx". Applications written this way can be compiled on any architecture, regardless whether the host is 32-bit or 64-bit.

Should an application need to process ELF files of an unknown architecture then the application needs to include both.In sys/elf32.h and.In sys/elf64.h instead of.In elf.h . Furthermore, all types and structures need to be identified by either "Elf32_xxx" or "Elf64_xxx". The macros need to be identified by "ELF32_xxx" or "ELF64_xxx".

Whatever the system's architecture is, it will always include.In sys/elf_common.h as well as.In sys/elf_generic.h .

These header files describe the above mentioned headers as C structures and also include structures for dynamic sections, relocation sections and symbol tables.

...

[snippage]

...

HISTORY

The ELF header files made their appearance in Fx 2.2.6 . ELF in itself first appeared in AT&T V . The ELF format is an adopted standard.

This is the problem with SCO's case - OldSCO/Caldera only could have gotten what Novell originally had to give, if Novell HAD assigned copyrights to OldSCO. A lot of the stuff was from BSD.

"In the ten years that we have worked together, I can recall only one case of miscoordination of work. On that occasion, I discovered that we both had written the same 20-line assembly language program. I compared the sources and was astounded to find that they matched character-for-character."

This was the first time I had a chance to see any of their "evidence". How exactly did this make it all the way to court?

Especially given that "original" files appear to have been altered to contain comments to the effect that a USL copyright statement somehow proves that the information is "UNPUBLISHED PROPRIETARY SOURCE CODE" together with a meaningless statement about the existance of a copyright statement not implying publication.It looks as though they have "mailmerged" part of their claims into their supposed evidence. Shouldn't this have resulted in the judge throwing this out as null and void?

Sure, they copied all those those blank lines, and in at least one case (tab 247) they also copied the BSD copyright header. Shocking! Funnily enough SCOX removed those lines on both sides. Kind of them.

They also copied strn?casecmp definition (tab 241). For quite astonishing values of "copied":

This is clearly an extremely grave violation. However, it is interesting to note that SCOX did not complain about the definitions of all the other string functions. Maybe because the header of their file specifies

In addition, portions of such source code were derived from Berkeley4.3 BSD under license from the Regents of the University ofCalifornia.

I find it rather funny that the Linux code is well commented but the SVR4 code has little to no comments at all. Just because the function names are the same doesn't mean it was copied. It just means that the coders implemented functions with the same names (and I bet that the Linux versions worked rather differently than the original SVR4 code).

I agree. As a programmer myself, I saw significant differences between the two sets of code in each example that SCO claimed was "evidence". I would have to look at more of the examples, but from what I saw, if I were the judge, I'd tell SCO to stop wasting everybody's time.

I actually find it ironic that libelf was picked as an example of infringement. I can tell you first hand that the (more standard) UNIX/Solaris libelf is NOT compatible with the Linux/libc libelf. And I can also tell you that after pointing this out to Ulrich Drepper he really didn't give a shit... (I think his approximate words were "It's been like that for a while, too late, I won't change it").

Their only mistake was actually naming it "libelf"... since it is most definitely NOT the same library...

More news about imaginary property. How much time and money does our society waste on propping up this outdated concept that you can own an idea? "#include " constitutes copy and pasting? I guess every program on earth violates the copyright of the guy who first wrote "int main(", and whoever started the convention of naming C++ files cpp or cxx should be hiring a lawyer about now. Money is to be made.

It's not copyright infringement to write something required for a specific functionality (e.g. POSIX compliance or API compatibility) even if it's exaclty the same words. Copyright only covers creative expression. If the ways of saying something are limited, that's not covered by copyright.
Of course that didn't stop the SCO trolls from trying and wasting everyone's time.
Delayed justice is no justice at all.

And were it some small group of developers that SCO sued rather than Red Hat/Novell/IBM, it would have been worse than delayed justice. Copyright does not protect the little guy, much like the majority of our legal system.

"It's not an "outdated idea". If you really think that copyrights and patents are a bad idea, you need only look at countries where they did not exist -- like Russia during its peak of Socialist power -- to see how that works out economically. Hint: it doesn't. "

You're seriously assigning all the problems with the Soviet economy to their lack of copyright? You, sir, take the cake. That's the most absurd statement in support of copyright I have ever heard.

"The idea being that those who ignore (or don't know) their history, will be doomed to repeat it. And you obviously don't know your history, or I am about as certain as the sun will come up tomorrow that you would change your mind on that issue. "

Given your interpretation of the history of economics is rather... I'll be nice and say 'intriguing', I guess I have to agree that I don't know it. I don't see how the lack of copyright led to Stalin screwing the country over, but I guess I am just dense.

"The fact that some laws have been corrupted, like the duration of copyrights for example, is not justification for elimination of all such laws. Sure, it needs to put back the way it was, but not eliminated."

But the fact they are built on the concept of preventing free flow of information - and the fact they can no longer be enforced even by companies with more power than some countries - DOES make copyright and patents outdated. That ship has sailed.

"Copyrights and patents were established for the public good, and believe me, in the places where there are none, there is also damned little public good."

The entire world prior to about the 18th century begs to differ with you. On the other hand, I don't know history, so don't listen to me; nothing of merit was produced before copyright! The Greek epics, countless thousands of books, paintings, songs, architecture, machine designs, all came to exist about 300 years ago, right around the time copyright was invented. Not one before it was established. Thank you for setting the record straight!

"It's not an "outdated idea". If you really think that copyrights and patents are a bad idea, you need only look at countries where they did not exist -- like Russia during its peak of Socialist power -- to see how that works out economically. Hint: it doesn't. "

A wild thought. It is *JUST* possible that some other aspect of communism, besides the lack of copyright, was responsible for the slow rate of development in the Soviet Union.

Your reasoning would lead us to conclude that the high rate of alcoholism and consumption of vodka was caused by the lack of copyrights... and that Mr Gorbachev need only turn back the clock, implement a proper copyright system, and become a Hero of the Soviet Union by saving them all from economic perdition.

Sorry, Jane. You are wrong. Not your fault, really, since you are just repeating what you've been told many times over the years. But it is wrong.

Your appeal to socialist Russia as an example of why copyrights are needed is laughable. The Soviet system failed mainly for many reasons, but lack of copyright recognition wasn't one of them.

The fact is that the idea that copyrights and patents are a benefit to society is based on no evidence at all. It is one of those ideas that were accepted for a long time without being examined in any detail. There are many examples of places and times where copyrights and patents did not exist and innovation there certainly was not harmed, and careful studies have found pretty good evidence that innovation was helped by their absence. The site techdirt.com discusses these issues regularly, frequently pointing to academic research in peer reviewed journals so you can check out the research yourself if you don't trust the reporting. But you don't have to depend only on research. There are examples today where industries thrive in the absence of copyrights or patents. One easy to understand example is the fashion business. No copyrights or patents. Lots of copying, yet good money is being made by many players. And there's certainly no lack innovation there.

It certainly is true that eliminating copyrights and patents would be kind of disruptive, since some large businesses have come to depend on the artifical monopoly authorized by copyright and patents. But there is reason to believe that we would be better off now had they never been invented. Whether the disruption that would come from eliminating them is reason enough to keep them is questionable. I am pretty sure we would quickly adjust to their absence if they were abolished tomorrow, and be better off, on the whole. Of course, that is unlikely to happen because our government is bought and paid for by the businesses that depend on copyrights and patents, so we will have to suffer the burden of these government-granted monopolies for some considerable time to come.

This code was the last big unknown in this long sorry saga. Even if SCO owned the copyrights, (and hadn't distributed it under the GPL, and hadn't signed the UnitedLinux agreement, etc.) it is now crystal clear that SCO's Microsoft-funded anti-Linux campaign was based on a stack of frivolous law suits.

I think Darl's brother is scrambling to cover his backside so that when the disbarments and criminal charges come down, he has a chance to escape.

Groklaw (of course) has IBM's response [groklaw.net] to SCO's claims that these paltry examples are worth BILLIONS of dollars in copyright damages. None of the code they offered is protectable under copyright law. Some of it is BSD code that everyone is free to use however they want (if they include the copyright notice). A lot of it is header files that were not copy-and-pasted which are nearly impossible to protect under copyright law. Then they have some snippets of generic code. Given the size of the source code for Linux, it would be astounding if there weren't some similar snippets. The idea that this is proof that Linux violated any Unix copyrights is totally absurd. The idea that these generic snippets are what made Linux enterprise-ready is beyond insane.

The recent SCO v. Novell case decided that SCO never even owned the copyrights it was suing about. And then instead of the millions of lines of code they claimed were infringing, they presented this meager collection of totally unprotectable snippets. I sure hope SCO's lawyers get severely punished for perpetrating this fraud on the court for the past seven years.

In Gates v Bando [groklaw.net] the Tenth Circuit established the abstraction-filtration-comparison test that would become the standard in software copyright infringement. Specifically in the filtration step, all elements which are not protected by copyright must be removed from consideration. In this case, most of the code falls under scenes a faire: "expressions that are standard, stock, or common to a particular topic or that necessarily follow from a common theme or setting ..these external factors may include: hardware standards and mechanical specifications." Most of the code were simply declarations needed for compatibility and cannot be copyrighted.

"until they lost the battle by losing ownership of UNIX"
Those are all the words you needed to read. You cannot loose ownership of that which you never did own. SCOs gambit was to gain ownership by bamboozling everyone. You know when one party is blowing smoke in these issues when they refuse to point to the infringing code outside of court. If the quote included is to be believed, they lost on appeal and now that they have filed yet another appeal they are suddenly going to show us all the holy grail of "infringing" code. In each case where they've brought up "infringing" code, it was either released by themselves, or was code they didn't own in the first place.

Just because Linux and Unix have some of the same lines of code, does NOT mean that linux copied the code from unix.The code could have come from BSD for example and in fact there are several instances where linux and Unix share (or shared) the same BSD code.

The code could also have come from implementing the Posix Standard. The PDF linked to seems to be an implementation of errno.h which I believe is part of the POSIX standard.So again just because the code appears in Unix, does NOT mean that Unix had copyright ownership of that code.

To prove its case SCO would have had to prove that:a) Linux had lines of code that were substantially similar to Unix. (some minor examples provided but even that was not definitive)In fact the judge who supervised the discovery kept asking for details and at the end of the multi year discovery process, said, "Is this all you've got?"

b) Unix had copyrights to the code in question (again not proven)

c) SCO owned the Unix copyrights (again not proven)

d) SCO never granted the rights to use that code in any way. In fact Caldera (aka SCO) distributed a version of Linux under the GPL which in effect granted GPL license to any of their code that happened to be in Linux.

So even if all of a, b, and c were true,they STILL did not have a case for infringement.I almost wish that SCO had owned the UNIX copyrights, because then this whole issue would have been resolved by now, instead of relying on Novell.

The courts have established that in order to determine software copyright infringement (for non-literal copying, which is what we have here but filtration is required even for literal copying), one must perform what is called the
Abstraction, Filtration, Comparison Test [ladas.com]. In court documents related to the code in question, SCO admitted the did not perform this test on this code. They claimed that that was IBM's job. The article linked to above explains the test:

1. break down the plaintiff’s program into its constituent structural parts (“abstraction”);

2. examine each part for incorporated “ideas,” elements taken from the public domain, methods of operation, processes or procedures, or otherwise unprotected material (“filtration”); and

3. compare the remaining kernel of creative expression, if any, to the work alleged to infringe at each level of abstraction (“comparison”).

They further explain:

The scenes à faire doctrine is often applied in software cases because it is frequently impossible to write a program in a particular computing environment without employing certain standard programming techniques and design elements. This is because certain functions, data elements, and the order of operation of a program can be dictated by such things as the type of computer on which the program will run, the programming language used, the operating system environment, governmental requirements, industry demands and standards, and widely accepted programming practices.

I suspect the reason SCO didn't filter this code is because if they did, there would be nothing at all left to present to the court as their fig leaf to avoid being charged with perpetrating a fraud on the court.

I spent several years as a Unix kernel hacker, working extensively with AT&T source code. I also went to law school and was one bad case of writer's block away from becoming a copyright lawyer. Thus I found those code snippets quite interesting, both from my Unix kernel hacker persepective and my almost-became-a-copyright-laywer perspective.

My conclusion, from the half dozen or so of his samples that I looked at? They show nothing remotely resembling copyright violation.

Copyright covers expression, not ideas.
What that means when dealing with functional works, such as computer programs, is that things that anyone implementing that functionality will have to do are unlikely to be covered by copyright.

All of the functions I saw that were allegedly copied were very simple functions.
All they did was check arguments to make sure they were legal, return the expected error code if not, or return some very simply value otherwise.

Even if the corresponding functions in Linux were exact matches to the SCO code, it would probably not be enough to support an inference of copying, because there just aren't a lot of ways to reasonably express such simple functions. And they were not exact matches. One would check for a null pointer by comparing to NULL, one would use if(!p), for instance.

The header files are more similar, so copying is more believable there.
The problem with SCO's case there is that the elements in the header files I looked at are entirely dictated by compatibility requirements. There's no copyrightable expression in them.

To summarize, SCO's claims appear to fall into two groups. First, things where the implementation is so simple that it is not possible to infer copying from similarity since the similarity is imposed by the nature of the function. Second, things where there may have been copying--of things that aren't protected by copyright.

Header files are public - but they seldom contains any advanced functionality. They are just a definition of the calls available, defined data types and constants.

If the header files didn't contain the same (or very similar) definitions then the API wouldn't work. I expect the same header definitions to reappear in many other operating systems with minor differences - many even in Windows (Which do have a Posix API)

But of course - a lawyer wouldn't understand that, it's just a question of money.

They both had a common parent which was public with those structures. Remember these were public APIs, no one is disputing that the structures were similar. The question is whether the implementations were copyright violations.

That's true, but in the push to get UNIX into the commercial space the SysV interfaces were released as an open specification. This was actually covered during the trial.

The fact of the matter was that the Linux folk didn't copy code, something that would have been obvious to any observer following it's development. The idea that there were vast amounts of stolen code was ludicrous if you knew anything at all about the internal structure of the two operating systems.

There was always the possibility of code that got injected during the large commercial code donations by e.g. IBM or SGI, and in fact the only piece of code that showed actual derivation came from SGI... But it turned out to be both a very small amount of code and buggy to boot. As soon as people got a look at it they excised it in favor of working, original, code.

I personally expected it to go more the way of the AT&T veresus BSD case, where it turned out that AT&T had stolen tons of code from BSD, not the other way around. The Linux emulation layer in SCO UNIX seemed a particularly likely candidate. Either that turned out not to be the case or IBM simply didn't push the issue (perhaps because SCO was having so much trouble proving anything in their claims) though.

SCO's strategy always seemed to me to be a shakedown, scare companies into license agreements. Why they went after one of the deepsest pockets first is beyond me, IBM was very likely to fight given their investment, but it was clear early on that management was not very competent.

"it was certainly a matter of one window with unix source to read from, the other window with linux source to write"Hell no. That's a bullshit accusation.

Here we likely have a programmer who has never seen the SCO code (was it even released publicly back then?) but knows that applications can call functionY from libraryX from the interface documentation. If he creates a new libraryX containing a compatible functionY he isn't violating copyright. It is established that creating code with a compatible API interface isn't a copyright violation unless you actually do copy the code.Indeed it appears this is exactly what is happening in the linked elflib files. The filenames are the same because they have to be (note both the redhat_libelf.h and SCO_libelf.h are actually referred to by just libelf.h in code and elflib.h is what applications need to link to) and the functions are the same because they have to be or else applications will be calling functions that don't exist.

This also happens with the WINE project. People there are creating the same header filenames and inside those headers the exact same function names. They aren't working off a split screen setup and blatantly copying Microsofts closed source code. They are merely recreating the functions that applications can call in Windows.

A similar thing happened years ago with Compaqs IBM compatible BIOS. Compaq re-created the IBM BIOS to get around having to buy the IBM BIOS. Both BIOSes responded to the same application calls in the same way with the same return values and messages. Despite this Compaq didn't break copyright because they still re-wrote the code. In fact Compaq hired engineers specifically for the fact they had never seen IBM BIOS code. This way it could be easily established that they didn't copy the code but instead just wrote the same code that performed the same task.

And the idea that this key book to early '80s PC tech (still worryingly relevant today!) was somehow missing from all the bookshelves reachable by the Compaq BIOS writing department is just silly.

You don't know what you're talking about. I was there at the time: Compaq had administrative staff remove the BIOS listings from all IBM tech ref manuals before they were given to the engineers. (This was especially easy to do because they came in the form of ring binders.)

At one point, since I didn't work on writing BIOS code, I was assigned to be the one designated guy who could disassemble the IBM BIOS for a certain model. When the BIOS developers got stumped by a compatibility problem, they could send me a question, and I was allowed to poke around in the IBM ROM and then give a "Magic 8 Ball" type vague answer.

Here's a bit of trivia: A few PC applications wouldn't work unless the ID string "IBM" appeared at a certain address within the BIOS code. Compaq developers worked out a way to make those bytes at that address appear in part of an actual executable code sequence instead.

Reading the man pages for (random example) memcpy tells you how to define it. It also describes what it does. Tell me how you'd re-implement this function without your header definition being identical to SCO's (and yes, the definition is included verbatim right there):

NAME
memcpy - copy memory area

SYNOPSIS
#include <string.h>

void *memcpy(void *dest, const void *src, size_t n);

DESCRIPTION
The memcpy() function copies n bytes from memory area src to memory area
dest. The memory areas should not overlap. Use memmove(3) if the memory
areas do overlap.

Since we have to rely on this PDF , and there is no way to prove that that code was actually in UNIX at the moment ( as it's closed , there's no way to check ) .

I am left to conclude that it's much easier to copy from Linux (which is open ) , than from UNIX ( which is closed ).In other words , using this PDF , i could also claim to SCO stole from Linux , and then implemented it in their commercial product , thus violating the GPL .

The pdf linked in the document is a snippet for what looks like a struct for the elf API interface. This specification is open and judging by the code they are using it exactly as intended.

I'm going to guess the majority of their findings are specifically computer generated. They may have known first hand what the code was or even where it came from. However, if pressed to say how they discovered these violations I'm quite sure they would fall back on "the program made the mistake your honor." This would generate a plausible stance when the foundation began to crumble.

Going further on a limb I'm also guessing this is why they would never release any of the alleged violations. In days a website similar to groklaw would be up in for everyone to review, identify and mark the source of the "violation." ie, this is a struct for the elf library specification or this is a header of a BSD library. (Remember that BSD ancestry is likely still there in large chunks)

All of this happening in the court room and they had to know there were big holes in the allegations. Even a cursory glance reveals that some of the crap submitted is just that. This was a court room poker face with a huge bluff that many parties would just settle. I suppose it worked because too many people rolled over and handed out free cash.

Going further on a limb I'm also guessing this is why they would never release any of the alleged violations. In days a website similar to groklaw would be up in for everyone to review, identify and mark the source of the "violation." ie, this is a struct for the elf library specification or this is a header of a BSD library. (Remember that BSD ancestry is likely still there in large chunks)

If this paragraph is indicative of your general level of knowledge about the history of Unix, you might want to look into ELF and how it got into Unix. I can tell you this however: it didn't came from Berkeley. The various BSD derivatives got it from SysV long after the fact.

Also you might want to spend about ten more seconds checking out the PDFs at the linked site, not just the one linked by the knucklehead kdawson. It's not simply header files. There is actual code. The similarities in code flow, layout, variable names, filenames, etc. are conclusive. The linux contributor didn't implement this code clean room-style based on the specification, plainly having used the Unix implementation as the source.

The PDFs are machine generated. I ran the same program using a copy of Moby Dick and the Microsoft EULA, it produced damning evidence that the programmers had copied them word for word into their source code.

I did look at the code on the linked site and what I saw looked entirely like a clean room implementation to me. The similarities were superficial and much less impressive than similarities I have seen in code that I know was written independently (because I wrote it). Two programmers working on the same problem can easily come up with strikingly similar looking solutions which a non-programmer (or an inexperienced programmer) would never believe was independent. I was astounded at how pathetic the supposed similarities were.

Since we have to rely on this PDF , and there is no way to prove that that code was actually in UNIX at the moment ( as it's closed , there's no way to check ).

I am left to conclude that it's much easier to copy from Linux (which is open ) , than from UNIX ( which is closed ).
In other words , using this PDF , i could also claim to SCO stole from Linux , and then implemented it in their commercial product , thus violating the GPL.

While I disagree with the SCO case, your assertion that its impossible to check is wrong - UNIX may be 'closed' but that doesn't mean that the code base is impossible to get hold of. One of the licenses you could buy was full source code access, and many companies and academic institutes took this license (I for example have seen the code base to AIX and HPUX several times at different places).

'Closed' doesn't always mean 'you can never have the source code', it can just mean 'you can't do what you want with the source code'. UNIX is one of these cases.

The difference is that there are only a handful of solutions for the same problem when it comes to computer code and still make sense.

On the other hand there are tons of ways of conveying a simple fact in English, consider the statement that "George Washington was the first president of the US" the same statement might read that The first president of the US was George Washington. George Washington was elected as the first president of the US. The first president in office in the US was George Washingto

Now I'm not a system programmer, so I may be completely off base, but that looks like a system call for supporting ELF binaries. Meaning it's an implementation of a public standard. Meaning there's a very limited number of correct ways of doing this. Possibly only one correct way of doing this.Every file I've randomly selected has been like this; either references to the ELF format (which is a public standard), or ABI type stuff (which is also a public standard). That's what was rumored to be SCO's "copying

You're not a programmer are you.
The header files are pretty much just a bunch of definitions. there is no programming to speak of in either of those files. From reading the Posix standards you will end up with the same code but with your own comments.
The header files are a lot like the ingredients section of a recipe.
So it's like looking at two recipes for omelets then complaining that both have eggs listed in the ingredients section.

"ELF_C_..." - each of these is the name of a type in C. I don't see how this is even a bit creative. I had a very similar enum in a program I wrote, except with data types from a 3D engine. My guess is that ELF_C means it would be the ELF binary format's C data type. Nothing to see here.

"ELF_K/ELF_T" - It says in the open source one that these are descriptors as well. More or less the same; universal concepts if you're going to be programming a C compiler. I bet you can find an enum just

There's no coincidence involved. If you write applications that use the ELF_Type enumeration and you decided to write a new elf library to support that app you'd end up having the same enumeration names to maintain compatibility.

Copyright allows you to recreate something that's compatible as long as it isn't copied directly.

Ostriches aren't Australian, they're African. Omelettes can be made from emu eggs and I have tasted one. It really wasn't any different to one prepared from hen's eggs. It looked no different to this observer. Compare an emu egg to a hen's egg and they are quite different in size, colour and even texture internally and externally. The formula (recipe) however was just for a standard omelette that we would all recognise by sight instantly. Interestingly, it tasted like one prepared from hen's eggs as well. Couldn't tell the finished product apart.

Posix header files also look remarkably similar to this observer. If code is being written to a required formula so that it interacts correctly with other code (a standard) then there should be little surprise that it looks the same.

It's a header file for a standardized interface. All this stuff needs to be the same for any *NIX-like operating system to be *NIX-like, otherwise, you're making an incompatible operating system. To make source-compatible operating systems you need to have common interfaces, and those interfaces lie in the header files. Saying that this is copyright infringement is like saying that they patented a hole in the wall as a way of getting in and out of a room.

In case you are not trolling, virtually all of the allegedly copied code is boilerplate stuff defining types and structs or function interfaces. These have to be the same for Linux to be posix compatible. The little actual code there is, it isn't similar at all. Copyright can't keep you from writing a function that acts like another, that is for software patents, there should be actual copying and for such tiny functions it would be pretty hard to demonstrate (!s) was copied from (s==NULL).

I still think the jury, knowing nothing about computers, would have ruled against Linux, but the claims were ridiculous.

No one cares whether it's a public spec or not (it may be, I do not know). It's a OS header file. The functions have to be named the same way in order for end-user programs to be source-compatible. If you've been programming for 20 years, this shouldn't be too hard to grasp.

If SCO were allowed to claim copyright over this, then it would be simply impossible for Linux to provide a compatible libelf. This means you'd essentially prevent anyone from ever making compatible OS libraries, as they'd be infringing on the original author's copyright. That would be ridiculous. Public function names and prototypes (documented or not, standardized or not) are not considered copyrightable.

Another example: is Wine, according to you, a humongous violation of Microsoft's copyrights? After all, it implements the Windows API with identical function names and prototypes, undocumented features and all (which is nowhere near a published standard of any sort).

Interface names needed for interoperability are fair game. Interfaces once published are public domain. The specific document is not.

Software is copyrightable but it is not a work of fiction where more protection is given. Writing a book on physics protects the specific wording in that book but it doesn't keep others from using the word "relativity" with the same meaning or your specific name for the speed of light -c- to explain the same exact phenomenon your original book described.

Of course it looks rearranged. It's a header file. Some of the ELF constants come straight from the ELF spec. The #ifndef stuff is bog standard code, there are a finite number of ways of writing that and the one presented happens to be the most common. The #include is another "duh" - of course you have to #include the right header, that doesn't mean it's copied. The header file is presumably deliberately compatible with the original, hence the function definitions are prototype-compatible (while being considerably different in style).

There is nothing indicative of code copying in that PDF. The Linux header is just about as different as it can be while remaining source-compatible, as it should be.

Of course it looks rearranged. It's a header file. Some of the ELF constants come straight from the ELF spec. The #ifndef stuff is bog standard code, there are a finite number of ways of writing that and the one presented happens to be the most common. The #include is another "duh" - of course you have to #include the right header, that doesn't mean it's copied. The header file is presumably deliberately compatible with the original, hence the function definitions are prototype-compatible (while being considerably different in style).

There is nothing indicative of code copying in that PDF. The Linux header is just about as different as it can be while remaining source-compatible, as it should be.

Like you said, any author wishing to build an ELF-capable system would almost have to have that exact same code. There are only so many ways to build an enum or struct following the exact TIS specifications, and there is no virtue in paraphrasing C code.

Much of the rest of the code is libc and POSIX prototypes (and more headers), all of which are covered in the System V ABI [freestandards.org] specification. Anybody wishing to build a POSIX [wikipedia.org]-compatible system would have to define those prototypes.

Several of the function implementations with similarities are very basic functions. Most of the similarities are in the constant names (rather than the specific implementation of those simple functions), and the constant names are defined by... the TIS spec. The remainder is a no-brainer. See, for example, Tab 422 [mcbride-law.com]. This is a simple accessor method. There are only so many ways to retrieve a value from a structure...

Computer programs pose unique problems for the application of the
"idea/expression distinction" that determines the extent of copyright
protection. To the extent that there are many possible ways of
accomplishing a given task or fulfilling a particular market demand,
the programmer's choice of program structure and design may be highly
creative and idiosyncratic. However, computer programs are, in essence,
utilitarian articles -- articles that accomplish tasks. As such, they
contain many logical, structural, and visual display elements that are
dictated by external factors such as compatibility requirements
and industry demands... In some circumstances, even the exact set of
commands used by the programmer is deemed functional rather than
creative for the purposes of copyright. When specific instructions,
even though previously copyrighted, are the only and essential means
of accomplishing a given task, their later use by another will not
amount to infringement.

It is nearly impossible to win a copyright suit over a header file. The only chance you would have would be if it was a straight copy-and-paste which this was clearly not. The reason for this is that there is just not much room for creative expression in header files. Likewise, you can't copyright a word or a short sentence.

There are a very limited number of ways to declare functions. If someone was allowed to copyright certain function declarations then they would have control over a large segment of the software industry. Likewise, if someone was allowed to copyright particular words, they would have control over a segment of the publishing industry.

Likewise, if someone was allowed to copyright particular words, they would have control over a segment of the publishing industry.

If the RIAA can claim copyright infringement because someone sung a copyrighted song in public, one could claim copyright infringement if those copyrighted words were spoken in public. You'd have control over a lot more than just the publishing industry.

To drive that point home, take a look at BSD kernel source code, libs, headers, etc etc.

There was this litigation, long ago, University of California Regents, if I recall.... and it caused this schism.... and more free code than had probably ever been produced with C before Torvalds and RMS came along.

Nah it's a joke. The only thing highlighted were the function / subroutine definitions and not even across the board. Just because 2 programs have hooks or functions called "ReadX" does not mean there was any copying involved. They even highlighted include statements and data structures. It's almost like suing an author for starting with "Once upon a time". I guess they just figured no judge would be able to clue in on this kind of stuff and they would be able to sift through junk like this for decades. Hell they could use the same reasoning against pretty much any software and win if that is all the proof he needed. I guess Unix was truly the precursor to all the code ever written so Novell can truly say "all your base are belong to us".

No, read the POSIX interface standard (or in this case specifically the ELF executable standard).You have to give your functions certain names to be compliant to the specification. The code shown is interface code, the implementation is somewhere else. Interface code simply names the functions, parameters and variables. As the functions must have certain names and parameters to fit the standard you will get the exact same line that declares a function. Any C programmer could recreate that same block of code with just a list of functions names and parameters that must be declared.

eg. If you have to have a global function called elf_version with return of unsigned int and parameter of the version you'll get the lineextern unsigned in elf_version( unsigned int __version );

We see that same line of code in both files as they both implement the same specification. I'm sure there's a ton of other UNIXes out there that have the same line of code.

Dude, you ruined it. As written, you have a brilliant joke in C: the premise is bad (we're beating a dead horse), the plan is questionable (we're directly comparing a pointer and a string literal), and the execution is sloppy (thanks to the typo, we're testing the result of a non-zero assignment, so the horse is beat forever regardless of liveliness).