Re: CRAN policies

>>>>> William Dunlap <wdunlap <at> tibco.com>
>>>>> on Fri, 30 Mar 2012 16:07:52 +0000 writes:
> It looks like you define a few functions that use substitute() or sys.call()
> or similar functions to look at the unevaluated argument list. E.g.,
> "cq" <-
> function( ...) {
> # Saves putting in quotes!
> # E.G.: quoted( first, second, third) is the same as c( 'first', 'second', 'third')
> # wrapping by as.character means cq() returns character(0) not list()
> as.character( sapply( as.list( match.call( expand.dots=TRUE))[-1], as.character))
> }
> %such.that% and %SUCH.THAT% do similar things.
> Almost all the complaints from check involve calls to a
> handful of such functions. If you could tell
> codetools:::checkUsage that that these functions did
> nonstandard evaluation on all or some of their arguments
> then the complaints would go away and other checks for
> real errors like misspellings would still be done.
I agree very much with you, Bill.
Many (if not the majority) of my packages have given these false
positive notes for many months now... and I have to admit that
the effect indeed has been that I take notes much less seriously
nowadays. This of course has never been the intention.
I'm pretty sure that most of us agree that it would be very
useful if not desirable to have a simple and robust way for

R datasets ownership(copyright) and license

Dear R Developers,
Recently filed (and dismissed ;) ) law suit by Astrolabe against tz
database developers caused a lot of media-press and discussions and
created some kind of precedence in the USA [3]. But also it imho showed
that similar attacks might happen in the future, and possibly against
data sets which are not that obviously "factual" thus after all might
fall under copyright or IP protection if not in the states then in
some other jurisdictions.
And 'data copyright/license' question comes over and over again, I just
wanted to ask based on what policies or advisories datasets were
selected to be shipped with R. From a very very brief look at the
datasets, many of them appear to be factual data, thus at least at the
moment probably are not copyrightable in the states -- but is there
guarantee that they are not protected by copyright elsewhere if their
origin abroad? But some seems to come from published works (still)
under copyright with "All rights reserved", e.g. datasets Harman23
and Harman74 [4].
Although similar question to mine was raised before [e.g. 1,2] I
have not found a straight answer e.g. from a list above or a mix of
them:
1. we simply did not look into it and adopted them with idea that if
someone complains -- we remove corresponding pieces
2. we considered all datasets factual data thus not copyrightable (in
USA? around the globe?)

Re: R datasets ownership(copyright) and license

Yaroslav,
coming from an experimental field, I use options 4 and 4a:
4. I measure the data myself, so I am the copyright holder.
4a. I publish data sets that are given to me in order to publish by the
person(s) who did the measurement. This is properly annotated in the
authors field.
So far, the data sets I put as example data into packages are small
subsets of real studies or data collected in pre-tests, so they are not
that sensitive/valuable. I plan to publish at least one "real" data set
(as own package) eventually. But we're not yet there.
Claudia
Am 03.04.2012 00:06, schrieb Yaroslav Halchenko:
> Dear R Developers,
>
> Recently filed (and dismissed ;) ) law suit by Astrolabe against tz
> database developers caused a lot of media-press and discussions and
> created some kind of precedence in the USA [3]. But also it imho showed
> that similar attacks might happen in the future, and possibly against
> data sets which are not that obviously "factual" thus after all might
> fall under copyright or IP protection if not in the states then in
> some other jurisdictions.
>
> And 'data copyright/license' question comes over and over again, I just
> wanted to ask based on what policies or advisories datasets were
> selected to be shipped with R. From a very very brief look at the

Re: R datasets ownership(copyright) and license

Hadley Wickham <hadley <at> rice.edu>
2012-04-03 21:00:58 GMT

> 2. we considered all datasets factual data thus not copyrightable (in
> USA? around the globe?)
This is definitely true in the US, but not true globally. I have no
idea under which jurisdiction a lawsuit would apply.
Hadley
--
--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Re: R datasets ownership(copyright) and license

On 4/3/2012 2:00 PM, Hadley Wickham wrote:
>> 2. we considered all datasets factual data thus not copyrightable (in
>> USA? around the globe?)
> This is definitely true in the US, but not true globally. I have no
> idea under which jurisdiction a lawsuit would apply.
I'd be careful with the word "definitely". The major media
conglomerates and their industry associations have successfully
destroyed competition to their hegemony in many areas. For example,
they sued college students for close to $100 billion, because their
improvements of search engines made it easier for people in a university
intranet to find copyrighted music placed by others in their "public"
folder. They successfully sued lawyers who advised MP3 that they had
reasonable grounds to believe what they did would be legal and Venture
Capitalists who funded Napster. In each case, they won not on the law
but on the fact that they had larger budgets for lawyers. See Lessig
(2004) Free Culture [book available from Amazon and also for free under
the Creative Commons license; see Wikipedia, "Free Culture (book),
"http://en.wikipedia.org/wiki/Free_Culture_(book)
<http://en.wikipedia.org/wiki/Free_Culture_%28book%29>"].
Spencer Graves
>
> Hadley

Re: R datasets ownership(copyright) and license

On 4/3/2012 2:00 PM, Hadley Wickham wrote:
>> 2. we considered all datasets factual data thus not copyrightable (in
>> USA? around the globe?)
> This is definitely true in the US, but not true globally. I have no
> idea under which jurisdiction a lawsuit would apply.
I'd be careful with the word "definitely". The major media
conglomerates and their industry associations have successfully
destroyed competition to their hegemony in many areas. For example,
they sued college students for close to $100 billion, because their
improvements of search engines made it easier for people in a university
intranet to find copyrighted music placed by others in their "public"
folder. They successfully sued lawyers who advised MP3 that they had
reasonable grounds to believe what they did would be legal and Venture
Capitalists who funded Napster. In each case, they won not on the law
but on the fact that they had larger budgets for lawyers. See Lessig
(2004) Free Culture [book available from Amazon and also for free under
the Creative Commons license; see Wikipedia, "Free Culture (book),
"http://en.wikipedia.org/wiki/Free_Culture_(book)
<http://en.wikipedia.org/wiki/Free_Culture_%28book%29>"].
Spencer Graves
>
> Hadley
--
--
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.

Re: R datasets ownership(copyright) and license

Ted Byers <r.ted.byers <at> gmail.com>
2012-04-03 22:03:31 GMT

> -----Original Message-----
> From: r-devel-bounces <at> r-project.org [mailto:r-devel-bounces <at> r-project.org]
> On Behalf Of Hadley Wickham
> Sent: April-03-12 5:01 PM
> To: r-devel <at> r-project.org; pystatsmodels <at> googlegroups.com; Dirk
> Eddelbuettel
> Subject: Re: [Rd] R datasets ownership(copyright) and license
>
> > 2. we considered all datasets factual data thus not copyrightable (in
> > USA? around the globe?)
>
> This is definitely true in the US, but not true globally. I have no idea
under
> which jurisdiction a lawsuit would apply.
>
> Hadley
Why worry about jurisdictions in which you neither work nor live?
I would expect such rationality (factual data not being copyrightable) in
the US, Canada, Europe and Australasia, so they're not likely an issue. And
I doubt any such country would try to impose their laws on someone living
and working elsewhere. In many parts of Asia, where I have lived and worked
at least, copyright violation is rampant, and the perpetrators face no real
consequences; at least none I could see.
As for banana republics, such as many countries in the Muslim world, like
Iran, I really don't care what their laws have to say. They do have a
history of trying to impose their lunacy on the rest of the world (as
illustrated in the death threats from Muslim religious authorities against

Re: R datasets ownership(copyright) and license

I somewhat agree with Spencer -- as I have mentioned, the recent precedence
with tz database shows that such claims would not be taken as ungrounded right
away and things could easily go all the way to court -- and that might be a
really costly endeavor regardless who is right or wrong. Proving that
data is factual, and not fictional/creative/original might be another challenge
in quite a few cases I bet.
While searching for more information -- I found IMHO a very nice (although a
bit dated) summary: http://www.bitlaw.com/copyright/database.html which,
if we talk about abroad-of-USA summarizes nicely:
"sui generis right that prohibits the extraction or reutilization of any
database in which there has been a substantial investment in either obtaining,
verification, or presentation of the data contents. Under this second right,
there is no requirement for creativity or originality."
so -- I would be especially careful with data from EU
on the other hand above link clarifies to me that it is ok to claim a copyright
(e.g. as it is in R) on the collection of factual unprotected (still unsure if
that is the case with R datasets) data.
On Tue, 03 Apr 2012, Spencer Graves wrote:
> On 4/3/2012 2:00 PM, Hadley Wickham wrote:
> >>2. we considered all datasets factual data thus not copyrightable (in
> >> USA? around the globe?)
> >This is definitely true in the US, but not true globally. I have no
> >idea under which jurisdiction a lawsuit would apply.
> I'd be careful with the word "definitely". The major media

Re: R datasets ownership(copyright) and license

Hadley Wickham <hadley <at> rice.edu>
2012-04-03 22:36:30 GMT

> I would expect such rationality (factual data not being copyrightable) in
> the US, Canada, Europe and Australasia, so they're not likely an issue. And
> I doubt any such country would try to impose their laws on someone living
> and working elsewhere. In many parts of Asia, where I have lived and worked
> at least, copyright violation is rampant, and the perpetrators face no real
> consequences; at least none I could see.
My understanding is that rationality is not the case in Europe - see
e.g. http://en.wikipedia.org/wiki/Database_Directive.
Hadley
--
--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Re: R datasets ownership(copyright) and license

Hadley Wickham <hadley <at> rice.edu>
2012-04-03 22:37:40 GMT

> I somewhat agree with Spencer -- as I have mentioned, the recent precedence
> with tz database shows that such claims would not be taken as ungrounded right
> away and things could easily go all the way to court -- and that might be a
> really costly endeavor regardless who is right or wrong. Proving that
> data is factual, and not fictional/creative/original might be another challenge
> in quite a few cases I bet.
I think it's generally easy to tell if something is a fact or not, and
I doubt any of the datasets in R are fictional.
Hadley
--
--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/