Hi. As I wrote earlier, I will be away for until the end of the week. Here are a few thoughts on the topic of what Wikisource includes, or should or shouldn't include. It seems to me that for the most part it won't be hard to agree on a pretty wide consensus, and once we do things will be clearer.

To understand how we got here, remember that this project started at ps.wikipedia.org. In other words, "Project Sourceberg" was initially thought of as a repository not exactly of "source texts" but of "sources" for wikipedia articles. If someone read an article on Mark Twain that mentions something from chapter ten of Huckleberry Finn, he could click on a link to the actual chapter and find it at Project Sourceberg.

The same thing was true of math. Someone could read an article on pi at Wikipedia, and "pi to 10000 places" would be stored for reference at ps.wikipedia.

It seems to me that this has changed. While our texts here at the "modern" Wikisource can certainly still serve Wikipedia and the other projects, we are no longer just a source of data for them. We are trying to build an independant library, and we have to decide what such a library should (or should not) include.

That is why I think the initial question of whether "data" like math, cryptography, etc. should be stored here is a legitimate one. I'm not against deleting it. I'm also not for deleting it. I'm simply not sure where we should draw the lines of our library. I just think we should all collectively think about it some more.

One thing I am against is unilaterally deleting something that did have a home here without first figuring out where it does best belong. I think we would all agree that lists of data should have a home somewhere, within a Wikimedia that supports giving "every single person... free access to the sum of all human knowledge". The question becomes stronger when it is recalled that until now, that place was here, and was agreed upon from the very beginning of the project. So before deleting, we should first figure out what that home should be.

It seems to me that data like this is arguably appropriate for any one of three projects: Wikipedia, which could legitimately include data related to its articles; Wikibooks, arguing that data is supporting material for a course of study in the subject at hand; Wikisource, arguing that in a library one would expect to be able to look up various kinds of data. I personally lean towards the last option, but in principle they are all OK. The main thing is - we have to decide which one, and it has to be a clear decision by whichever project saying: Yes, we agree in principle that this type of material belongs on our website, whichever one it is.

I also think we should get feedback from other Wikisource languages about this.

A separate argument on principle might be that collections of data and computer source code are not "source texts" and that the latter are actually "original." This also requires some history.

There were two reasons why Project Sourceberg was defined as "source texts":

So that people wouldn't write their own novels and post them here.

In order to differentiate Wikisource from Wikibooks. The basis of the differentiation was that Wikibooks are "wiki-creations" - collaborations just like Wikipedia articles - while Wikisource stuff is unchanging texts that were produced at a certain point in time by one or more authors.

I propose that neither of these criterion apply here, and that furthermore we would do better by starting to define ourselves more as a "Library" like our new slogan says, and less as "a collection of source texts."

The first argument obviously doesn't apply to math data and computer code. Even computer code, while somewhat original, is just trying to supply examples of algorithms mentioned in Wikpedia articles. While a personal creation, it is not the same as writing your own novel.

The second argument also doesn't apply. The truth is that textbooks are part of a library, and so the whole Wikibooks could be included within Wikisource. This won't happen in reality because the nature of the two things is so different: Writing your own textbooks from scratch in collaboration with others is an entirely different experience than organizing other people's books into a library. That's why I think these projects will always remain separate, even though the distinction between them is somewhat artificial. Nevertheless, it seems to me that the "data and source code" experience seems to fit better into the Wikisource experience (archiving, classifying, editing) than the Wikibooks experience (creation instructional texts from scratch).

It seems to me that all the data and related stuff can legitimately be kept here or moved to a different project (if that project agrees). When all is said and done it doesn't really matter which project it is kept on, it just isn't that important, but it also shouldn't just be lost.

Regardless of whether these texts are kept here or put elsewhere, we really should start thinking of ourselves more in terms of "Free Library" and less in terms of technical "source texts." This includes:

How to add value through the wiki system to the texts that we host. (Doing so will also mute the frequently and recently asked question: "Why do we need Wikisource if there is already Project Gutenberg?")

Defining criterion for when even an original text is "good enough" to be included here (in the past, for instance, academically certified papers like a masters thesis were uploaded).

Hopefully will be back at the end of the week.Dovi 08:40, 3 October 2005 (UTC)

My problem with source code and math tables is where to draw the line about what we accept and what we don't. For texts, the line is easy. It has to be free and published somewhere else before (except for translations). For math tables and source code, it is very difficult to draw such a line. You can't put copyright on a math table (so essentially any math table is free) and IMO, math tables are useful if they are difficult to calculate (i.e. there is no point storing a table of even numbers), so if they are available somewhere else or easily calculated, there is no use to add them here. So math tables look like more new material to me, so they should be hosted on Wikibooks or Wikipedia if needed.

You don't read a table of prime numbers or the source code of the Linux kernel like you read a text from Shakespeare. Math tables and source code are not useful in themselves, but as a base for learning, further programming or calculations.

Also I don't believe that Wikimedia should host everything which can be found on the Internet. There are formats much are more useful for some data than a wiki, and math tables and source code are in this category. Should we host all source code which is free ? The Linux kernel ? etc. You see my point. So what is interesting is short programs explaining or demonstrating an algorithm, so that looks like more teaching material, so it belongs to Wikibooks. Yann 18:22, 3 October 2005 (UTC)

Thanks for the feedback, that was interesting.

First of all, a basic question: Has Wikibooks actually agreed to take this stuff? If so, with what conditions? In other words, will they take all pages of this type, or will they say that some meet their needs as "institutional materials" but others don’t? (It is for this reason that I personally lean towards the library option, because if they belong in a library then all of them belong in a library!)

Secondly, the following is an alternative to deletion, that I would really like to get feedback on from others. (The idea is borrowed from the deletion reform debate going on at en.wikipedia, but I think it might actually serve our needs here even better than theirs.) The idea is based on the notion that Yann mentioned, namely that reading a math table (if it is read at all) is completely different than reading Shakespeare (even if you could find both of them in a library).

The idea is to remove all data, source code, etc., from the main namespace (better: to remove it from the main bookshelf reserved for "regular" texts) and instead keep it in its own special namespace: "Data:" namespace, "Source code:" namespace, etc. It would not be "counted" with the regular texts of normal books, nor listed amongst them. All such texts would be stored in a strictly separate space that clearly defines them for exactly what they are. You get the advantages of deletion through a process less radical than deletion. The texts are still there for those who want to use them or link to them from other projects, but are clearly demarked as a class unto themselves, and somewhat separate from the main project.

Would love to get feedback on the namespace idea. As to the question of whether everything needs to be on a wiki, I agree with Yann that it is not the most important thing in the world. Nevertheless, many times the value of things being on a wiki only becomes evident later-on when projects go in a certain direction. Remember that in the beginning, many people said there was no value to having "source texts" on a wiki!Dovi 09:38, 6 October 2005 (UTC)

Hm, that's a good idea, Dovi. While I still agree with Yann, and would personally love to see it all removed from WS, if we could have a new namespace created for math tables (would we put source code and cryptography under those namespaces or create new ones?), then they would not be counted as articles, and I would be willing not to raise anymore issues with their inclusion (as long as they don't raise any problems I'd have with a "regular" text). Also, after some thinking, physical libraries do have a reference section, and these new namespaces would sort of serve in that purpose--people reference lists of constants (we'll still need to talk about constants and if we are going to include "Phi/pi/e to the Nth place") and source code, etc.

I'm assuming for this that "Data:" or whichever we decide would actually be a true namespace? If that's the case, can we have "Author:" created as a new namespace as well?—Zhaladshar(Talk) 14:07, 6 October 2005 (UTC)

Are you saying that "Author:" is still only a pseudo-namespace? I didn't know that. If so, we should get that fixed pronto! Let's ask around about how to request namespaces (maybe ThomasV knows, and actually Yann probably knows, too). Things like "Author" and "Title:" should be requested immediately. That's far more important than 5,000 places of pi... :-)

As to the matter at hand, my initial hunch would be that each of the distinct things we have been discussing would to get a distinct namespace: "Math:" "Cryptography:", "Source code:" etc. "Data:" for plain old lists of basically anything...Dovi 14:40, 6 October 2005 (UTC)

Yeah, "Author:" is just a pseudo-namespace. Having "Math:", "Cryptography:", etc. is fine with me. Now, we just have to get consensus and go about getting the namespaces created.—Zhaladshar(Talk) 14:59, 6 October 2005 (UTC)

I do not know if there is a specific procedure for requesting a namespace. I suppose you have to ask developers. But before that, I believe someone should install mediawiki on his/her computer, and play with it a little bit (configuring namespaces, etc), in order to get an idea of what it is possible to do with namespaces, and what has to be requested. ThomasV 15:12, 6 October 2005 (UTC)

I know nothing about php, but if no one else does it, I'll give it a try.—Zhaladshar(Talk) 15:44, 6 October 2005 (UTC)

See m:Help:Custom_namespaces. Since constants are language generic, maybe we should try to get these created on the main wikisource namespace instead of here. --CSN 22:09, 6 October 2005 (UTC)

I just stumbled on this discussion (I normally hang out at English Wikipedia) and I must say I am appalled at the notion that a major field of human knowledge is proposed for deletion from Wikisource. To begin with, Wikisource:What_is_Wikisource? plainly states:

" What do we include?

Some things we include are:

1. Source texts previously published by any author

2. Translations of original texts

3. Historical documents of national or international interest

4. Mathematical data, formulas and tables

5. Statistical source data (such as election results)

6. Bibliographies of authors whose works are in Wikisource

7. Source code (for computers) that is in the public domain or compatible with the GFDL

Contributions are not limited to this list, of course."

I believe this statement is a commitment, not just to Wikisource, but for the entire Wikimedia community, for which Wikisource is an important resource. If it is to be materially changed, it should receive community wide attention and discussion, the same attention that, say dropping minor languages from Wikipedia would receive.

Also, I do not see the point of creating namespaces for mathematical material. As I understand it, namespaces are already being used for different human languages. Much mathematics source material will have accompanying explanatory text, which would have to be categorized by language. Hopefully at some point important math texts and papers will begin to appear, mathematical writing going back thousands of years in many languages. What is wrong with the category mechanism?

The fact that the mathematics category is sketchily filled in at present is no excuse for deleting what little material has been submitted. On the contrary, it only discourages future contributions. If there is need for further clarification about what material is appropriate for the mathematics category, there should be an effort to involve contributors to the mathematics sections of other Wikipedia projects. From what I have seen, that discussion is premature at the moment. In any case the deletions that have eviscerated (cut-down) this category, should be reversed, pending a widely discussed and accepted policy change. --ArnoldReinhold 21:36, 6 October 2005 (UTC)

First off, about comments from your first paragraph, this debate's been going on for months now. Only no one but a select few want to take part in coming up to a resolution. They want to complain about any action or possible action, but not help in the actual discussion aspect. This has been a community-wide discussion coming over from the main Wikisource.

I'm unsure about what you are saying when you say "namespaces are already being used for different human languages." I can say that that is not the case here. The only language to be found on this wiki is English. Note that there is also a difference between a mathematics table and a source. If a mathematics paper or selection from a mathematics journal ends up here, that's great! What the discussion centers around is a page that goes to the 100,000th place of pi or e. Or lists of prime factors. This is only what our discussion centers around.

About creating namespaces, is there any reason why you oppose them? As they are more reference material than articles, they should not be in the main namespace, as they should not be counted as articles Wikisource includes.—Zhaladshar(Talk) 23:15, 6 October 2005 (UTC)

Has any effort been made to involve the mathematics or cryptography communities on Wikipedia? Both have active portals. I suspect you would get a lot more interest. I am relieved to hear that mathematical texts are welcome, but I would note that the entire category of mathematics is currently marked for deletion.

Thanks for correcting me on the (non) use of namespaces for languages. I withdraw that objection. Here is my remaining concern about creation of namespaces for mathematical data. In Wikipedia, at least, namespaces are used exclusively for "behind the scenes" information used in constructing the encyclopedia: users, categories, images, talk, etc. All content sought by end users is in the main namespace. End users who are simply seeking information usually have no reason to know namespaces exist, much less learn their special syntax. I see no justification for adding such a burden to people looking for mathematical formulas or data. A school child should be able to go to wikisource and simply type in "Pi", not "Number:Pi."

I'd be happy to help in coming up with a solution, but I don't see what the problem is. Perhaps you could explain your concern about data tables "counting" as articles. Why shouldn't they? I would note that all 154 Sonnets of Shakespeare count as separate articles. What is the big deal? --ArnoldReinhold 12:59, 7 October 2005 (UTC)

Hi Arnold. First of all, you should be aware that this discussion already began earlier at Wikisource talk:Proposed deletions, so my comments above about the possible use of namespaces (providing the community agrees to it) are somewhat out of context. Actually, at that time I seem to have been nearly the only person to wonder whether this stuff should really be deleted (not opposing, just questioning), and thus the "namespace" proposal was actually an idea for a compromise, not an attempt on my part to get rid of mathematics! As Zaladshar pointed out, the namespaces don't create any language problems. Nor is there really such a problem with "Number:Pi" - "Pi" can always be a redirect.

From the point of view of those who favour to delete, the advantage of namespaces has to do with the origins of the text - see Yann's comments above. Yann is concerned that Math tables, for instance, are not the original creations of an author which he published in his own personal format in one particular edition, but something that is by definition not copyright-able and has no single printed "edition" that we are basing our text on. Since it is such a different kind of text from Charles Dickens, he wonders whether it should be here at all. A namespace would at least make it absolutely clear, from his point of view, that this is a completely different kind of text.

Zaladshar, you wrote: "This has been a community-wide discussion coming over from the main Wikisource." I looked a little bit, and I'm embarrassed to say that it seems I was so focused on he: and on the language domains business that I didn't even pay any attention to this, and somehow missed the whole thing. Sorry.

I still think that for the Math/Data/Code question we are discussing here, as well as other questions that are likely to arise in the future, we should be framing the question in terms of: "What should our library include and how should it be classified?" with an attempt to create a library that is as comprehensive a resource as possible. And less in terms of "What exactly is a source text?" though we should still be avoiding things that are entirely wiki-creations with no relationship at all to a real source-text (like somebody's own novel).Dovi 15:11, 9 October 2005 (UTC)

I agree with the last comment of Dovi. In my own field of technical history there is data like historic screw thread, wire and sheet gauge tables which are pretty well non-existant on the web, and Wikisource would be an admirable place to place them. Similarly, data like historic sizes of hand-made paper. But I would not wish to spend time on such a project only to find that I was having a battle with editors who wished to delete them since they were not 'literary texts' or some such argument. Where might unpublished transcripts of historic documents (in paper form) fit? Many scholars might welome the possibility of having an on-line home for this kind of thing. Apwoolrich 16:13, 9 October 2005 (UTC)

Things like historic screw thread can easily be covered by digitizing some really old edition of Machinery's Handbook. It's not like these things have never been written down before. I have a problem with the inclusion of "source-less" facts or new translations in "Wikisource". Facts, such as a list of historic paper sizes, can be stored in Wikipedia. But if you want to author a free alternative to Machinery's Handbook, or write a free modern Polish translation of Beowulf, this can be done within Wikibooks. Because any such effort is indeed subjective authoring, not merely an objective representation of eternal facts. --LA2 17:07, 12 October 2005 (UTC)

I also agree with Dovi. I am beginning to see this debate as a way to kill two birds with one stone. One big question that I'm sure is asked a lot, is how are we different from Gutenberg. This is one way that would make us different. - By including "non-literary" articles (preferably under a different namespace). As Gutenberg does not contain this information, inclusion here (especially if we a-massed a great number of these kinds of articles) can help distinguish ourselves from Gutenberg. Also, we can reach an agreement of some sorts with non-Wikisource editors who constantly publish this stuff here and create a large stir by finally giving no reason for them fighting with us: we just let them post it. And to make the WS editors who do not feel that this is the place for that information (because of various reasons), such pages will not be counted as an article, and we can at least say that we carry "N number of literary articles." Of course, even if such data is allowed, there must be guidelines that must be followed for their inclusion here, just as there are guidelines for the "literary" articles (e.g., no self-publication).—Zhaladshar(Talk) 16:50, 9 October 2005 (UTC)

I have no particular objection to a separate namespace. But I don't see much benefit to it either. I view wikisource primarily as a collection of reference materials. Part of the mission being, to support other projects. What's wrong with having a table of Mersenne primes, for example? They are genuinely useful to know for some purposes, and are essentially impossible to calculate in finite time. At any rate, it seems to me that a compelling case should affirmatively be made for moving this stuff elsewhere. - I don't know all the arguments, but from reading the above, I'm not convinced that there is such a case. In short Wikisource:What_is_Wikisource, referenced above, lays out the mission, I presume as stated by the Wikimedia board. What do we know that the board didn't? Wolfman 20:00, 9 October 2005 (UTC)

I recognize, that using separate name spaces for Math was proposed as a compromise, but it feels like an "ok, we'll let you on the bus, but you have to sit in the back" type of compromise. From the discussion, the use of separate namespaces is proposed as an answer to some who question the suitability of the material and how needed decisions about it can be made, not a technical need to avoid naming conflicts, which is the problem namespaces are intended to solve.

I'd like to focus on the broader question of which "mathematical data, formulas, and tables" are appropriate for inclusion in Wikisource. The basic criteria in Wikisource is, that material has been published elsewhere. There may be some cut-off for really obscure publications, but mainstream publications certainly belong there. All of the mathematical reference material I would expect to appear in Wikipedia would have been published before. With rare exceptions, we should be able to cite books or peer-reviewed papers that contain the same information.

There are many mathematical reference books that have been published. Unfortunately, scanning technology, as far as i know, is not up to the task of accurately transcribing either data tables or formulas. Nor is it necessary to import entire reference works, though a few would be nice to have. The information in them generally represents a consensus and will typically be found in multiple number of books.

Even if we did import a single reference, its contents should be spread over many articles, just as The Complete Works of William Shakespeare are split into separate articles on Wikisource. No brick and mortar library has a separate book for each of his Sonnets, but Wikisource has separate articles for them. It's more convenient for the reader and we don't have the physical problems traditional libraries have shelving and cataloging one or two sheets of paper.

Then there is the question of how to digitize data tables and formulas. Numerical tables are best re-created using computers, It would be ideal if the computer program used to create a table were published along with the data table itself, either in the article or in the talk page, or some new validation page that we could dream up. Formulas can be typed in manually using a Tex editor (Wikimedia accepts TeX markup) or, in some cases (perhaps a table of integrals), computer-generated. Contributors should be free to make editorial judgement as to format, precision and organization subject to the usual Wiki revision process.

That brings us to the copyright issue. Current copyright law, at least in the US, makes expanding Wikisource's literary collection beyond the first two decades of the 20th century problematic. Mathematics has an advantage in this respect. As I understand U.S. law (and I am not a lawyer) the content of mathematical data, formulas, and tables are ideas which cannot be protected by copyright. However, slavishly copying the selections and layout of a copyrighted source is potentially a problem.

This suggests three possible strategies:

Use reference works whose copyright has expired

Put together reference pages based on information in more modern reference works still under copyright in ways that do not raise copyright concern, e.g. not following any one (i.e. particular) work. Still more recent material may have to be gleaned (i.e collected in small quantities) from scholarly publications.

Find relatively recent works that are in the public domain.

One particular example in the last category is w:Abramowitz and Stegun's Handbook of Mathematical Functions published by the U.S. National Bureau of Standards (NBS) in 1964 and recently called "perhaps the most successful work of mathematical reference ever published." [1]. It is still available at Amazon.com. Most of its 1046 large format pages are devoted to numerical tables, but each chapter begins with a couple of dozen pages of formulas and graphs. A lot of arguments about whether material is appropriate can be settled using this book alone.

In the related field of cryptography, there is quite a bit of material published by the U.S. Government in the FIPS Pub series. Again it may be best for reference purposes to extract sections rather than simply mirror the publications, especially if we don't want to include PDF format files.

So here is a propose set of guidelines, to serve as a starting point toward developing a policy:

Content should be in the main Wikisource names space

Information on provenance, methods, validations etc. should be recorded. This could be done in the article, but the talk page or a separate page created for the purpose might be a better approach.

All mathematical material must have appeared in a recognized, published source.

Public domain or open-licensed sources are preferred.

Accuracy, reliability and verifiability are major goals.

Where possible numerical data should be recreated algorithmically, rather than being scanned or manually input.

What data tables to include, and their precision and format are subject to editorial discretion, taking into account both the desire to preserve history and the needs of modern users.

Data and formulas should be accompanied by a source and, where possible, the program used to create data tables should be exhibited.

Data and formulas should be validated against published sources by the original contributor. Additional efforts at verification should be recorded, even if no discrepancies were found.

Again this is intended as a first cut. I'm also trying to put together a draft table of contents, showing the kind of material I think should be included.--ArnoldReinhold 15:59, 12 October 2005 (UTC)

Concur. Slavishly restricting ourselves to exact reproductions of published works is needless, pointless, and, in this case, difficult. The important part is that the information contained be well-referenced and verifiable. This poses no more obvious difficulties than ensuring that a reproduction is exact. In the case of easily computable tables, e.g. 'sin', I think we're better off just reproducing (with cites) any common formulae to calculate it to arbitrary precision. No need for tables there. Anyone who simply wants to know the value for ordinary purposes can find it much more quickly and accurately using commonly & freely available software.

Such material does not belong in wikibooks; that is a place for explaining how to do things at length. This is reference material, the purpose of wikisource.

Btw, some attempt to include at least the more important parts of Abramowitz and Stegun, formulae & graphs etc, would be phenomenal. I'd pitch in, if anyone wanted to start such a project. Wolfman 17:42, 12 October 2005 (UTC)

Hi,

The discussion is interesting and I think we are going to find a consensus. Separate namespaces can help in this regard. My concern is that we set clear guidelines as Arnold has done above, to avoid a future situation where someone adds something like

#!/usr/bin/my-favorite-language
print "My own novel (ten pages or more)";
end

People will claim that it was published before on their personal web pages (how is it different than the 1,000,000 decimals of pi, which were never published on paper), and it can therefore be included in Wikisource. So in brief, we should be clear in order to avoid any future conflict, as far as we can. Yann 18:21, 12 October 2005 (UTC)

I still believe different namespaces should be used to remove them from the main namespace, as these are reference materials and not pre-published, literary works. However, by doing this, I do not think they are relegated to "the back of the bus" but are set apart from our normal articles. (Of course, this is all assuming that we actually do get namespaces; we might have to rely on pseudo-namespaces).

I do agree with Arnold, however, about having a set of guidelines, and aside from the first, I have no problem with them. I think we should add another that will...limit the extent of works that are published. That is, we need guidelines on how long these tables/constants/etc. should be. Specifically, I'm refering to a constant like pi or phi. Why does Wikisource need "pi to 20,000 places," "pi to 30,000 places," or "pi to 40,000 places" when it already has one that goes to 50,000 places? This sort of thing should be excluded.—Zhaladshar(Talk) 23:09, 12 October 2005 (UTC)

Is there any danger of our being held to run foul of the WP No Original Research guidelines in any of this? I am thinking of the use of the term 'pre-published' in —Zhaladshar(Talk)'s contribution above. Apwoolrich 15:15, 13 October 2005 (UTC)

I interpret the "no original research" policy as meaning everything in Wikisource should have first been published somewhere else. Self-publication would seem less of a problem for math and science than it is for literary works. For math and science "published" should normally mean in a book by an established publisher or a peer reviewed journal, paper or electronic. Exceptions might be made in unusual circumstances, as when a result is particularly notable and easily verified, e.g. factoring an important integer, where one can simply multiply the two factors.

Equilateral triangle

The thornier problem for me are results that have been published in hundreds of reference works, e.g. common formulas for area and volume. Should we find a pre-1920 reference work and copy that or simply make our own selection and arrangement of the material, citing multiple references? The former might be fun, but the latter approach seems in keeping with the original Wikisource charter. I also see no problem in adding original illustrations. See right e.g.

By the way, The Joy of Pi, ISBN 0802713327, published in the USA by Walker & Company and in the UK and Overseas by Penguin Books, contains the first million digits of Pi behind the decimal point, printed on actual paper, according to the books web site http://www.joyofpi.com/pi.html.

It seems like the conversation has died down a bit on this page, yet we have come to no real resolution. We need to finish this once and for all. - I can see the benefits of having a reference section here at Wikisource. Unlike the main section here, the "pre-published" guidelines might not be best, unless we allow information to be compiled from numerous sources.

These are the guidelines mentioned above:

All mathematical material must have appeared in a recognized, published source.

Public domain or open-licensed sources are preferred.

Accuracy, reliability and verifiability are major goals.

Where possible numerical data should be recreated algorithmically, rather than being scanned or manually inputted.

What data tables to include, and their precision and format are subject to editorial discretion, taking into account both the desire to preserve history and the needs of modern users.

Data and formulas should be accompanied by a source and, where possible, the program used to create data tables should be exhibited.

Data and formulas should be validated against published sources by the original contributor. Additional efforts at verification should be recorded, even if no discrepancies were found.

The first question is "What to include?" - Formulae, constants, tables, lists of numbers, etc.? Let's try to get a consensus on this before we proceed further.—Zhaladshar(Talk) 20:12, 5 November 2005 (UTC)

I have no problem with the criteria set out above. WS exists as a service and we cannot really exclude whole chunks of reference material on the grounds that it might not be "literary". We should not exclude material of value. I do have difficulties with some of the literary material we accept, simply because I am not familiar with it and never want to read it. But who I am I to deny it a place?

One point that I have (been having difficulty with) is actually finding stuff on WS. If we adopted a proper library classifaction as the basis of sorting out material, there should be slots for the sort of reference material we are discussing here. I know that the "search" function should find everything, but that pre-supposes that the right question is asked in the first place. - It is my impression that we do not make sufficient use of "see also" links within Wikisource at the bottom of the pages. In other words, editors are posting text with no links to other WS articles by the same writer or genre. This is perhaps the wrong place to pursue this, so might be aired eleswhwere Apwoolrich 20:50, 5 November 2005 (UTC)

For additional texts by the same author, there should be an author byline-link at the top of the page. But I do agree about classification and believe that categories are under-utilized. However, I assume that most users of Wikipedia are not professional librarians and probably wouldn't always do a very good job trying to put things in order. —Mike 23:20, 5 November 2005 (UTC)

What to include? Mostly formulae & algorithms. Lists of numbers only when computational software is not ubiquitous -- no sin, exp, arctan, etc. For those, we can display the algorithm ... both infinite precision and the rational polynomial 16, 32, & 64 bit precision approximations.

Mersenne primes & the like go in -- essentially uncomputable. Sequences such as the Fibonacci up to some reasonable limit (a hundred maybe) ... it's easily computed, but not ubiquitous. Anyone who needs more than a hundred is serious enough that they'll use software.

One possibility is to actually include functional java or javascript programs to calculate common tables e.g. of erf. I'm not sure of that, but we do have a section for software ... so why not just make it browser functional?

In short, I agree with all of Zhaladshar's guidelines. I like formulae. I'm not a big fan of tables, no one will use them. I think, maybe some functional software would be helpful, but that's a stretch. Wolfman 02:13, 6 November 2005 (UTC)

I admit I feel swamped by the sheer volume of discussion that has taken place here (in this section and below). It's great, but it is impossible to address everything. So I'll simply state that I think Zaladshar's guidelines are excellent, and I hope they (or some variation of them) are adopted.

As for his question, "What to include?", maybe the best thing would be to make sure we have some educated Math-enthusiasts working on this, so that they themselves can make reasonable decisions as to what sorts of materials should reasonably be kept at Wikisource. Dovi 08:39, 8 November 2005 (UTC)

I would like to add my thoughts about computer code. Suppose code is submitted that purports be a valid implementation of some algorithm when in fact the code is defective. It seems that the wikisource community should not be in a position where it becomes the arbiter of correctness in such a case. In fact there are code libraries that exist in the public domain where this sort of issue can be better addressed. I have no objection to the inclusion of code of historical interest (perhaps with a disclaimer) or code which might be found in published texts. In these cases and, perhaps, in other cases the wikisource community would only be quoting an existing document and thus would avoid the above mentioned difficulty.

I believe that if wiki could at some future time implement a repository for verified (e.g. proofed code) it would be a meritorious project. However, the field of Computer Science is vast and growing daily.

Lists such as my List of victims of the 1913 Great Lakes storm, which survived VFD, are basically just selected source material combined together into one document, to save other people countless hours of research on their own. These lists are just like lists of numbers that are generated on a Wikipedian's computer (in terms of who created the list and how verifiable it is). --Brian0918 05:03, 6 November 2005 (UTC)

In support of inclusion of mathematical material—even in a broader sense[edit]

I think Wikisource is an appropriate place to store mathematical material. By material, I mean not only numerical tables/lists, but also definitions and theorems. So, it can host the collection of current accepted mathematical knowledge.

I agree with the suggested guidelines.

My view is like this.

The definitions and theorems shouldn't be original, rather they should match those from some peer-reviewed publications. The publication must be referenced. The researchers that were first to suggest the definition or prove the theorem must be specified.

Also, famous hypotheses can be included, also with references to where they were published and discussed, and who suggested them.

The theorems can be accompanied by proofs (also originating from peer-reviewed publications). There may be several proofs of a single theorem. Analogously, definitions can be accompanied by comments (giving equivalent definitions and explaining why they are equivalent). Yes, probably, comments are appropriate for theorems as well. Additionaly, examples can be supplied.

By match, I mean that the wordings may or may not be the same as in the sources but the formal mathematical content must be exactly the same.

I suggest that the mathematical material is presented in an exact and formal fashion. Nothing relevant can be omitted in favor of better presentation.

(Benefits of the wiki.) Every occurence of a formal concept must be linked to the page that defines it. And every use of a known mathematical result in a proof must be linked to the page that formulates it (and hosts the proofs).

(Difference from Wikipedia or Wikibooks.) The mathematical material at Wikisource would be a collection of exact mathematical material, split into small individual pages, which might be difficult to read and understand quickly, and it might be of some sense only to specialists, whereas the corresponding articles about the mathematical results at Wikipedia try to present them to a reader in a way that is best for getting the idea of it (plus all other non-formal information about the history etc.), and Wikibooks try to explain some theory by re-ordering the material, giving informal high-level comments etc. So, Wikisource can really serve as the source for the other two projects.

There might be alternative formal concept systems in mathematics. Well, let them all exist under Wikisource, as long as they are not original research and are accepted by peers. Use some kind of disambiguation to resolve the conflicts.

So, my view of mathematics under Wikisource approaches some characteristics of the Books by Bourbaki,

with the additional benefit of wiki-technology for linking and presenting the material,

with the additional benefit of Wikimedia project for collaborative work on maintaining it, checking it, and extending with the constantly growing mathematical knowledge,

with the addtional benefit of not requiring making the difficult and controversial decision on the linear order of the presentation of the mathematical material as well as not requiring the enormous effort to first fill in the pages on the more basic areas of mathematical theory,

with the difference in that Wikisource could host alternative mathematical "systems" side-by-side, whereas Bourbaki present only a single point of view.

Also, there is an example of a similar sort of work in linguisticmorphology by w:Igor Mel'čuk: Cours de morphologie générale. The wiki-format would be wonderful for it, if it once could be put to Wikisource, since it consists of linked formal definitions accompanied by examples and comments (probably, now it cannot because of the copyright).

Probably, both of the works have a wonderful introduction about the organization of the works, which could serve as a basis for Wikisource guidelines concerning the organization of such material.

I think that nothing about this view is against the general Wikisource idea and guidelines. The formal content (formal relations between concepts, either given/invented by humans, such as definitions, or derived, such as theorems) is the source in the case of mathematical material. In other words, in mathematical material, (to a certain extent) the matter is not about how something was said (with which words), but is about what was said (what is the formal meaning), and that formal meaning should become the content stored under Wikisource, but, in a "human-oriented" form ;) (so that the formalistics is not exaggerated).

Of course, obtaining high quality of mathematical material placed under Wikisource this way would require participation/review of corresponding specialists (but there will be some out there, won't they?).--Imz 22:15, 6 November 2005 (UTC)

The books by Bourbaki are near to meet, after imaginary wiki-reformatting, distribution over many individual pages and additions of appropriate references to each individual page, the general requirements that could be imposed upon mathematical material in Wikisource.--Imz 23:40, 6 November 2005 (UTC)

I wonder whether the major reorganization of the referenced publication (into individual linked pages) and "simplification" of the wordings (cutting out informal things) would solve the copyright issue. The mathematical ideas are not subject to copyright, and as long as the references are OK, the presentation of the mathematical content at Wikisource in relevamt wiki reformatted fashion would be OK, wouldn't it?--Imz 23:00, 6 November 2005 (UTC)

Yes, as long as we don't exactly present something the same way someone else did (i.e., we made the presentation a full Wikisource one) we should not run afoul of copyright infingement.—Zhaladshar(Talk) 03:03, 8 November 2005 (UTC)

I've been thinking further, if the general idea of inclusion of mathematical material in this way is accepted, what might be the useful set of kinds of such material (kinds of individual pages) and the useful namespace divisions.

The main kinds are definitions and theorems. Theorems are supplied with proofs. Is there really a need for commentaries/remarks to definitions or theorems, like "def1 is equivalent to def2"? Actually, this is a theorem, but perhaps an obvious one. So, commentaries as a separate kind shouldn't be included perhaps.

Some additional things are hypotheses, axioms, named properties (that are neither true or false; perhaps, formulas is a better, although unusual name). Probably, axioms are not a good kind (definitions + named formulas would replace them).

Examples are a reasonable kind (although being probably a subkind of theorems).

So, these kinds might be reasonable namespaces for mathematical material. The discussed numerical tables and other such data can belong either to examples or theorems.--Imz 02:34, 7 November 2005 (UTC)

Are you maybe over-thinking things a little? I thought the original discussion was whether to include numbers like Pi, Phi, etc. and tables of numbers like factors, constants, etc. If I wanted to know about the Pythagorean theorem, I could look at the Wikipedia article. And if you were adding some kind of mathematical textbook to Wikisource that describes the theorem, you would include that with the other content here and not use any kind of namespace.

If we want to describe the documents that can be found in Wikisource, that is one thing, but we should leave the original writing about mathematical topics to Wikipedia and Wikibooks. —Mike 03:02, 8 November 2005 (UTC)

I was hoping we could just keep this initially about numbers/lists of factors and such. This is a good discussion, but I think we need to nail a few things down first. I think it would be best to talk about only lists or math information, lists of constants, etc. As, historically, these have caused the most problems, I think they should be addressed first.

So, for the time being, let's just try to get first things first, and restrict our discussion to what Mike mentioned above. After that, we can bring up what Imz raised.—Zhaladshar(Talk) 03:18, 8 November 2005 (UTC)

Yes, I agree, I was over-thinking things, and probably, in that form, that's a topic for another discussion. Let me just make a few remarks. It seemed interesting and useful to me to think on what can really be assumed a "source" in the case of mathematical material. So, I thought that it is the formal content without any presentational/explanational and so on "toppings", and I thought that's nice if there is such a collection (perhaps, at Wikisource).

As to the tables, lists, etc. -- they are, of course, just a particular case of that kind of content.

As to the relation to other places you mention (Wikipedia, Wikibooks), I meant that the form of mathematical material that can be stored at Wikisource is different from those places. For instance, in the case of Pythagoras theorem, Wikipedia's article on it presents the idea of the theorem and tells about the relation of the theorem to other parts of human knowledge, Wikibooks teach the theorem presenting it in a way suitable for learning and understanding it. Wikisource can have a much shorter entry on it, with only the bare formulation of the theorem (one sentence), but formally exact and grounded on other Wikisource entries for the concepts used in the theorem (triangle, etc.); and with a list of references to articles/books where this formulation was used. It's not the place one looks for if one wants to know what the Pythagoras theorem is (one looks in Wikipedia or Wikibooks in that case), but one looks here if one wants to see what the place of the theorem was in the suggested formal mathematical systems, if one wants to work with the formally complete mathematical systems. (There might be several alternative formulations stored, if there are different formal systems of mathematical concepts.)

Ok, I'm sorry for writing another long passage on a slightly different topic, but I think there is a task for Wikisource (or perhaps a separate "Wikimaths") project, and not only Wikipedia or Wikibooks, in this area.--Imz 22:58, 8 November 2005 (UTC)

I remember from my distant past having to use little handbooks of mathematical tables. I assume they still exist. Now on the one hand, perhaps the availability has made something like a table of logarithms obsolete. On the other hand, the more esoteric tables, like tables of probability for the t statistic or some such, which can of course be calculated by the user with a computer, on the other hand most people would need to look up the formula, on the other hand.... ? Gzuckier 18:45, 9 November 2005 (UTC)

Just to make it clear, have we reached a consensus where we will allow mathematical tables (such as prime numbers, Paschal's triangle, etc.), and constants (as these are what initially started this debate, I think we just need to get this down now--we can move on from there) to a limited extent? Please say whether you support or oppose their inclusion. I think we should do it this way, because right now, we're just doing a lot of talking and nothing is actually getting done.—Zhaladshar(Talk) 17:10, 9 November 2005 (UTC)

Well, seeing as nobody is opposed to this, I say we create a new page at Wikisource:Mathematical and scientific guidelines or a similar such page, where we draft a set of guidelines for the inclusion of mathematical/scientific guidelines (I'm including science as well, because they have numerous tables and such that will probably end up being added after time). Of course, we should use the talk page to begin formulating these rules. But this page should be a focused discussion about the rules mathematical data should follow to be accepted here.—Zhaladshar(Talk) 19:32, 13 November 2005 (UTC)

I urge this goes as part of the Help corpus. Since I have not been involved in the discussions I will have a stab at making a first draft unless anyone else is volunteering:-) Apwoolrich 13:57, 16 November 2005 (UTC)

Please, take a stab at it. This is something that should be in a help page, and I'll do my best to contribute (after Thanksgiving I'll have more time). If you don't get around to drafting something I'll begin the work then. Otherwise, I'll just help edit what you've created.—Zhaladshar(Talk) 17:14, 16 November 2005 (UTC)

Not only math and source code, but also music, chess, diagrams, and more[edit]

Having found this discussion accidentally, I would like to remind that there are even more types of material that should be put here at Wikisource. In my POV, mathematical contents should be allowed, both mathematical proofs and tables ("lists of prime numbers", "astronomical coordinates",...). And much more.

Have you ever read something about WikiTeX? Please, read about this MediaWiki extension in its home page. It's really, really awesome! It brings a lot of potencial, specially to Wikisource (I can't wait!). Think Wikisource hosting…:

Musical scores: like the Mutopia project, a library of sheet music of the greatest composers of all times.

Chess playings: all chess moves of a historical game, like the tournament of Kasparov vs. Deep Blue.

Historical source code: why not having the code of Minix V1 or the first FORTRAN I compiler (this one doesn't need any MediaWiki extension).

Maybe namespaces could be used to have special editors in each of them (e.g. an editor for musical scores), as proposed in a Wikitex usability review. But I don't know whether namespaces are neccesary for that.

Of course that not everything has a place at Wikisource (only "published" material), but IMHO we cannot create a new project when we want to host a new published topic. Hope this helps--surueña 13:51, 10 November 2005 (UTC)

I'm tempted to say, "include everything", but then someone would take that to an absurd extreme. But I think no limit should be set on topic. If it is factual; an existing part of our real knowledge store; then why exclude it? No argument against inclusion meets my approval. Xiong 17:32, 11 November 2005 (UTC)

The initial purpose of Wikisource has been fulfilled. The value of Wikisource has progressed past the the initial purpose of Wikisource. A purpose is only valuable as long as it proceeds the implementation. If you were traveling from Warsaw to New York and you were crossing the Atlantic you would not say "My purpose is to reach Paris". When the implementation exceeds the purpose and you try to use the purpose as a guide, you are lost. It would be similar to putting the horse behind the cart instead of in front of the cart. The initial purpose of Wikisource is irrelevant. Update the purpose!

The objective of all references is to improve the ability of mankind to predict the future. This started with predicting the sesaons for harvesting crops. But quickly advanced to predicting human interactions. Eventually the predictions became good enough that we could predict, in part, the results of interactions between the very small, an atom, and the very large, a galaxy. A big change occured for mankind when we could quantify precisely what we knew and to what degree we knew it. This required mathematics.

To throttle the knowledge of mathematics in any way is to decide that only a small group should have access to THE TOOL of accurately predicting the future of constrained systems. Hence, the proposal is not about reducing or preventing the flow of mathematical information. It is about deciding to constrain access to the tool that enabled mankind to design almost everything invented in the last two centuries.

Where should we draw the line? Never accept proposals that would limit mankinds ability to predict the future.

Now pragmatically how do we prioritize what is included in Wikisource? This is obvious: 1.Include anything that helps mankind to predict the future. Obviously, if you do not know the past it is almost impossible to predict the future. 2.Include ideas that might help predict the future. These ideas could be from past literature or current literature.

The mathematical works because of their direct importance to mankind are included in level 1.

I think this whole discussion gets into the heart of what the distinction between Wikibooks and Wikisource should be. I have been invovled with moving a couple of Wikibooks over to Wikisource (in fact, just did so today). Bizzarely, somebody deleted text of the Bible from Wikisource and put it on Wikibooks. Go figure and try to understand what other purpose of Wikisource would be than to have the actual text of the Christian Bible and not be duplicated elsewhere.

In regards to mathamatical material on Wikisource, I think the general rule of thumb should be more toward where it has been previously published before. If the content is to be a scholarly review of mathmatical formula that contains original content or commentary, it should go on Wikibooks. If you are doing a copy of material that has been previously published elsewhere, such as Principia Mathmatica by Isaac Newton or Einstein's original paper on Relativity, that should go to Wikisource instead. Translations of these classical papers should clearly stay on Wikisource as well.

The larger issue would then be for things like a list of forumlas like b:Calculus:Tables of Integrals or for computer software source code. The two issues do need a little bit of separation, however.

Computer software source code is a huge issue on Wikibooks right now because of the GFDL/GPL conflicts. In short, you can't have GPL'd source code in GFDL documents and the other way as well. That is something IMHO needs to be fixed in the GFDL, but something relevant here as well. Many Wikibooks projects are now instead placing the source code on Source Forge, because it is a public repository that allows GPL'd software. I consider that to be a loss for Wikimedia projects, but then again MediaWiki software doesn't do a good job of doing software versioning either like a good CVS system. Perhaps a Source Forge instance on Wikimedia servers would be a better alternative here?

Classical pieces of software, such as the source for the original Crowther & Woods Colossal Cave Adventure or Weizenbaum's ELIZA would be clearly of interest to Wikisource and something that should stay here. Some other source code for classical programs, such as IdSoftware's Wolfenstein 3D (having been placed in public domain) is also available. The question then becomes if it makes sense to put stuff of that nature on Wikisource when it can chew up a huge amount of server space for questionable utility to anybody as an HTML page of source code. That is previsouly published information, which should be one test to see if it belongs on Wikisource.

As far as tables of mathmatical values, such as logrithmic tables or similar sorts of classical tables that are now done through calculators or math CPUs, the utility of those can be questioned quite a bit. There is no real practical use for a book of the first million digits of Pi, but some people do find it interesting. Other transedential constants like e or the square root of two are of similar nature. On the positive side, there are relatively few of these numbers to worry about too much. As long as they don't get too far out of hand (like over 1 million digits) I don't see the harm of having them sitting around on Wikisource. They are constants, and don't ever change after being verified for accuracy. Wikisource is better equiped for policies to deal with information of that nature than any other Wikimedia project. --Robert Horning 08:28, 13 November 2005 (UTC)

I'm not sure this really does get to the heart of the distinction. The discussion on this page has been more or less about overall subject areas, not specific texts, and the conclusion seems to be that we will welome texts on most topics, among them mathematics.

As for the distinction: "If the content is to be a scholarly review of mathmatical formula that contains original content or commentary, it should go on Wikibooks" - I question whether that is correct. Wikibooks does not host scholarly reviews. It hosts instructional resources, such as a guide to learning and applying mathematical formula. It is not a place for scholarship per se. This, to my mind, is an entirely separate topic, which is very important but not appropriate for this page. For those interested, please see:

I initially wrote both pages, but my thinking on this changed a bit over time and I currently have a slight preference for the way things are described on the Wikibooks page. One important suggestion on this whole topic is that slight overlap is preferable to falling throught the cracks (i.e. that some kinds of text projects with source-texts can be reasonably put either on Wikisource or on Wikibooks, and we should respect the gray area here). In any case, this is an important topic, and if people are interested in discussing it the best place would be on the above two pages.Dovi 09:25, 13 November 2005 (UTC)

In replying to invitation to join in mathematics debate I encounter the above statement which as a joke clearly highlights why Mathematics should be included in Wikisource but only if it IS SOURCE MATERIAL ONLY i.e. historical formulae, hypothesis and data etc. Wikisouce is not paper so why not, unless contributed material drains resources. My usual handle to Wiki projects is the NORWIKIAN but in attempting to register am no longer recognised as a contributer ! Best not take any of these meta-projects TOO seriously in the larger scheme of things tho' i do not quite subscribe to the views of the above ! NORWIKIAN

Mathematics has as much of a place here as does anything else. Physics, Astronomy, Chemistry, Astrology, Theology, Cryptography, Geography, Geology, Chronology, the study of everything should be included. I don't believe in Greek Mythology or Christianity, but I find it interesting, helpful, and enlightening to study both. Wikisouce has the potential to be the ultimate compilation of information, thoughts, ideas, methods, a 411 at the center of the universe, and I certainly do not see why closed-mindedness should prevent you, or us rather, from accomplishing such a goal. By the same token, I do not believe that it is appropriate to censor material found here - hate is perhaps the worst thing in the world, but that doesn't mean we should prevent material from surfacing about skinheads or kamikaze or anything else. If this is truly to be an opensource community, we must omit nothing and censor nothing.

The issue is not "math or not math". The works of Newton or Fermat would be very welcome here. The question is "mathematical tables". Yann 17:05, 15 November 2005 (UTC)

I don't understand your comment. The point here is to discuss what should be in Wikisource. Yann 20:42, 22 November 2005 (UTC)

My point is that there is already a policy in place and. as the above discussion shows, there is no consensus to change it, quite the opposite. At some point this question must be considered settled. --ArnoldReinhold 04:59, 7 December 2005 (UTC)

Yes, you are right, there is no consensus to change it. It looks like it has been agreed that "tables" and "constants" are to be allowed. What we now need is to formulate a set of guidelines for the addition of such tables. Such as, how many digits to take a constant to, do we have pages containing a constant to the 10,000th decimal, to the 20,000th decimal, and to the 100,000th decimal all at the same time. Devising that will be the most productive discussion at this point.—Zhaladshar(Talk) 22:10, 10 December 2005 (UTC)

As suggested by Zhaladshar, here is a proposal for guidelines on the level of precision in mathematical tables and constants to be included in Wikisource. Many of the parameters I am proposing are judgement calls on my part and, obviously, open to discussion. Whatever we come up with should be included in a guidelines page, along with other issues discussed above, and with the understanding the guidelines are not absolutely rigid and that contributors who wish to deviate from the guidelines should propose changes or exceptions on the discussion page.

There are several types of tables and constants:

Historical tables. For centuries people who did calculations relied on mathematical tables. These range from the simple tables of logs, sines, cosines and tangents found in the back of high school math texts, to the elaborate tables used by navigators. Exhibiting such tables, either completely or as sample pages has clear historical value. For these, the precision question is easily answered: reproduce what was originally done in some published exemplar.

Cultural artifacts. Certain numbers have a special place in mathematics and the popular imagination. Pi and e are clearly at the top of the list. A second tier might comprise the square root of 2, the en:golden ratio, etc. Pi deserves its own categories, which might include:

First million decimal digits. As noted above, there is precedent for this in a published book. These expansions also have some utility in cryptography, see w:nothing up my sleeve number. Note that the space required for a million digits is about that needed for one megapixel resolution jpeg image.

Values in other bases, from, say, 2 to 20, to, say, 1000 digits with perhaps a much longer expansion in base 16, for various reasons

Number series, such as factorials or Bernoulli numbers should stop when a single entry takes up most of a 800 X 600 resolution screen, or sooner.

Values that are difficult to compute and only known to modest precision (e.g. w:Euler's constant) should be shown to full precision

Current utility. Some important mathematical functions are not available on scientific calculators nor in most subroutine libraries, e.g. statistical integrals, Bessel functions, etc. For these, a useful limit might be the maximum accuracy attainable with w:Quad precision floating point under the w:IEEE 754r standard, which is 113 bits or about 35 digits. Note that any table that displayed 35 digits would have to be computed at a higher precision to achieve full accuracy. Obviously we should never display tables to a precision that is greater than their accuracy.

Validation of computer algorithms. It is potentially useful to have selected values of common functions computed to high accuracy. For example, values of the common trigonometric functions computed for every whole degree to, say, 100 digit accuracy.

Proposals to include other constants to very high accuracy, say, more than 100 digits, would be subject to advance discussion and consensus formation.

Sorry, I just noticed that the above comment, which I posted 15:12, 12 December 2005, was not signed. Sometimes I get logged out but do not notice. Anyway, what do we need to do to move this discussion toward a conclusion? --User:ArnoldReinhold 21:42, 18 January 2006 (UTC)

I'm responding to the addition of adding the bit under self-contributions about the user namespace. While it seems like a good idea to allow people to submit their works in their user namespace, this really seems no different that having them put it in the main namespace. They can still link their websites to those pages, and people can still read them. Entirely libraries of bad and unimportant fiction can be added if this is allowed. I think it would be better (and safer) if this weren't allowed. Unless, of course, I'm misunderstanding the terms "with reason." And if I am, then let's please get a good grasp of what's reasonable and what isn't.—Zhaladshar(Talk) 20:33, 28 January 2006 (UTC)

I agree that this could be abused with people unloading huge amounts of material into user namespaces. I was thinking more along the lines of limited quality text as in Apoolrich's example, that could provide a personal version of a text in the main namespace.

As to the idea itself, I was thinking along the lines of Wikinews: NPOV doesn't allow for editorials reflecting a personal vision there. The solution has been for people to "editorialize" within their personal namespace. But I agree as above with Zaladshar that is could be abused. That might not be a technical problem since we are not paper, but something (I'm not sure exactly what?) seems wrong with letting people unload 1000 page novels to their userspace. Any other ideas or opinions from others? Dovi 21:24, 28 January 2006 (UTC)

Perhaps we can put a size limit on User space. --BirgitteSB 21:43, 28 January 2006 (UTC)

If that's possible, I'd rather steer away from that. Some users might have (or want to have) large amounts of material on their namespaces (subpages for their own projects/to-do's/etc.) for valid reasons; that shouldn't be discouraged.

Dovi, could the problem that you might not be sure exactly what it is concern self-publication? Publishing a small amount seems fine, but if they take it to uploading entire novels, this is essentially a self-pub, just not where (probably most) people will ever find it. I think that User spaces are a prime place for a person to "editorialize," and I don't want to limit anything like that, so we should come up with a guideline concerning what is and is not acceptable to put in the user namespace (in terms of adding texts--this would circumvent the whole 1000-page novel problem).—Zhaladshar(Talk) 21:49, 28 January 2006 (UTC)

I have added the text of some old (1800-1950s) letters and papers to WikiSource in the past. These letters have not been published (except on my website...[2]. However, I feel that at least some are of general interest. Do they belong here, somewhere else, or nowhere. I published one here today as supporting information for the page on w:Carl Friedrich Gauss at Wikipedia. Among other things, I have a journal that my great great grandfather kept from the early 1800s until his death in 1899.

Yes, we can host letters, provided they are of historical significance, as yours appear to be. We list them at Wikisource:Letters by sender. I believe we should also be able to accept journals/diaries/etc though offhand I'm not sure where to put them, perhaps Portal:Non-Fiction, Portal:Biography or possibly a new section such as autobiographical material or diaries. AllanHainey 14:36, 17 March 2006 (UTC)

I do think a new rule or exception should be made regarding this. If original media such as letters, diaries, maps are previously unpublished, but have historical significance and they belong on wikisource, then this should be stated in some way on the project page. At the moment it leaves this area in a bit of confusion by saying that all items must be previously published in some kind of area that would invite peer review. However for these sorts of items, it would appear that wikisource is the publisher of first venue. This needs more clarification on the project page. Wjhonson 16:12, 9 July 2006 (UTC)

I have made a slight clarification in the Original Contributions section to ensure that it reads as discussing *your OWN* original contributions. This would then allow an old letter, writen by someone else (obviously) to be part of wikisource. I hope everyone agrees with this interpretation, or maybe can clarify it further. Wjhonson 16:18, 9 July 2006 (UTC)

Shaladshar deleted my changes, which were based on discussion in this section. The policy of unpublished letters, diaries, other manuscripts of historical interest should be explicitly stated on the policy page to make it clear. That's what I did. I see no reason to revert. Wjhonson 20:56, 14 July 2006 (UTC)

In my view we should be including letters as well as other kinds of sources. There is an enormous amount of similar stuff like diaries available which never gets published in regular historical journals, but has value. There are two kinds, letters which are already in print, and unpublished ones. Tne latter might fall foul of our rules for acceptibility, so perhaps these will need tweaking, as we have recently done for original translations. The texts cited by the previous writer are both genealogical, so maybe we ought to have a category for these as well. Apwoolrich 18:51, 17 March 2006 (UTC)

Family trees (or pages which are only family trees) seem to be outside the range of WS's purview. We should not accept geneological information if it is all by itself. Geneologies have no real sources and are a lot of user-contributed research. That seems like something which should be incorporated into Wikipedia articles.—Zhaladshar(Talk) 19:44, 17 March 2006 (UTC)

Well, I am confused. I gather that 'categories' are more than just something like 'Letter', 'Author', or something like that. After doing some reading in Wikimedia, it looks like The Charles Henry Gauss Family Papers could be a category. If so, and if it would be relevant, how do I do it. Also the letters I have submitted seem to have mysteriously been put on the page, Portal:Letters. How did this happen? Was it somehow automatic, or did one of the roaming editors do it? Also, I think it would be nifty to have some sort of style guide for letters, e.g., how should they be titled, etc. Mathsinger 03:25, 18 March 2006 (UTC)

I know this discussion has been dormant for a long while, but I'd just like to clarify whether transcripts of birth, marriage and death certificates are wanted on Wikisource. It seems to me that, although not really 'published', these are very valuable source documents and would be good to have here. Any thoughts, or pointers to where this has already been discussed? Thanks. —Sam Wilsoncontrib's | talk 00:25, 29 April 2008 (UTC)

These could be considered acceptable under "documentary sources". The difficulty is determine whose transcripts we want, as I dont want to be accepting these documents about anyone. John Vandenberg(chat) 02:24, 29 April 2008 (UTC)

My feeling is that "standard" family documents (birth and marriage certificates, as well as death certificates where the cause of death is described in just two or three words) don't belong in Wikisource. But I understand your point: we want some of these for added information, such as when there is a clear reference to an author or a biography character. The question of reference data springs up time and again. I believe the problem is that there is no single sister project where all these reference data can be dumped into. Compare the situation with author or character quotes which have been made off-the-cuff, i.e. not as part of a publication. We want those, too, for added information, but we'd never include them in Wikisource. Instead we link to Wikiquote through the {{author}} header or in the notes section of a text. There may be isolated solutions for certain types of references (I've done some searching, and w:Wikipedia:Persondata might be partially adequate for family data, and there are other unimplemented ideas such as m:GlobalFamilyTree) but it appears to be about time this WikiData thing should be gaining some momentum.

I've got an idea about creating a collection of transcripts. It's more than that, but that's the simple way to put it.

WikiSource seems to exclude random transcritps that JoeUser creates when he hears someone say something he thinks is interesting, historic or newsworthy.

Please check out Transcript project goals at wikinews to see the discussion. The upshot is that the wikinews guys are arguing that transcritps belong on WikiSource, but what I read on this page makes me think that Wikisource wouldn't want transcript material. Mattks 09:27, 1 April 2006 (UTC)

We certainly do host political speeches. However it seems to me you are talking about more than just public speeches (interviews, TV appearences, etc.) and the other stuff will be a copyright problem. The Broadcaster retains the copyright to such things in most cases. The other is issuse is you seem to want to focus on one individuals comments. Transcripts of things where there are several participants would have to be complete and inclusive of everything said. Editing of the material is frowned upon, because it can introduce bias. You will probably have to give us some specific examples for a definative answer, but I suspect there will be a problem with some of the material you want to add. --BirgitteSB 12:06, 1 April 2006 (UTC)

Thanks for the prompt reply BirgitteSB. I imagine 4 ways of getting material:

Direct copy of material that is freely distributable

Obtaining permission of stuff that is not freely distributable.

Creating original work where the wikisourcian (wiksourcerer?) is a first person witness to the spoken words and the words are not otherwise copyrighted by the speaker. This might be a big category; Matts law: "For every person or subject, one out of N citizens will schlep over hill and dale to hear the words themselves rather than trust the mainstream media to give a faithful report".

Copy a link to the material and provide an NPOV abstract.

Editing of the material would indeed be a bad thing; the goal is to provide a body of work that is as close to dispute free as possible so that citizens and scholars alike can have a trusted place to turn to for the truth. Zero dispute is the goal and the only way to get close to that would be to leave out all interpretation and include only the context that is dispute free. For example, date and time of the spoken word, persons present, reporters name and time of transcription, method of transcription (from notes, memory, personal recording etc).

Yes, all the speakers present would have to be included in the transcript to make the transcript meaningful.

I don't have any examples yet; when I think I have the right idea, I'll just start doing it and let the wiki community make of it what they will. -Mattks 22:22, 1 April 2006 (UTC)

My opinion on your examples is of that course #1 is fine and acceptable. #2 is also acceptable, but it is unlikely we would be given permission for all the things you would want. #3 is questionable as there is a problem of verifiabily; it would be best in such case if recordings could be uploaded along with the transcripts. However the copyright on recordings may be even more stringent, I am not sure. #4 would not be acceptable. Wikisource is not a collection of links and we would not accept contributer written abstracts. I think you will be able to put some of the material you want on Wikisource. If your only goal is to compile the complete remarks of person X, I do not believe we will be able to reach that goal. --BirgitteSB 00:38, 2 April 2006 (UTC)

My goal at the moment is to start doing something besides nothing. :) A limited version anywhere would be useful if the data could be easily copied when a more suitable environment pops up. A table of public domain speeches would be a good start. Maybe Wikipedia is a better place for that, with links that point to the actual text here on wikisource? For that matter, maybe wikinews would be the place for news hounds to document the fact that a some words have been spoken, those events tabulated on Wikipedia, followed by other researches who obtain legal (permission to) access the text and copy it to wikisource and/or Commons (Amgine on wikinews said something about 'Commons'). Hmmm, maybe a combined effort across wiki* is the right way to do the complete project?--Mattks 18:04, 2 April 2006 (UTC)

I'm mulling the issue of completness, it has dimensions (in the sense of catagories to be populated that are independent of other catagories):

a table of all the times a person spoke within earshot of witnesses is one useful dimension. (WikiNews--->WikiPedia?)

a table of all the people that have spoken on a given subject is another.(WikiPedia?)

a complete transcript of a given spoken-word-event(WikiSource, assuming copyrights and verifiability satisfied)

an abstract of what each spoken-word-event was about(NOT WikiSource. Maybe WikiPedia?)

a histogram of words from each spoken-word-event. (I include this because it's an entirely objective way to summarize the content of the text without risk of bias or violating copyrights. Probably too weird for Wikipedia)

And what about transcripts of historical documents? I recall a discussion about this somewhere on WS where it was felt that this was OK, providing images of the original MS were added. This will cause mega-problems for most record office I know are very picky about allowing images of their documents to appear on the web, because of reproduction-fee loss. I personally feel that WS ought to be accepting transcripts of this sort, as a service to Scholarship, but there is no way I can see of ensuring that the text placed is not corrupt or without page images as well. Apwoolrich 15:24, 1 April 2006 (UTC)

I don't see any problem hosting complete transcripts that are old enough to be beyond copyright restrictions. Mattks is wanting to be able to hold current public figures accountable for what they say by keeping a public record on Wikisource. I don't know exactly what kinds of transcripts you are talking about when you say "historical", but I don't know why we would refuse them simply because they are not widely available. They are still verifiable even if isn't easy to do so. If something seems unlikely and is not substantiated by anything else we can remove that on a case-by-case basis.--BirgitteSB 16:06, 1 April 2006 (UTC)

WikiSource seems to exclude random transcritps that JoeUser creates when he hears someone say something he thinks is interesting, historic or newsworthy. This sounds to me like the transcript project is going to become nothing but a compilation of excerpts by public officials (I don't even know on whose sayings this project will focus—is it politicians or any kind of public person?) that will be archived on some wiki. If I'm misinterpreting this, please tell me, because compiled works are expressly excluded from WS.

Personally, I'm most interested in politians, but public officials often have something to say that serves to document the state of the world at a given time.

I'm not talking about excerpts, I'm talking about complete transcripts. One persons "nothing but a collection of transcripts" is another persons invaluable reference material. I don't know how to differentiate between the two; do either belong on WikiSource? --Mattks 22:55, 1 April 2006 (UTC)

Like Birgitte, I don't have any problem hosting complete transcripts that are out of copyright protection. It comes down to the more current day transcripts that concern me. And this is where a number of questions must be asked:

Upon what people will this project focus? Politicians? celebrities? athletes? all of them?

Is the transcript complete? We will not take transcripts that have omissions in them. This is because WS is an archive of source texts. We do not push any kind of agenda, and any omissions of transcripts that involve any of the groups of people from question 1 could very easily become POVed.

How verifiable is this transcript? There might be problems later on down the road if there is just know way to verify that the transcript is accurate (especially if the content of the transcript in question is not consistent with other transcripts). This might result in the transcript being deleted, which would be a phenomenal waste of time for the transcripter, since transcription takes to much time.

I would like to ask that maybe one or two examples be presented (or just give a detailed explanation of what you are aiming at with this project). As has been said before, this project seems like it would be beneficial, but some considerations must be addressed first.—Zhaladshar(Talk) 16:58, 1 April 2006 (UTC)

Answers:

I'm interested in whoever says something that documents their view of the world at a given moment.

Yes, the transcript should be complete to be useful.

This could be an issue for WikiSource: a transcript standing alone is precisely biased towards the speakers view, and of course the particular transcript I choose to put on wikisource will reveal my bias; I can't instantly upload everything a pol has said so I'm going to pick the things that seem most important to me. The only thing I can promise to do is to faithfully transcribe a particular contiguous body of work with complete context. Hopefully, the rest of the community is watching and will point out errors. For a given politician, I'd hope that some other partisan would provide other transcripts that will serve to broaden the picture of the kinds of things the target politician is saying. Between all partisans, hopefully all issues, viewpionts and people will get covered.

I think all transcripts should be accompanied by meta data that provides bona fide as well as context. Just what that means will evolve over time to be the most acceptable to the most people.--Mattks 22:55, 1 April 2006 (UTC)

My question number 2 is more concerned with the omissions being deliberate on the part of the editor. Clearly, the whole transcript will be biased in favor of the speaker. This is just something everyone has to accept. What I'm concerned with is when the editor adds the transcription, he/she leaves out "a sentence here" and "a phrase there" that make the speech say something that it would not if it were complete. This is why verifiability is of such a major concern for me, because I want to try to keep from introducing any kind of contributor-originating bias.

Definitely, though, there should be meta data to provide context to the transcription.—Zhaladshar(Talk) 23:10, 1 April 2006 (UTC)

I wanted to ask one question before I finished with our inclusion policy. Did our recent vote also choose to exclude source code and cryptographic material as well as mathematical data?—Zhaladshar(Talk) 20:38, 29 April 2006 (UTC)

Yes, I think so. Source code in particular was mentioned several times throughout the discussion, and cryptography got a few mentions as well. // [admin] Pathoschild (talk/map) 09:06, 30 April 2006 (UTC)

Alright then, I'll update the page. It should be done by today for voting. Hopefully...—Zhaladshar(Talk) 18:16, 30 April 2006 (UTC)

Even if Wikisource does not contain source code, the code currently there should instead be moved to programming Wikibooks, where exampesl will be useful. Pcu123456789 23:17, 23 May 2006 (UTC)

Then by all means move it. I've done some browsing on WP and WB, and much of the code we have here is duplicated on at least one of those two. If you feel that it should all be moved somewhere else, have at it.—Zhaladshar(Talk) 01:17, 24 May 2006 (UTC)

This might be a bit nitpicky, but should audio recordings really be uploaded to Wikisource? I've been uploading them to Commons, as I found Alice's Adventures in Wonderland and Hunting of the Snark already there. I've so far uploaded Northanger Abbey and am currently doing Pride and Prejudice. Now I'm in a bit of a quandary; if those files should be here, is it possible to move them to this project, or will I have to re-upload everything here? Also, I found a site that has recordings of President Bush's actual speeches (of him giving them). Such a file seems to be more apt to put on Commons than here, since WP might want to also like to those files.—Zhaladshar(Talk) 19:55, 30 April 2006 (UTC)

Other projects should be linking to Wikisource for the works themselves, so the audio files would be best on Wikisource. An example is w:Alice's Adventures in Wonderland, which would link to Wikisource for both the text and the audio recording. There are probably exceptions to this, so it'd be up to the individual users' judgement. // [admin] Pathoschild (talk/map)

Aw, crap. This will be a nightmare to correct...—Zhaladshar(Talk) 21:46, 30 April 2006 (UTC)

I have edited several articles for the EB1911 project, mostly scientific or mathematical. The original EB1911 has lots of inline equations, and these fit neatly in with the text, due to using the same font and size. However, in WikiSource, we have to make do with the PNG rendering of TeX for complex equations (i.e. involving fractions, symbols like integral and sum). This does not fit in and makes the page very hard to read in places. The HTML option does not work for complex equations, and even when it does it's too small, making the numbers and letters run together.

Wikia, formerly WikiCities, has a version of the TeX renderer than makes the PNG smaller and more compact - much more like the EB1911, in fact. I have added my name to a proposal at Bugzilla (Bugzilla Bug 4915) to have this version of TeX added (to supplement, not replace the current one), perhaps using a tag like <maths>NEW STYLE EQUATION</maths>. Could any interested parties either add their comments, vote for this, or both.

What about current US Government publications, such as ADA Guide for Small Businesses? This is the text of a current brochure from the Govt. Printing Office. Although this is in the PD, it seems a bit out of the scope of wikisource. It's not really a historical document.

I've noticed Wikisource does host a very few press releases, I'm wondering if that's because there's rules on what kind of press releases can go here, or just that no-ones really felt like adding any. I have a couple of press releases I'd like to add, mostly for old, or cancelled games and tv shows, because of their present-day rarity and lack of official hosting. -- Quoth 16:30, 1 June 2006 (UTC)

OK, I'm going to go ahead and add the couple I have, they can always be deleted if they're deemed inappropriate. -- Quoth 14:41, 4 June 2006 (UTC)

Press releases are fine. Just make sure that copyright rules are respected. WS doesn't have any simply because no one here has uploaded any. People, of course, are always welcome to add them, though.—Zhaladshar(Talk) 21:35, 4 June 2006 (UTC)

Should this information be included in the "examples of source texts" then? I'm not really comfortable with editing policy just yet... – Quoth 04:34, 8 June 2006 (UTC)

I don't think we need to make a new addition. After all, that list is just examples--it's not exhaustive (and for good reason). Just because something doesn't show up on that list doesn't mean we won't take it--we just wanted to give some ideas. Maybe we should make note, however, that the list is a non-exhaustive list.—Zhaladshar(Talk) 14:17, 8 June 2006 (UTC)

If we generalise, we can cover most cases without making the list too long: "Media releases and speeches (such as speeches, radio addresses, and press releases)". // [admin] Pathoschild (talk/map) 15:22, 8 June 2006 (UTC)

Sounds good to me. I'm still concerned about trying to pigeon-hole everything (that's always a pain), and we're still giving examples, so I would still expect questions on "Is this document permitted? It's not an example on your policy page."—Zhaladshar(Talk) 20:04, 8 June 2006 (UTC)

Again this goes to the heart of my latest blurb above, under Letters. Does certain media of historical significance, but yet unpublished, belong on wikisource? Perhaps the exception could state that *you* the transcriber cannot also be the *author* of the media. Personally I'm not really sure how to address this, but at the moment it seems a bit murky. Wjhonson 16:15, 9 July 2006 (UTC)

I propose that this get moved to the Scriptorium, where there's a better chance that people will notice it, especially with RC being flooded by page moves and adding {{header}}. More people are likely to discuss it that way.—Zhaladshar(Talk) 19:59, 9 July 2006 (UTC)

I propose expanding section 2.4 to include its exact opposite: works that cannot be changed. This is a policy that the WMF has and which WS has implicitly held, as well. This would be no change in policy, just making something explicit which we already do implicitly.—Zhaladshar(Talk) 17:06, 8 July 2006 (UTC)

I would like to know if "hand-writen" annotations by a known person, in the printed text of a previously published book are considered part of the "Annotations" that are allowed. I have several books where the annotations are known to have been made by a certain person, themselves perhaps not notable, but long dead, and it may be interesting to include this sort of material. Wjhonson 19:59, 11 July 2006 (UTC)

I would say no as the intention is to reproduce the printed source itself rather than annotations by specific persons. Possibly if the annotater was particularly notable (eg Einstein writing all over a work by Newton) there would be a case to include those specific annotations but I think that set of circumstances would be few and far between. AllanHainey 11:22, 12 July 2006 (UTC)

I think it would be interesting to have such editions. Of course the text must also be available without annotations as well. Also be sure that "long-dead" is long enough to clear up any copyright issues.--BirgitteSB 21:22, 13 July 2006 (UTC)

The requirement of previous publication has (at least) two reasons, verifiablity and notability. Handwritten annotations by Einstein or Newton are notable. But it might be difficult to verify them if they are not published in some way. If such annotations can be found from some reliable source, I think a title that names the famous person is better (maybe X, with annotations by Newton or something similar). I think the title The annotated X is good for works annotated by Wikisource contributors, but annotations by a famous persont is not really the same thing. /82.212.68.183 23:05, 13 July 2006 (UTC)

As has been discussed here and in the Scriptorium, manuscripts of historical note, should be excepted from the *prior publication* rule. As a general rule, the vast majority of such things are not published in their entirety, but may have various bits and pieces published.

However if the persons or events recorded in these manuscripts are of note, they should be included on wikisource.

I suggest we have a new Heading for Manuscripts, since they could be, but are not necessarily "added value". And within that section should be wording similar to "Transcriptions of hand-written manuscripts of historical note such as letters, diaries, transcripts, or notes — form a special exception to the requirement for prior publication and may be included on wikisource."

I don't really think this is necessary. If we add anything, it should just be another entry under §1.1. Just add a line about manuscripts being source texts. However, I think the problem here is due to a slight misunderstanding of the prior publication rule and what we consider publication (which, I admit, is not the easiest thing to understand). As I said, we host letters. Many of these letters are not "published" in the sense that they are in some book somewhere. They are "published" in the sense that they have actually been created in a hard copy form and that (ideally) there is the actual document floating around out there which can be compared with the electronic edition that we currently host. Same with speeches; many of them were delivered orally and that really (aside from possibly the speech written out so a person could read it) is all the publication it ever got; it, too, never ended up in a book.

The biggest reason to having the prior publication rule is to ensure that we have something to check against the electronic version we host. If you have scans or the actual manuscript in front of you, then there really is no problem with what you are adding being excluded according to our policy.—Zhaladshar(Talk) 14:41, 15 July 2006 (UTC)

I changed it to remove "Historically significant works, such as national anthems, constitutions, legal documents;" and replaced with Primary sources folowed a non-exclusive list of works which are primary sources. I do not want to use the term manuscript, as that only signifies that the work was writtten by hand instead with a typewriter or computer. Feel freet to alter or revert if anyone disagrees. --BirgitteSB 16:17, 15 July 2006 (UTC)

I'm not really sure about something typewriten. I just felt it would be harder to verify the source since a typewriten document can't be validated as to who wrote it. I'm ambivalent about it, I just wanted to point that out. And I'm not suggesting that everyone who posts their great-grandfathers letters, validates his handwriting, I'm just saying it would be possible. Whereas with a typewriten document, it wouldn't.Wjhonson 16:38, 15 July 2006 (UTC)

What about when a wikisource editor wrote a peer reviewed dissertation...[edit]

Trying to follow the history of this discussion, it seems the primary intent of preventing wikisource editors from including their own writing is to prevent loads of "self-published" or "bad fiction" works. However, what about those of us who wrote a peer-reviewed dissertation that has been published elsewhere (UMI)? Your guidelines are unclear but as they are currently phrased, tend towards not including such works.

I think the guidelines are pretty clear that such a publication (peer-reviewed) would be acceptable. However there is is also the issue of copyright to consider. --BirgitteSB 21:40, 3 September 2006 (UTC)

I know I'm probably getting annoying, speaking of the need for a firm re-definition of our inclusion policy, but I think we're unfortunately building on some greatly hypocritical ground where we allege that Author:Jon_Stewart would allow his works that we host to be modified, published and sold in a book, turned into a stage play, set to music, redistributed and used commercially...yet Zodiac Killer letters or John F Kennedy death threat are somehow not respecting the author's completely absurd (yet possibly legislated) inherent copyright. I can go sell a book entitled "The Zodiac Killer letters" or "The death threat against Kennedy" without any possibly legal ramificiations...even a book containing famous suicide notes (all three books of which have in fact, been published before) - but I'd be a lot more wary about claiming that I can commercially publish and/or modify Jon Stewart's work. (and no, I'm not claiming we should go delete the thousands of speeches on WS on a whim)

The trouble arises from the definition of "Free" works, and no, Free/Gratis/Beer comments really don't constitute a consistent and defined inclusion criteria.

As I can see, we have two distinct factions on WS when it comes to deciding whether something is valid for the project or not. One group says "My gut instinct says it's probably alright, assuming you've actually looked into any potential copyright as exhaustively as possible, common sense (and incidentally common law, largely) would agree it can be here" while the other camp says "The copyright policy says we don't publish anything unless it was publishedby the US Fed'l government, written pre-1923 or the author died pre-1936, those are the rules". And unfortunately due to our human limitations as laypersons, and of course the even more glaring human limitations of lawyers paranoid about Deep Throat suddenly claiming he owns the publication rights to the Pentagon Papers, both groups are prone to armchair legal hypothesizing.

Please stop characterizing the public domain as containing only "Poe, Swift and Twain" or "US legislation and Victorian authors" or so forth, which is untrue. Considering that copyright generally extends only 70 years into the past, there is a vast repertoire of works above and beyond that. For details on what is public domain, see User:Pathoschild/Help:Public domain.

We are not an indiscriminate text dump. This is what distinguishes us from other online libraries, not comprehensiveness (where we are still far outclassed by Project Gutenberg, for example). Any user can freely read and download Wikisource texts knowing exactly whether it is legal in their jurisdiction or not based on the license data provided (unlike other libraries), and distributors can distribute and market Wikisource texts in bulk knowing exactly whether it's legal in their jurisdiction or not. This ability to freely (in every sense of the word) read, use, and modify Wikisource texts makes it extremely useful, far more so than an indiscriminate, gray-area text dump.

As I have said before, we barely include an insignificant portion of what is available. Why turn to unfree works, which will reduce our usefulness? This is particularly relevant since we are part of the Wikimedia Foundation, whose stated goal is "the growth, development and distribution of free, multilingual content, and to providing the full content of these wiki-based projects to the public free of charge" (from Foundation wiki, links original). I'm willing to bet orphaned works will never be listed on w:Free content. :) —{admin} Pathoschild 18:07, 26 November 2006 (UTC)

I've seen, and complimented User:Pathoschild/Help:Public domain - it is a useful resource - it is not however clear on many matters, especially on issues that involve a "grey area". [Nixon Tapes] were not made by the President in official commission of his duties, Nixon died only 16 years ago, the tapes were published only 33 years ago, the subject of the work fought ferociously to prevent them being made public (or even turned over to the courts), while the creator status of the work is disputable at best...Nixon was responsible for creating the tapes himself. Of course, however, discussions cannot be copyrighted in and of themselves, and does a security recording count as having made the tapes under the known pretext of their being fixed in a tangible form by automated security systems? Does the fact the Berne Convention excludes audio recordings matter? Obviously it's a complicated issue that has no legal precedent.

Your explanation isn't entirely accurate, we don't say whether a text is "whether it is legal in (the website surfer's) jurisdiction" - we say only whether it is in the United States (as per the location of our servers). And I doubt any book publisher would say "Hey look, WS has a copy of a comedic sketch/speech done by w:Jon Stewart, I bet I can republish it as my own book, or give the same speech myself in five years pretending its my own, freely!". We don't even offer any explanation why such a document (of which WS has thousands, if not tens of thousands of speeches) would be considered to be public domain, other than "Well, Jimbo said it was a-okay, so let's not emasculate one of our key genres" - which is fine, but can't we find a better way of stating it? In fact, the legal precedent of the I Have a Dream speech would seem to contradict our blanket claim...so perhaps our policies should explain the difference between it, and the speeches we *do* allow. You can't just say "This 1963 speech is allowed to be hosted here, but this one is not" without having a policy which explains why we consider the law to view them differently. If Jon Stewart went to court like King's estate did, wouldn't they give an identical ruling about the copyright status of his speech? In fact, given the judicial reasoning, isn't it more likely to be copyrighted since they refused to allow an open, public address to be "free", why on earth would the courts consider a private address to a controlled audience to be free?

Take another issue we don't address, what is the difference between an w:editorial and an w:open letter? If I list Lloyd Axworthy open letter to Condoleezza Rice as an open letter, few people would dispute that it belongs on Wikisource. If I list it as an opinion that Axworthy had a Canadian newspaper publish...there would be more dispute. Other than age, is it any different than I accuse? Does it make some kind of difference that both of them begin with an opening address to a single person they'd like to offer opinions upon, instead of acknowledging they're writing it for a public audience? I'm not asking w:rhetorical questions here, I'm actually pointing out real questions that exist, and we don't address in our policies.

Again, I don't want to see speeches or letters deleted, when it comes down to it, I just want to have a workable policy.

What is free content, is what we have to decide here. That's the whole reason we need to actually bring logic to bear, and hack ourselves some more clearly-explicable policies. Take a look at w:Desiderata#Copyright_status where it talks about how a copyrighted 1927 poem, not published until 1948, by an author who died in 1945, was ruled by the US Copyright Office to have its copyright status forfeited in the 1970s, since so many people believed it was public domain and centuries old, and the author had written his war-buddy a letter telling him to "share" the poem with others. Note also, that other US courts have ruled it is not public domain...so who does WS default to in the case of a dispute among judges, the rulings of state courts? federal courts? congress? copyright office?

Again, while we can both offer pithy statements like "shouldn't just be Poe, Swift and Twain", or "With such a large wealth of free texts, we shouldn't accept non-free" - they are basically rhetoric for both of us. The fact is that "free" is a largely subjective term - w:Wikipedia:Public_domain even makes repeated references to national governments "guessing" at the status of works. So let's move forward, rather than argue "Yes we need an overhaul" or "No we don't", let's focus on the actual points brought up. How do *we* distinguish between MLK's "I have a dream" speech, a British politician's monologue to the House of Lords, and comedian Jon Stewart's address to university students?Sherurcij(talk) (λεμα σαβαχθανει) 04:55, 27 November 2006 (UTC)

Orphan works can be free content . . . 120 years after they are written. Orphan works are a category of works that contain both public domain and copyrighted works, lets not be imprecise it only confuses the issue. I don't believe the people here fit so neatly into two categories. There is a large variety opinion on the details. However we cannot host works which are clearly copyrighted as the gov't is considering allowing such works provision where the author can in the future come back and collect licensing fees on his copyrights but limiting the amount he can collect to protect people who tried and failed to locate him before publishing. BTW feel free to comment on Wikisource:What Wikisource includes/new draft although it is not attempting to change things.--BirgitteSB 20:13, 26 November 2006 (UTC)

Sherurcij, if you seriously want us to come up with a definative test for determining if something is "free" or not, I do not think you will have many takers. There is no such thing. However the argument that we are in grey areas for something so we might as well allow all orphan works is not going to work either. Orphan works which are not PD are clearly handled as copyrighted works. They are definatively not free, the only problem is no one is sure who they need to pay licensing fee to. I really do not know what you are trying to accomplish with this. Really none of us are lawyers, we can do our best to account for things but there is always uncertainty. I don't know why you insist on answers that no one has. The only solution is that we will examine individual situations and come to consesnsus as to whether each is acceptable. That is it. We will do our best and when we find out our best was incorrect we will reverse course. What else can we do?--BirgitteSB 20:01, 27 November 2006 (UTC)

I know there is no such thing as a definitive test for if something is "free", in fact I made the exact same point earlier when Pathos responded "There's no point adding unfree texts when there are so many free texts" - copyright law isn't like a speeding ticket, where it's a simple matter of "Was he, or wasn't he, going above 100km/hour?" - even ignoring international rights, just in a single country, there are overlaps, loopholes, exceptions, and most of all...areas with no legal precedent. I'm not arguing specifically for Orphan Works, in fact I've mentioned a great many other types of works - but my point is firmly rooted in the belief that where there is no clear legal precedent in a case...WikiSource should have its own defined policy on matters. From a legal standpoint, the thousands of speeches we host are every bit as copyrighted as the anonymous death threats - moreso in fact, since they have successfully had courts state explicitly that speeches are copyrighted, no matter if they were addressed to the public domain. (see w:Estate of Martin Luther King, Jr., Inc. v. CBS, Inc.. Now, Wikisource chooses to prominently host these copyrighted works...shouldn't we include our reasons for that?

Whether it's Copyright Policy, Inclusion Policy, or What Wikisource Includes that needs to be updated, is a matter of debate - but somewhere we should actually have a policy that states where the line is drawn on say, Stephen Colbert and Jon Stewart, both are the scripted works of popular current-day entertainers...does the fact they were delivered to invited audience members only somehow make them legally unique from any other stand-up comedy skit? One was televised, does that matter? Are both of them within our inclusion policy? Neither of them? One of them? How is this any legally different from hosting a w:Chris Rock sketch? Or a w:Monty Python skit? Again, I'm not demanding these works be deleted, I'm insisting there should be an internal policy in Wikisource that defines what we do/don't include. One that doesn't just glibly say "free stuff". Sherurcij(talk) (λεμα σαβαχθανει) 01:57, 28 November 2006 (UTC)

I'll try to express my position as unambiguously as I can, and avoid rhetoric if possible. We should only host texts that we know are libre for whatever reason. By libre I mean that the work may be distributed, used, or exploited in any way and for any purpose (commercial or noncommercial) without any restrictions whatsoever above and beyond those codified by the GNU Free Documentation License and applicable laws (for example, you cannot modify even a public domain text to slander or misrepresent the author). This includes the vast majority of literature, and is the category covered by User:Pathoschild/Help:Public domain.

There are some isolated exceptions, as established by full consensus, in which we allow a slight degree of subjectivity. These exceptions are referred to as 'presumed public domain'. Speeches are one type of work that consensus has allowed, despite the possibility of copyright in some cases. In such cases, we presume that the works are free without requiring quite the same evidence we normally do (although they can be deleted if challenged).

Both of these cases (known public domain or compatible license, and presumed public domain) are covered by current policies. These are applicable to all the cases you've brought up; we can look at each work and ask, "Do we know that this work is compatible? If not, is it one of the agreed-upon exceptions?" The changes you suggest belong to completely different categories, so the question of whether we need to overhaul depends on whether we want to include those.

The works to which I am most opposed are unfree but exploitable works. By unfree I mean that we do not have the legal or moral right to distribute, use, or exploit them in every way and for every purpose (commercial or noncommercial) without the accepted restrictions I mentioned above. This includes cases where we do not have said right, but may be able to do so anyway by exploiting loopholes, gray areas, or legislation that does not change the status of the work (such as orphaned works legislation).

The other type of work I oppose are those that might be free, but for which no evidence or legal precedent is available to confirm that the work is free. Wikisource is not a legal testing ground, and we should not be exploring new legal horizons. Even ignoring the fact that this is a blatant violation of the Foundation's mission, our team of lawyers consists of one person and we cannot afford to defend ourselves against serious litigation. Further, these works are already prohibited by the Copyright policy, which requires evidence confirming that the works are compatible. Note that legally unprecedented assumptions are not evidence.

If the lines in our exceptions are as blurred as you suggest, that is a good argument to tighten the individual exceptions. However, this does not apply to new categories you've uploaded, such as orphaned works, which are not known to be compatible and are not an agreed-upon exception. —{admin} Pathoschild 03:54, 28 November 2006 (UTC)

But you see the problem with "We should only host works we know to be libre...except when consensus says to ignore it and host them anyways", right? There's no formal policy saying why we host Speeches, only that we do. They certainly can't be legally presumed to be PD, in fact the courts have ruled the complete opposite. There's not a "possibility of copyright in some cases" with speeches, in fact the court ruling is very broad and clear that all similar works are copyrighted. You're taking a very strong stand against "might be free" works, while allowing "definitely not free" works.

Are you claiming that I can publish Author:Jon Stewart's comedic sketches and sell them for corporate profit, and there would be nothing Stewart could do about it?

Your claim that "we cannot afford to defend ourselves against serious litigation" would tend to support my claims that it's better to host something written by an anonymous terrorist who will never reveal himself, and even if he could, likely couldn't take us to court...than to host something written by a current-day public figure, entertainer or comedian who wrote down a scripted address to be delivered. Just because they spoke words the words out loud to the public, doesn't mean it's not copyrighted...otherwise we could just include transcripts of every television interview ever conducted.

Again, I'm not crusading to delete speeches - I'm pointing out that if we're going to host speeches, we should acknowledge that they are much more "high risk" than say, hosting a ransom note or death threat written by an unknown individual...so if WS is willing to "allow its inclusion policy to allow some very distinctly copyrighted works" (including making them featured articles) - then we shouldn't balk and emasculate the project, we should include texts such as the Anthrax letters or JFK death threat, which are rationally and litigationally much less of a risk that what we already host thousands of. Sherurcij(talk) (λεμα σαβαχθανει) 04:46, 28 November 2006 (UTC)

I'm not defending speeches, just pointing out that they are an exception per community consensus. If you feel they are a violation of the copyright policy, I invite you to discuss the deletion of speeches that are not otherwise compatible on the Scriptorium.

However, I disagree with the application of the slippery slope argument. The fact that we allow one exception does not mean that we are willing to allow any exception. Speeches are hosted because we presume that they are in the public domain; this may be an incorrect presumption, and perhaps in the future we'll delete copyrighted speeches and solve that problem. However, orphaned works and exceptions for enemy organizations or states are not libre. They are, at best, exploitable without probable legal consequence in the United States in the short term. Wikisource, as a project of the Wikimedia Foundation, should conform to it's overarching principle of "encouraging the growth, development and distribution of free, multilingual content" (see my definition of free or libre above). Hosting any work we can get away with, be it orphaned works or fair use or exceptions under the International Emergency Economic Powers Act, is not libre. —{admin} Pathoschild 19:26, 28 November 2006 (UTC)

We don't really have any basis whatsoever to presume speeches are PD, in fact quite the opposite. You might as well say "We host Michael Crichton books on the presumption they're PD". Speeches have been ruled by the courts to be automatically copyright-protected. But I don't disagree with hosting them here, because there is an overwhelming obligation/desire to share this information freely with the world, and other than the King estate (and the recent additions of Jon Stewart and Stephen Colbert "speeches") - there is little "probable legal action" that will arise. By the same token then, I feel we should write policies that explain this reasoning - and be open to the fact that perhaps the understanding of "free library" or "library of free texts" can be re-interpreted from the very narrow, yet very vague, definition you're using now. Sherurcij(talk) (λεμα σαβαχθανει) 19:51, 28 November 2006 (UTC)

"Speeches have been ruled by the courts to be automatically copyright-protected" Please refernce this statement. If you are refering to the King speech you are quite mistaken. You might wish to read the actual files. The reasoning it was originally ruled as copyright was because of a printed copy distributed to reporters before the oration. The apppeals court said this did not count as copyright (at least at that time period). The court never established copyright It was settled out-of-court. Settled out-of-court means the court did not settle the issue. Please show me where a court has ruled a speech copyright protected.--BirgitteSB 20:05, 28 November 2006 (UTC)

It's possible I am misinterpreting the King ruling, although the Washington Post says that all of King's speeches are fiercely protected and under copyright[3] meaning that it's not solely based on the desseminated copies of I Have a Dream.

"Between 1955 and January 1959, Vice-Admiral Hyman G. Rickover delivered speeches before a variety of organizations...mimeographed copies, which only bore notice of copyright after December 1 1958...were distributed to interested persons...A publishing firm sued for a judgment declaring that the speeches could be printed and sold freely, since (Rickover had used government equipment) A Government Official May Copyright Speeches Prepared and Delivered outside the Scope of His Official Duties. certainly implies that speeches can be copyrighted, even if only mimeographed and distributed after the fact.

On the whole, there's a lot more evidence for speeches being "copyrighted, just like ransom notes and everything else", than for it being otherwise. But maybe at the end of the day it simply comes down to whether we want to err on the side "not including copyrighted works" or "only including works released to the public domain" - both of which make assumptions about that large grey area in the middle.

Of course, the question of "What is the WS project?" can probably be answered by each member's response to the hypothetical question of "2007 sees the executive branch of the US government pass increasingly ignorant copyright legislation known as The Sonny Bono Act: Part II - retroactively declaring PD to be unconstitutional since it abridges individual rights...does Wikisource simply shut down and declare the project over, or are we working towards a purpose that is greater than any poorly-devised legal blanket-terminology that would likely not stand up in court?

Heck, on a similar note...we are non-commercial...just our viewers aren't all non-commercial. We could host thousands more documents and texts, if we just had a template ((non-commercial)) that inserted a warning about commercial useage onto the text's articlespace. Anyways, whatever, it's past midnight and I imagine I'm less than entirely coherent at the moment. Sherurcij(talk) (λεμα σαβαχθανει) 05:24, 29 November 2006 (UTC)

Speaking of hypothetical futures, which news subtitle would you prefer? [a] "Under increasing pressure from flagging sales, a coalition of book authors has agreed to release dozens of old books into the growing online 'libre' literature movement to encourage sales of their future works." (see similar, real headline); or [b] "Under increasing pressure from flagging sales, a coalition of book authors have petitioned their publishers to implement tighter DRM technology in their downloadable ebook previews." (see similar, real headline). Freedom encourages freedom, much as restrictions encourage restrictions.

Wikisource is not unique because it can be comprehensive; Project Gutenberg's wiki had a growth rate of 30 new texts this week alone. However, those texts are difficult to download in bulk, are not in a browser-friendly format, and are not all free. In many jurisdictions, a user would be afraid to download a text from Project Gutenberg to his PDA for fear of violating copyright law, because there's no license information (just a vague assurance that it's legal in the US). Wikisource is unique in that it is a library of free texts, and this fits very well with the Foundation's mission that I've so often quoted: "the growth, development and distribution of free, multilingual content". —{admin} Pathoschild 07:06, 29 November 2006 (UTC)

I still don't understand why you think what's hosted on Wikisource is free everywhere in the world. Like Project Gutenberg, we only assure readers that our texts are free in the United States. But where I see us differing from Project Gutenberg, is that PG doesn't host After Action Reports, autopsies, death threats, ransom notes or similar non-literary works.Sherurcij(talk) (λεμα σαβαχθανει) 07:55, 29 November 2006 (UTC)

I'm satisfied with works being free in the United States, as long they are properly categorized. Unlike Project Gutenberg, we explain why a particular work is free so that users can decide whether it is usable in their jurisdiction. However, I'm entirely against works that are not free in the US or elsewhere, such as orphaned works.

If you'd like to host such works, note that Wikilivres (based in Canada) has already expressed an interest in taking them from us. This will make them equally accessible to the world (which will find them just as easily there than here through Google), while not posing legal problems for Wikisource or turning away from the Foundation's overarching objective. —{admin} Pathoschild 17:36, 29 November 2006 (UTC)

Based on some discussion WP. I now think the terms primary source, secondary source etc are a very bad idea. Ignore this for now . . .--BirgitteSB 17:46, 18 October 2006 (UTC)

I am much happier with the draft after some updates--BirgitteSB 06:09, 21 October 2006 (UTC)

Only part that I'm not sure is a good idea, is the concept of "Precedent deletions", if something is not valid at Wikisource, that should be obvious in and of itself - without having to say "But we deleted something similar in the past!" - I believe that would work against allowing Wikisource to evolve and adapt itself as needed, without offering any clear benefits. Sherurcij(talk) (λεμα σαβαχθανει) 07:51, 11 December 2006 (UTC)

New Question: Is a dissertation/Ph.D. thesis, that has been accepted and published by a university included in the scope of Wikisource?—unsigned comment by194.94.56.12 (talk) 10:00, 4 May 2007.

Depends on the university, I think. Accredited universities have new Ph.D. theses scrutinised by at least two, sometimes more, qualified examiners, so Wikisource should treat them like ordinary publishers. Diploma mills, on the other hand, are similar to vanity publishers and theses from them should not be accepted.

Of course, to publish your Ph.D. thesis here, you must have the legal ability to grant Wikisource an appropriate free licence. So if the university nicked your copyright, Wikisource regrettably can't accept your work.--GrafZahl (talk) 15:01, 4 May 2007 (UTC)

I think this subject deserves an own section on the main page. I agree with GrafZahl, and I'll initiate such a section based on some key points of it. Still, everyone's welcome to come with suggestions of changes or additional notes. Mikael Häggström (talk) 06:51, 6 December 2011 (UTC)

Is there another problem here? If part of a PhD thesis (including, say, a diagram) is, at a later date, published in a journal and copyright signed over to a company (say, Venal Publishing Inc.) then the author no longer holds the entire copyright on the thesis, because the diagram now belongs to Venal Publishing Inc. Is this correct? If so it might be difficult for even the author of the PhD to know where the copyright resides (Venal Publishing might be taken over by Vampire Publishing and the copyright passes to them). --Logicalgregory (talk) 10:03, 2 September 2010 (UTC)

In the above case, as far as I know, the permissive license in Wikisource is perpetual and does not change, regardless of licensing used in other publications (see WP:No revoke). In this case it would mean that you wouldn't have the permission to just copy the diagram from, say, Venal Publishing, but you could still copy it from Wikisource as much as you want. Mikael Häggström (talk) 18:04, 3 December 2011 (UTC)

I'm new to this project, I usually edit Wikipedia (see w:User:Sbrools), but I had a question. As a musician, I'm interested in works of music, and I noticed that Wikisource doesn't really have any policy that I can find related to that. I mean, I saw the definition on this project page, which states "Wikisource, as The Free Library, exists to archive the free artistic and intellectual works created throughout history...", and to me, that seems like it would include works of music that are now in the public domain. There are many pieces that are free due to their age, such as the works of Mozart, Beethoven, etc. And while I can see some people being opposed to the inclusion of mathematic proofs as not being artistic (which I don't wholly agree with, I happen to think some math is very artistic (see w:Euler's Identity)), I don't think anyone can argue that music is not artistic. If there is a policy against it, can someone please point it out to me? Otherwise, I think that we could and should include pieces of music. After all, Wikisource is the "Free Library", and my local library has a whole section of sheet music. The logistics of it would take a little bit of work; while the actual music can be in the public domain, most printings are copyrighted, but we could always use an open-source music creator like w:Lilypond. Something similar to the Mutopia Project. What does everyone think? I won't be offended if you think I'm crazy, after all, I'm an outsider here... Sbrools 04:09, 18 May 2007 (UTC)

At first I thought you were referring to Category:Song lyrics, but I see now you mean Portal:Sheet music. The consensus seems to be that it is a great addition to the site, just we lack people motivated enough to add much of it :) As long as it fits the standard w:public domain criteria, it would be an excellent addition I think you may be interested in WikiTex markup, but am not actually sure how it works...or if that's what you were already referring to...I...am not musically inclined :)SherurcijCOTW:Harriet Beecher Stowe 07:04, 18 May 2007 (UTC)

Ok, thank you. I had no clue such a page existed. Thank you very much. Sbrools 19:05, 18 May 2007 (UTC)

Both music and mathematics are perfectly acceptable as long as they are published (see the inclusion policy). :) —{admin} Pathoschild 20:28:24, 18 May 2007 (UTC)

Pathoschild, I'm not sure I understand your revert. Your edit summary says "rv (contradicts section, which is not only about historical events)". First of all, my edit doesn't concern "historical events" (i.e. in the past), it concerns "historic events" (i.e. notable). Secondly, how does it contradict the section? Kaldari 01:04, 25 September 2007 (UTC)

Your edit limited documentary works to those with historical importance, a requirement which has been opposed above. The section then goes on to list works such as military transcripts, personal correspondence, and diaries, which are typically not "historic" in the sense you mean (even if a small subset of that collection may be). The requirement in force for such works is verifiability, not notability. —{admin} Pathoschild 04:15:13, 25 September 2007 (UTC)

In that case, what's to prevent me from uploading the entire contents of the MediaWiki development listserv archive? It was created "in the course of events" (developing MediaWiki), and it's verifiable. Surely, we want to require some degree of notability, don't we? Kaldari 21:29, 25 September 2007 (UTC)

It was never legally published, though. That's the overarching requirement we've been discussing. If they publish it in book form or some such, feel free to start uploading it to Wikisource as soon as the copyright situation allows that. —{admin} Pathoschild 22:21:36, 25 September 2007 (UTC)

The MediaWiki development listserv archive is most certainly legally published (though not as a book). Regardless, the current guidelines on documentary sources says nothing about publication (in book form or otherwise). What would you suggest adding to address this? Kaldari 15:16, 26 September 2007 (UTC)

Feel free to suggest something. I'm drafting a complete rewrite of the policy based on discussion here and elsewhere. —{admin} Pathoschild 15:31:42, 26 September 2007 (UTC)

I've proposed a one-sentence addition to the policy here, although it doesn't specifically deal with the issue we are discussing now (re documentary sources). Feel free to propose something more radical if you have some ideas. Kaldari 15:35, 26 September 2007 (UTC)

Greetings. I specialize (in a layman kind of way) on copyright policies on Wikimedia projects, and I noticed a few problems with this page. For instance, it says that pre-1923 works are acceptable, but it doesn't say why. I have proposed some changes at Wikisource:What Wikisource includes/rewrite. Most of it is the same, but the copyright information is explicit, and the organization is a little different. I welcome any comments. Quadell 22:04, 10 January 2008 (UTC)

Hi, IMO this "pre-1923" aspect of our inclusion policy is merely a compromise - it is only related to copyright because we have chosen it as a line in the sand that correlates with the PD-1923 line in the sand. It does not make sense, as PD-1923 is only relevant to a subset of our works - why should it have anything to do whether a work is acceptable for inclusion onto Wikisource? see WS:S(2007-09)#Change inclusion policy the discussion around the change, and WS:S(2007-10)#Upshot of Change inclusion policy ? for post-policy-change discussion.

It is like the current change to the Author: page inclusion policy, where we have decided that "dead people" is the new line in the sand.

These compromises are not supposed to make any sense, except to keep using very clear lines in the sand so that we dont need to have complex notability discussions all the time - that path leads to madness, and none of us really want to spend time debating when we have such an enormous breadth of readily available material that can be uploaded without dispute.

I just think it makes more sense for our "line in the sand" to be "public domain in the United States". There's actually a legal basis for this, and it's how the English Wikipedia is run (basically). Quadell 04:34, 11 January 2008 (UTC)

You will need to go read the discussions I pointed to. John Vandenberg 04:45, 11 January 2008 (UTC)

I'll go read them. But for now, I'll just hilight that there are many, many PD works written after 1923, which are included on Wikisource, and which are (technically) not allowed according to the letter of this page -- U.S. government works, Ethiopian works, etc. Quadell 11:58, 11 January 2008 (UTC)

The "1923" provision was never intended to prevent post-1923 works - it is there to prevent post-1923 self-published works. I've updated the page to indicate that there are four different classes; we accept works that fit within the definition of any of these classes. John Vandenberg 13:50, 11 January 2008 (UTC)

Okay, that's fine. But no policy -- on this page or anywhere else -- says whether "public domain" refers to U.S. copyright law, the laws of the originating country, or what. Commons and en:Wikipedia do this differently, and it isn't clear how Wikisource deals with this. That's the purpose of Wikisource:What Wikisource includes/rewrite, which I believe is clearer as well. Quadell 14:22, 11 January 2008 (UTC)

I haven't read through the rewrite or even the current version of this policy closely. But I have some general ideas about topics which have never been addressed in this policy (which in full disclosure I was one of the primary authors). Three of these topics revolve around whether we want this policy only applicable to the Main: namespace and if so do we separate policies for other namespaces. Personally I would like to see this only be about the main namespace, but it would be good if we said this explicitly and wrote a supplemental policy for to supplemental namespaces. I will try and hunt down these old discussions and find links over the weekend.

Language: We have works in old English, Middle English, and Scots dialect. There was a discussion about codifying acceptance of w:Anglic languages that did not reach consensus at the time.

Author pages: Despite the current discussion about dead authors with no free works, we have never had any inclusion criteria for author pages written here.

Index pages: Inclusion of index pages is generally hashed out piecemeal at WS:DEL when someone sees something they don't like.

Categories: There has been big picture discussion about these in the past, and currently to piecemeal discussions on WS:DEL. The big questions are: Should we use "subject" categories and if so where do they end? and How far away from their works do we go categorizing authors?

The rewrite does not address self-published artistic works prior to 1923, which is one of the outcomes of the discussions I directed you towards, and the rewrite uses the word "notability" which is an inclusion criteria that doesnt have any community approval as far as I am aware. This page is not intended to define public domain, as that is a whole other kettle of fish. John Vandenberg 14:55, 11 January 2008 (UTC)

The word "notable" is not generally used on Wikisource in the same way it is used on other English speaking Projects. Instead, Wikisource spells out the inclusion criteria related to publication with a peer review component for artistic works. For that reason I think we should avoid using it in this policy. FloNight 20:57, 11 January 2008 (UTC)

I agree that we want to avoid the word notable like the plauge. If someone didn't think the work were notable they wouldn't think of adding it, so that criteria is unhelpful. I would like to stick to objective criteria only.--BirgitteSB 23:20, 11 January 2008 (UTC)

Am I to understand from reading this page that there are no "notability" requirements for inclusion of a text beyond that they've been published? I have some texts of old (i.e. pre-1923) political speeches that aren't hugely notable, but which I suppose might be of interest to some people. Many of them were only (so far as I can tell) published in Hansard - is that sufficient publication, or is it too indiscriminate a publisher? Sorry for the ignorant questions, but I'm new at this. Sarcasticidealist (talk) 21:27, 12 September 2008 (UTC)

Indeed, our only "notability" threshold is that something was published - preferably in "hard" media (not a blog, etc) -- so a (non-notable) politician's words from the Hansard's would definitely be worth adding. If you're using the British Hansard, you'd be pretty much limited to pre-1923 I think, while Canadian Hansard would grant you up until 1945 (since CC would have expired in 1995, and thus even the United States would recognise them as Public Domain). Sounds like a good project :) SherurcijCollaboration of the Week:Author:Albert Schweitzer21:39, 12 September 2008 (UTC)

I concur with Sherurcij: if it's published, and public domain, we're interested in it. Unlike Wikipedia, we love all the little lost historical details that make up the world. Jude(talk) 21:53, 12 September 2008 (UTC)

Thanks kindly - I'll try to get to adding some if this weekend. Sarcasticidealist (talk) 21:59, 12 September 2008 (UTC)

what about something which was created both after 1922 and before 1923?[edit]

It list information about Works created before 1923 and Works created after 1922. What if it happened between the two? Everything that happened after January 1, 1922 12:00am, happened after 1922, and that's a whole year of things that happened before 1923. Shouldn't it be changed to be the same year listed? Dream Focus (talk) 09:43, 16 October 2009 (UTC)

After 1922 means (obviously) after December 31st, 1922. Published in 1922 is not after 1922. Yann (talk) 09:45, 16 October 2009 (UTC)

When there is a requirement for school or retirement that says you must be born after a certain year, that always includes those born in that year. Needs to be made specific. I'll go change that now. Dream Focus (talk) 09:58, 16 October 2009 (UTC)

Keep the current system for anything of historical or educational value. For fictional works whose only purpose is entertainment, I suggest we make a certain rule.

If the entire story is nothing more than pornography, and it glorifies kidnapping, torture, brainwashing, and rape of women, encouraging all the young perverts out there to do so themselves, telling them its alright because she'll enjoy it and thank you later, and keep coming back for more, then it shall not be allowed.

I do not believe things such as The Way of a Man with a Maid have any reason to remain here. Regular pornography I have no problem with, as long as it does not glorify and thus encourage rape. Dream Focus (talk) 09:56, 16 October 2009 (UTC)

If you have an issue with a specific text, the best place to bring it up would be through our deletion process (WS:PD) rather than attempting to create a new guideline (which will likely be rejected, as we have't yet written guidelines based on content, nor do I think it's likely that we will do so in the future) specifically to exclude its inclusion. Jude(talk) 11:26, 16 October 2009 (UTC)

Whether it gets rejected or approved depends on whoever is around at the time to comment on it, and decides to do so. Please post either Support or Oppose. Dream Focus (talk) 18:14, 16 October 2009 (UTC)

We're not really a democracy in that fashion, straw polls won't resolve much. Jude is explaining to you that if you want this specific work deleted, PD is the way to go. If you want to change WS policy on censorship, you probably have five years of committed editing on the project ahead of you and a very impassioned reasoning. SherurcijCollaboration of the Week: Author:David Livingstone. 18:50, 16 October 2009 (UTC)

Further to what Sherurcij said, we're also not Wikipedia. We do not work like they do, and for the most part, we don't rely on polls or votes (except at WS:PD and WS:COPYVIO, and even then, each vote is taken into careful consideration by the closing admin) to decide "consensus". We rely on sane discussion instead. Jude(talk)

There is no such thing as a hundred-year old work whose only purpose is entertainment. They're all part of history, and an insight into older days. I don't think that anyone is coming to Wikisource for dirty texts; there's a lot of porn out there on the net, and perverts don't have to wade through the decent stuff to find it. Your rule is narrow, convoluted, and purely subjective. It's a library, it's not censored, and there's no reason for it to be.--Prosfilaes (talk) 22:35, 16 October 2009 (UTC)

Seems to me that there is very much an expression of POV. The book is of its time, and of its place. Whether one likes it or not, for us to start making value judgements is not what we are about. -- billinghurst (talk) 08:51, 17 October 2009 (UTC)

In W:Melvin T. Brunetti, I am using as a source judge Brunetti's obituary as published by the United States Court of Appeals for the Ninth Circuit. The obituary is a PDF file hosted by the Ninth Circuit web site at [4]. Unfortunately, the URL is obscured by the Javascript mechanism used by the site, so I cannot provide a link to it.

The obituary is a news release of the Public Information Office of the Ninth Circuit, and as such is a work of the U.S. government, not covered by copyright under 17 U.S.C. 105.

I'm fairly active on Wikipedia, not so much here. Is this an appropriate PDF file to upload here? TJRC (talk) 02:42, 4 November 2009 (UTC)

Rather than a PDF file, can I ask that you look at Help:DjVu files and then use the Any2DJVU site to convert the PDF to DJVU. You can then upload the file to Commons, and we can set up an Index page as per Proofreading. Sounds like a nice file, and do feel that you can come either to Scriptorium or to my talk page for any assistance to get this online. -- billinghurst (talk) 03:17, 4 November 2009 (UTC)

I think I don’t understand the Works created after 1922 section. It sounds like it’s saying some works created after 1922 are allowed to be included. It sounds like it’s saying Documentary sources such as diaries are not copyrighted, but I doubt that’s true. It also sounds like it’s saying Analytical and artistic works made after 1922 are not copyrighted, but I doubt that’s true also. So maybe “Documentary sources” and “Analytical and artistic works” are supposed to be sections rather than subsubsections. Maybe the “Works created after 1922” section is just saying that 1923 is not prior to 1923. --Chucky 01:58, 11 December 2009 (UTC)

It's not about the copyright status. Anything of a sufficient age that someone types up we'll accept; 1923 is an arbitrary line. But while we want to include good modern material when the law lets us, we don't particularly want to open the doors to just whatever. Personal diaries, all sorts of unnotable ranting and raving, etc., unpublished (and unpublishable) fiction, we don't want. So this is our attempt to define exactly what we do want of a relatively recent nature.--Prosfilaes (talk) 02:46, 11 December 2009 (UTC)

Would it be okay to clarify the language about pre-1923 works? It's my understanding that all works published prior to 1923 are in the public domain, but some works still unpublished but created up to 120 years ago are still copyrighted; this is due to the U.S.'s 120-year rule (thank you, Sonny Bono) for unpublished works where the author's date of death does not apply. I would suggest striking the phrase "or created but never published" and adding a sentence such as "Unpublished works from this time are also acceptable, though they must be verified to be in the public domain—some remain under U.S. copyright and cannot be included." —LarryGilbert (talk) 15:43, 6 January 2010 (UTC)

[5] looks to be only unknown authorship, or anonymous, unpublished works that have 120 years. Known author, if unpublished the date is 1939 (70 year rule). billinghurst (talk) 16:42, 6 January 2010 (UTC)

University Working papers/discussion papers are usually reviewed by somebody before being published in very limited numbers by a University or more often a University Department. They usually remain the copyright of the author. Later if the paper is any good it will evolve and be published in an Academic Journal, at this point the copyright is usually hi-jacked by the Journal's publisher. The working paper will normally contain most of the work that is published in the Journal. Are working papers (that are not University copyrighted) acceptable as a Wikisource?Logicalgregory (talk) 11:39, 1 September 2010 (UTC)

We would only look to the papers following peer review. FWIW the works should remain the intellectual property of the author, and not be subsumed by a university or publisher, unless there is a contractual obligation that applies the intellectual property differently, eg. contractual agreement to monopolise the intellectual property of the work for a period of time. In the absence of that, or prior to publication, the author would be assumed to have standard rights as per the respective copyright act(s) of the country of origin.

Some universities and many journals do make a contractual claim on the copyright of the works. A university can even do that prior to publication.--Prosfilaes (talk) 16:57, 1 September 2010 (UTC)

There are legal claims, and other claims which may or may not be supported by law, not something on which we give advice. Otherwise, isn't that what I said. — billinghurstsDrewth 02:21, 2 September 2010 (UTC)

The primary sources for hardware and software dates are the announcement letters published by the vendors. The recent ones are typically available on the vendors' web pages, but older ones may not be readily accessible. Does Wikisource policy permit storing scanned product announcement letters? If so, what is the cutoff date? Chatul (talk) 09:47, 28 September 2010 (UTC)

Not typically what we are hosting. Generally our work passes peer review test, ie. through publishing. It does not sound as though the topic matter that you raise would be our within our scope. — billinghurstsDrewth 11:58, 28 September 2010 (UTC)

I might be interested in making a case for them, except for the fact that copyright law protects them for at least 70 years.--Prosfilaes (talk) 18:16, 28 September 2010 (UTC)

This page does not seem to mention open-access materials but I saw that a few open-access articles are already present here (e.g. those currently listed in the category PLoS ONE, which are all licensed CC-BY). Has there been a discussion on whether and how to import such reusably licensed materials more systematically? I would like to work on that as part of a Wikimedian in Residence on Open Science project. Thanks for any pointers. -- Daniel Mietchen - WiR/OS (talk) 15:16, 9 August 2011 (UTC)

Let's say that I perform a clinical study, submit it to an independent peer reviewing entity (such as journalprep.com) that rates its quality for a fee - can this study be included in Wikisource, provided that the independent quality rating can be verified by a link or scanned certification? Mikael Häggström (talk) 18:12, 29 November 2011 (UTC)

In my opinion peer review would make it akin to a publication, and that is pretty much the standard that we are looking to achieve. After that, I would encourage that we use the OTRS system (as a guide see Commons:Commons:OTRS) to park the permission to publish, and you could include a certification there if desired. — billinghurstsDrewth 20:40, 29 November 2011 (UTC)

I changed the introductory note in this article from "the free library" to "the free library that anyone can improve", because there are many libraries denoted "free library" so the "the" was inappropriate for denoting Wikisource. Mikael Häggström (talk) 17:45, 3 December 2011 (UTC)

I agree with the change. My first thought was that there are other free libraries that anyone can edit, so even that doesnt uniquely identify us. e.g. Project Gutenberg also has a wiki. John Vandenberg(chat) 22:08, 3 December 2011 (UTC)

I've noticed, though, that The Free Library seems to be the general slogan of Wikisource (such as in the infobox at http://en.wikipedia.org/wiki/Wikisource, so it may be a more established phrase than I originally thought. I think I can accept either version of the phrase in this article. Mikael Häggström (talk) 06:05, 4 December 2011 (UTC)

Under what conditions could the contents of one of the older(?) versions of the Latter Day Saint Endowment Ceremony be included in Wikisource? I'm not sure what other documents created more than 100 years ago would fall into a similar situation where an existing non-government entity would object to the publication, maybe the Masonic initiation ceremonies? Yes, I know that wikisource is not a stand in for wikileaks.Naraht (talk) 16:28, 1 March 2012 (UTC)

GO3 brought up a point at Wikisource:Administrators'_noticeboard#Nominating_to_get_OTRS_feed where he felt that WS was at risk from the "wingnut brigade" if we don’t limit to traditionally published sources. While it is clear that WS has been a targeted for some self promotion example, and potentially will be again. There is a strong trend towards self publication in electronic format only, which will potentially continue to climb. This seems like a good time to begin defining guidelines for publication for inclusion in WS. While this is clearly a topic for general discussion at Scriptorium I thought a first pass through a smaller group might help set some parameters and define some beginning ideas for the conversation there. JeepdaySock (talk) 10:59, 24 August 2012 (UTC)

I would treat electronic publication a lot like print publication. That is, normally exclude self publication in vanity press and personal blogs/websites while potentially including anything published via a reputable third party (so, academic papers such as PLoS, online articles from otherwise established publishers, etc). That leaves a grey area in the middle and requires personal judgements of "reputable." Maybe we can steal some of Wikipedia's "reliable source" policy? That would, however, risk going into 'pedia style conflicts of opinion. - AdamBMorgan (talk) 11:57, 24 August 2012 (UTC)

Without regard to peer review, previously unpublished Scientific research, is acceptable to include in Wikisource if an author meets Wikipedia:Notability (regardless of the actual presence of Wikipedia article on the author) and the work is released under a Wikisource compatible license.

I am not always the best wordsmith, so would welcome feedback or changes. JeepdaySock(AKA, Jeepday) 11:35, 9 November 2012 (UTC)

These as well as any artistic works must have been published in a medium that includes peer review or editorial controls; this excludes self-publication. or have been copyrighted and the copyright has expired

Support I agree on the rationale. This was never meant to limit our ability to host works on a common sense basis, it was intended to prevent Wikisource to be used as for modern vanity publishing ("original research" in 'pedia parlance). My only quibble is that the new wording still technically excludes some works: early science fiction fanzines, for example, have historical worth as old vanity press pieces and were not copyrighted at the time. That isn't really important right now, of course, it's just something I had thought about for the future. Assuming Wikipedia's "Ignore All Rules" rule applies, it could solve the problem if it ever comes up. - AdamBMorgan (talk) 11:49, 9 May 2013 (UTC)

In most modern legal frameworks, copyright is automatic, so the proposed clause would be largely void of meaningful content. If we wanted to bring Leaves of Grass into the fold, we could simply change the wording to "this excludes works that have only ever been self-published." Hesperian 11:56, 9 May 2013 (UTC)

Alternatively, add an exemption for sufficiently old vanity press (because, with time, everything eventually becomes acceptably historic). Not actually specifying how old would give us more flexibility in applying the rule either way. If it does need a number, 1964 is a useful benchmark (the year US copyrights became automatic) or something along the lines of a 30-year rule. - AdamBMorgan (talk) 12:08, 9 May 2013 (UTC)

As everything now is automatically copyrighted so an age limit would automatically be applied, until the copyright expires. Not seeing the rational for the 30 year rule as everything after 1964 is auto copyright and everything before is 50 years old. Jeepday(talk) 23:28, 25 May 2013 (UTC)

We have some self-published post-1922 works too, such as Gadsby, so it would be nice to clarify this and explicitly give some leniency. Wikipedia gives some examples(1)(2) of highly notable authors who went the self-publishing route. It seems like "vanity press" is trying to imply some other reason for exclusion of artistic works than the face value fact that the author oversaw production of the book, but I'm not sure what other than a measure of notability however subjective that may be. I have a 1962 (1973 reprint) no-copyright-notice political play by a notable author (Fredy Perlman) who was a participant in the print shop that produced the reprint. I also have a 1975-ish (cutting it close) no-notice essentially-auto-biography that accounts for eighty years of activism by a less-notable author. (Yes, I obviously tend towards political history...) What measure of notability of work, notability of author, or inherent notability through age, or lack of any/all of those, might we assign literature like that? Should we have a hard line at which we say, "Sorry, move it to Project Gutenberg"? djr13 (talk) 16:02, 15 January 2014 (UTC)

Hi. I would like to publish my (M.Sc) thesis (which is licensed with a CC-BY-SA license) on Wikisource. It is from University of Milan Bicocca. I would like to know if M.Sc thesis follow under "a thesis that has been scrutinized and accepted by a thesis committee of an accredited university." (I imagine that for Ph.D there is no doubt, but I am unsure about Master thesis). -- CristianCantoro (talk) 10:04, 19 October 2013 (UTC)

Predominantly it is for doctorates, though what we are wanting is the indication of a peer review and requisite changes were made, rather than an unreviewed, and not updated. — billinghurstsDrewth 10:53, 19 October 2013 (UTC)

I don't necessarily see that the classification is the pertinent criteria, it is whether a work is in the public domain. We are not wikileaks, and the discussion about their document collection has been held previously. Wikileaks search in WS:S. — billinghurstsDrewth 01:50, 3 October 2014 (UTC)

Well, "Criteria on this page are in addition to copyright criteria, which are described at Wikisource:copyright policy" is the first sentence of the page, so "whether a work is in the public domain", however phrased or defined, repeated throughout the text, is redundant, thought that's not always a bad thing. The consensus at [7] is mostly that it's a matter of opinion - mostly in favor of but also several opposed to wikileaks stuff that's PD, e.g. {{PD-USGov}} - certainly no consensus that it's a bad idea or legal no-no now, years later (though there was consensus that it was not a good idea while it was hot). Wikimedia already hosts lots of still-classified PD stuff. None of the sources I mentioned are wikileaks stuff anyway; they're all {{PD-USGov}}.