Hi Eroica! Welcome to your first Wikisource project! I noticed your project on The Annotated "Ulysses". Am I correct in saying this is intended to be a Wikisource-annotated edition of the first edition?

Yes. I am trying to reproduce a clean text with the pagination and layout of the 1922 edition, with annotations confined to the bottom of the page.

There are two main things that would help here. First, we have the ability here to host books and scans side-by-side so we can cross-check the text. Do you have access to a scan of the book, or access to the book and a scanner? As far as I can tell, the 1922 edition is rare and I haven't found any scans of it. If you do have a scan, it would be a valuable addition to the project. If you do, could you upload it to Wikimedia Commons or point me to it and I'll upload it for you? If it is a series of JPGs, I can combine them into a single DJVU file for you.

I have a modern Dover reproduction of the 1922 text, so I guess I could scan it page-by-page. But I will need some advice on what to do with the 734 images that this would generate!

I hope that your first major project here comes to fruition. If you need any help, be sure to ask here, on my talk page, or in the IRC channel. All the best, Inductiveload—talk/contribs 17:42, 13 June 2010 (UTC)

Thank you for the advice and kind words. Best wishes,Eroica (talk) 18:10, 13 June 2010 (UTC)

I'll reply here to keep everything together.

Just a word about our "normal" procedure: we don't normally strive to retain things like pagination, as this generally, but not always, is an artifact of the printing process, not the work itself. There are always exceptions and if there is a lot of value in doing it, I'm sure no-one will mind. What we usually end up with for a chapter/section/poem is a single mainspace page, such as this one, composed of several transcluded Page: namespace pages, with the references at the bottom and page number markers on the left. The original pages (complete with footnotes for that page) can be accessed one by one by following that link. Would that be what you are looking for?

If you do decide to scan, depending on what scanning software you have you can either produce a DJVU file directly, or you can scan to JPG, zip them up and upload them to a hosting service like Dropbox or Rapidshare, from where I can download, collate and upload to Commons for you. (I have a script that combines pages, and I can also split pages in half if you scan one spread at a time). The ideal situation for Wikisource is to have the text backed by scans, so others can verify the work. Of course, if you don't feel like scanning 734 or even 367 spreads, that is perfectly understandable, but it will limit the help anyone can give you. Cheers, Inductiveload—talk/contribs 18:42, 13 June 2010 (UTC)

Hi Inductiveload. I am starting to get organized to replace the Commons images with better quality versions, to correct and expand the file names according to the Commons naming rules, and add the proper categories.

In all of this, I need your input and kind assistance with the Commons environment, such as the ability to move/rename existing images that meet the new criteria and upload to replace. I haven't been able to replace, renaming seems to be a new option, and the commons is such a huge website that it takes several days for the checkers to act on requests.

My strategy would be; first rename/move the existing images, categorize them and apply the changes in the PSM pages of Wikisource. This would buy me time to get up to speed using GIMP to clear the images. I noticed that you use professional software and IrfanView is just not up such fine quality work (my opinion). Then, I would start with the simple (drawings). For high quality images I downloaded the .jp2 zip files.

For the new file names, I would like to use the following naming convention E.g: THIS FILE would be placed in Volume 17 Djvu #8 and renamed to any of the following possibilities, according to preference, with the volume and .djvu numbers numbers padded:

PSM V17 D008 James Clerk Maxwell.jpg or

PSMV17D008 James Clerk Maxwell.jpg or

P17 D008 James Clerk Maxwell.jpg or

P17D008 James Clerk Maxwell.jpg or

(Volume 1 to 9 would be 01 to 09)

The characters at the beginning to indicate the volume and Djvu page are crucial to identify the image location and place the images in their order of appearance. The .Djvu numbers are required because these numbers appear from the beginning to the end of each volume. Not every printed page is numbered, and the IA numbers use another scheme altogether, making placement very difficult and error prone.

Sorry for the long post, and your comments are most welcome. - Ineuw (talk) 19:20, 14 June 2010 (UTC)

I know that you're very busy and I went ahead with the naming scheme in the style of PSM V17 D008 James Clerk Maxwell.jpg. Since we have to provide a clear name, my coding should follow the same convention. Have a nice day/evening. - Ineuw (talk) 01:17, 18 June 2010 (UTC)

Hi. I am still very gung-ho about replacing the {{gap}} in the PSM project. I consider the mystery of unintended indents temporary, and confident that they will be resolved. If you feel the same way, I would like to request the running of the User:InductiveBot. If you respond in the affirmative, and since this is my first time, do I post a request on the bot request page?

As for the graphic rule, I redirected the {{gr}} redirect template to your design, since aside from the PSM, the {{graphic rule}} template is referenced only twice in a dormant project that hasn't been edited since 2007-2008(?), at which time it was only intended to be graphic but was not. - Ineuw (talk) 17:30, 21 June 2010 (UTC)

Why is it incorrect? I saw this with other editors. I try to portray original text. What is wrong? --Tommy Jantarek (talk) 19:19, 24 June 2010 (UTC)

It's an ugly template that doesn't work well (the spaces it uses are all different sizes for a start, some are "normal" spaces, some are em-spaces, etc). It was listed at Proposed deletions for removal. A nicer way to do it is to use {{gap}}, with a parameter like "2em" to produce a gap that is two em-units long. This is semantically more correct: you are inserting a "gap", rather than explicit spaces. At the start of paragraphs, it is generally usual to have no indentation, and apply indentation to the whole text using a div style instead. Inductiveload—talk/contribs 21:19, 24 June 2010 (UTC)

I'm afraid I don't know. I think there isn't one. I don't contribute much to the EB1911, so you'd get a better answer from someone at the EB1911 page. Inductiveload—talk/contribs 18:41, 30 June 2010 (UTC)

No, though as they are generic in their effect there is absolutely no reason why it couldn't be used as it gives the same result. They just sprung up through DNB as part of that project. I will take that conversation to the Wikisource talk:WikiProject DNB. — billinghurstsDrewth 23:50, 30 June 2010 (UTC)

We chatted ages ago about the problems I had been having with sidenotes family, and I fiddled unsuccessfully with overfloat family. Well, I went back to the drawing board and have come up with yet another set of templates (joy-o-joy) though I think that they are going to be manageable. Still in the final stages, however have a look-see at Copyright Act, 1956 (United Kingdom)/Part 1 and the use of {{outside}}, {{outside L}} & {{outside RL}}. With this set, I am only planning on bringing the sidenotes to the left. The parameters used are those that you have for overfloat, and the beauty of how I have configured is the depth that you set, is the margin-left factor in the div wrapper. Feel free to pick away and see what is wrong with it. — billinghurstsDrewth 17:02, 17 July 2010 (UTC)

I finished what I can do on the ocr of EB1911, a quick ocr quality report. I can't do ocr for volume 29, results are too wrong and I didn't find an ocr layer available on the net, but I didn't search a lot. Volume 11 is special, all attempt to download the djvu with an ocr layer failed, see the file history on commons, all file without thumbnail for this volume on commons are valid djvu and contains the text layer so downloading it from the history and uploading the text layer with the pywikipedia bot will work but I've no bot flag (and ThomasV said than the match & split bot doesn't work in this case), another solution will be to change this djvu file by another (and hope it'll work...).

Plate and figure. I tried to get automatically page number and bounding box of each figure to extract them but it's no reliable on monochrome bitmap, I'll try later with another source for EB1911 but not now, I take a break on this series. Ping me in a few months if I forget to do this :) Phe (talk) 09:22, 12 August 2010 (UTC)

I was wondering if you could look at my monobook and tell me why "indented-page" no longer looks like the prose class (with the 35em column that's centered down the page). It worked last night, but this morning it is broken. I can't explain it, and it looks like you know about CSS so you might be able to tell me what I'm doing wrong. Thanks!—Zhaladshar(Talk) 14:08, 13 August 2010 (UTC)

P.S. The prose class still appears like it should, but something is hosed up with the indented-page class.

Well, this is embarrassing. I was using that earlier this morning and it wasn't working. So I fiddled with it and it was even worse. Now you give me the same thing and it's working! I have no idea what happened, but thanks for your effort!—Zhaladshar(Talk) 14:21, 13 August 2010 (UTC)

Are you sure this is a good idea [1] and the previous diff too ? The previous change is a bit boring because some people was already complaining the width was too small, and you decrease it by 3em, what about to switch to 40em for max-width (or at least 38 to compensate the 3em used by the margin). For the justify removal I don't understand your comment, yes we are on the web, but the rendering is too ugly without justification, a test pagePhe (talk) 19:36, 13 August 2010 (UTC)

Width-wise, I'm not really bothered. Feel free to change it. The 3em padding is to prevent the numbers overlapping text on narrow screens. The actual width there isn't what I was worried about, and I do think it can be made a little wider, feel free the change away. The primary aim of this edit is "width"->"max-width" to prevent horizontal scrolling on narrow displays, and the left padding.

As for the justification, there is fragmented discussion all over the show about it, and I don't have strong feeling either way (but yes, your test edit is not great). Normally justification is used to reduce the amount of page space used by text, and it need hyphenation to break works so you don't get long lines with big gaps in them, and in a narrow column like this even the best justified text gets "rivers" of whitespace (and auto-justified text is not the best). If you really dislike it, you can add a line to your monobook/vector.css and have justification back for yourself, or you can revert the relevant change to the common.css (I deliberately made the justification edit a separate edit so it can be reverted if needed). There is a discussion about it on the talk page. If you revert, I won't really be bothered. Inductiveload—talk/contribs 03:21, 14 August 2010 (UTC)

Hi, do you remember this match&split ? were they split from the text version in one volume so it's normal than no pages are transcluded ? If yes, can you mark the Indexes as ready for proofread. Phe (talk) 16:28, 16 August 2010 (UTC)

Yeah, the text version was just a huge lump of unformatted text. Status has been changed. Inductiveload—talk/contribs 10:23, 17 August 2010 (UTC)

I noted you improving on my cruder attempt with an image, which seem to be getting worse, and your script for back-ground extraction. I'm taking a risk with my old box, and fading screen, but this convinced to get GIMP and start aiming something like your restoration. Was there anything else you did to get sharper effect? Also, had you considered using a transparent background? cygnis insignis 17:00, 23 September 2010 (UTC)

Hello Cyg! Sorry for the delay. I used the script here: User:Inductiveload/Remove-background-colour.scm, which is a Scheme plug-in for the GIMP. If you look, there is a commented-out line for an unsharp mask, which can sharpen everything up a bit. However, if the image is noisy, it will make the noise worse, so use with care. I had considered using a transparent background, but that means using PNG and the file size tends to get big, which is pointless, since I was grabbing the images which already have JPG compression off the IA. Hope this helps Inductiveload—talk/contribs 16:12, 13 October 2010 (UTC)

Thanks, I had some success with using the techniques. I couldn't work out how to get the script loaded, but that is probably due to a limitation of the build or, more likely, the user. Converting to PNG can inflate the the file size, greatly when there is fine detail; GIMP seems to iron out the wrinkles with JPEG that is sometimes fixed by conversion, so maybe it is pointless. cygnis insignis 17:37, 13 October 2010 (UTC)

In theory, you just save the script (as an .scm file) to your ~/.gimp2.6/scripts directory on Linux, or an equivalent place on Windows (I can't remember, but it's there somewhere). Then you restart GIMP or click Filters->Script-Fu->Refresh Scripts and it should appear in Filters->Enhance. Inductiveload—talk/contribs 08:39, 14 October 2010 (UTC)

I calculated the approximate number of paragraphs in the first 92 volumes of the PSM project and this would require the use of over 100,000 transcluded {{gaps}} to indent the paragraphs. Then, I came across a template in Wikipedia which had a cautionary warning - that it's best to avoid this number of transclusions.

Based on this, I asked Hesperian for a list of all PSM pages which contained the "gap". I checked each page, and removed some 300 pages where the template was used in contexts other than indenting paragraphs.

If you can and want, could you activate your bot to remove the remainder of the gaps from THIS CLEANED LIST which contains 549 pages.

I temporarily changed the indent to "0" in the template to better see what I was doing, but this can be changed back at any time. Personally, I would leave out indents altogether until such time when an inline template would resolve administrators' concerns with spanning over pages (as I understood the problem).

Otherwise, regarding the use of {{nodent}}, I don't know what was decided upon, but I will implement any procedure or template that has been generally agreed upon and recommended by the powers that be. Take care, and sleep well. - Ineuw (talk) 04:27, 2 October 2010 (UTC)

Hi Ineuw, sorry for the delay! I'll put it on my list. Just to be clear: the aim of this task is to totally remove all {{gap}}s from these 549 pages? I agree with removing the indent from the template for now since it seems to disagree with the {{dropinitial}} template. Inductiveload—talk/contribs 16:16, 13 October 2010 (UTC)

Hi, sixth sense told me to check this post. Honest!.:-). About the gaps - Yes. Please remove the gaps from the list of pages. The list contains no pages where I used the gap in other contexts. Thanks again. - Ineuw (talk) 21:09, 13 October 2010 (UTC)

Hi. Many thanks for the help. When I proofread all article title pages up to and including Volume 25, (currently, I am at Volunme 18), I will request a new gap list from from Hesperian, and will re-check. - Ineuw (talk) 20:45, 23 October 2010 (UTC)

Hey, I noticed an issue with Template:US CoA case info that I was hoping you could help me with. It works great for cases in the 2nd and 3rd series of the Federal Reporter, but gives the wrong citation for cases in the First Series. Here's an example. It should be 293 F. 1013, not 293 F.1d 1013. So what we need is for that series field and the letter d to be omitted when the series is 1. I'm not sure how to do this, as my wiki knowledge is really limited to basic formatting. LegalSkeptic (talk) 00:14, 2 November 2010 (UTC)

Hi! I've made the change you've requested, can you check if i got it right. I also formatted and commented the template code, so hopefully it is a little easier to read and modify now! Cheers, Inductiveload—talk/contribs 16:58, 5 November 2010 (UTC)

Yes, it looks like they can all be made into {{***}} or {{***|char=.}}. I'll get on changing them over tomorrow. Other than that, should we rename {{asterism}} to {{***}}? Inductiveload—talk/contribs 04:20, 7 December 2010 (UTC)

Personally, I would think that {{asterism}} should give ⁂ which may mean a trawl through the pages. Not something for which I am volunteering at this point. — billinghurstsDrewth 05:32, 7 December 2010 (UTC)

What do you think of this Three Books of Occult Philosophy/Book 1? Do you think we should transclude this TOC on the main page or reduce the current main page TOC to the preface material and links to Book I, Book II, and Book III, with full TOCs there, transcluded from the index?--Doug.(talk•contribs) 05:47, 16 December 2010 (UTC) P.S. I tend to think the latter, full transclusion only on each Book's page.--Doug.(talk•contribs) 05:48, 16 December 2010 (UTC)

Yeah, if the original work has an index system, it it better to use that, so let's have move the detailed TOC to the /Book XX subpages. Inductiveload—talk/contribs 20:12, 16 December 2010 (UTC)

Hello, Inductiveload. You have new messages at [[User talk: Ineuw 02:39, 18 December 2010 (UTC) |User talk: Ineuw 02:39, 18 December 2010 (UTC)]].You can remove this notice at any time by removing the {{Talkback}} or {{Tb}} template.

In my mind it's simpler than trying to format several thousand quotes in a standard way. This way, you don't need to italic the work, small-cap and indent the the author, make the number smaller and indented, etc etc. You can just paste the blank template from {{Hoyt quote}}'s doc page and fill it in and the formatting is done automagically. Otherwise, contributors will be so sick of typing formatting by page 100 that they will stop formatting in a standard way and the work's consistency will suffer. With a template, you can just paste in the blank one, fill in the fields without formatting markup (except for the occasional special case) and the template will do the rest. Moreover, if the formatting needs to be changed, the template can be altered and all quotes will "jump to" and update, which is better than going though 10000 quotes by hand and changing ":" to "{{gap|2em}}" or whatever. This is almost exactly the situation that the template system was designed for: standard formatting of many identical chunks.

If you would like a button on your toolbar to insert a blank template, I can make one for you. Inductiveload—talk/contribs 17:03, 18 December 2010 (UTC)

I'll give it a try, but I think adding all this formatting will make it more difficult to copy these to Wikiquote. However, I understand that that is of secondary interest to this project. Thanks for the Javascript. BD2412T 20:33, 18 December 2010 (UTC)

Using a template can also help you automate the transfer to Wikiquote, as the data is more structured. When the time comes, I'd be happy to help you by writing a script to transfer quotes across. I already have bits of code suitable for parsing multi-parameter templates. If each quote is in its own template, the script will be able to locate and parse the relevant quote more efficiently, and could then (for example) output ready-formatted quotes in the Wikiquote style (whatever that is, I've never been there). If you think that could be helpful, please get in touch and I'll do my best to help! Inductiveload—talk/contribs 14:09, 19 December 2010 (UTC)

I have just realized what a genius you are. With this template, we can bot-parse all the quotes by theme and by author. Can I ask that invisible parameter be added for the Hoyt's theme (since all quotes are grouped under a particular theme) and for the page the topic appears on in Hoyt's? BD2412T 15:58, 19 December 2010 (UTC)

(blush)

If you don't mind typing in the topic to every quote's template, you can just type the parameter in:

Because the parameters are named, but there is no code in the template that uses those names, nothing will render onto the page. However, the metadata will be preserved in the Wiki markup of the page, which is what a bot will see when parsing it. Inductiveload—talk/contribs 16:45, 19 December 2010 (UTC)

Cool. We can parse things out in multiple passes. BD2412T 00:09, 20 December 2010 (UTC)

I notice you adopted the strike-through solution for this, it doesn't work with gap, at least on the platform I'm currently limping around on. It does work with non-breaking spaces, but you would need a bunch of them. Hesperian and I discovered that series of dashes wont always work either, I have used a box building character if that helps. Note also that the position is higher or lower with all these solutions, so it looks a bit untidy when there are dashes in the line. I usually just put a couple of emdashes and try to put it out of mind, but hope you lick the problem. cygnis insignis 18:37, 26 December 2010 (UTC)

How do you mean, it doesn't work with {{gap}}? It works for me: —————— Could you be a little more specific? Inductiveload—talk/contribs 19:08, 26 December 2010 (UTC)

Only the space is rendered on the screen, no line appears. Would you like me to test other browsers? cygnis insignis 19:16, 26 December 2010 (UTC)

How about with a lead and trial & nbsp;? Like which is & nbsp;{{gap}}& nbsp;, and that would allow the lengthening, and gap could start at zero and work its way up in length. — billinghurstsDrewth 04:21, 31 December 2010 (UTC)

Clever, that seems to invoke the line and allow control, at least it does when I tested here. Unfortunately these solutions don't copy/paste, so I will probably continue to use one or two em dashes. cygnis insignis 04:50, 31 December 2010 (UTC)

We do things like that with {{redact}} so maybe it is a tweak we can make there as an option to easily allow a strikethrough of a specific length. I have no issues with the double mdash, though somewhere on WS:S there was a one-time comment that a person was seeing gaps between mdash, which is why the strikethrough typographic was presented as a solution. — billinghurstsDrewth 06:06, 31 December 2010 (UTC)

I see a broken line too. It may be the font, but unlikely to be the browser, as Firefox is common, though my setup is not unusual. This is why I tried to find another solution, which is still not ideal. Inductiveload—talk/contribs 06:09, 31 December 2010 (UTC)

The earlier discussion was at User_talk:Hesperian/Archive_4#right. Two emdashes will copy paste, convey the same information, and is convenient and simple. The risk of a broken line is better than no line at all. In my experience the length of the dash is a product of proper and carefully crafted text justification (in print), I regard this as unachievable and undesirable so I ignore the problem. cygnis insignis 07:28, 31 December 2010 (UTC)

I thought a little deeper about Ineuw suggestion and perhaps I came to a general, useful statement.

When a html block element (table; div; but p and list too!) is splitted into two or more pages, simply open it explicitely (try with p tag!) into the text of the first page, and then close it into footer section of the same page; then open it again into the header section of the following page, and close it explicitely into its text section. If the blok goes on into more than two pages, open and close blocks into header and footer but the last page, where you'll close it into the text section.

Transclusion engine will transclude only the first, opening code and the last, closing code; so the block will be merged into one, and any attribute of opening tag will be applied to the whole block.

There's nothng new about table and div tags; but perhaps few discovered than the same trick runs with p and list element too. That's all! --Alex brollo (talk) 08:39, 30 December 2010 (UTC)

Thanks for the note! We have some split templates such as {{block center/s}} and {{block center/e}}, but I hadn't considered using it for <p>, I shall certainly keep it in mind when I next run into that kind of problem. Inductiveload—talk/contribs 01:16, 31 December 2010 (UTC)

I don't know if you too use here "the theatre convention" used into fr.source and it.source, using ; and : tags for dialogs; if you do, take a look at the code of it:Pagina:Dialogo della salute.pdf/20 to see an example of how good, simple and effective is the result of our "discovery" when applied to html list elements <dl><dd>. A really fresh use: Xavier121 applied it yesterday, when our "discovery" dates two days ago.... It.source community is small but fast! :-). Then a suggestion for new year: if you like, pay attention to what's happening into a new, small, but very interesting source project: vec.source. Candalua is something like a genius IMHO! --Alex brollo (talk) 07:40, 31 December 2010 (UTC)

Your user page is something like a candy store! I've to study and test all from your tricks and scripts! --Alex brollo (talk) 08:25, 30 December 2010 (UTC)

:-) I'm glad you think so. There are a few more scripts that are still in the development stage, and they'll certainly get added when they are ready, so keep an eye open for new stuff! Inductiveload—talk/contribs 01:20, 31 December 2010 (UTC)

I already shared my "discovery" into it.source village pump and I hope you don't mind to have many pairs of eyes pointed on it. Happy new year!--Alex brollo (talk) 07:28, 31 December 2010 (UTC)

Not at all, thank you for your interest. It is always good to have people seeing or using things I've made. Remember, if you have any suggestions, problems or requests, please feel free to ask me, and I'll see what I can do! Inductiveload—talk/contribs 16:36, 31 December 2010 (UTC)

While studying the template code, I copied it into a personal sandbox; and I add some comments into its talk page. I'm a little worried... a new "header reform" should be done into it.source, and I know from experience how hard this task is, consider that previous policy has been, to build a lot of different header templates, one for each project (Literature, Theatre, Scientific works.... and many other) and convert them into a single, customizable header template has been really hard. The result is a terribly complex header code, I hate to review it! :-( --Alex brollo (talk) 08:49, 3 January 2011 (UTC)

Please take a look to "alternative usage" of {{Left margin}}. I'm implementing the trick into as many it.source formatting templates I can, and users are using it without so great troubles. The big advantage is that such templates are "well formed", since no text content is mixed with template parameters: the whole text content is "outside" the template code. while back-compatibility is saved. --Alex brollo (talk) 13:13, 3 January 2011 (UTC)

I just discovered your table on templates. The candy shop has many rooms into it. :-)

A question: in your opinion, how many calls justify the existence of a template? Just some? Dozens? Hundreds? more? --Alex brollo (talk) 16:26, 4 January 2011 (UTC)

I've never used "left margin" before, so I'm probably not the best person to ask about it. Also take a look at {{hii}}, it's quite similar.

As for the number of calls, it very much depends on the function. If it is a very complex object (say a strange arrangement of stuff particular to one work) that must be kept consistent, then any number over 1 could happily have a template. Any thing that may need changing in future will be easier to deal with after templating, as will anything which needs to be computer-parseable from the wikitext, such as headers). A very simple template such as {{ae}} doesn't need a template for any number of uses (we keep it becuase some people like it, but it gets subst'ed out occasionally). Anything in between is up to you. There's probably no harm in overtemplating, as they can usually be easily replaced or subst'ed later on, but undertemplating is a recipe for misery if you make a mistake. Inductiveload—talk/contribs 19:51, 8 January 2011 (UTC)

Thanks for comments about templates in general. About {{Left margin}}: the interest of the edit was not IMHO in the effect of that particular template, but as a possibly general trick to convert "opening and closing" templates into "open only/open an close" templates. But probably this trick could confuse users not so familiar with html. --Alex brollo (talk) 00:18, 10 January 2011 (UTC)

Take a look if you like to this page: it:Pagina:Hypnerotomachia Poliphili.djvu/135. Alternate text is to be written (I'm looking for an excellent polyglot...) but perhaps there's something fresh since it merges ImageMap extensione and it:Template:§ that I impored here as Template:Anchor2. My suggestion is (if you like the Anchor2 trick) to add this code to Common.css here:

That's nice! Shame about the non-compliance, but that's OK for non-critical eye-candy. Would this be appropriate to backport into {{anchor}} or {{anchor+}}? If that can easily be done, I can add the CSS to the Common.css so it will work for anyone with a compliant browser. If possible, I'd rather not have 3 different {{anchor}} templates.

As for the ImageMap thing, that looks cool, how do you set the boundaries? Guesswork and trial and error? Or is there an easier method? Inductiveload—talk/contribs 01:43, 12 January 2011 (UTC)

I have integrated some of the functionality (but not the whole paragraph anchoring, as missing second parameter gives anchor text=visible text) into {{anchor+}}. I have also added the class to Common.css as "HighlightedAnchor". See what you think. Inductiveload—talk/contribs 02:09, 12 January 2011 (UTC)

:-) You make me happy!

About anchor highlighting trick: feel free do test anything. Then I'll come here again and I'll study and import your advancements! Please take into account only that {{|tlAnchor2}} is used into Horses and roads, so let its parameters and output unchanged (but feel free to edit anything inside, if you ensure back-compatibility).

About ImageMap boundaries: there are some easy online tools. I used thit one ( but its URL is wrapped into the code! your attention was focused away ...): http://www.maschek.hu/imagemap/imgmap; but some other tool to explore is listed into extension doc. There's some need to review the wiki code got by that tool, anyway. An important point is to pass to any of those tools the link to the full-size image, or a copy of full-size image. I already tried ImapeMaps into fr.source, where there were some excellent old engravings with lots of explanatory text inside. Scientific works have lots of it. Nevertheless, merginig ImageMap and highlighting anchors is new.

I downloaded and run AutoHotKey following your suggestion. Yes, you are right, it's a necessary tool. I'm browsing too your scripts, I'm interested about OCR layer text, I'll dig a little more inside, then I'll ask your help for sure. Have you a link to a good description of djvu text layer? I can guess some points from "djvu-wise text conversion" and from djvu.xml files of IA (I use them to extract and upload text into pages with horrible DIY python scripts), but I'd like a full, formal description of it. --Alex brollo (talk) 08:43, 12 January 2011 (UTC)

Yes, I have never seen ImageMap and highlighting paired up before. Very Web 2.0!

I'm afraid I don't have, and don't know where to find, a full technical description of the DjVu file format. I just use the available tools (DjVuLibre) to handle them. Inductiveload—talk/contribs 00:11, 15 January 2011 (UTC)

Good! I uploaded into {{Anchor+/Sandbox}} my version to recover the external span an all paragraph options. It should be fixed since there's no need for parameter 2 when span or p options run. IMHO its very important to avoid to pass text as a parameter when possible. When parsing wikitext with a bot, browsing text for fixing anything, it's really important to have as much text as you can outside templates! Usual general template syntax mixes attributes and contents ... a clear example of "NOT well-formed code". --Alex brollo (talk) 11:31, 12 January 2011 (UTC)

I have been looking through the Annual Reports of the Interior, which are from the US in the late 1800s. There is only 1 volume on here, and I would like to bring them all on... but the files are only 1 page each (you can't download the whole report at 1 time). I was wondering if your bot could help me out with putting all of the seperate files together into an index file?

Certainly, page-wise scraping is one of the things I have recently been doing with the bot, and I'd be happy to help. Could you please furnish me with a link and as much relevant detail (eg. number of volumes, length of volumes, etc) as you can find? If you want real-time discussion, feel free to drop in on IRC where I am often around. Cheers --Inductiveload—talk/contribs 15:14, 14 January 2011 (UTC)

The problem above (see my talk page for more info and links to the catalogs in question) is an old one (Scriptorium archived by now) and concerns the Hathi Trust archives. Even when I was able to download an entire volume of one thing or another, the PDF would be so bloated that any attempted online djvu conversion would eventually hang. It would fall apart before finishing locally as well.

Within the past few weeks, the ability to "trick" the Trust and download entire volumes as 1 PDF was blocked. No solution has been found AFAIK. I was an idiot and didn't download much of what could now be easily matched & split/converted to pages on the Executive Order project (see Juris database & HathiTrust sections) too. -- George Orwell III (talk) 01:25, 15 January 2011 (UTC)

Hmm, this looks like I can scrape using a page-by-page method. I'll look into it, thanks for the info. Inductiveload—talk/contribs 01:15, 15 January 2011 (UTC)

You didn't have to put the OCR's in. - Tannertsf (talk) 18:16, 16 January 2011 (UTC)

OCR'd text allows for matching and splitting. If you have the text for each page (separately), then we can skip OCR, and paste the text in by bot. If all you have is the text as one big block, then you need an OCR layer for the match and split process to work. Inductiveload—talk/contribs 19:23, 16 January 2011 (UTC)

I saw Hathi has the OCR already done (and better than mine), so the script now inhales that text and adds it to the DjVu. This will make it easier to edit, as it will pop up when you create the page. Inductiveload—talk/contribs 21:27, 16 January 2011 (UTC)

Not sure whether and/or exactly how useful the template is, however, it is worth a look for a quick generation of subpages. It also might be something that we can subst: and apply some smarts for quick generation of a reasonable subpage listing.<shrug> Anyway, I am off to import the pages and tidy to style. — billinghurstsDrewth 22:30, 14 January 2011 (UTC)

Hi. To begin with, thank you for the wonderful shortcut system template {{ts}} of which I’ve been making extensive use. I added definitions on the basis of frequency of need and yesterday I added two more definitions and they don’t seem to work, although I am quite sure that I did it correctly. Ther are: "fwn" for font-weight:500; (normal) and "fwb" font-weight:600; (bold) both with words and the numeric values, and they are not working, and I don’t know what I am doing wrong. Could you kindly enlighten me at your leisure? There is a sample table here to try. Many thanks in advance. — Ineuw talk 22:18, 15 January 2011 (UTC)

fwiw - I have always been under the impression 400=normal & 700=bold. After changing the parse to these values I'm seeing your table "bolder" under IE 7 & 8 (of course no change under IE 6). To my crusty old eyes, however, there is little to differentiate the "boldness" between the two values when rendering your table under IE7/8/. -- George Orwell III (talk) 23:05, 15 January 2011 (UTC)

Hi GO III. I don’t think it’s a browser issue (am using Firefox 3.6.13 and just now I tried IE 6 and there is also no difference). However, I just altered one of the shortcuts to {{ts|font-weight:bold}} in the above mentioned sample and that works. The problem seems to be elsewhere. — Ineuw talk 01:47, 16 January 2011 (UTC)

Fixed it. It was a typo on the /parser page that I missed at first-look too. We had fwn right but had fsb as the parameter instead of fwb. -- 02:06, 16 January 2011 (UTC)

Gracias. . . . and I looked and looked (and looked) for a typo last night, and didn’t see anything wrong. :-) Ususally, that’s the first thing I look for, because I am prone to such errors. :-) Again many thanks. — Ineuw talk 02:18, 16 January 2011 (UTC)

After some reflection, I'm thinking returning to the text parameter-values (normal & bold) over the current numbered ones might be appropriate and in-line with the rest of the parameter naming scheme being used. If we want font-weight to equal '600' rather than 'bold', then a new parameter named fw600 should be added in my view. What do you think? Should I revert the numbers for text? -- George Orwell III (talk) 02:22, 16 January 2011 (UTC)

Not at all - my original preference were the words. I ended up using numbers in desperation (while still not seeing my error), and the values of 500 and 600 I looked up somewhere on one of the wikis, but if one checks the w3 schools info, the number range is quite great. — Ineuw talk 02:34, 16 January 2011 (UTC) - P.S: In fact words may be better for different browsers, — Ineuw talk 02:35, 16 January 2011 (UTC)

Currently if you create a new section on a talk page in mainspace and there is no {{textinfo}} on the page, it will preload one, this doesn't seem wanted, it should only preload if the page is blank as we don't want {{textinfo}} being way down the page and more importantly, it adds two keystrokes to get rid of it. For an example, click "Add topic" on Talk:The Tragedy of Hamlet, Prince of Denmark. sDrewth said you'd be the best to fix it and Jay said "simple solution is to check whether url.href includes 'section=new'" in the preloader gadget.js but didn't volunteer to make the change. ;-) —unsigned comment byDoug (talk) .

Pardon (not sure if this is similar or not) - is there anyway to stop offering up header2 instead of our normal header template for new pages created without any header template - I vaguely recall continued use of the h2 redirect was "pointless" to put it mildly.... or is h2 being used to track newbies maybe? -- George Orwell III (talk) 00:33, 17 January 2011 (UTC)

I don't get header2 in my preloaded pages. Is it lingering in some JS of yours? Inductiveload—talk/contribs 02:52, 17 January 2011 (UTC)

That's what I meant about not being sure - I don't "see it" here either no matter what I try. Still, I keep seeing header2 being applied upon 1st creation and is noted as such in the edit summaries under the history tab too. I stopped looking for why & just thought a handful of infrequent contributor's were atill manually applying/copying/pasting it somehow in error. Seems like all this does now is artificially bloat the number of links/transclusions back to the main header template so I figured I should mention it after reading the above. Is it possible they are merely copying it from the ambox bang-message for lack of a header template or something? -- George Orwell III (talk) 03:28, 17 January 2011 (UTC)

I have been through every blinking feed-in/base page at WS looking for any remnants of "header2" and believe that I got them all over a year ago. So it is either off-site information, or copying of other templates. So short of asking someone where they found it, we are going to continue to grope around looking for the outcome. I thought of writing a filter to look for/identify/manage, however, it never got further than a "wannabe". — billinghurstsDrewth 05:11, 17 January 2011 (UTC)

Irksome, isn't it? What is up to now? ~8,000 so redirects? There is a case here to temporarily split it off and be a redundant stand-alone to header until a Bot can make a dent in converting a slice of pages from header2 to header. -- George Orwell III (talk) 05:23, 17 January 2011 (UTC)

I ran the backlinks script more or less successfully. It only spits out about a dozen pages, is that right? Maybe you've taken care of all others by now. Of the ones it spits out, some actually have a backlink, so I don't follow. The output I got from my first iteration is here, my second iteration is here.--Doug.(talk•contribs) 11:11, 16 January 2011 (UTC)

OK, as we saw, those outputs were failing after only a few pages due to a page that dind't exist. The revised script is running and I'll place the output here when it finishes.--Doug.(talk•contribs) 19:11, 16 January 2011 (UTC)

Amazingly, we don't appear to have page scans here or even on Commons for Hamlet! There is one DjVu on Commons but in Català! Additionally, our text may have issues from the comments on the talk page. I think we should prioritize getting the copy from SCETI, it's in the table I've been working on in your userspace and it's from the 17th C. and appears reasonably good as where it leaves out dialogue for the stage it still notes the original in parentheses. Should we add a parameter to the table for priority? I'm just putting them in as I work down the list.--Doug.(talk•contribs) 11:28, 16 January 2011 (UTC)

Incouraged from your links, I'm going deeper and deeper into djvu text layer as produced by output.txt djvused option, with interesting, promising results. Is there a page where to share this kind of talks? --Alex brollo (talk) 00:43, 17 January 2011 (UTC)

I met User:Hesperian, another fellow interested into such kind of stuff. VERY interesting. I'm writing my layman experience here: User:Alex brolloBot/WYSIWYG djvu project, just a sandbox that will be moved if something interesting will emerge. The most interesting and new idea could be, to add a metadata layer to djvu files (metadata fields are free; data from header and information template would be obviuosly interesting, but I presume that the whole wikitext of a validated page text could be uploaded into e metadata field!). Another interesting thing IMHO is that reCAPTCHA stuff is now clear: it's only a matter of coupling mapped image of words and underlying text. Sometyhing that could be implemented for internal wikisource use. --Alex brollo (talk) 09:26, 25 January 2011 (UTC)

I'm uploading them one volume at time, as and when I have time to grab and convert the images. 7 volumes are ready for editing, which should be sufficient for now, until I can get them all uploaded. Inductiveload—talk/contribs 02:05, 21 January 2011 (UTC)

Pardon - I need to get in on some of this scraping action when you get the time. The largest & most pressing publications are listed with their Hathi Trust links over in my Needed section. -- George Orwell III (talk) 05:53, 21 January 2011 (UTC)

Sorry for being so short with you yesterday - didn't have the time or energy to get more into it than that. :-(

The thing is not so much to do with using the template - it has to do more with organizing and collecting the relevant citation info (which is just as freely lacking on the internet as the actual content is). Once that info is collected, the normal header could be "swapped" in if need be.

The real part that I could use some input/help with is creating the date ranges of the 44 Presidential Administrations that this and other templates can querry for an array(?) of defined values in returns for use as parameter values in "sister" templates or outputs of a repetitive nature found in government serials/collections. We've done this with the signature dates of Executive orders and the authoring Presidents to automate link creation, associate signature image files, create accurate lists, etc. This was a less-than-elegant attempt by me to utilize a slice of the existing WS content that I found when I first came across WS and the literal mess that was the EO category at that point in time (look at it now and you can see works of year xxxx and authorship of the work are strictly tracked). The most efficient way to generate similar template generated results would have been to querry Administrations (only 44 presidents) rather than if an EO number is less than or equal to a range of EOs per a certain President per the particular year in office (over 13,500 orders to date). Setting up 44 separate "databases"(arrays?) based on start and end dates of Administrations is what I was driving towards in theory.

Let's take President Lincoln's Administration (potus_16 [i.e. 16th President] for example. Start date and End date for his Administration's term in office was March 4, 1861 to April 15, 1865. If this date range existed in some form for any template to determine if an inputed date of some government work falls within that range, a whole host of authorships could be attributed/outputed/cataloged more easily for various WS hosted works pertaining to Lincoln and that era. Acts of Congress, Proclamations, etc. that he signed during that date range, works by members of his Cabinet/Administrationmade during that date range, biographical, historical, etc. works covering that date range, and so on... I'm sure you get the premise.

Any guidance/examples on even just getting a specific mm-dd-yyyy input just to return the basic WS link of the Author: page of the President who's term-in-office date-range that input date happens to fall between would be all that I need to get started and then I could work backwards from there hopefully. TIA. — George Orwell III (talk) 02:28, 23 January 2011 (UTC)

That's OK, you looked busy at the time ;-). I'll give this some thought. The "best" (IMO) way would be to use {{#time:U|1 May 1980}} to generate a Unix time stamp (seconds since 1 Jan 1970), and use the output to govern the selection of a president. However, the MediaWiki extension required falls down for negative dates. For example, {{#time:U|1960}} gives about -310,000,000 (correct) but {{#time:U|1950}} gives the time right now. I'll ask in the MW IRC channel and see if this is a known defect. Inductiveload—talk/contribs 21:41, 23 January 2011 (UTC)

Actually, it seems to work, it was parsing the year as a time. So, where do you want me to put the template? I don't have the year data to hand, but I can do a couple of them. Inductiveload—talk/contribs 21:54, 23 January 2011 (UTC)

I'm more than a little confused - if we can get away with throwing away months and days altogether for template querries and using the just the (((year))) for all the years where no transition from one president to the next took place (ie the day the president took the oath of office), wouldn't that make life a whole easier for starters?

Also, don't both greater than start date AND less than end date have to be determined to be true at the same time to result in the default assigned value? I know I'm not saying it right but I hope you can gleam what I am really asking here while you laugh. — George Orwell III (talk) 02:12, 24 January 2011 (UTC)

Well, yes, but you'd still have to handle the cases when a transition did happen. And yes, you do need to do a two-way comparison, as you can see I've attempted at the page I linked. However, looking at the potus-eo/data template, it is basically the same thing, except I didn't realised you could use the "and" operator in an "#ifexpr", so yours is neater. So, yeah, all it needs is conversion to run using #times rather than eo numbers, and that can be our template! Inductiveload—talk/contribs 03:35, 24 January 2011 (UTC)

I've used the "and" syntax at Template:Potus-eo/sandbox/data now, you can take a look (but I think a date is wrong somewhere). However, I am concerned that there are very many #ifexpr statements and that it is a bit clunky. At least, we probably want a wrapper template to handle the calling of #time so we only do it once. I'm off for now, Inductiveload—talk/contribs 03:44, 24 January 2011 (UTC)

Thanks for that. I 'm fairly sure that something is being over looked & needs more tweaking; a larger negative number translates to going further back in time and at this hour I not much good at keeping my math straight. I'll come back to it later too. — George Orwell III (talk) 06:29, 24 January 2011 (UTC)

←Getting back to Potus-eo & standard header conversion, I've tested dozens and dozens of possible settings and the one that "most-closest-maybe-almost" contributes to the possibility of conversion at some point is up in your sandbox3 (points back to one of my sandboxes). The problem is the still the same as before & lies with the 2 table approach basically. I believe the current CSS is still based on some old scheme that didn't have any "floating" divs like 'shortcuts' or 'plain sister' in play. I think this would be fine if it was all under one div in one table but with 2 tables under the same div, nothing lines up under the current plain old CSS settings nevermind dynamic layouts.

Still not ready for primetime until some of that is addressed :( — George Orwell III (talk) 13:08, 28 January 2011 (UTC)

Wondering if it would be too much to ask... If I [ever] get around to finishing scans of text from Mine and Thine (1904), would you be able to set them up here on WS? (Remember, only the 1905 reprint version is available online) I think I know how to get them into djvu format via the online converter, but how can I get the file to you once I'm done (and if you're willing)? The new Index could have the name, Mine and Thine, Coates, 1904, and once set up, I'll just copy and paste the info from Index:Florence Earle Coates Mine and Thine (1904) to the new index. Thanks again, Londonjackbooks (talk) 12:41, 25 January 2011 (UTC)

I'm sorry, I don't quite understand what it is you would like me to do. Do you want me to add OCR text? Or merge the images into a DjVu? Inductiveload—talk/contribs 13:46, 26 January 2011 (UTC)

Apology is mine for not explaining myself better...There is no 1904 1st edition Mine and Thine (Coates) available online, so I have to scan pages from my book. I'm trying to figure out the easiest, most efficient way to get the jpeg images (once scanning is complete) converted into book-format (as DjVu) instead of uploading each image one-by-one to Wikimedia like I did with Index:Florence Earle Coates Poems (1898), which was very tedious. I have used the online converter (jpeg to djvu) before, and am able to convert to djvu, but since my computer can't open up djvu files, I have no way of knowing if it was a successful conversion or not... Hoping you can give me some advice? Thank you! Londonjackbooks (talk) 15:55, 26 January 2011 (UTC)

Decided to continue on the tedious route (why stop now?)... Just waiting for the images to render on their pages... Don't think I did anything differently than before...? Londonjackbooks (talk) 16:14, 17 February 2011 (UTC)

Not a problem! I took a look at your conversion script, and it is over my head, I'm afraid (as are most technical things here on WS, etc.)! I could upload photos more quickly than I could figure out what to do with the script! Perhaps in the future... But I appreciate the direction! Hoping to have the complete works of Mrs. Coates completed here within a couple months... barring distraction! Thanks again, Londonjackbooks (talk) 14:14, 2 March 2011 (UTC)

Thought I would pose this question to you since you have created sub-category disambiguation pages, et. al... I have started creating {{Versions}} pages for the poetry of Florence Earle Coates (example). When I am finished, I will have near 300 or more versions pages of her poetry since her Poems Vol I & II (1916) contain poems from past editions which are also listed here at WS. I notice that {{Versions}} and {{Similar}} pages are all "lumped" together into Category:Disambiguation pages, and was wondering if separate SUB-CATEGORIES could be created to make a distinction between the two.

Okay, I think I see... {{Versions}} render under Cat:Disambiguation pages, and {{disambig}} render under Cat:Mainspace disambiguation pages... {{Similar}} just links articles to a disambiguation page,-whether a "version" OR a "disambig"...? Yes?Londonjackbooks (talk) 03:47, 31 January 2011 (UTC)

I have created the sub-category "Category:Florence Earle Coates (versions)", but in hindsight, it doesn't seem like a method you all on WS would think appropriate... Any thoughts? If you approve of the subcategory, should it be placed under Cat:Disambiguation pages or Cat:Author disambiguation pages or other? Thanks, Londonjackbooks (talk) 03:47, 31 January 2011 (UTC)

I don't see the value in that sort of manual sub-category as a means to capture disambiguation pages. What information are you trying to capture of information are you trying to extrapolate? That is what benefits are the outcome of the process? — billinghurstsDrewth 11:11, 31 January 2011 (UTC)

Hmm..It is mostly for my benefit at the moment as I navigate back and forth and make corrections/revisions to Mrs. Coates' {{Versions}} pages; but since I realized that Versions pages do not render under the same Sub-Category as the sea of Similar pages, it shouldn't be as difficult to navigate as I thought... Thank you, Londonjackbooks (talk) 15:18, 31 January 2011 (UTC)

We should be using http://toolserver.org/~magnus/catscan_rewrite.php and embedding a "search" if we need to drill down further, rather than overly categorising. When this bloke is back from whatever sojourn he should be able to help develop a suitable local templated form, it is something that we do need at WS. — billinghurstsDrewth 15:26, 31 January 2011 (UTC)

While exploring tl|Populate, I found that en.source does not (as I see, but perhaps I'm wrong!) categorize works by author. This is one from many differences between it.source tl|Intestazione and en.source tl|header; it.source template automatically does it. Why? Did you discuss this issue? I'm very interested about, since we perhaps are over-categorizing. Or am I simply missing something? --Alex brollo (talk) 10:30, 4 February 2011 (UTC)

I don't know why we don't categorise by author but I don't think it would be a very good idea, as the same author can have a lot of different names, and we usually handle that by redirecting to a single author page. If the author field was also used to categorise the works, we would have to have a lot of category redirects, which are a strange beast, and it would add another housekeeping task to the (long) list that we have already.

The way we "categorise" by author is to list the work on the author's page, which is, admittedly prone to mistakes and omissions. But if we had a category for the author and the author page, both would contain near duplicate information - a list of works by the author.

One of the items on my long term list for bot-checking is to look for any main-page works which do not have incoming links from author pages. However, I do not often find that a work has not been listed on an author's page, more frequently, I find that the author page doesn't exist, which would not be solved by categorising to a non-existent category.

Also, as far as I can tell, categories aren't thought as useful here as they are at Commons (especially) and enWP, and we have an alternative system for "categorising" works - the Portal: namespace, which is a more flexible solution to categories, and is based on (and extended from) a published system (the Library of Congress Classification) rather than an ad-hoc category tree. Inductiveload—talk/contribs 06:23, 14 February 2011 (UTC)

Hi. Finally wrote my first hotkey (phew as I wipe the sweat off my brow), and I would like to ask if you have any Autohotkey scripts used in Wikisource proofreading, and which you don’t mind sharing. It will take me a good while to get accustomed to the AHK help, and relevant scripts would speed up my understanding.— Ineuw talk 00:58, 13 February 2011 (UTC)

Hi Ineuw. I no longer us Windows, so the hotkeys I had before are now implemented using AutoKey (a Python program), but I can tell you some of the ones I used before. Most of my hotkeys were actually hotstrings. The Wiki-relevant ones are here. I also had a full set of Greek characters, though I think the newer versions of AHK have native unicode support, so you might need to redo this one. If not, here is the code I had for Unicode insertion. This one is useful to enter a new hotstring by just pressing Win+H. Inductiveload—talk/contribs 06:10, 14 February 2011 (UTC)

You may be surprised but what you placed in the pastebin is exactly what I needed - scripts in context that I can understand. Thank you very much.— Ineuw talk 09:02, 14 February 2011 (UTC)

Not surprised, actually. I have learned enough new computer things from scratch to know that the first step is the hardest, and often the one with least documentation, as people who know assume it is easy to pick up. Cheers, Inductiveload—talk/contribs 10:11, 15 February 2011 (UTC)

I simply robbed your user vector.js page, :-) I like a lot some of Regex menu framework and I'm using {{Rh}} and Clean up into any page I proofread. A question: I see that both need Regex framework menu, I think that function rmflinks() comes from that js, but I can't find where Regex framework menu is called/uploaded. Can you help me? And a second question. Custom toolbar buttons.js doesn'r run for me under vector. Am I doing some mistake? --Alex brollo (talk) 10:08, 17 February 2011 (UTC)

Alex, are you referring to the fact that although the rmf can be imported, more commonly it's a gadget enabled in Special:Preferences#preftab-8 (the documentation links to Pathoschild's user space on meta but even that has never been updated to discuss this option).--Doug.(talk•contribs) 21:23, 18 February 2011 (UTC)

No matter, in the meantime I learned something about gadgets (I was absolutely ignorant about), and I loaded into it.source Pathoschild scripts as a new gadget, linked to an edited version of your Clean up and {{rh}} scripts.

I added to {{rh}} a IMHO useful series of ifs:

if there's already a {{rh}} into header, do nothing;

if there's a {{rh}} into the body of the page, move it to header;

if there isn't any {{rh}}, add an empty one into the header. --Alex brollo (talk) 01:20, 20 February 2011 (UTC)

Hello! Sorry for the delay. Did you find the rmflinks() stuff? If you like the rh script, you'd like the script at User:Inductiveload/Running header.js (it preloads based on two pages ago - very useful). It was originally User:Phe's, so thank him ;-)

My custom toolbars are made for monobook, which I still use after a brief foray into vector. However, as my toolbar is now vector anyway after the MediaWiki update, I'd appreciate it if you could tell me if you find out how to do buttons in vector. Inductiveload—talk/contribs 18:16, 22 February 2011 (UTC)

Yes, I built some buttons under vector; nevertheless I removed from my vector.js since I'm going to develop RegexMenuFramework. The idea is, to put into my vector.js variables only, and to put into RegexMenuFramework general routines which use those variables. I.e: now, in my it.source vector.js, there are two variables which contain lists with needed data to build a perfect RunningHeader for the current page of the current work: the general Regex routine works only if it finds those variables, and if current page is one of those listed.

The code I used to add a button comes from old vector documentation, as soon as vector was presented as an option:

This adds a ref tag button into the main row of buttons, just after bold and italic buttons. I can't tell you where doc is, but I found it somewhere into a techno page presenting vector. I apologyze for lots of wasted space into your talk page, but I feel unconfortable when I edit a pasted-and-copied code: I guess that you can simplify it a lot. --Alex brollo (talk) 16:00, 25 February 2011 (UTC)

While I work on getting set up to do this stuff myself, if you could take care of this one for me: utopia when you get a chance that would be great. Thanks.--Doug.(talk•contribs) 20:09, 18 February 2011 (UTC)

Uggh, all that work you did, then you split them as requested and now the OCR is gone; which I guess was to be expected but makes match and split to the current text kind of hard. :( --Doug.(talk•contribs) 18:26, 20 February 2011 (UTC)

No need to persue this anymore. I didn't voice my opposition clearly enough at the outset, it was implemented with the assumption there was a demand for it, I clarified my POV so there would be no future misunderstanding as to my belief this was not really needed as well as not the template or implementation I was thinking of originally supporting. 'nuff of that

At the very least, let's just drop it on the header talk page 'cause it's not directly related to the header template anymore anyway. — George Orwell III (talk) 18:00, 20 February 2011 (UTC)

Sure thing. Sorry if I sound edgy, I'm been running around all day, cleaning floors :-s. If you want to talk about alternative implementations, I'm happy to hear them, as I still feel we need to work on the header area, and more input is always good. Actually, I think one area that we need to work on is the presentation of meta-data, and separation and location of meta-data, internal and external links is part of that. So you may yet see this out of plain-sister. Off to bed, have a good one. Inductiveload—talk/contribs 18:17, 20 February 2011 (UTC)

I made my position known as far as moving forward goes - if the notes field is going to be used for one thing or another at the end of the day anyway, split it off from the green navigation portion and call Notes from header with everybody's optional parameters hosted within that template instead. This would more align our header with the basic header ThomasV & Co. use successfully with dynamic layouts, etc. over on the French Wikisource site while still catering to the various specialized parameters (past, pesent & future). The way things are now is backwards imo - there is a greater number of pages that will never ever appply such specialized parameters, let alone have any header notes whatsoever, than there are pages that will apply such parameters. Following the separating content from layout principle hoped for, so should form (the green, easily preload-able basic navigation portion) be from function (all this newly added stuff being standardized into what amounts to, correctly or incorrectly, the light-blue Notes field). — George Orwell III (talk) 19:10, 20 February 2011 (UTC)

That is very possibly the way to go. I also support the separation of form and function (though my primary focus is on getting the data into a unique place where it can be found easily). Exactly how we would do it is a tricky issue (where to put it, what to include, etc etc), and as I said, I'm open to suggestions, ideas, thoughts, tip, tricks, etc. I don't have plans to start work on it yet, but it is on my long-term wishlist. In the meantime, parameters in the header will serve to collect the data in a "standard" fashion, which will make any modifications to our normal method easier to perform down the road. Inductiveload—talk/contribs 17:47, 22 February 2011 (UTC)

Well I think you've stumbled over a primary point with the multi-column thing - "traditional" tables are not as flexible as "div" tables and that is most likely why the French WS doesn't have a header based on traditional tables but one based on divs with the occasional "table-row" when needed. Displays 1000x better under different, browsers, fonts, settings, dynamic-layouts and looks alot easier to work with IMHO (don't know how useful it is when it comes to handling/holding data though). — George Orwell III (talk) 18:42, 22 February 2011 (UTC)

I'm off the Hathi stuff for a while, but I can get back on it in a week or two, when my bandwidth will be available again (away from home now). Inductiveload—talk/contribs 18:17, 20 February 2011 (UTC)

The Hathi thing is better left alone - they've started to put a time limit on the amount of URLs you can visit within a minute, minute and a half; fairly sure my ip is being blocked even (no access for more than a day now).

Cool, I'll take a look. I'll be back at real broadband next weekend, and then I'll proxy up to the US and see if I can get some for you. Inductiveload—talk/contribs 03:54, 26 February 2011 (UTC)

Update: my script is now proxified (with a variable delay), so I can get on it when I get back to my good connection. Inductiveload—talk/contribs 10:28, 26 February 2011 (UTC)

Alright! things might actually come together on this. Thanks so much.

Just a quick question - I did manage to dL probably the largest volume as a single PDF before they got all bent out of shape about one thing or the other over there. Now that I understand a little more about how time consuming whatever this voodoo that you all do is - should I bother getting that back? Would the quality be better or worse using the full PDF? I tried that online conversion djvu thingy (I told my cousin not to touch that laptop until it was done; that was back in November) and I have been feeding the cat anti-freeze and leaving surprises in the mailbox ever since. — George Orwell III (talk) 10:56, 26 February 2011 (UTC)

The process is pretty slow over a proxy, but being a computer, it can just churn quietly in the background. If you think the PDF quality is OK, then by all means convert it one way or another, or put it on Hotfile or something so I can grab it and convert it to DjVu. If not, then I'll grab the images one at a time as I am doing now, and convert directly to DjVu. This is no extra hassle to me, as it just runs in the background (or it will do when I have the exceptions caused by proxying ironed out). Inductiveload—talk/contribs 14:27, 26 February 2011 (UTC)

... is User:CharlesSpencer's "pet" project (his grandfather helped published it or something). Get him to sign off on whatever it is you're proposing and I'll shut-up.

Sure. Having two blocks of text, side-by-side like this, rather than a single block in to columns is a real pig to typeset in web format, while retaining the page structure. If I fix it (big if, I'm not even sure there is a way to do it), I'll set one treaty up and see what he thinks. It might need some relatively complex section transclusion to get it to flow well, which he may or may not like enough to use.

In fact, the whole sidenotes system needs a good hard look, so we'll so how it goes. Inductiveload—talk/contribs 14:19, 22 February 2011 (UTC)

p.s. - you do realize the whole point of proposing something is not to justify changes you've already made and let everybody in on it after the fact but to let the community approve the proposed changes moving forward, right? (Not meant to offend - you are moving too fast without regard for the process it seems - sorry if I did.) — George Orwell III (talk) 09:59, 22 February 2011 (UTC)

The proposal there is for the deletions, not the changes. If I thought the changes was big enough to warrant consensus (i.e. if the templates were actually implementing different things, or appearance was going to be altered), I'd have started a discussion. The templates are not eligible for speedy-ing, so they get a discussion, just like the other templates I've pruned in the last year.

I didn't exactly move mountains. In the case of col-begin I just subbed one template out for one that does the same thing, in the same way (tables), just with documentation. If multicol was as critically under-documented and col-begin was most prevalent, it would have been the other way around.

In the case of column-list, I subbed it out for a template which is both documented better, more robust, usable across page boundaries and uses the exact same CSS properties to do it. At no point did I find that the older templates were using parameters not present (and documented) in the newer one, and in some cases, I added the parameters from multicol where no equivalent existed before, so some of the works are actually closer to the source than they were before.

These templates were used almost as often for formatting tabular data using columns and <br>s than it was for actual text columns, so they were removed and replaced with the proper markup, which is just regular housekeeping.

Whatever you think is best is probably the best at the end of the day, however when individuals start throwing down and arbitrarily start to determine what is being "phased out" or is deemed "depracated" without any evidence of anyone else's input - I do tend to get worried & will speak up more likely than not.

Why the bunch who imported those templates from WP didn't do the documentation, etc. parts I can't really say because its been quite some time that everybody gave up and shelved the treaty fix ideas. — George Orwell III (talk) 18:27, 22 February 2011 (UTC)

A click creates a filled, or a partialli filled, running header template selecting data for running page into the edit box; then a click on the former function moves it into the header. --Alex brollo (talk) 00:09, 4 March 2011 (UTC)

When you get a chance, I'd like some help generating a couple of reports:

1) mainspace pages that contain {{TextQuality}} and either <pages/> or {{page|}}. It appears that these can be removed.

2) mainspace pages that contain a <div> wrapper and possibly those pages that contain one other than the standard one. The former have a problem displaying the page numbers if they use <pages/>, the latter will result in a double set of div wrappers, at least if the page is opened for editing - now that we have the new system. Also ThomasV says those styles should be applied through the css line on the index page now.

Thanks, I'm a bit busy right now, so it'll take some time to get round to it. {{TextQuality}} is called about 19300 times, which makes botting over it pretty tricky, as I'd have to download each page's wikitext and check it for the page tags. However, it is doable.

If I have to check for all pages with a div wrapper, that's going to be hard, as I'd have to check almost every page in the wiki. I'm not too sure what to do about that.Inductiveload—talk/contribs 03:20, 13 March 2011 (UTC)

Kein Problem, whenever you have time. I need to clarify some of the above before we do it. I am more than happy to run any report but will need help putting together the script. I'm waiting for a reply on ThomasV's (fr) talk page before we move forward with this. I want to make sure I understand the problems.--Doug.(talk•contribs) 11:23, 13 March 2011 (UTC)

Relooking at what I posted before, yes, number 1 is valid as I think a bot job could safely remove those pages where both exist as they are more or less redundant and the former is deprecated for that reason, I believe.

The second question above is something I need to clarify with ThomasV as the divs seems to break things in certain cases.--Doug.(talk•contribs) 05:54, 15 March 2011 (UTC)

Pardon -couldn't stop myself from peeking in... don't know if this will be related, or even helpful, but I figure I should bring it up in case there is already something in place concerning metadata. I believe there are already 2 classes mentioned (but unused?) in the typical Main.Css file related to document (meta?)data -- .documentDescription and .documentByLine. I figure somebody put them in there for a reason and that might lead you to different ideas/clues. Sorry again for the intrusion at any rate. — George Orwell III (talk) 04:51, 13 March 2011 (UTC)

GO3, you're over my head I think but more than welcome to join the discussion, may want to address this at /metadata to keep that discussion together.--Doug.(talk•contribs) 05:54, 15 March 2011 (UTC)

All I was trying to say was it seems someone "had" started to do something about collecting page data because in...

...to show something at some point. I thought it might give a clue to existing work/findings was all. Disregard if it's not relevant (metadata box?) I guess. — George Orwell III (talk) 15:35, 15 March 2011 (UTC)

I just noticed your thought on 'headers' in the page namespace, but I'm not sure what you are referring to. I can think of about 3 types of page headers, appearing on the verso and recto, combinations of: the repetition of the title of the work, the title of the chapter or section, and the author of that section. Occasionally the description is of the content of that page only, running annotation within a section, eg. p.45 "early life", p. 47 "European tour" p.49 "marries Doris" and so on.

Sorry if this is confusing, or if I misread something, do you have an example of a header which should be transcribed (and transcluded). CYGNIS INSIGNIS 07:29, 23 March 2011 (UTC)

Ah, I strongly agree. An earlier practice was to throw away the title page, and section titles, and replace it with the {{header}}. I'm still puzzled about what to do with the last example, a similar situation to what I blathering about at User_talk:Spangineer/archive02#infra_section_links and a model I made at this chapter with anchors. When this sort of thing appears in the page header, essentially the same function as some "side notes", I have thought about adding that to notes of our header for deeper navigation. CYGNIS INSIGNIS 08:20, 23 March 2011 (UTC)

Regarding the handling of metadata, storing and presenting, I think we need a way to make use of what is already encoded into the djvu, page numbers, author, and bibliographic detail. When I view a scan at IA.org, the 'real' page numbers are presented for navigation. Their top-level page also contains author and so on, I wish this would 1) autofill the information template at Commons 2) allow the data to be corrected (it is sometimes wrong) and then 3) autofill our Index form and son. Copy pasting the information to both these templates, and the headers, seems silly when the scanner has added it already. CYGNIS INSIGNIS 08:20, 23 March 2011 (UTC)

I could image a nifty JS run from an index page which goes and sniffs metadata out at the IA (or WorldCat or whatever) and adds it to the page. However, we would need a more comprehensive range of index page fields for the new metadata. This would presumably entail 1) being very nice to ThomasV and 2) some consensus understanding that index is a de-facto metadata storage area. See also User:Inductiveload/metadata. Inductiveload—talk/contribs 16:38, 24 March 2011 (UTC)

... and have come to the conclusion border:none & border-[1 of 4 quadrants]:none applied by itself is an invalid [CSS or] style paramater. It does not work across the most basic of browser versions because they only interpret it as no color (also wrong). 07:03, 24 March 2011 (UTC)

Sigh. Removed the options. However, transparent is not a perfect fix, as it leaves some gaps in the surrounding borders, so I changed btt and friends to use a 0px border, which solves the problem. If you could quickly check if this works in places you've used it. We may also want to think about removing "bn" and replacing with "bt"? Example here. Thanks for the head's up. Inductiveload—talk/contribs 16:33, 24 March 2011 (UTC)

Well "none" worked for me personally all along but I'm stuck using IE 6 for the most part so I'm not the best example to go by either. In a nutshell, what came out of all of this the last time this kind of compatibility thing came up was not to "cheat" when it comes to CSS-like parameters - by "cheat" for example I mean we should use margin: 0px 0px 0px 0px instead of the simple margin: 0px some folks still seem to apply (versions of IE need at least 2 values - not just the one - to apply a margin value for both top & right and bottom & left for example). I'm sure the same would apply for the equivalent of border:none if all 4 quadrants used all three border values (1px solid none, etc.)

I looked at the linked page and, like I thought, the previous revision and the current look about the same to me. — George Orwell III (talk) 16:52, 24 March 2011 (UTC)

ps - the next time you see that gap issue try "border-collapse:collapse;" as well and it should correct itself then. — George Orwell III (talk) 16:53, 24 March 2011 (UTC)

The Skull of Burton thanks you for your help and improvements! Please continue! :) TheSkullOfRFBurton (talk) 21:24, 24 March 2011 (UTC)

Will your changes fix my problem with {{subpage header}} so I can just throw it at the top of Book II, Chapter 3 and have it automatically populate the "Next" and "Previous" chapters and link the book title and everything? If not, would you be able to figure out what I'm doing wrong? TheSkullOfRFBurton (talk) 23:29, 24 March 2011 (UTC)

It looks like it's working now. Probably you needed to purge the page to allow the changes to other pages to make their way through. I always find it easier to just use a normal header, which is prefilled for you anyway with the title and author, and even the next/previous links if it can find them in the parent page. If now, it's an easy thing to add. The subpage header template is not used often, in general (I'm not sure why it's still around and undocumented). Inductiveload—talk/contribs 23:59, 24 March 2011 (UTC)

The approach of importing a work like Mission of Gelele, which is made up of OCR'd text (you can tell because of the character recognition errors and running headers interposed in the text), probably won't go down with some users on Wikisource, who like works to be substantially complete before they are even displayed in the main namespace. Personally, I don't mind too much, though I would very much prefer works to be complete! Others subscribe to the broken windows theory - having incomplete and abandoned text just encourages others to do the same.

The purpose of the DjVu is to provide a reliable reference source for our texts. If you copy and paste only:

There is no good way to ensure the person you transcribed from didn't make a mistake or omission (it is actually not rare at all - it is routine to find an online source doesn't match the book it is supposed to have come from)

There is no good way to demonstrate (stably and in-wiki) that we didn't make a mistake.

There is no requirement to use DjVu's (or PDFs or images) to back up your text like there is at de.Wikisource, but it is generally accepted that if possible, we should have the works be scan-backed. Yes, it is significantly more work, because checking that the work is correct takes a long time. In fact, it takes longer to proofread a text without scans, as you have to keep flipping back and forth between WS and the scan. Checking without scans is easy because you just have to make sure it makes sense, not make sure it matches to the letter.

Also it is impossible for an interested third party to help out if the scans are not both linked to and publicly available in an accessible format. And co-operation is the name of the game. At the very least upload the DjVu, make the index page, and then link it on the front page of the work, so we know where the source is.

I hope I don't sound negative, this is just my viewpoint as someone who has cleaned up/is cleaning up, slowly, abandoned OCR imports. Inductiveload—talk/contribs 00:48, 27 March 2011 (UTC)

┌────────────────┘Here's a stupid question I suppose... Why has the focus to date been to upload a DJVU "as is" or "as converted to" first and then going about scripting/REGEXing/BOTing/editing it to death - surely knowing it holds an obviously flawed OCR text-layer to begin with - rather than extracting, fixing and re-inserting the OCR text layer and then uploading a much improved DJVU afterwards instead? This seems backwards to me personally. — George Orwell III (talk) 01:55, 27 March 2011 (UTC)

Well it's easy if you have the good text, you can match and split to good text into the Page namespace, assuming the DjVu has sufficiently good OCR to allow the text to be matched (quite poor OCR still M&S's nicely, so this isn't usually hard). If all you have is the flawed OCR or text from a different version, then you are stuck with it. Inductiveload—talk/contribs 02:03, 27 March 2011 (UTC)

That is not what I meant. I meant extracting the layer with a 3rd party program locally, correcting the obvious spelling, punctuation, etc. errors, adjusting the "slices" (info, ppm, fg2k, bg44, fg44, etc) or whatever you call the internal-formating found in .DJVU files to something the Page: namespace can better recognize, reinserting the layer and then uploading it so most of the "work" here is deleting the scanned headers & footers for noincluded ones and so on.

Every newbie I try to drag over here says the same as the above pretty much editing here is too technical and too labor intensive. The problem is not so much with the premise and rationale you've laid out but with its application and usability for the common netizen (did we break ~300 or so regular editors per month yet? ever?) — George Orwell III (talk) 02:23, 27 March 2011 (UTC)

You hate DJVUs as well? I have a new best friend...move aside Inductive, I am a person of mercurial tastes! :P TheSkullOfRFBurton (talk) 18:40, 28 March 2011 (UTC)

:-p I do agree that DJVU editing is a pain, however, I wouldn't say it is significantly more of a pain than any other proofreading. The plain text you get from the IA is the exact same text they shove into the DjVu's, just in one big stream. So the proofreading effort (assuming you will achieve the same standard, which is doubtable if you don't have the original text on hand as you do with DjVu editing) is more or less the same. If you have an IA DjVu and a better transciption (Project Gutenburg, or some other ebook, whatever) then you can match and split the good text into the DjVu pages with almost no effort, assuming the DjVu text has some semblence of OCR (which they do if they are from the IA, otherwise someone like me might be able to OCR one in).

The extra effort associated with DjVu's is, IMO, mainly due to the fact that they:

Are longer, including all the front matter, contents, text ornaments, whereas a "plain text" work usually doesn't have any of this

Are split into pages, so you have to load more pages (negated, I think, by the saving in having the original text right there from reference)

Include the overhead of formatting the index pages, which is actually just adding metadata for the book, which is a valuable part of the work usually lacking in "plain text" entries. This will hopefully be more important in the future when we introduce a proper metadata scheme.

Include the apparent overhead of creating the mainspace pages which transclude the DjVu pages, although this is done for plain text works as well, but there it is swamped by the following full page of text.

This is all nominal and I agree not all that big of a deal when it comes to true proofreading (font-style(s), spelling, paragraph/line breaks) but you give far too much credit to IA and their extracted (not inserted?) OCR content layer in my opinion. Formatting aside, if their process was indeed to insert/apply the hidden-text or layer into a djvu - you would think a simple spell check would be done prior to that insertion (cutting down on the OCR errors from jump when extracted here). Transclusion is a bit rough for any newbie but not all that hard to overcome given the time. It is the effort in "electronic book-binding" that seems to be the biggest turn-off even if one manages to reach the point where a match & split makes sense. — George Orwell III (talk) 00:21, 29 March 2011 (UTC)

You would think that they did that, but experimentally speaking, they don't. You can ask them why that is, and petition them to change the process, but it isn't something I am prepared to make a fuss about. I'm just grateful that they do extract (get OCR text from the scan images), convert to DjVu, and insert (add it to the DjVu, page by page) the OCR. If they didn't do that, we'd have to do it ourselves, and that would be huge block to importing texts. Tesseract is a pain to install, you also need djvulibre to do the DjVu conversion and OCR inserting step, all of which takes a lot of explaining to new users. Add to that the the IA uses ABBYY to do the OCR, which generally superior to Tesseract. At least, I rarely get a result from hand-tweaked Tess that is as good as the automatic extraction at the IA. On the other hand, the IA OCR generally is good enough to apply match and split to, even for low-quality texts. Inductiveload—talk/contribs 02:14, 29 March 2011 (UTC)

Correct me if I'm wrong, but GO3's problem was that we don't have an automated way to bulk-process simple jobs across a whole text, prior to any human proofreading.

Well you're "wrong" in the sense that the entire approach to the Index/Page namespace will always be flawed in my opinion. We've taken what amounts to an indirect .djvu file (1 "index", any number of associated single pages outined within that index) formatting scheme and tried to force it upon bundled .djvu files (index incorporated and shared amongst associated single pages all in 1 file). The problem with this that I feel will become more & more apparent as the popularity and refinement of the djvu standard goes forward is that we've managed to mash-up what should be 'file-name to file-number' and 'page-name to page-number' associations along with a simple page-labeling aspect without actually having 1 (sub)directory with x# of physical files under it being virtually ordered to behave as-one-work while retaining the ability to virtual label/physically name the one-page also as a stand-alone on the fly. The indirect djvu model keeps indexed bookmark/outline ordering constant - you can rename a single file (Page:.../## here) and it will appear where the list says it should. The virtual labeling keeps the displayed "page no." irregardless of the order (page-name to page-number) or it's physical designation (file-name to file-number). Here, you can't insert/swap single pages simply by renaming(moving) the page to match the listed ordering or edit the listed order to account for the out-of-order file name AND still have all the associated thumbnails match the actual content at the same time. Here, if Page:BigLongRestrictingDirectoryName+page/18 is really suppose to be 19th in displayed-sequence ordering but also be XXII in assigned scanned page numbering (the label), you can't just delete, rename or swap with the existing redundant 19th physical file or simply re-label the two and force the indexed outline to reflect the updated changes - mostly because we've bastardized a single bundled djvu to "look" like a collection of associated files under an indirect djvu scheme yet cannot make it behave like a subdirectory-1index-bunch of pages as the indirect djvu protocol allows.

Sure this approach is just fine for well-designed djvu's for single works but makes for too many headaches when the work is a not-so-well designed djvu to begin with. That seems to turn off many possible new contributors from taking up the labor currently needed for being a positive WS member and meet our desired guidlines at the same time. Not your fault, nothing we can do about it now and sorry for the rant but I believe we can java script ourselves to death with patches/workarounds and still not be viewed as the memory-hole most folks may have heard of but was not positive/useful enough to actually come here and deal with some of this hoop-jumping - especially if one has little technical knowledge of the inter-web and its tubes. — George Orwell III (talk) 00:21, 29 March 2011 (UTC)

Let's continue that discussion in a new section, shall we? Do you have a workflow/structure that you think we should be using? I'm not a dev, so you'd probably be better talking to ThmoasV or someone about this if you perceive a usability problem with the ProofreadPage extension. Inductiveload—talk/contribs 02:21, 29 March 2011 (UTC)

I don't see the point of raising it - it seems nobody discusses anything worthwhile here in WS space that could better facilitate a more open & well-informed decision matrix on rhings like how to proceede or what to implement, next etc. — George Orwell III (talk) 04:41, 29 March 2011 (UTC)

This would apply equally to DjVu, and huge tracts of plain, unreferenced text. The problem here is identifying "common" operations to perform (maybe certain spelling corrections, title formatting, etc), and then writing program to perform those operations on the DjVu or text file. I am welcome to a list of operations people would like to see, and I will eventually write them up into a program.

I'd like to see some of the more current operations completed to their logical conclusion for full usability first.

Dynamic Layouts have incredible potential for both reader & editor -- too bad the stupid proofreading color-bar still takes up the div wrapper that the green navigation header should be using up instead for works that are transcluded. If the format should be separate from content is really to be our motto and the desired goal around here, navigation should be in its own isolated contentSub atop contentText hosting a 2 column approach for displaying transcluded works only; one column to behave as narrow left (or variable right?) margin, hosting the links back to the Page: namespace and the other column subject to user manipulation via layout settings/forms re: Dynamic Layout.

The addition of the user-friendly assignable header & footer fields on an Index: page certainly was a step in the right direction -- too bad it doesn't go far enough with additional +1 fields to accomodate both odd and even page header/footer instances and still requires the user to over use the running-header template and/or run some script to account for changes from one page title to the other as well as left/right page no. display.

Again, that is something to ask ThomasV, the rest of us have little control over or knowledge of the internals of the PP extension. I just work with what we have. I am (very slowly) working on extracting metadata from index pages, feel free to join in what little there is at User:Inductiveload/metadata. Internal or extension code is not something I can or want to change. Inductiveload—talk/contribs 02:21, 29 March 2011 (UTC)

Frankly, I don't see the point. The djvu's metadata itself will remain as it was when first uploaded just like the corrected OCR text is in spite of hundreds and hundreds of correcting edits made to it. So to is the case of any beneficial research produced then applied to a work coming from the editors on WS the way I see it. I know it's not like most djvu's out there actually come with this data intact & ready to be shared with each page but I believe its all there for the taking at will (if you know how & I don't) via the djvu's annotations. It's the whole point of bundled djvus sharing the document metadata in the first place - so pages or groups of pages can call upon it only if needed (i.e. extraction/conversion). I do like to read along whenever I can but I don't even have a good example of just one well-designed, properly formatted, fully completed base djvu of a work, with all the bells & whistles already in place, to even see what is possible, what is worthwhile and what is just not doable here on WS. Without one, I'd be foolish (like I am most of the time when I speak up) to attempt any such participation. — George Orwell III (talk) 04:41, 29 March 2011 (UTC)

Unfortunately, proofreading a whole book from scratch, whatever the source and method, will always be a hard job, due to the sheer amount of reading and checking involved. I believe this is the barrier to entry (as well as a relatively complex system of templates which must be learn all at once - documentation is still lacking) to people more used to a quick edit at Wikipedia to add an actress's maiden name. The way I see it, proofreading is sadly a fundamentally more taxing and protracted task, and one that is not suited you your average surfer. This is't to say that we shouldn't make more of an effort to make WS easier to contribute to—we should—but we shouldn't expect Joe Q. Public to wander in from Google and begin effortlessly proofreading before he even finished his cup of tea. Inductiveload—talk/contribs 21:38, 28 March 2011 (UTC)

Perhaps, but Mission to Gelele, King of Dahome/Chapter I and the others aren't proofread, it's just a copy-paste of some automatically generated OCR. Of course that is easier - it's the same as manually marking all the pages in the pagespace as "not proofread" and saving without actually editing the text. In my view, if that's the only effort you're going to make to the text, you may as well just link to the place you got the OCR from ([[2]]) from Burton's author page using {{ext scan link}}. We generally only import OCR text like that if you actually intend to proofread it. It contains OCR errors (know not what ; $ela porte a Vamour,), still has the running headers interposed, and headers aren't formatted or marked.

It represents a huge effort for another editor to bring this to any sort of proofread status, and the first part of that effort would be to go and get the DjVu from the IA and start doing it "properly", so we can track the proofreading status. Otherwise, we have only a horribly indirect way to check the text we have is correct (in your case, no way, as there is no link to the text source). If we just wanted "readable", we could just automatically import every ream of OCR we can lay our hands on. We don't because that is what sites like the IA are for. Inductiveload—talk/contribs 02:01, 29 March 2011 (UTC)

I was always under the impression that proofreading another's work, especially something already pagnated & published as a book, meant minimal formatting changes, if any, would be required of the editor. This is not the case here beecause visitors are expected to (re)format these works to behave as displayed or viewed but under Wiki rules rather than standard practices - be they HTML & a computer or on an old typewriter & paper. For the unfamilar possible contributors out there, doing both the WS specific formatting and the intensive proofreading requirements makes for too much work on something easily found elsewhere and rectified with a good spell-checker for the most part. The one advantage of WS, the intra & inter-linking across all the wikispaces, just isn't all that great when you contrast it with amount of time & work involved in producing a work. I truly sympathize with the TheSkull's POV here because its just about the same as what my little inner-circle of keyboard commandos/nerds have been saying to me personally for more than a few weeks now. — George Orwell III (talk) 04:41, 29 March 2011 (UTC)

I'm unsure as to what your point is. My point here is that this work is demonstrably not proofread, and so you can't compare the time taken to put it up at WS to fully proofreading a DjVu. If you were to fully proofread the text, it would probably take a roughly equal amount of time (you are helped with DjVus by having text right there all the time, but hindered by the extra transclusion stage).

That's all just fine and makes complete sense - I agree. The "parts" that don't make sense to me at times are creating pages for proofing on something that first wasn't even checked for basic completeness (are all the pages really there? No duplicates? In correct sequence? Needs place-holders for missing pages?) at an index level; second, having to do corrections on what a basic spell-checker might have eliminated for us if it had been run prior to being inserted into a djvu at the time of its creation and third, after overcoming & any all roadblocks along the way and actually producing a finished product - nobody thinks to re-insert the just the plain old corrected text back into the uploaded djvu "for the next guy" just as we've done to IA. For the most part, these are just opinions of course, but I cannot easily ignore them at will either (even when things are not under 'our' control unfortunately).

If you don't like DjVu, what exactly are you proposing to do for having proper sources for scanned books instead? I'm confused about what your "ideal" workflow is.

I like it just fine for where it's developement current stands. It's not anywhere as near as some of the other available document formats out there are though (but that is more about licenses and junk than actual user usability etc.). I just think sequencing and basic repetative layouts of indivual pages might have been better served as indirect-djvu's is all. Once that is mostly out of the way, converting the indirect-djvu to a bundled one makes for a streamlined task of just proofreading & validation. We get around this now by manipulating the pagelist instead to include one thing or exclude the other - but doing stuff like that puts the whole validity of the work into question even with the benefits of transclusion if you take step back and think about it.

It is regrettable that "proofreading" (i.e. proofreading and formatting to a semblance of similarity to original) works at WS is so involved, time-consuming and hard for newbies to "drop in to". If you have any ideas to make it easier, tell me and maybe we can do something about it (unless it involves the PP extension, then you need to talk to ThomasV).

I can't think of anything that could make a more direct an impact than improving the match and split utility (I can hear the collective groans coming from some others already) to be more "newbie friendly" as well as making it administratively undo-able, as most certainly becomes the case when newbies are still learning the ropes. As was the case of TheSkull's exploits you rehashed earlier - if the Page namespace recieves text content via a direct OCR layer automajically or through the manual import of text matched and split as a layer afterwards, should not make much a difference to the actual proof-reading process if nobody is actually re-inserting any corrected text back into the djvu after validating it anyway.

As for things easily found elsewhere, that is why I like the books which, as far as I can tell, do not exist in text form on the Internet already. Some people like adding popular/common/important books that we lack, others like the esoteric that aren't widely available. Inductiveload—talk/contribs 08:39, 29 March 2011 (UTC)

I'm generally of the same mind (why else would anyone come here?) but just don't have the will to struggle through a subject that does not interest me to do it justice. I know myself well enough that if I'm genuinely not interested in the work in question that I wouldn't be much a proofreader of it - never mind a good one. — George Orwell III (talk) 09:50, 29 March 2011 (UTC)

Found at Help:Sheet music that it used redundant {{indexes}} and I converted it to portal parameter to find that the template:process header didn't have plain sister. Would you please consider that for adaptation. Thanks. Billinghurst (talk) 15:07, 30 March 2011 (UTC)

{{process header}} now has the full set of plain sister parameters (though I don't think they would all be useful in that namespace, it is better in my mind to keep everything consistent). Inductiveload—talk/contribs 15:44, 30 March 2011 (UTC)

Great. And to keep stirring the pot, have a gander at Biographia Hibernica and regale in its header ugliness. Not sure on the readiness to further discuss the plain sister bits and the other inserts into the Notes section. — billinghurstsDrewth 00:47, 5 April 2011 (UTC)

Getting back to the whole point of somehow making it easier to proofread .djvu based uploads, I've put an example of pre and post simple spell-checked OCR-generated text layers of the same scanned page in Index:Sandbox.djvu. If you go to edit mode for pages 2(no corrections) & 3(corrections), you can easily see the difference in the quality of the content. There must be a way to automate this - any ideas? I don't care if its even a local script! — George Orwell III (talk) 01:39, 8 April 2011 (UTC)

Can I ask how you made those changes? Do you have a set of regexes and things that you applied? Or do you want a heuristic text-correction algorithm that can make good guesses at what the mistakes are and correct them? The first is simply done with Javascript and the second is an involved exercise in artificial intelligence, and beyond my meagre ability in programming. It may be possible to do simple spell-checking using, for example, the Python enchant module, but even that will need some nifty heuristics to work out the "best" alternative spelling. Unfortunately, I don't have much experience in this field (Hamming distances, etc). If you could give as many details of anything you can think of to improve text programmatically, please do!

DJVULibre free-ware --> extract hidden-text originally generated through some OCR routine --> copy-paste into some online spell-checker --> correct checker's most obvious mistakes --> reinsert hidden-text into original djvu (a single page in this case).

OCR is just the means by which text content can be generated from. It is inserted as hidden text and no different from PDF's hidden text, Microsoft Office's HTML/XML hidden text and similar. The djvu specification accepts any and all of these as long they are properly formatted in the manner that it recognizes. The fact that OCR generated text frequently makes the same mistake over and over and again has nothing to do with the djvu file nor the djvu specification -- djvu accepts text clean or flawed as long as it is inserted/exracted in the format laid out in the specification. DJVULibre handles this formatting automatically via command line and simple .txt files.

Extremely straight forward and used plain-old MS notepad to make the edits.

My problem is not the insertion or extraction part but how to copy and paste that special format for an entire bundled djvu file in some spell-checker that won't disturb the format or encoding (which is seems to be a bastardized UTF-8 or maybe its just plain old ASCII but I haven't tested enough characters yet to be sure).

Not to belittle it, but your script just seems to "act" just like a spell-checker does not to mention after-the-fact, is limited to correcting specific conditions only and does not fix the hidden-text layer in the original djvu file either way.

If any idiot like me can do it after less than a week of investigation & trial and error, there must be a way to streamline it in a way newbies can do it too (preferably here on WS but certainly localy on their desktops if need be). 04:17, 8 April 2011 (UTC)

This is a very clever approach, grabbing the previous verso or recto: kudos to you and Phe. It may be easier for novices (like me) to use, in that manipulation can be done without diving into one's monobook/vector.js, eg. adding the special format, changing the section title on the verso, and so on. I'll probably stick with what I've got, 'six of one ...', because I think there is a slight advantage to working with journals. A couple of points that may lead to refinements:

Hesperian's script will insert an 'unfilled' rh template if the index hasn't been defined (the djvu Page number nearly always accords with the actual pagination, even or odd to verso or recto). This one could do the same thing if the earlier page was blank, no rh, or yet to be created.

roman numbers have to be specified, "+roman(pagenum-4)+", but the snippet that generates it is a little shorter and worth a look

The only pitfall (a minor one) I have anticipated with this new script is if the work contains skips in the pagination, eg. 33, plate, blank, 34. Perhaps an instruction to seek a runnning header at '-4' if '-2's header is empty might work? Regards, CYGNIS INSIGNIS 03:14, 14 April 2011 (UTC)

I would like to see this script have a tighter alignment with the header and footer fields on an index page, where the page number is already defined, and able to be allocated to a page, we just need to get it to flip the numbering left and right, and adapt text maybe by one of the means that CI has described elsewhere in the common three or four styles of headers used in works of the period. We probably have enough body of knowledge to cover that already. — billinghurstsDrewth 11:53, 14 April 2011 (UTC)

I'd much rather see the complete elimination of the need for such scripts for new Indexes and have the current header and footer fields on the Index page expanded by one field each to handle not only the current {{{page}}}[left] but also a {{{page+1}}}[right] as well. I know this is more of a developer issue and not a local one but if you stop to think about it - it is the most efficient solution (do we really want to over use the running-header template with thousands of unnecessary and avoidable calls to it?). Of course if the scan is less than perfect to begin with (i.e. duplicates, skips, etc.), then it makes sense to scurry about running manual patches like this instead of using the Index fields. — George Orwell III (talk) 14:49, 14 April 2011 (UTC)

These are good ideas, but much easier said than done. What happens to the running headers which have the chapter title in them? And then what happens when the chapter title in the running header doesn't match the one in the index, which doesn't match the one in the chapter heading? At least once, I have seen a running header in the scan with the wrong chapter title, so even an perfect index parsing script would have missed that. And you'd have to have a suppression mechanism for pages without numbers. I'm afraid I just don't have the inclination to attempt this - the thing is that with our PP extension and editors, we can often do better proofreading than the original publishers! I find the system using this script works fine. It's just one click on the button, and if you will run any other scripts, your mouse is the there anyway. It also requires just one manual change per chapter for works with chapter headings.

I would like to see a "-4" capability, as I do find it annoying when a one-page interruption ruins the flow, I may try to work on this soon. Inductiveload—talk/contribs 15:05, 14 April 2011 (UTC)

Hi. I am using your clean up script. Wouldn't it be possible to handle missing ' in cases line, e.g., text s by changing it to text's? Chances of finding a correct stand-alone s is quite low I guess. Thanks --Mpaa (talk) 20:58, 22 April 2011 (UTC)

Sure, all you have to do is add a line like:

editbox.value = editbox.value.replace(/ s /g, "'s ");

to the script. If you are not familiar with regular expressions (the language in which the text replacement is performed), the Wikipedia article is a good start, and RegexPal is a good place to experiment.

Also note that different works have different OCR quirks based on the font, size, camera, OCR software and settings, etc. So be prepared to use temporary fixes that only apply to certain books. Please ask here if you need anything! Cheers, and a belated welcome! Inductiveload—talk/contribs 02:16, 24 April 2011 (UTC)

So I just figured out that most of these that you uploaded for me have up to 5 volumes ... do you think you could complete this by uploading those volumes for me? - Tannertsf (talk) 05:18, 28 April 2011 (UTC)

Sure. All I need is a list of the Hathi Trust book id's. The look like mdp01232121221 (mdp can change, but it was constant for these works). Cheers, Inductiveload—talk/contribs 22:28, 28 April 2011 (UTC)

I don't have them and don't know how to get them ... was hoping you could do that? - Tannertsf (talk) 23:30, 28 April 2011 (UTC)

I can try identifying them but I believe you mean "volumes" in the sense each volume is the annual report released every year. If so, what year(s) are missing/needed? — George Orwell III (talk) 23:36, 28 April 2011 (UTC)

Yeah, I have noticed this, but it isn't on all the versions of the same year (I don't have an example to hand, sorry). I've never seen any sign of the additional volumes (Hathi makes no note of it at the collection page). So, I don't really know where to go to find them and I'm not sure they even exist. For example, the last page (87) says "In closing this report...." and signs off. Inductiveload—talk/contribs 23:56, 28 April 2011 (UTC)

Ohhh... O.K. I think I see what is going on here. For certain fiscal years, Congress legislated additional reports be produced by certain sub-divisions within the Department of the Interior, such as the Commissioner of Indian Affairs (or the Board of Indian Commissioners for example) on specific topics or subjects that were relevant to some issue of the day. In other words, these reports are not annual like the operations report but produced every so often or as needed. These could be what those subsequent volumes are in certain years in addition to the typical annual operations report mandated from the Secretary of the Interior by Congress every year through some long ago law.

So could you add them on here? - Tannertsf (talk) 01:19, 29 April 2011 (UTC)

I would gladly "add" them if I knew what (which sub-department issued it) and when (what year) these are exactly. The archive where the annual "operations" reports are being lifted from does not show these other sub-volumes along with the main annual report. They might not even exist on that archive for all we know. Without having some sort of official title or similar to cross reference a particular year with, there is no easy way to look for and identify these - both at the Hathi archive or on Google. — George Orwell III (talk) 01:31, 29 April 2011 (UTC)

Q: Does your method allow for "stopping" and adjusting mid-way thru? To clarify, we need a 1957 ~215 page U.S. government committee report ( HERE ) in which pages 1 to 43 are normal paragraph-like content scans but from then on to the end are table-like that has it's text offset by 90 degrees compared to the first 43 pages. Is there anyway to import the whole thing with the tables also in alignment with the normal report text so that any OCR routine run on it afterwards could possibly create a workable text-layer as well? Maybe split it into 2 "packages" that I can work to resolve independently then merge them together afterwards? — George Orwell III (talk) 23:36, 28 April 2011 (UTC)

Yes, I can edit the images before the DJVU conversion step, and rereun the OCR on a subset of the pages (i.e. ones which were rotated). Would you like me to grab it now? It will take some time. Inductiveload—talk/contribs 23:59, 28 April 2011 (UTC)

Yes if you can start "grabbing it" now that would be great. If this could be 2 separate djvu's that would be optimal I guess. I will merge them back together before creating the index to proofread from after making some internal edits to reduce the amount of corrections needed. — George Orwell III (talk)

Would you rather have a directory of images and text files holding the OCR? This is how it is before it is collated to DJVU? It is 26MB, so it will happily go up on a file hosting site. Then you can collate to DJVU yourself when you are done, or send it back to me. I would expect text files to be a lot easier to deal with than DJVU text layers. Inductiveload—talk/contribs 01:41, 29 April 2011 (UTC)

Not sure I can "handle" what you're sending but let's try it anyway for starters and I'll ask for advice/help if I can't manage the raw files after all. — George Orwell III (talk) 02:12, 29 April 2011 (UTC)

Hi and greetings. My belated good wishes for the marriage of the young couple, :D . . . and I do agree with the Mail Online that the sister-in-law (Pippa) is a real cute PofA. :">

Now for matters more relevant to Wikisource. . . . I am in an ongoing discussion on the Commons about my image uploads and regarding the IA file offerings of .djvu, .JP2 etc. I assume that the top image production is the .JP2 files, from which the .djvu is derived. Am I correct? — Ineuw talk 08:08, 30 April 2011 (UTC)

I wouldn't know, she is a long way out my league! There are two common types of work at the IA:

Works scanned by the IA (generally have a yellowish background from the paper): The JP2s are the "raw" images, from which all others are derived. In fact, there are two sets of jp2 for many IA books: the "orig_jp2.tar", which is the raw JP2s which come out of the book scanner, and "jp2.zip" which is cropped to the page and tided up a bit. The DJVU and PDFs are then derived from this.

Works scanned by Google (Low quality bitonal black and white): I'm not sure which comes first here. Possibly it is the PDF, as that is what you most commonly download from Google Books. The JP2s or TIFFs may be derived from that, or it may be the other way around, but they are both about the same quality.

Generally, I doubt there is any benefit to uploading single pages of the Google books, as they are such low quality, a conversion to DjVu won't harm them. For a high-quality colour scan, the conversion to DjVu can cause problems, especially if the length of the work means the compression needs to be stronger. Inductiveload—talk/contribs 15:34, 30 April 2011 (UTC)

Thanks. Someone at the commons assumed the reverse, which to me didn’t make sense and before replying I wanted to make sure.— Ineuw talk 17:46, 30 April 2011 (UTC)

Hi. What is the best way to get an image from a scan? What I did so far was to take a screenshot from the pdf file and clean the background with ImageMagik. Is there any advantage in working directly on jp2? If so, which tools do you suggest? I can convert to jpg with ImageMagik, but how do I preview and crop? Thanks --Mpaa (talk) 09:49, 7 May 2011 (UTC)

If we are talking about books from the Internet Archive (IA), the order of quality goes: JP2 files (you find these in an archive file on the HTTP page), the "Read online" book, which presents you with normal JPEGs which you can right-click-save, and then the PDF/DJVU.

JP2 are the highest quality available, and have had the least lossy compression (which damages the images). However, the archive files are very large, and JP2s can be difficult to process, since you might need special codecs for your graphics software (depends what you have).

"Read online" JPEGs are generally good enough (it's what I use if I just need one or two images). Sometimes you will find compression artifacts in the images, which can look like regular grid patterns. You need to make sure you zoom to 100% (you'll see scale=1 in the image URL), otherwise, you are missing out on the details, and introducing even more compression noise.

DjVus are PDF are heavily compressed to reduce the filesize (compare the JP2s, which are a high-compression format already, with the size of the DjVu or PDF). This is fine for text, which is composed of repeating units (letters) but atrocious for images. The images are seriously and irretrievably damaged by this compression (the information is simply lost in the conversion process, there is no way back). Generally speaking, you should screenshot out of PDFs and DjVu only if better formats like JPEGs or JP2s are not available.

If we are talking about Google Books, their scan quality is awful (missing, folded or damaged pages, hands in the scan, blurred text, the lot) and the compression is shockingly heavy, so the best bet is to check at the IA for a better scan. If not, your only option is PDF screenshots. You might want to see if ImageMagick can convert PDF pages to PNG (lossless, to reduce conversion noise) format for you, to save messing around.

In general, you want to find the "most original" image, from as soon after the book was scanned as possible. Every lossy conversion step (i.e. all of them) introduces noise and loses details.

Question. We have used {{dropcap}} on the above. Trying to comprehend why we went that way, rather than using image positioning and set left. What am I missing? Just tidying up to {{dropinitial}} — billinghurstsDrewth 01:04, 7 May 2011 (UTC)

Probably you are missing nothing. I suspect at that time, I hadn't noticed dropinitial and thought dropcap was the right (or only) template to use. Inductiveload—talk/contribs 02:19, 7 May 2011 (UTC)

Oh my lack of clarity. I just am not seeing the value of its use against traditional image placement, so was wondering what I was missing. In the end, it doesn't matter, and I will move on. — billinghurstsDrewth 13:22, 7 May 2011 (UTC)

Oh, yeah mean why did I use a {{drop*}} template instead of a [[File:Image.jpg]]? That has been the way I always have done it, as it allows a bit more freedom with margins if required, and adds the semantic hint that this image is a drop initial. I have seen dropinitial used for several works with image-based initials done by others (that's how I learned to do it). So I suppose it is a relatively common method. I'm not too bothered about, it's just a repeatable formula that works for me in most cases. Cheers, Inductiveload—talk/contribs 17:10, 7 May 2011 (UTC)

why would you delete a work that is 1800 years old?! you can look it up on google it is public domain it is not copyright and you say you will delete it?! —unsigned comment bySonofcaleb (talk) 19:03, 27 May 2011.

Which work are we talking about? Is this about Letter to Chrestus of Syracuse? If so, you have not provided source information. Where is this transcription from? when was it translated? By who? This is information we need to have in order to host this work here.

I tried to use {{article link}} to see if I could construct suitable (work) lkpl templates, and was unable to do so. So to look to standardise how we can create internal plain links, I have created {{authority/base}} which has some flexibility for titles and subparts, though I would hazard a guess that we will need extra bits over time. You can see an implementation at {{IrishBio lkpl}} and I will look to convert the other renditions of lkpl to utilise the new underlying template. — billinghurstsDrewth 13:52, 3 June 2011 (UTC)

I would like to take you up on your offer to move these five images to the Commons. I found the proper Commons category commons:Category:Help for Wikisource newbies and already inserted them into the images. From thereon, I can make all the necessary changes. My efforts in the past to move images from one wiki to another have never been successful. (I have the the USC token, & setup but the tool never worked for me.) If this is a problem, then please delete them and I will upload them directly from my computer. Love the new commons upload system. :-) Thanks. — Ineuw talk 04:44, 7 June 2011 (UTC)

Hi. I am working on Index:John Masefield.djvu. Next step should be to add images to Commons. But I am unsure of the copyright tag I should use. Djvu has been uploaded with PD-1923 but that cannot be because in Page:John Masefield.djvu/39 the author mention works from 1926. On the other hand Internet Archive states that this is not copyrighted. Can you advice? I do not like to do things when copyright status in unclear. Thanks --Mpaa (talk) 21:59, 11 June 2011 (UTC)

I'd say {{PD-US-no notice}}. I checked, that is a Commons template too. The book is US-published, so US copyright is the only one to worry about, even at Commons. Inductiveload—talk/contribs 23:33, 14 June 2011 (UTC)

Ok. Must be. Can I use this book with my class? We would have to put a disclaimer up telling people that its being done a different way. - Tannertsf (talk) 23:51, 14 June 2011 (UTC)

You'll have to ask at the Scriptorium, especially as you have one work sectioned off already. Why do you need a special process anyway? From looking at Index:Outlines of European History.djvu, nothing unusual is happening there edits-wise, I don't see why your class couldn't edit as part of the usual workflow (the work isn't busy anyway). It also doesn't look like anyone has been editing other than you. Or am I missing something about what the aim of the project is? Inductiveload—talk/contribs 00:07, 15 June 2011 (UTC)

Ok. I'll be fine as long as Roman History doesn't get taken up too fast. I need this process because when I inherited this class from the former teacher, thats the way he does the assignment books. I just can't afford to have someone do a "proofread" streak. I can do Roman History myself, but it would just be nice if I could avoid other people working on it too much. - Tannertsf (talk) 13:46, 15 June 2011 (UTC)

OK, it's definitely OK by me to request that others don't interfere too much with the work, but you should still ask at the Scriptorium. There is plenty of other stuff for other users to get on with. I'd say that if you could do just one volume at a time, that would be better, as I don't like the idea of reserving 9 volumes in one go (unless your class really are capable of completing a nine-volume dual-language work that fast, in which case, we need them to be chained to desks and made to work here for 20 hours a day!). Additionally, if you have more than one work like this, could you make a note on your user page saying which they are, and when they were "reserved". That way, curious users can quickly see what the reason is behind the reservation. I have no problem with reserved books as long as they are active (or imminently active). Inductiveload—talk/contribs 17:02, 15 June 2011 (UTC)

I will request there, but I really don't want people to do an edit streak on other volumes as well. My class works only a little bit some days, and megatons of work on other days. I can pull out the "whip" to get them working harder, but it would still take some time. - Tannertsf (talk) 17:12, 15 June 2011 (UTC)

If you're still working on this, do you think you could put headings in the table on the Works to prepare page like the headings under the list? I'll put the works under the correct headings and add the rest to the table. Digipoke (talk) 13:32, 27 June 2011 (UTC)

If you are unaware of how excerpting, and the casual creation of virtually empty pages, has caused disruption and wasted time of the well intentioned, here, and at en.wp, I can provide some 'WP:Beans' that illustrate that. However, it doesn't take much imagination to envisage to how the trolling and tendentious can exploit a trend to keeping material on the basis of it could be transcribed. The potential to back stuff with scans is already possible, but there is a huge gulf between setting up an index and transcluding the bit some user wanted, and placing it in its full context. There is quite a different motivation in contributing a cool transcription (as you are inclined to do:) and proving something in a hot discussion elsewhere. CYGNIS INSIGNIS 23:07, 14 July 2011 (UTC)

Hi Cygnis, I'm not quite sure what it is I have done to violate NPOV here, or anywhere else, or even what the "hot discussion" is. I realise that you feel that only complete works should be allowed on this site, but, at least in this case, I don't see why the work is objectionable, not least because it is pretty short, and mostly made up of images. As far as I am concerned, I have removed the reasons for the PD request by providing the snippet's full context along with a source, courtesy of Hesperian. Thus the only objection can now be failure to meet scope, which it pretty manifestly does. If the objection is based on "no-one will transcribe this work in the near future", I have more than one counter to that:

This work is interesting enough that I will probably do some it myself.

Creating an index page and any number of pagespace pages has no effect on the readability or usefulness of the mainspace. If you think transcluding only some of an index is a no-no, then I can see your point. However, having the Page loitering in pagespace awaiting wikilove, then all it can do is light up Wikisource in Google results on that topic, which is no bad thing. So what of someone lands on a single non-transcluded page when searching for a phrase from that work? If it is the only result, as it would be for this work, then we landed another visitor, and at least they can see the scan if the work is incomplete. They wouldn't get that hit from a Commons work unless they searched on the title given at Commons.

Note, we do have the (stagnant) Wikisource:Workspace for incomplete mainspace works. If you really feel that strongly about incomplete works, why don't you draft up some guidelines and make that a useful area for work development? A prime example of a problem to solve there is how to get prevent absolute internal links from being incorrect when used in the Workspace. I feel that {{incomplete}} does an adequate job of showing that the work is not considered complete by WS standards, and prevents the uninitiated seeing WS as chronically incomplete, but rather as "in progress" (which it will always be).

No-one is advocating using raw OCR (your concern from WS:PD). If a page is not proofread, it is not created and has zero effect on anything except the number of red links on the index page. Most cases of raw OCR are misunderstandings or grandfathered PG texts that have not been found a scan-based home.

I'm sorry you feel that I am contributing to Wikisource's "incompleteness"; it must be very frustrating to see so many works fall short of your impeccably high standards. One of my personal objectives is to provide scans for inactive, incomplete and generally problematic works in order to allow them to be at least read in the scan format, and hopefully eventually become part of the WS corpus proper. I usually like to do "hard" scan retrieval that needs to be scraped, compiled, converted, OCRd, batch uploaded or generally fannied about with, as that is not something that can be easily done without custom code compilation, helper scripts and a certain degree of know-how. Index:Battle Damage Assesment - 1991-06-18.djvu is an example of that kind of work. I do not (knowingly) contribute to NPOV issues here or abroad, and I do not see how uploading in-scope material can be seen as that. Inductiveload—talk/contribs 03:19, 16 July 2011 (UTC)

I tried to explain my position here, on a general principle, because I like your contributions and respect your opinion - avoiding you thinking I felt otherwise was part of my motivation for that. CYGNIS INSIGNIS 06:13, 16 July 2011 (UTC)

First of all, thanks for setting up the index with the page and moving it all. I am not sure how to do that, rather, if there is a technical or proper way, or if it is just manually moving pages. Also, do you have any preference for the image links in the mainspace? I want to link it like I've done here, which is done using an {{anchor}}. - Theornamentalist (talk) 02:18, 25 July 2011 (UTC)

No problem! Well done on finding a source. The page moving is done by a custom script based on Pywikipediabot, which is just automating the same job which you or I could do by hand. Links in the mainspace are up to you, anchors are how I'd do it too, so your way looks great! Inductiveload—talk/contribs 02:36, 25 July 2011 (UTC)

Actually the issues appear to be with {{Long s}} and not with your script.

Phe found that in FF3.6 a line of text that has the word neceſſity transcluded from a page where it used {{ls}} copies to the clipboard as "neceſsſsity", in other words it copied both the long-s and the short-s; the problem does not seem to occur in FF5 or Chrome.

I believe that this is now complete. I have stepped through the prefix Wikisource: and believe that I have got everything that needs to have been moved or culled. You may want to think of some weird and whacky way to check. Thanks. — billinghurstsDrewth 07:42, 28 July 2011 (UTC)

I didn't do all the magic, but it is done with the "header" and "footer" fields of the Index: page, where you can put the text to appear on each Page: page. Specifically, {{{pagenumber}}} gives the page number according to the page-list in the Index page, so you'll need to set up page numbering before that will work.

In a work with an alternating left and right {{RunningHeader}}, you can use this script to fetch and increment the last-but-one page's running header. It is still not possible to have this done automatically on page load. It also doesn't work for this work as it uses {{center}} to provide the header.

Thanks for the info. But I'm not a technical type and wouldn't dare run a script. Not even sure if I can set up page numbering. All you guys are so much better than me. Best wishes, Mattisse (talk) 21:41, 2 August 2011 (UTC)

No problem! As for the script, it's not a script that can cause damage, it only acts on the page you are editing at the time. Just add those two lines that I linked to to User:Mattisse/vector.js (create that page if needed), hit Ctrl-F5 (or restart the browser) and you should get a button to automatically fill in the header based on the last-but-one page (assuming there is a running header there, nothing happens otherwise). There's no technical ability needed beyond editing a wiki page! And it can save you a lot of time and tears of boredom! Inductiveload—talk/contribs

Thanks. I added it to monobook.js and I'll look for the button! Mattisse (talk) 00:20, 3 August 2011 (UTC)

Where is the button? What does it look like? Is it in the tool bar? (I don't see it.) Mattisse (talk) 00:24, 3 August 2011 (UTC)

Should be in your edit toolbar: it looks like a red running man and the text "rh". Did you flush your cache with "Ctrl-F5" (firefox)? Do you have the "enhanced" (broken) toolbar on, or are you using the "old" (working) toolbar? Inductiveload—talk/contribs 00:39, 3 August 2011 (UTC)

Ok. I removed the "enhanced" (broken) toolbar. Maybe that will do it. Thanks, Mattisse (talk) 16:29, 3 August 2011 (UTC)

I got djvulibre to function with c44, cjb2, and djvm -c after installing with macports. I had to use IM to convert an image to a bitonal, though I was later able to also do that with GIMP, though not from the command line. I think wx was an issue before, I'll wait to catch you on IRC to figure out which version to get and see if we can get past that. Tried to use pygrabber but had issues I hadn't run into before. I know you said it was broken but I didn't upgrade so I must have an unrelated issue, though it did look different so maybe something is up with that. I think we are actually starting to get somewhere! --Doug.(talk•contribs) 22:32, 13 August 2011 (UTC)

The latest error. I managed to reinstall all the dependencies and upgrade IM. I can now accomplish the cleaning tasks from the command line AND from within pygrabber, but it fails to get any OCR. From the command line if I convert the same file to a .tif and then tess it, I get a good OCR.--Doug.(talk•contribs) 10:18, 28 August 2011 (UTC)

Hi, was referred to you as knowing how to split up double page scanned pdfs. How do I go about turning this into single pages? Thanx Misarxist (talk) 10:11, 19 August 2011 (UTC)

If you want to do it yourself, you can use Imagemagick to convert the PDF pages to PNG and then process them like normal image. Imagemagick can do the splitting. You can then recombine the single-page images to a DjVu file and upload to Commons. I don't have the exact commands you need today, and you would be better off with a script, which I also have, but not with me. I can also do it myself in the next few days if you want. Inductiveload—talk/contribs 00:19, 24 August 2011 (UTC)

Thankyou very much for that. (If only they'd flatted the pages when they scanned it :) Misarxist (talk) 10:17, 27 August 2011 (UTC)

You're quite welcome. The low resolution was the main problem for the OCR, but it seems to be tolerable in most cases. Scripts for "bursting" a PDF to images and splitting images into two are on my user page now. Is this the first online textual reproduction of the book? Inductiveload—talk/contribs 15:08, 27 August 2011 (UTC)

As far as I can tell it hasn't been transcribed before, and I couldn't find another scan, so I suppose it's worth the effort. The djvu has been saved with the original number of pages, so is only half the book. Is this just a minor problem with the script? I've no idea how to get a script like that working under windows, I'm having enough trouble trying to get djvu libre to remove repeated pages frm files.Misarxist (talk) 10:47, 28 August 2011 (UTC)

It's great to have works that have no been transcribed before! Sorry about the pages, I didn't refresh the page range after splitting. The file has been updated. As for removing pages, the command is easy: to remove page 30, you do:

It was deleted, we need to get it moved to here; apparently Commons now things that it must be PD in all countries it was simultaneously first published in or, based on the response over there, any possible place of first publication. Bottom line we need it restored an put here.--Doug.(talk•contribs) 18:58, 27 August 2011 (UTC)

The book was uploaded from here. Copyrights laws make me me crazy. That is why I prefer to work on something already available on WS. But now also this is not safe any longer :-(

The strangest thing in this discussion is that it needs to be PD in all possible places where the book might have been published. Who will be ever able to demonstrate such a thing for a work done 100 years ago ... --Mpaa (talk) 19:22, 27 August 2011 (UTC)

Inductiveload may still have it too, but it would be nice to have it exactly as uploaded and with the book template and all so we don't have to do that all over.--Doug.(talk•contribs) 19:42, 27 August 2011 (UTC)

Hi. I noticed that after reloading Index, in Namespace some chapters show the source tab and Page:numbers on the left. Others, including do not. Though transclusion is correct. I tried to tweak a little bit by modifying some page and saving, both in Page: and Namespace: but does not help. --Mpaa (talk) 22:07, 31 August 2011 (UTC)

Billinghurst discovered that it could have been caused by the block-centre formatting. Also note that the page numbers don't appear in preview, only when you have saved the page. You don't actually need to use that formatting, because we have dynamic layouts to do that on user demand, and a narrow fixed-width column is one of the options (Layout 2). Supplementary formatting actually prevents them from working properly. Inductiveload—talk/contribs 23:04, 31 August 2011 (UTC)

New things discovered every day:-) Is there a a way to set as default a particular layout for a page? E.g. all users that will access this book will find the page displayed with layout 'x' as first start? This is to suggest the reader the best view to render the work. --Mpaa (talk) 10:41, 1 September 2011 (UTC)

Not currently, but a lot people ask this question. The question is if the JS behind the DL formatting can be made to take notice of a marker on the page (or if user-JS can override it by default based on the same marker). Inductiveload—talk/contribs 23:07, 1 September 2011 (UTC)

Update: There is now a solution. It currently requires some custom Javascript tweaks, but if it is made into a gadget, it can be enabled for all users. You can see how to do it and comment at the Scriptorium discussion. Inductiveload—talk/contribs 01:35, 2 September 2011 (UTC)

I have the missing pages for this File:Betty Crocker's Cook Book for Boys and Girls.djvu but they are a duplex (side-by-side and rotated 90 degrees counter-clockwise) pdf. What is the best way to fix and insert them? I can use preview to rotate and convert them to jpg; and I can probably split them with GIMP or another tool, and then convert them to djvu with DjVuLibre and insert them with the same. Is this a reasonable plan or is there a "better" (more efficient and/or less lossy way to do it? There's not much to bother with for OCR.--Doug.(talk•contribs) 10:25, 28 August 2011 (UTC)

I have looked at the script that comes with PWB and it appears to be just a framework to model on. I have imported nearly 200 pages manually from fr to la but there are many more to import where latin pages have been transcribed at other projects. Also, because of the problems with {{iwpages}}, as I've discussed elsewhere I'm inclined to transcribe multilingual works at both subdomains. The easiest way is to transcribe each at the relevant project and then import. This requires a bot. I think we are fine to test it through our admin accounts so long as we keep the testing on userspace and above 60sec/edit to begin with and leave an edit summary that discloses that we are testing a bot script that requires advanced privileges. We can post a notice on the relevant noticeboards of each project for good measure. My plan would then be to request transwiki permissions for my bot, at least on la and maybe here. Bot needs to be able to import from any language including old.ws. What else do you need to know?--Doug.(talk•contribs) 10:34, 28 August 2011 (UTC)

Well, I tried everything I could think of but still no joy. Not even a noticeable change in behavior. I do wholeheartedly appreciate the effort at any rate.

To recap, once I set self.proofreadpage_numbers_inline to "=true;" in my .js file, whatever it is PageNumber.js related to the Display menu's hide or show page links toggle works exactly as advertised in all 3 Layouts and works even in a 4rth test layout I have setup here. The "leftover" highlight "bar", for lack of a better term, parks itself way way down at the bottom of the page regardless of also being enabled or set to false. All the crazy offsets associated with the mouseover highlighting thingy that I assume somebody must find useful never mind accurate goes away in all 4 Layouts as well. I still don't recover the 3ems set aside as the pagelinks' container, but I can live with that since "we" are forcing sidenotes to the right-side, ~12em margin for the most part anyway. I can even live with not being able to toggle the pagelinks from inline to dynamic in the display menu. I just can't get anything like that to work when self.proofreadpage_numbers_inline is set to "=false;"

It dawned on me that the whole approach so far may have been aimed at the wrong suspects; the toggling between a never-existed inline to dynamically contained pagelinks switch or the always present but never worked show / hide pagelink toggle - regardless if the former is set to inline or contained first - from the display menu. I'm wondering if its that highlight thingy interfering with the normal switching between the basics to start working for me and only reveals itself by happenstance & circumstance being pointless and "parked" when pagelinks are set to inline at that particular moment anyway.

Anywho, thanks again for making the effort on this. The other thing setting a specific Layout to come up on designated pages is rockin' away perfectly though - I'd be happy with just that becoming a permanent option somehow. -- George Orwell III (talk) 15:02, 4 September 2011 (UTC)

Sorry for editing your JS by accident, I was copying it to my JS and got the wrong tab. Try again? Inductiveload—talk/contribs 17:37, 4 September 2011 (UTC)

tsk... I wish you'd set me up like a real playa. I wasn't concerned.

This code actually has nothing to do with proofreadpage_numbers_inline, but it probably should. However, the underlying code does not support dynamically removing the page numbers, so we have to do it ourselves. Inductiveload—talk/contribs 17:41, 4 September 2011 (UTC)

Huh? I'm under the impression that the blank span holding the pagelist-accurate scanned page No. (id="..."), the interlink (title="....") and a class equaling pagenum will always be present

set to inline with show page links enabled (Display menu shows [hide page links] text), the pagelinks are bold text and a deep blue[ish], appearing exactly at the point where prior associated Page: content ended and the next Page:'s content really begins. No hover or mouseover highlighting takes place.

set to inline with hide page links enabled (Display menu shows [show page links] text), the same exact thing as the above "occurs" except those spans with a pagenum class all get what amounts to setting span style to display:none and all that x and y axis parent to child to grandfather "offseting" padding and tweaking and negative absolute yada yada disappears for the most part.

All I was hoping for is the same possibilty when inline is false but targeting the spans ....

.... and all that skewed highlighting of most anything and everything BUT a point from one inline pagelink measured to the next inline pagelink (or capped by the previous one).

I don't care where the pagelinks go or wrap or cloak or whatever. I just need to be able to toggle them on or off no matter if they are absolute left, outta-sight right or ontop of each other. All of that is just not useful for legislative based stuff.... especially when targeting [side]notes to their subject matter is more of a factor.

Nothing changed in that last run btw. Don't go wasting anymore time on this - like I said earlier; locking in even Layout 2 provides enough progress well into the future for a sliver of areas. If and when there are scans of those Editions or Compilations where the "old" Page Nos appear inline in print, I can come back and bug you with this -- George Orwell III (talk) 19:12, 4 September 2011 (UTC)

The problem I have here is that I am manipulating the spans and divs as I am handed them by the underlying code (which is the bit that actually listens to self.inlinepagnumbers=true), but I can't change how that code does them in the first place, hence the clunky duality of "page numbers on/off" and "inline page numbers on/off". I think the "best" way would be to hack in the actual page number code and make it a single, cyclical, three-choice option: "page numbers on/inline/off". This would be a more advanced route, and would produce a "neater" interface in the sidebar (one button to cycle the page number-related option), and would reduce the points of failure by not having local custom code running over the oldWS code which could change in future. However, it needs hackery within the code at oldWS, which is tricky to test robustly because to test it, I have to block the real JS and load a version from my userspace (or from my machine via an HTTP server), which only works in Firefox+AdBlock. Inductiveload—talk/contribs 09:32, 11 September 2011 (UTC)