I thought you might like to know that the Phil. Trans. is also available at http://rstl.royalsocietypublishing.org/content/by/year. The banner at the bottom and a notice at the top is present, but since you have a script to cut that out, this could be a useful resource as it doesn't need a login like JSTOR (and maybe wouldn't be geared to detect scraping ;-)). Inductiveload (talk) 04:48, 10 February 2010 (UTC)

Useful, sometimes a few pages are missing on commons. Unhopefully the script I use can only remove the bottom banner but there is probably some way to workaround that. Phe (talk) 07:42, 10 February 2010 (UTC)

A couple of weeks I made a cryptic comment on references to figures in the Geological Transactions. I had arrived at this page on which there is a reference to "(Pl. 31. fig. 1, 2.—Pl. 31*, fig. 3.)". That is how it stands in the original text where the implied instruction is "go to Plate 31 and there look at fig.1 etc". But now the figures have been moved into the main body of the text so perhaps in the edited version we should change the reference to something like "(figures 1 and 2 above and figure 3 below)". I'm not sure if the original plate pages will appear in the final mainspace form but, if they do, there would presumable have to be a link to them as well. The reference would then be something like "(figures 1 and 2 above and figure 3 below and also on Plate 31)". That would be clumsy. I bring up this topic because it will recur throughout the volumes and an agreed policy would be useful. Peter Mercator (talk) 21:16, 28 March 2010 (UTC)

I modified page 448 to try two ways to provide more information on the plate number, the first use a tool tip on the plate to provide the plate/figure number, the second is more intrusive and use thumb. I prefer the first but perhaps the more explicit way is better?

Thanks for your comments. My gut feeling is to add figure captions which define the plate and in the text link to the plate captions (as you have illustrated). (I'm not particularly fond of tool tips.) Unless you strenuously object I'll go ahead with this method (in this article and the rest of the volume).

There is something very odd with the figures in this section. Plate32-fig1 is not explicitly mentioned anywhere in the text and it is clearly in an inappropriate place. Since it was also grouped with Plate31-fig2 which is used in the (same) author's next article (section on Assynt) it may belong anywhere between its current position and the position of Plate31-fig2. Perhaps it will become obvious on a closer reading of the text of the two articles. If I can find a suitable spot I'll move it. If not I will simply remove it from the page namespace.

I would have liked to search the whole of the text of this article with a more powerful editor (emacs). Is it possible to do this? Presumably the pages get assembled in transclusion.

On a more general level, as a newbie to WS I'm not sure where to direct my queries. They landed in your space because I presume that you put up these pages and did the first proofread. Thanks for your patience. Peter Mercator (talk) 22:44, 29 March 2010 (UTC)

First there was a mistake in the example I gave, the anchor name for description was the same than the anchor for the plate itself so this change, now we have anchor for plate description as "#Plate descr plate_nr" and plate anchor as "#plate plate_nr".

I've nothing against a caption added with the thumb parameter but in some case you'll need to decrease a bit the image width, in a few case, with 440 pixel and thumb parameter to [[Image, the image will not fit in the column width when transcluded.

I noted also the problem with Plate 32, figure 1, but the order of plate make sense, plate 31, 31* and first part of plate 32 for this article, second part of plate 32 for the other article, and the description of plate 32 figure 1 is (Descr) : "Plate 32. Fig. 1. Contortion of mica slate at Loch Lomond, p. 438." Phe (talk) 09:25, 30 March 2010 (UTC)

(update) I found it in the Errata page (which was not visible in the Index:), Errata, so pl. 32 fig. 1 must really go after figure 3, perhaps just after and not a bit below as I did. Phe (talk) 09:33, 30 March 2010 (UTC)

Good. So the order of plate/figs seems to p31f1,2, p31*f3(only), p32f1 and finally p31*(complete). Shall adjust captions and text as per errata (unless you have already done so). I'll do all the errata if you wish. To prevent errata edits being undone I suggest a comment in the edited text. Peter Mercator (talk) 10:21, 30 March 2010 (UTC)

I didn't apply the Errata (or rather I don't remember to apply it for any volume of the GSL), a comment in the text for errata will be fine. Phe (talk) 10:28, 30 March 2010 (UTC)

Wikisource:Scriptorium is a good place for general purpose and technical question. Perhaps someone will now how to get the text for a set of page. It's probably not difficult to do with pywikipedia's scripts. Phe (talk) 09:33, 30 March 2010 (UTC)

Sorry. Just hacking away at the moment. Give me a chance to get on even keel.Peter Mercator (talk) 15:20, 30 March 2010 (UTC)

Oops, I've been confused by the time of your edit, I though it was two hours ago, but it's the offset from my local time to utc time... Phe (talk) 15:22, 30 March 2010 (UTC)

OK, I give in. The link on page 448 now goes to the appropriate description section but within the section the three links to p31, p31*, p32 move down the page to plates 31, 31, 31*. What's going on? The anchors on the figures seem ok. Is there any significance that the page links adjacent to many plates have two numbers superimposed? (BTW, I'm in Edinburgh UK and now on UTC.) Peter Mercator (talk) 15:59, 30 March 2010 (UTC)

I don't see this behaviour, from the description, the three link go to separate plate. The duplicate page number shouldn't change anything, from the description we don't link to page number but to #Plate xxx. Perhaps you are trying the link from On the Geology of various parts of Scotland in main:, but after you modified page 448 you didn't reload the page so you continue to use the version with the link not corrected. Phe (talk) 16:12, 30 March 2010 (UTC)

Well that's my first WS article done and dusted. I now understand a little about WS but I feel that I've just scratched the surface. I haven't validated the four pages starting at 448. Perhaps you can check that the figs and captions all work together. (I did check the text). Peter Mercator (talk) 22:19, 30 March 2010 (UTC)

I made two changes ([2]), the added comment is not useful once the page is transcluded, the referred figure are shown just above and the meaning of "in some of the figures" is enough evident (well, that sort of things is also a matter of taste, but I preferred to keep the text as near as possible to the original text), second change follow the same rational, I applied the errata as it without any change (except pl. 31 --> pl. 31*), even if your wording was better. I let you to validate these four pages. (I could advance the proofread state, but the intent of validation is that only a different editor can advance the state a second time to ensure the state comes to "validated" only after it has been checked by two different people, in some case the software enforce this policy but not in this case) Phe (talk) 06:58, 31 March 2010 (UTC)

I presume that the readers of the future will probably arrive via the page for Geological Society of London and will then head off to say volume 2. They may be armed with an author name, say John MacCulloch, and a (true) title, say "XVIII. Miscellaneous Remark accompanying a Catalogue of Specimens transmitted to the Geological Society." Well, where are they? Your short title for this article "On the Geology of various parts of Scotland" is not an obvious choice and there is no sign of names. Of course one can find the article by going into the actual contents page and then jumping to the text. However I find the clash between full titles and your short titles a little confusing, particularly when both are visible at the same time as here. Moreover, on the author page for MacCulloch we find the invented short titles: surely one must have the actual titles listed here. Sorry to stir up problems but I was very puzzled when I started editing MacCulloch's paper. How puzzled will the default reader be in years to come? Perhaps there should be no use of short titles? Peter Mercator (talk) 11:57, 31 March 2010 (UTC)

I hesitated a lot on that when I started to work on GSL publications, but first a remark, I didn't invent these titles, short titles come from the running header at top of each page, and it's a common habits to use the short title variant to refer to such paper. Compare a search on short title vs. long title, both are used, and when the title is very long, the short title is always used more often than the long.

Beside that I agree it's problematic to show only the short title everywhere especially in the case you pinpoint where the short and long title are completely different. Perhaps we should transclude the three Contents pages instead of using a short title summary in Transactions of the Geological Society, 1st series, vol. 2, but 1) we will lost the link to Plate and Maps 2) we will get exactly the same problem you describe, how people will figure out than "On the Geology of various parts of Scotland" is identical to "Miscellaneous Remark accompanying ..." ?. For the author pages, I dunno if it's better to use the short or long title. For the article title itself and the prev/next article links in each article, the short title is the only viable option, as article title length is limited to 256 bytes. It'll interesting to check which title variant is used inside these books themselves, the short one, the long, both ? Phe (talk) 12:49, 31 March 2010 (UTC)

This game gets more interesting by the day. Sorry to accuse you of invention; I should have spotted that you were using the short titles. As you point out both long and short titles are used in the literature. So may I just add one or two tentative suggestions for I know that you have already invested a great deal of thought and effort on GSL. Since a reader might enter searching for long title/ short title/ author, perhaps the main entry for Transactions of the Geological Society, 1st series, vol. 2 could be structured as follows:-

I don't know how your data is structured so I can't assess whether such a page could be easily constructed. Other points. Does the prev/next banner have to use any titles at all? Simple prev/next 'buttons' could possibly suffice? Possibly the index page could have the same four column table instead of the present list of sort titles? (But does it need titles at all?) Finally, I suppose that the authors page should probably include both long and short titles!Peter Mercator (talk) 14:48, 31 March 2010 (UTC)

I've started a test page, but I've not a lot of time to enhance it (the two first test are constrained to a short column but I dunno if it's a good idea). Don't take care about the red link, they are blue when transclusion is done in the right page. It shows already a few problem, transclusion of contents is easy and reuse existing pages but do we need the long author description ? "By J. Mac Culloch, M.D. F.L.S. Chemist to the Ordnance, and Lecturer on Chemistry at the Royal Military Academy at Woolwich, and Vice-President of the Geological Society". At the bottom of the test page I've added the first row of a table showing all the information we need, perhaps we need only this table (beside that an advantage of a table is that we can make it sortable by column (exemple). For the index I think we need to keep a summary, it's the habits on most Index:, but we can't use the four or five column table of contents as in the test page, it's probably too wide... For the data structure we have, it consists actually only in the Page:*, we can also add some sort of markup tag in Page:* to be able to transclude a specific portion of page, but the Page:* code will become quickly clumsy. Reversely we can transclude a portion of the Main: page in the index to avoid code duplication but it's boring to do if we go to the table solution. If you have ideas, feel free to test them in the test page by adding a new section or modifying the existing one. Phe (talk) 16:32, 31 March 2010 (UTC)

Perhaps what we have to do is to add a rubrik to the main page for volume 2. The structure of the page would then have five elements:

A left bold heading saying simply End matter.

Four links in column for prelims/donation/index/plates. (Prelims links to pp i-ix)

A left bold heading saying simply Articles.

A rubrik along the following lines: "The following list of authors and titles is supplemented by a list of the short titles used in the running heads. The short titles are frequently used in references to this volume from later volumes in the series. (And from other publications?)

The actual table. After the above rubrik it is ok to use "Short title" in the header line. Is issue required? (Perhaps later volumes do have divisions into issues?)

Once the above distinction has been made the prev/next structure is ok. Perhaps just leave the index page as it is. There is no need for "long authors" on the main page: their qualifications and affiliations appear elsewhere.Peter Mercator (talk) 11:13, 1 April 2010 (UTC) (Update) I have added an example of the above layout to your test page. Peter Mercator (talk) 14:04, 2 April 2010 (UTC)

I've added one more section to the test page, by omitting the short title when it's identical to the start of the title, I think it's enough obvious than On certain Products obtained in the Distillation of Wood, refers to same article as On certain Products obtained in the Distillation of Wood, with some account of Bituminous Substances, and Remarks on Coal.Phe (talk) 15:09, 3 April 2010 (UTC)

One or two final mods to the rubrik and layout. I'm perfectly happy to go ahead with this format (perhaps with the table centred and perhaps no italics). I'm prepared to help you on these pages for the four volumes. I would have to hack away with emacs macros but you may be able to do better with scripts. If the prelims page I constructed is ok then it could be copied over to with the appropriate name and then we need three more. Cheers, Peter Mercator (talk) 21:56, 3 April 2010 (UTC)

Table centered, I increased the font size to 90%, either we need a 90% font size or remove the italics for short title. I can't devote time to do the real work at least for a few days. Phe (talk) 08:04, 4 April 2010 (UTC)

Further mods. Narrowed text (text lines were too long). Moved author to right of title as in actual contents. Played with the markup as a learning exercise. The apology: when you originally had a column headed 'issue' I thought you implied the volume had been published in issues and then bound. I don't think this is the case. The numbers are simply serial numbers within the volume and they should be present but I don't think the column needs a title. Lets defer major action for a few days until this page is stabilised. Peter Mercator (talk) 21:47, 4 April 2010 (UTC)

I've duplicated the last line to try to put the short title on the same line but right aligned, it doesn't work... Do we need really a line feed between long/short title? Phe (talk) 09:33, 5 April 2010 (UTC)

There is only one instance of a different short title in this journal so I think we can afford a linebreak so that it stands out clearly. I don't know if similar problems arise in other volumes. The link to title etc is now to a new page for all the prelims (here). I have added appropriate prev/next pages in the prelims and the first article. I haven't deleted the link to 'Contents' on the main page but it should probably now go. I'm ready to edit the other titles on to the trial page and I'll go ahead now unless you object.Peter Mercator (talk) 20:03, 5 April 2010 (UTC) Update: decided to have a go anyway. What do you think?Peter Mercator (talk) 22:43, 5 April 2010 (UTC)

Yes, time to go ahead, I've added a last final test version with sortable column on page number and Author, but I used directly span style="display:none" to specify the sort key, perhaps we would import the relevant template from w:en:Category:Sorting templates unless they already exists in ws: Phe (talk) 09:58, 6 April 2010 (UTC)

The sorting is a neat trick. What now? Will you put up the page or should I? As for the future I'm happy to tackle the contents of the other volumes, but not in a rush. It's time to proofread a little more. (Have just replied re 'always'). Peter Mercator (talk) 21:33, 6 April 2010 (UTC)

I must admit I prefer to proofread too. Phe (talk) 07:18, 7 April 2010 (UTC)

I have made some minor changes to the index page for vol2, here. I have added serial numbers to the first two articles in the contents list and indicated the pages on which these articles start. My motivation is that the index page doesn't indicate the pages where the article start. The changes help editors (I hope). Is there a better way? Shall complete these mods if you agree. Once again I'm thinking of how best to set up pages for the future volumes. Peter Mercator (talk) 21:13, 10 April 2010 (UTC)

Fine for me, I made a minor change. Beside this change, I think the /Prelims page shouldn't use an abbreviation but be moved to /Preliminary pagesPhe (talk) 09:28, 11 April 2010 (UTC)

Shall make further mods along lines of above. In retrospect I think it would have been better to have separate pages for (a) Preliminary pages and (b) Contents pages. Peter Mercator (talk) 14:29, 11 April 2010 (UTC)

Please look at this page. The page name is incorrect: it should be "Transactions of the Geological Society, 1st series, vol. 1/On the Geology of some parts of Hampshire and Dorsetshire". Do you have admin rights to move the page? You will also see that the title of this page is also incorrect: it is been added as "On the Wrekin . . ." (Hence prev/next header bar is nonsense). I'm just assuming that I can't move the page without admin rights. I could of course create a new page with the correct name but that doesn't seem the right way. (You must have been having a bad day. Look at the first word of the article!) Peter Mercator (talk) 21:38, 10 April 2010 (UTC)

A gross error, good catch, no need for admin right to move a page but fixing the links can be a bit tricky, many of them comes from Page:* but "what links here" doesn't show them due to the use of convoluted template to create these links. I found another error : On the physical Structure of Devonshire and Cornwell --> Cornwall, I'm fixing it too. Phe (talk) 09:05, 11 April 2010 (UTC)

Now moving on to validate Phil Tran volume 4. Perhaps you might like to do same for numbers 1,2. I shall leave number 3 for someone else to validate since I have no wish to edit a single page which uses 'long s'. I didn't use your first line indent style because it screws up the positioning of the drop cap (after transclusion only). Is there a way to have a dropcap for the first para (of a section) and a first line indent for all subsequent paras. In general I have always used no first line indent for the first para of a section and indents for subsequent paras.Peter Mercator (talk) 20:00, 13 April 2010 (UTC)

There is two problem with dropcap and indent though css. The First letter (the figure) is shifted to the right. The second letter is shifted too, creating a big gap between the first letter and the second. I know only how to fix the second problem, by using a double line feed before the dropcap, see page 57. Phe (talk) 08:54, 14 April 2010 (UTC)

(update) Do you think we shouldn't use indent through css and use {{gap}} all over the page ? Phe (talk) 16:47, 14 April 2010 (UTC)

Priorities? It is clear that the drop gap must look ok on the final transcluded version. If the only way of achieving this alongwith subsequent indents is to use gap commands then perhaps we should do this. This is a small pain at the moment but once again it is probably good to agree a solution before launching into the next hundred volumes.Peter Mercator (talk) 21:23, 15 April 2010 (UTC)

I changed the dropcap template to never do any indent, so it's usable now, except it must be preceded by a double line feed to ensure the text following the dropcap is not indented. Phe (talk) 06:48, 25 April 2010 (UTC)

Phe. I've been doing a fair amount of editing as you will realise. Most should be unexceptionable apart from this page. There were a few errors in the table which I have corrected but I moved the status back to proofread because I think the table should be rechecked. Is moving status back an acceptable action? The other edit may upset you: removing the latex fractions and using 'standard' fractions. My reasons are entirely aesthetic for the latex fractions are just too ugly. OK, they mimic the upright fractions of the original but they are much too large---and fuzzy into the bargain. If you object strongly then I'll undo these edits. If you accept them then the rest of the table needs attention. (Of course this method breaks down for fractions not in the 'standard' symbols.)Peter Mercator (talk) 17:00, 25 April 2010 (UTC)

If you found real error (I mean things other than obvious misspelling where reader can have little doubt about what is the correct word), moving back the status is right for me, moving back is fine too for difficult to read table or text. I started to use latex everywhere but I tend to use the unicode fraction nowadays, these changes are fine for me. By the way, I commented in the above section about the dropcap template. Phe (talk) 18:17, 25 April 2010 (UTC)

Thanks for that & for cleaning up after me at commons. How did you do that? And also do I need to confirm the copyright thing beyond the National Library saying its out of copyright in Australia?Misarxist (talk) 12:04, 15 August 2010 (UTC)

For the copyright I think the claim of the National Library is sufficient but I'm pretty lame on copyright issue. For the djvu I used tesseract to do the OCR but it is tedious to do, I needed to treat each image to strengthen the character drawing. I've no easy how todo it, rather it involded using djvulibre tools to extract image, ImageMagick tools to treat each image, tesseract to do the OCR and an additional pass through a sort of check speller to clean it up a bit, and even with that the text layer is full of error. Phe (talk) 12:42, 15 August 2010 (UTC)

Can you think about some potential magic. I am wanting a lazy boy approach to have a search link(s) when operating in Author: namespace that enables me to:

perform a enWS search (prime focus, and primarily main namespace), and here looking to identify WORKS ABOUT AUTHOR

potentially expand to a enWP search and Commons search, primarily for existing pages where we are looking for related articles or possible images

and more potentially a search into Author namespaces of other sister wikisources, example if German author, I might want to see if they have a German page then I can interwiki both ways at that time.

I see the tool more relevant for those of us who construct author pages, and do the more gnomic work. Not sure whether it is a gadget or just something that I would insert into local user files. Thanks for your thought power here. — billinghurstsDrewth 02:08, 25 August 2010 (UTC)

The two first are difficult to do in a reliable way, lookup in plain text (as opposed to lookup in metadata) is difficult. I didn't found any library catalogue with the necessary API to allow some sort of lookup to get author works/about author works. I'll look the third later. Phe (talk) 14:27, 25 August 2010 (UTC)

If it came to a plain text, that is better than nothing. For WS/WP/Commons, even some sort of intitle:searching may be better than nothing for existing articles, even the ability to undertake the search that you take for new author pages would be useful, seeing that it is a sort of check that can be useful to run against pre-existing author pages where they may not have changed for a number of years. I wasn't thinking of a perfect tool, I am thinking of a tool that provides some ease to those who are doing the repetitive maintenance gnoming tasks through the site. — billinghurstsDrewth 02:19, 26 August 2010 (UTC)

I've done the third, but it's limited and half broken, add to your monobook importScript('User:Phe/Interwiki.js'); and regexTool('Adding interwiki', 'add_interwiki()'); to your rmflinks() function. It check only for article in it:, pt: and fr:, see the comments at begin of User:Phe/Interwiki.js why this is done this way. Beside that, it adds iws blindly, if iws already exists some can be duplicated. The script doesn't sort the added iws. It's also of extremely limited use, there is not a lot of page with identical author name. You can test it on Author:William Stanley Jevons.

For your two first request, are you searching ala Template:Search author but with more links and added somewhere automatically ? I'm unsure how and where to add it and how to hide it for people who don't want it or ips. Phe (talk) 07:50, 26 August 2010 (UTC)

Ermm, the first can be done partially, I implemented it, add importScript('User:Phe/Works about.js'); to your javascript and regexTool('Works about', 'works_about()'); to your rmflinks() function. This script works only if you use the preloading header gadget. Try it on Author:Johann Joachim Becher. Caveats: as you can see the section is added at start of the author: page, adding it at end it error prone because the script in this case act as if nothing occur. The script doesn't try to handle already existing works about entry, it's up to you to remove/move the code. The script handle only a few case, see the begin of the script or ask me with the links you want to be checked by the script, useful information is, a link to an existing article, the way to create a link to such works (template or direct link). Phe (talk) 13:00, 27 August 2010 (UTC)

A buglet that I have found with the new creation tool is when one is creating a page name where the last compound word is not a word, eg. Author:Andrew Balfour (1630-1694). It takes the parenthetic word as the last name. Could we do it that it ignores paranthetic words and takes the previous last word. If that is a little tricky, then not a bother, as I can amend manually. — billinghurstsDrewth 04:16, 25 August 2010 (UTC)

Done, I handle too a few special case, von, van, de and le, but Van or Von remains unhandled. Phe (talk) 13:04, 25 August 2010 (UTC)

What happens is that when I open Extract from Captain Stormfield's Visit to Heaven/Chapter II and have a look at the page links on the left of the text, when I reach page 64 the links are overlaid one on top of the other, then from page 67 on they appear normal again. I don't know whether it's only me who can see that. --Paolo81 (talk) 20:03, 29 September 2011 (UTC)

Tried emptying the cache but no luck. Finally opened it with Firefox instead of Safari and it shows ok. Mah! --Paolo81 (talk) 20:06, 29 September 2011 (UTC)

ok, I see the trouble with chrome too, a bug somewhere ;( — Phe 12:56, 1 October 2011 (UTC)

Hi Phe, You did text layers for the EB1911 DjVu files on commons, and the results are very nice. Would you be able to do the same for the three EB1922 volumes (30, 31, 32)? Thanks, Htonl (talk) 11:57, 15 December 2010 (UTC)

Actually, never mind, as I see the files do have text layers already. Now I need to figure out why the text isn't automatically being put in the edit box for the Page:'s. - Htonl (talk) 12:03, 15 December 2010 (UTC)

It occurs from time to time, uploading the file succeed but the text layer is not properly handled, you need to purge the File:, there is a gadget on commons to add a « purge » link to the left menu (but I guess you already found this trick :). — Phe (talk) 19:44, 18 December 2010 (UTC)

Phe, When you have a spare moment or three, I would appreciate it if you would be able to run your special index script over The Art of Bookbinding/Index to pair the page numbers on the index pages to the chapters. Thanks. Billinghurst (talk) 11:09, 11 March 2011 (UTC)

If the transcript is good, say a PG text, there is nothing to be done. Moving it to a scan is pointless.

If the transcript is different, another or multiple editions, then it is very misleading to place it against a scanned edition. Detecting any differences, if attempted, is very difficult and time-consuming, much more bother than correcting ocr.

I don't think this bot should be automatically invoked, especially by users who are unaware of these considerations. CYGNIS INSIGNIS 00:22, 16 August 2011 (UTC)

This user added these text, look like you know better than him from what edition these text come... — Phe 00:38, 16 August 2011 (UTC)

A general comment on how this is being used, but in the recent example I don't know anything, because PG doesn't specify which edition it is. Even investigating that takes more time, and I know from experience that PG texts do not "match". CYGNIS INSIGNIS 01:20, 16 August 2011 (UTC)

Hi. I saw your (I guess) statistics page here. My goal is to create statistics for PSM project. Before I saw your page I did something with the help of Hesperian, based on pywikipediabot. You can see some results here, User:Mpaa/Sandbox1, User:Mpaa/Sandbox2, User:Mpaa/Sandbox3. I have 2 questions:
1. could your tool and graphs be customised/used to look only at PSM project pages so that we could use that instead? That would be much better than my newby attempt.
2. if not, any suggestion on how I can extract not only current status but also create deltas in a smart way, given the API I am using? E.g. using timestamps maybe?
Thanks. --Mpaa (talk) 19:24, 24 October 2011 (UTC)

1. not easily.

2. I don't think there is any way to get past statistics, the only way you have is to gather statistics each day, save it somewhere and allow diffing with previous days. Dunno what API your are using actually but an efficient way is trough api.php [3] (note how I put cllimit twice as gaplimit to simplify iteration as it allows to get all the needed information with one query per volume), I don't think the needed code actually exists in pywikipedia. — Phe 15:22, 25 October 2011 (UTC)

The Websockets stuff doesn't seem to work for FF7, well, this throws the error your browser does not have websocket support; try Google Chrome or Firefox 4.

I cannot remember the syntax for http://toolserver.org/~phe/stats_diff.txt to get it display a longer period. I am wanting to check the Page: ns traffic for all wikis back to when the LF error was introduced. Trying to work out how much I would be letting myself in for if I bot fixed it. Thanks. — billinghurstsDrewth 03:55, 29 October 2011 (UTC)

We want to bring 100-150 people together, including lots of people who have not attended such events before. User scripts, gadgets, API use, Toolserver, Wikimedia Labs, mobile, structured data, templates -- if you are into any of these things, we want you to come!

I also thought you might want to know about other upcoming events where you can learn more about MediaWiki customization and development, how to best use the web API for bots, and various upcoming features and changes. We'd love to have power users, bot maintainers and writers, and template makers at these events so we can all learn from each other and chat about what needs doing.

I'm new to the use of Phe-bot but ran the Match (and Split) routine on the page The Book of the Thousand Nights and a Night/Volume 3 and it has put the first word of every page onto the previous page. Is this a known problem? If so, is it being fixed? Is this the right place to report errors? Chris55 (talk) 23:42, 13 July 2012 (UTC)

Yes it's a known problem, I get again a look at it and there is little way the bot can be improved. The trouble come partly from the running header in the ocr, the note at the bottom of page (on some page) and the ocr error (not a lot in this work). The text never match exactly, get a look at the proposed text at boundary of page 17-18

Then she wept with sore weeping and waxed wroth and shuddered in my face with skin bristling [FN#1] and looked at me with "
and the ocr
" TJaea she wept with sore weeping and waxed wroth and shuddered in
VOL. III. A
2 Alf Laylah wa Laylah.
my face with skin bristling^ and looked at me with "

the " VOL. III. A 2 Alf Laylah wa Laylah. " part in the ocr make the matching very approximate. — Phe 11:46, 14 July 2012 (UTC)

Btw, it's the reason why the match and split is done in two step, first step match, then adjust manually the page boundary (better to upload the djvu and use a djview viewer to do that) then the second step, split the text with the bot. — Phe 11:50, 14 July 2012 (UTC)

Thanks for looking. Yes, there's a footer and header to deal with, but apart from that (which is fairly predictable) the scans are pretty good. The line starting with a number at the top is a giveaway that's it's a running header and should be stored there.

Possibly it doesn't matter (after all it will produce the right output), but since it gets so close and is relatively predictable in its mistakes I wonder whether it's worth looking at the code. Is it possible to have a look at it? Chris55 (talk) 17:13, 14 July 2012 (UTC)

Remember it's a general purpose tool working for many lang, book type and ocr accuracy, it's not so predictable it look like. The code is available at [5], match part is done in align.py, function do_match(). Perhaps a solution will be to systematically move the last word of a page to the next page, if this is a very common error. Note there is already an auto fixup in some case (the part with a comment starting with # Move the end of the last page to the start of the next page ...), adding a if not match: # move the last word... after this if ... else will do the trick. — Phe 11:53, 15 July 2012 (UTC)

Thanks, I really wanted to see how feasible it might be to add other common but optional tasks, such as converting to wiki line structure and dealing with headers/footers. But I'm really learning... Chris55 (talk) 12:27, 15 July 2012 (UTC)

Gday Phe. If you are still in the business of linking index pages, would you be so kind to do you magic with the linked work. At the moment there are just a few pages to be validated so should ready to go shortly. Thanks if you can. — billinghurstsDrewth 14:52, 2 December 2012 (UTC)

I don't know if there is any pieces of code related to export in EPUB of frWS works that use DL nomenclature. You should ask Phe (talk • contribs) who manages Dictionary.js and is also involve in WSexport. Tpt (talk) 14:44, 27 May 2013 (UTC)

There is no support for exporting DL to epub, where do you see this ? Beside that I'm less and less convinced by DL, and discourage its use on fr (lack of category in DL article, lack of automated export of microformat.Look like tou have something like that in this example). Currently I prefer article dictionary creation by a bot using the section tag ala ## "Abegg" ## to get the article name and the page number. — Phe 22:09, 29 May 2013 (UTC)

{{ALL TEXTS}} update by bot, anyhow same bot can we use for other wikisourece like ours (bn.wikisource.org) statictis 1,039,505 is not actual number of text in wikisource in all wiki,. Please help.Jayantanth (talk) 18:02, 18 November 2013 (UTC)

Heyho. Commons deletionists are now identifying that some Commons djvu/pdf files may have the Google added lead page that they see as copyright, and thus 'poisonin'g the whole djuv/pdf file, and therefore delete it all. Is there some for a cropbot like tool that grabs the djvu and/or pdf file, pulls it to toollabs, removes and replaces the lead page with a blank page, and then puts the file back as an overwrite? Thanks. — billinghurstsDrewth 23:56, 21 September 2014 (UTC)

hmm, unlikely to be done 'cause accepting such removal is not compatible with CC-BY/CC-BY-SA. Some commonist should read the licence they accept. "attribution – You must attribute the work in the manner specified by the author or licensor" ([7]). emphasis mine, I think the wording is enough simple, perhaps if they read it a dozen of a time they'll start to understand it. — Phe 18:22, 23 September 2014 (UTC)

Beside that, they'll not happy with that, what about the watermark on each page ? — Phe 18:43, 23 September 2014 (UTC)

Stamping a work that is not yours with CC-BY/CC-BY-SA doesn't make it true. The watermark is a watermark, if they can be stripped, then I am all for that too. — billinghurstsDrewth 00:12, 24 September 2014 (UTC)

Ok on that point, but it's also a simple matter of politeness, if the watermark is not hiding some part of the image, I'm not at all to remove them. — Phe 00:54, 24 September 2014 (UTC)

My recent efforts should have eliminated a few entries. Any chance of re-generating the list, or making it a dynamic labs query against the Wikisource database?ShakespeareFan00 (talk) 22:57, 24 September 2014 (UTC)

Hi Phe. I am looking at a report which is a report from 1 Nov to today (at time of creation), and it shows me these results

Difference between Fri Oct 31 2014 and Thu Nov 27 2014

Page namespace

Main namespace

language

all pages

not proof.

problem.

w/o text

proofread

validated

all pages

with scans

w/o scans

disamb

percent

fr

12126

3095

-7

1148

7890

2018

1124

2743

-1669

50

1.07

en

10850

3916

218

677

6039

5680

1205

1347

-169

27

0.31

…

If I interpret that report it means that this month that enWS has only moved 359 works to the proofread only status this month. If that is the case I find that hard to believe as I have done 100+ on one book alone (none validated) and Ineuw has been working on his PSM. Could you please explain this to me? — billinghurstsDrewth 01:00, 28 November 2014 (UTC)

Hi, you get 359 with 6039-5680, but this has no meaning, what you have here is during this period you get +6039 yellow pages and +5680 green pages, all number for a given state are the number of page at the end of the period minus the number of page at the begin of the period. — Phe 23:36, 29 November 2014 (UTC)

A note on the -7 problematic pages, it doesn't mean that 7 problematic has been fixed, but rather if during this period 10 pages has been marked as problematic then 17 problematic has been fixed so on the overall there is -7 problematic pages, same apply for other field, +5680 can mean than 5780 has been validated but 100 has been downgraded or deleted. This is important to understand variation of yellow pages, if 1000 pages are passed from yellow to green and zero page has been moved from red to yellow then you'll get -1000 yellow and +1000 green. — Phe 23:43, 29 November 2014 (UTC)

Then on the page the wording that says "The "proofread" column counts all the pages that have been proofread : [category q3] + [category q4]" seems incorrect, as proofread by your logic is just [category q3] — billinghurstsDrewth 00:28, 30 November 2014 (UTC)

Ok, I see the trouble now, I'll need to get a snapshot of all pages state with their title at 24 hours intervall, and compare the change to the en: recents changes. — Phe 00:59, 30 November 2014 (UTC)

After thinking about it, the way you get 359 is not correct, 359 is the number of page proofread during this period AND that wasn't validated during the same period. Let say today only one page was proofread then validated, tomorrow in the stat the validated field will be increased by +1 and proofread by +1 too. Because proofread is q3+q4 doesn't mean mean proofread will be increased by +2. The state changed twice in a row but what is visible from the statistics is only one change from red --> green, because statistics provide count at fixed point in time not the continuous evolution of page state. So if you count proofread page by (validated - proofread) fields you'll get zero, not the correct result. I remember Ankry had question about that one or two years ago and it look like many people are confused about statistics, even me I've trouble to remember how exactly it works each time ask for clarification on statistics... — Phe 22:20, 30 November 2014 (UTC)

Yes, I understand that, though I still don't believe it. As I said, I had done 100+ pages of a work which would equate to over 1/4 of that change (none are validated). I don't find it credible that the dynamic of enWS changes that amount from a PotM especially when historically we proofread ~250 pp a day, and validate approximately half of that. That is to say that I don't think that we drop from 30x 250 to 30x 12. I can run those basic statistics in my head, and analysis of data is part of my work, and to me even with a concerted effort in validation there are plenty who don't participate in PotM and continue on with their projects and that is predominantly proofreading. Just looking at Special:Contributions/Ineuw, Index:Popular Science Monthly Volume 26.djvu and Special:RecentChangesLinked/Index:Popular Science Monthly Volume_26.djvu and my additions to my work would seem to threaten the proofread additions alone theorem. — billinghurstsDrewth 09:55, 2 December 2014 (UTC)

Not the first time that sort of things occur, [8] and look at [9] for example, a lot of validated page. Anyway I'm downloading a dump of en.ws and I'll check if the tag validated in the text is consistent with the contents of the validated category. — Phe 12:59, 2 December 2014 (UTC)

Here the table:

Difference between Sun Nov 2 2014 and Tue Dec 2 2014

Page namespace

Main namespace

language

all pages

not proof.

problem.

w/o text

proofread

validated

all pages

with scans

w/o scans

disamb

percent

fr

15366

5049

0

1194

9123

2106

1185

2969

-1839

55

1.17

en

12291

4210

342

665

7074

5943

1277

1615

-366

28

0.39

Higher number than 359, around 1100 but the same pattern persists: unusual low number compared to another 30 days period. Here Validated page and Proofread page using the whole rc change during the last 30 days, from them I get 1300 page proofread, and from the stats 1130 page proofread, the 170 difference are today change (stats taken around 4:40), so yes, the result you showed look like unusual but it's not conclusive. The rc change table cover only the last 30 days so I can't completely check if the trouble is real or not. The result from the last dump taken the 26 November: q3 = 273636, q4 = 186234, q3+q4= 459870 (q3+q4 today from stats 461347), this doesn't look like buggy too. I guess I'll need to get a daily snapshot of the whole content of the validated/proofread/problematic/empty text category and if we get unusual pattern again to check for Page: disappearing from the cat (from last day and last week there is nothing unusual afaics). — Phe 16:57, 2 December 2014 (UTC)

The proposal entails the replacement of the current Header template familiar to most with a structurally redesigned new Header template. Replacement is a needed first step in series of steps needed to properly address the long time deficiencies behind several issues as well as enhance our mobile device presence.

There should be no significant operational or visual differences between the existing and proposed Header templates under normal usage (i.e. Desktop view). The change is entirely structural -- moving away from the existing HTML all Table make-up to an all Div[ision] based one.

Please examine the testcases where the current template is compared to the proposed replacement. Don't forget to also check Mobile Mode from the testcases page -- which is where the differences between current header template & proposed header template will be hard to miss.

For those who are concerned over the possible impact replacement might have on specific works, you can test the replacement on your own by entering edit mode, substituting the header tag {{header with {{header/sandbox and then previewing the work with the change in place. Saving the page with the change in place should not be needed but if you opt to save the page instead of just previewing it, please remember to revert the change soon after your done inspecting the results.

Your questions or comments are welcomed. At the same time I personally urge participants to support this proposed change. -- George Orwell III (talk) 02:04, 13 January 2015 (UTC)