This is a part of IEG Project documentation: Grants:IEG/Growing Kannada-language Wikimedia projects with a digital library - Page on Meta

Introduction

As a part of our project at Pustaka Sanchaya, we started identifying the books which are OUT_OF_COPYRIGHT at Digital Library of India & Osmania University Digital Library. We wanted these books to be made available to all with a Kannada index. That was achieved through http://pustaka.sanchaya.net. But as we didn't host the books ourselves, we were again dependent on the government websites to provide access to actual books.

Technical Barrier:

Government websites as many of you might know work like Government offices.

OUDL used to be available only during 10AM to 5 or 6PM IST.

DLI website used to provide books in tiff & djvu formats. People who used to find books on Pustaka Sanchaya, found it difficult to work with DLI mainly due to this reason. Those who use Windows machines never faced any issue as there was a plugin to read through djvu and other formats via browser. So, access to books found to be still limited.

Update: Last we observed that DLI team started uploading books in PDF format which is most preferred. But only in April 2016, we found that Kannada books were also found in PDF on DLI website.

Solution:

Solution to this problem would be to mirror the site on to a third party Internet portal which provides 24x7 access to these resources without any difficulty.

Our Preference:

We preferred to use Internet archive & Wikimedia Commons as platforms where we can keep these resources.

Copyright issues:

Though we found the platforms, we were sure that all the Kannada Books cannot be kept on archive or commons due to Copyright issues. We decided to pick only the OUT_OF_COPYRIGHT books and upload them here.

But with OUDL books (need to check this for DLI), books were uploaded as Image container pdf instead of Text pdf files(this is required for book preview in archive.org and only text pdf/djvu files are recognized by IA-upload(https://tools.wmflabs.org/ia-upload).

To overcome this issue, djvu file can be uploaded alongwith pdf file to internet archive so that djvu file can be used in wikisource later. Below is the modified spreadsheet to include djvu file -

identifier

file

mediatype

collection

title

creator

language

description

contributor

date

subject[0]

subject[1]

subject[3]

licenseurl

Kirluuskara_Lakshhmanaraayaru_1921

Kirluuskara_Lakshhmanaraayaru.pdf

texts

opensource

Kirluuskara Lakshhmanaraayaru

SV Kirloskar

kan

ಕಿರ್ಲೋಸ್ಕರ ಲಕ್ಷ್ಮಣರಾಯರು -- ಶ್ರೀ ಶಂಕರರಾವ್ ವಾ. ಕಿರ್ಲೋಸ್ಕರ್

OUDL

1921

Kannada

Old Kannada Books

Scanned Kannada Books from OUDL

http://creativecommons.org/publicdomain/mark/1.0/

Kirluuskara_Lakshhmanaraayaru.djvu

Now the books can be uploaded with the following command

$ ia upload --spreadsheet=books.csv

Uploading to WikiCommons

Once the books are uploaded to Internet Archive, they can be uploaded to wikimedia commons using IA-upload tool. All it needs is the book identifier from Internet Archive. Once the identifier(Kirluuskara_Lakshhmanaraayaru_1921 in above example) is given to IA-upload tool, it pulls all the metadata from archive.org and pre-fills the book template to review. If all looks good, book can be uploaded by clicking on Upload. In few rare cases where the book size is greater than 60MB, IA-upload may not upload the book to commons but it generates the book template. In that case, the same book template can be used in url2commons tool to do the upload(https://tools.wmflabs.org/url2commons/index.html). Unlike IA-upload, url2commons takes the djvu file url instead of identifier (for ex https://archive.org/stream/Kirluuskara_Lakshhmanaraayaru_1921/Kirluuskara_Lakshhmanaraayaru.djvu)

Put the book identifiers from archive.org to IA-upload tool to upload to commons.

If above step fails, copy the book template and use it in URL2Commons tool with direct djvu book link to upload to commons.

Go to uploaded file on commons and click on Wikisource link to fill the book details.

Fill the book details on Wikisource and save it to start proofreading.

Current Status

As of now, we have uploaded around 1006 Kannada books have been uploaded to Internet Archive. (215 & 791 Books are uploaded from OUDL and DLI respectively).

Conclusion

While it sounds like an easy process, it has its own difficulty for people who get involved in this process.

The initial metadata dump we used from DLI seem to have had a mammoth amount of wrong entries for authors & publishers and it has been changed again.

Also, the transliteration & review project conducted via Samooha Sanchaya efforts will require a further review to fix the issues quoted in #1.

Uploading to wiki commons from internet archive would result in error in case the book size is more than 50MB. This would require us to use URL2Commons to resolve the issue.

Books uploaded to Internet Archive are by default available in Image PDF. It won't be accepted by ia-upload tool. Hence, the metadata has to be updated to Text PDF or upload the same book in DJVU format. See here for details on this issue .

Internet Archive uses 3 letter ISO code for langauges in its metadata which causes problems when this metadata is used in ia-upload tool since it accepts 2 letter ISO code.(Ex. Kannada was kan in IA while ia-upload tool assumed this as ka(Georgian language)). This has to be corrected before uploading to Wikimedia Commons. Please refer detail steps here .

Same file names are used in DLI(Ex. Hariharana-Ragalegalu.pdf is used 4 files). and this creates filename conflicts when internetarchive python library is used to upload the books.