Re: Downloadable Arch Wiki

what about a script to convert all the pages to PDF (if possible, in such a way where relative links are preserved, like OpenOffice (... I mean LibreOffice...) can do? PDF'ing and packaging the whole wiki would be a nice option, not sure about size, but compact and portable would be covered... as well as easy access... I know I've seen a script that could take multiple pdf's and turn them into a single, page separated, pdf.. so I think it's a good option to discuss... Personally, I think keeping the pdf's separate, but in the same location, would be the best, as it would allow multi-part packaging of the whole wiki and multi-threaded downloading of the files in bulk...

Re: Downloadable Arch Wiki

CPUnltd wrote:

what about a script to convert all the pages to PDF (if possible, in such a way where relative links are preserved, like OpenOffice (... I mean LibreOffice...) can do? PDF'ing and packaging the whole wiki would be a nice option, not sure about size, but compact and portable would be covered... as well as easy access... I know I've seen a script that could take multiple pdf's and turn them into a single, page separated, pdf.. so I think it's a good option to discuss... Personally, I think keeping the pdf's separate, but in the same location, would be the best, as it would allow multi-part packaging of the whole wiki and multi-threaded downloading of the files in bulk...

pdf-format would surely add to the size. The current wiki docs package is only 7.5 MB, so no downloading trick are necessary.I was talking about searching the local wiki, how do you that?

Personally, I think it would be GREAT to allow users to mirror the Arch Linux media wiki.We would then never suffer the fate of Gentoo's wiki, and when the official wiki is offline,people would have the information still available.

Setting up a mediawiki is trivial, all we really need is the database.

Here is how I think it could be done.--------------------------------------------------------------------------------

mysql> create database archwikibackup;
mysqldump -u root -p archwiki > archwiki.sql
mysql -u root -p archwikibackup < archwiki.sql
mysql> use database archwikibackup;
## Now we are going to remove any information that people should NOT have:
## Just substitute the word "Replaced" where passwords are etc....
UPDATE user SET user_email = 'Replaced' WHERE user_email LIKE '%';
UPDATE user SET user_password = 'Replaced' WHERE user_password LIKE '%';
UPDATE user SET user_newpassword = 'Replaced' WHERE user_newpassword LIKE '%';
UPDATE user SET user_options = 'Replaced' WHERE user_options LIKE '%';
UPDATE user SET user_touched = 'Replaced' WHERE user_touched LIKE '%';
UPDATE user SET user_token = 'Replaced' WHERE user_token LIKE '%';
UPDATE user SET user_email_authenticated = 'Replaced' WHERE user_email_authenticated LIKE '%';
UPDATE user SET user_email_tokenl = 'Replaced' WHERE user_email_token LIKE '%';
UPDATE user SET user_email_token_expires = 'Replaced' WHERE user_email_token_expires LIKE '%';
## Dump out the modified database
mysqldump -u root -p archwikibackup > archwikibackup.sql
## gzip the database to reduce file size
## This can reduce the size of the file up to 80%
gzip -9 archwikibackup.sql
NOW..... anyone could have the wikidatabase, and could setup a wiki.
Download the media wiki software from http://www.mediawiki.org/wiki/Download
Download the archlwiki.sql.gz file
gunzip archlwiki.sql.gz
mysql -u root -p YourDataBase < archwikibackup.sql

GZIP the sql data .... and that file should be fairly small.

I would be happy to test this at archlinux.us or archlinux.me if you would want to do this.All the above could of course be scripted into a nightly dump, then anyone that wanted to mirrorthe wiki, could. Perhaps a day behind, but they could. I could mirror the sql file so other coulddownload from my servers as well, if bandwidth was an issue.

Re: Downloadable Arch Wiki

that would be cool, but not sure how much work would be required for that...

Is there a way to encompass the wiki itself tied to a very lightweight browser like dillo? sort of an all-in-one package complete with it's own browser?

Also, (this would be A LOT OF WORK, but VERY cool and EXTREMELY useful) but what about creating an ncurses copy of the wiki that could be accessed by something like lynx or another text browser (or just be ran via cli on its own)? again, I know it would be extensive work to remake the whole wiki in such a way, but it would also give those with a fresh (no-gui) install access to the wiki itself (having a package would make it offline accessible, especially if the person could just grab the pkg.tar.xz and install it with whatever deps would be necessary for similar functionality (at least navigation if not search as well) of the standard wiki in a cli-friendly option... what do you guys think?

Re: Downloadable Arch Wiki

I'd rather have a database dump, done as I stated above. Not much work to do that really, it could be automated, then set as a cron job on the server. This way there could be several other sources for the wiki if the official version went offline for any reason, and none of them would have user password information or any other data that shouldn't be let out..... that's what those UPDATE lines were for in my above post.

I'm not sure any of this is on the top of the wiki maintainers list of things to do though... and that's fine. I just wanted to point out that simple solutions exist that aren't that complicated if they do want to make it available, and that I am more than willing to host the dump if bandwidth is an issue.

If nothing else, I would definitely like to mirror this on archlinux.us/wiki for the times when the official wiki is down.

All of this I've described assumes they have put the wiki in it's own database, which I am assuming they did.

Re: Downloadable Arch Wiki

A complete compressed sql dump is 570MB atm with caches and temp data removed. It's also not as easy as you think. The DB is rather big and your script will not only block the db for some time but also fill up the table space quite soon (hint: creating and dropping a table does not free the space used)

Re: Downloadable Arch Wiki

Hmmm well, I don't see where the database should need to be 570MB compressed, that would be over a gig in size uncompressed I'm guessing..... there must be alot of images in it..... I'd have to do some research, but it should be possible to dump everything but the images/etc/ that are not a primary concern (IE: your method doesn't download those either as far as I can tell) , the size would then go down considerably.

I'm not quite sure what your referring to when you speak of "creating and dropping a table doesn't free the space used" ... not that it matters. In my solution proposed above, I actually suggested creating a "temp" database modifying it, dumping and compressing it. You would dump the original database, upload to the new database, modify, then dump and compress ........ all actions taken on the newly created database, not the original.

I'm guessing the database dump being rather large is IO bound and that's why it's slow... it could be dumped to a ramdisk if you have the ram available to do that.... which would limit the downtime of the database while it dumps.

If you are dumping InnoDB tables (I'm guessing you are) you can use the --single-transaction, and your database isn't locked at all.......

Lock all tables before dumping them. The tables are locked with READLOCAL to allow concurrent inserts in the case of MyISAM tables. Fortransactional tables such as InnoDB and BDB, --single-transaction isa much better option, because it does not need to lock the tables atall.

If your NOT using InnoDB tables... you can use "--lock-tables=false".

The dump wouldn't have to be "perfect" ... IE, someone is in the process of changing something .... worst that would happen is your backup wiki db would have a corrupt entry... pretty unlikely actually though.

My method is "easy" ..... I've done it. Other than the database size, I really see no issues. My 384MB database takes about 4 minutes to run the dump/compress. Again, I'd have to dig into the wiki db and see where they store the uploaded images/etc ... but that table could be excluded making the database MUCH MUCH smaller as well.

I did look at your proposed "solution" ....... that would be much harder to implement than my method. But of course that all depends on your perspective doesn't it

-----------EDIT:

After getting home, I took the time to look through the wiki for "files" ... there are less than 50 files that I could see. The largest of which is only 281KB. This would mean that the wiki database is 99.999% text. Compression of text would be approx 75% which would mean the wiki database dump is almost 2 GB's of text ?????? .

It appears that there are roughly 4,500 wiki entries. If compressed is 570mb... and uncompressed is close to 2 gigs... that would mean the average wiki entry each had 1/2 million characters... which would of course include every revision of the document. I'm not sure how mediawiki stores each entry.

Re: Downloadable Arch Wiki

Re: Downloadable Arch Wiki

wow, I never knew about that.... I actually REALLY like that idea! plan to look through it to see how to make such a file and will look around AUR to see if there's a package already for the reader (and hopefully the tools to make zim files)

well, that was quick... Kiwix (the zim file reader) is already in AUR and zim (the editor) is in community... so now, the question begs: who is willing/able to take up the task of making the wiki offline available via openZIM?

Re: Downloadable Arch Wiki

Installed this with "pacaur" and it worked right off. Thank you Mr Keenerd for the wiki!

I'm torn apart between worlds. Basically, using vim in a highly visual environment with a lot of mouse features feels like soldering a lose wire to a motherboard with a Zippo and a needle, while working with ANY TEXT AT ALL with a "modern GUI" text editor feels like joining the London Philharmonic Orchestra with a Fisher-Price Laugh and Learn Magical Musical Mirror. --Awebb

Re: Downloadable Arch Wiki

Hi. I run Debian and wanted to thank you all, Arch's wiki is a very valuable resource for the whole Linux community, no matter what distro. irtigor proposed converting it to a ZIM file, if anyone is willing to work on this I would be really grateful, as I use kiwix (http://kiwix.org) to read wiki dumps offline (mostly wikipedia, several languages) and having a full offline copy of Arch's Wiki is something i've been looking for for a long time.

Re: Downloadable Arch Wiki

This thread has become rather long in the tooth. I am going to close it as much has changed in the 4+ years since it started. If there is a need for further discussion, please start a new thread.

Thanks.

Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael FaradaySometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing---How to Ask Questions the Smart Way