a) find better titles before the import or
b) replace '/' by sth like '__OR__' and fix the whole title later?

I tend to b). Other suggestions?

BTW: For the import I will now use WikipediaFS. A great little
filesystem that lets you treat mediawiki articles like real files.
Simply edit with vim, :wq, done. Or for the bulkimport: copy/write
prepared files to the fs.

Sebastian.

fREW

... That WikipediaFS is pretty gnarly. Thanks for the tip ;-) -- -fREW

> Hi
>
> I have prepared a list with problematic page titles. Especially titles
> with chars like [/#{}[]*] and the like are problematic since mediawiki
> doesnt allow them (even if one urlencodes them).
>
> Find the list (95 entries) here:
>
> http://scratchpad.wikia.com/wiki/Errornames
>
> Im not sure howto proceed here. Should we
>
> a) find better titles before the import or
> b) replace '/' by sth like '__OR__' and fix the whole title later?
>
> I tend to b). Other suggestions?
>
> BTW: For the import I will now use WikipediaFS. A great little
> filesystem that lets you treat mediawiki articles like real files.
> Simply edit with vim, :wq, done. Or for the bulkimport: copy/write
> prepared files to the fs.
>
> Sebastian.
>
>

> Hi
>
> I have prepared a list with problematic page titles. Especially titles
> with chars like [/#{}[]*] and the like are problematic since mediawiki
> doesnt allow them

Strange. Are you sure? I just created a page on wikia.com, page
titled '/hello/' and it created it without problems. Maybe &...;
encoding is needed ?
With slashes both in title and in summary and in category. You can see it here->http://scratchpad.wikia.com/wiki//hello/
Page titled '/hello/'
You can try to create such manually and intercept what browser sends in
and the mimic this in the script maybe ? Because with manual submission, wikia
does seem to allow slashes in the title.
If you upload scripts using script, then did you try the &...; encoding ?

Yakov

John Beckett

... Thanks for the good start. FYI there are a couple of lines with broken links: 157 160: 171: Do you know the g/ and g? commands? Above gives: Vim Online

Yes! In option (b), you have to change every '/' to '__OR__', so
you may as well change the titles to something good now.

Can you readily do something like this: Put each tip in a
separate file on your disk. Name them tip0001, tip0002, etc.

Put the list of 1500 tip titles in one file, one title per
line. Then edit that file to clean up the titles. Then run a
script to rename each tip to match the cleaned-up title.

> or
> b) replace '/' by sth like '__OR__' and fix the whole
> title later?

Whatever works, but wouldn't this create a whole bunch of
problems? I don't understand the internals of wikis but I think
your suggestion would create 95 tips with URLs that will later
need to be manually edited. Not so easy, and probably involves
copying the content from the wiki to a new page, then deleting
the old page (I guess).

> BTW: For the import I will now use WikipediaFS.

Wow - amazing.

How do you get the wiki format files from the VimTips web site?
If you're going to do the work, I don't need this answered, but
I'm thinking that you're going to need one of the scripts to
convert the existing html to wiki format.

I noticed that Charles Campbell's script does appropriate things
with common html codes like nonbreaking space. Probably all that
processing should be done before the files are uploaded?

With your scheme, you're going to get 1500 tip files on your
disk. It would be great if you could clean them as much as
possible before uploading. It would be pretty easy to find all
html markups and '&' codes when the files are still on your
disk.

Easy, but time consuming. Let us know if you want some help.

John

Sebastian Menge

... That is fixed now. Was a problem with the script that produced the page. ... That could be done by a regex. ... One idea was that the editing can be done

That is fixed now. Was a problem with the script that produced the page.

> Yes! In option (b), you have to change every '/' to '__OR__', so
> you may as well change the titles to something good now.

That could be done by a regex.

> Can you readily do something like this: Put each tip in a
> separate file on your disk. Name them tip0001, tip0002, etc.
>
> Put the list of 1500 tip titles in one file, one title per
> line. Then edit that file to clean up the titles. Then run a
> script to rename each tip to match the cleaned-up title.

One idea was that the editing can be done on the wiki. Just edit the
Errornames page :-)

> > or
> > b) replace '/' by sth like '__OR__' and fix the whole
> > title later?
>
> Whatever works, but wouldn't this create a whole bunch of
> problems? I don't understand the internals of wikis but I think
> your suggestion would create 95 tips with URLs that will later
> need to be manually edited. Not so easy, and probably involves
> copying the content from the wiki to a new page, then deleting
> the old page (I guess).

Moving/Renaming is easy in wikis.

> > BTW: For the import I will now use WikipediaFS.
>
> Wow - amazing.
>
> How do you get the wiki format files from the VimTips web site?
> If you're going to do the work, I don't need this answered, but
> I'm thinking that you're going to need one of the scripts to
> convert the existing html to wiki format.
>
> I noticed that Charles Campbell's script does appropriate things
> with common html codes like nonbreaking space. Probably all that
> processing should be done before the files are uploaded?
>
> With your scheme, you're going to get 1500 tip files on your
> disk. It would be great if you could clean them as much as
> possible before uploading. It would be pretty easy to find all
> html markups and '&' codes when the files are still on your
> disk.

The cool thing with the WikipediaFS is that i can do scripting on the
pages _as if _ they were on my disk!

My general approach:

First i downloaded all tips to my machine. I startet with the script
vimtips.py and hacked it alot. The script uses a perl-module that
converts html to wiki-markup. But I still have problems with some
regexes: stable regexes for 1500 pages are not easy to do. I have to
clean up the script and will post it here, because I guess there are
some regex gurus around ...

Thanks, Sebastian.

A.J.Mechelynck

... [...] This is where my redirect suggestion comes into play (assuming wiki software compatible to that used at ??.wikipedia.org): First pass: migration

Message 7 of 11
, May 28, 2007

0 Attachment

John Beckett wrote:

> Sebastian Menge wrote:
>> Find the list (95 entries) here:
>> http://scratchpad.wikia.com/wiki/Errornames
>
> Thanks for the good start.
> FYI there are a couple of lines with broken links:
> 157 160: 171: Do you know the "g/" and "g?" commands?
>
> Above gives:
> Vim Online Error
> Couldn't find tip 160. Are you sure it exists?
>
>> Im not sure howto proceed here. Should we
>> a) find better titles before the import
>
> Yes! In option (b), you have to change every '/' to '__OR__', so
> you may as well change the titles to something good now.
>
> Can you readily do something like this: Put each tip in a
> separate file on your disk. Name them tip0001, tip0002, etc.
>
> Put the list of 1500 tip titles in one file, one title per
> line. Then edit that file to clean up the titles. Then run a
> script to rename each tip to match the cleaned-up title.
>
>> or
>> b) replace '/' by sth like '__OR__' and fix the whole
>> title later?
>
> Whatever works, but wouldn't this create a whole bunch of
> problems? I don't understand the internals of wikis but I think
> your suggestion would create 95 tips with URLs that will later
> need to be manually edited. Not so easy, and probably involves
> copying the content from the wiki to a new page, then deleting
> the old page (I guess).

[...]

This is where my redirect suggestion comes into play (assuming wiki software
compatible to that used at ??.wikipedia.org):

First pass: migration proper.
For each tip, create *two* wiki pages:
- one page with the tip text and a "real" title, possibly "doctored" as shown
above
- one page titled "Vim tip 1" "Vim tip 2" etc. (url ending in .../Vim_tip_1
etc.) with only a redirect, as follows:

(url=something/Vim_tip_1)
#REDIRECT [[The super star]]

During this first pass, any link vimtip#3456 gets translated to [[Vim tip
3456]] pointing to the redirect page for the link pointed to. At this time the
"actual" name of the link pointed to doesn't have to be known, and in the case
of forward comments it _won't_ yet be known.

Second pass (after all tips have been migrated and the wiki software has had
the time to cycle and reconstruct its indexes)

For each redirect page: open it with ?noredirect and get the corresponding
"What points here" page, as delivered by the wiki software. Change links
pointing to the redirect into links pointing to the (now known) "actual" page
title. IIUC this can be done by a "robot" (i.e., a script, not a human).

Best regards,
Tony.
--
If you're not very clever you should be conciliatory.
-- Benjamin Disraeli

Sebastian Menge

... I took your advice silently. Your suggestion is already in the script. Sebastian.

Message 8 of 11
, May 28, 2007

0 Attachment

Am Montag, den 28.05.2007, 16:18 +0200 schrieb A.J.Mechelynck:

> This is where my redirect suggestion comes into play (assuming wiki software

I took your advice silently. Your suggestion is already in the script.

Sebastian.

A.J.Mechelynck

... well, sorry I didn t look at the script, but it seemed to solve John s objection. Best regards, Tony. -- Interpreter, n.: One who enables two persons of

well, sorry I didn't look at the script, but it seemed to solve John's objection.

Best regards,
Tony.
--
Interpreter, n.:
One who enables two persons of different languages to
understand each other by repeating to each what it would have been to
the interpreter's advantage for the other to have said.
-- Ambrose Bierce, "The Devil's Dictionary"

John Beckett

... Neat, but please give explicit directions if that s what you want. There s not much point in my editing the titles if you meanwhile are planning to use

Message 10 of 11
, May 29, 2007

0 Attachment

Sebastian Menge wrote:

>> Put the list of 1500 tip titles in one file, one title per
>> line. Then edit that file to clean up the titles. Then run a
>> script to rename each tip to match the cleaned-up title.
>
> One idea was that the editing can be done on the wiki. Just
> edit the Errornames page :-)

Neat, but please give explicit directions if that's what you
want. There's not much point in my editing the titles if you
meanwhile are planning to use some other scheme.

Also, we (actually, you, because it looks like you're doing all
the work:), need to resolve the issue of exactly what is allowed
in a title, and we should agree on some general guidelines.

I think the Wikipedia style of prominently saying something like
"this page should be titled xxx but due to technical
restrictions we can't do that" is too ponderous (although
reasonable in their context).

Maybe we could have something more informal (if scriptable).
For example: tip 249 in your errornames might be:

Title = C - Quickly insert precompiler directives
[I'm not very happy with this wording]
But first line of the tip might say:
C/C++: Quickly insert #if 0 - #endif around block of code

> stable regexes for 1500 pages are not easy to do

I'm glad it's you and not me! It's hardly reasonable to come up
with one script that correctly formats all of the existing
pages. I imagine a fair bit of manual tweaking will be needed.

If you gave me a couple of days over a weekend when it's quiet
here, I might be able to do a fair bit (I sent over 260 typos
to Bram which he incorporated in the 7.1 release, so I can
occassionally cope with tediousness).

OTOH we can all do that after the initial import. I can download
all 1500 tips from wikia, and determine if any still have html
(what will wikia do to html tags??), and fix them then.

John

Sebastian Menge

... Forget that, most problems came from slashes which could not be handled by wikipediafs. I fixed that. Other special chars get replaced by __HASH__ or

Message 11 of 11
, May 29, 2007

0 Attachment

Am Dienstag, den 29.05.2007, 21:03 +1000 schrieb John Beckett:

> Sebastian Menge wrote:
> >> Put the list of 1500 tip titles in one file, one title per
> >> line. Then edit that file to clean up the titles. Then run a
> >> script to rename each tip to match the cleaned-up title.
> >
> > One idea was that the editing can be done on the wiki. Just
> > edit the Errornames page :-)
>
> Neat, but please give explicit directions if that's what you
> want. There's not much point in my editing the titles if you
> meanwhile are planning to use some other scheme.

Forget that, most problems came from slashes which could not be handled
by wikipediafs. I fixed that.

Other special chars get replaced by "__HASH__" or "__BRACKET__" and the
like. Ugly, I know.

> Also, we (actually, you, because it looks like you're doing all
> the work:), need to resolve the issue of exactly what is allowed
> in a title, and we should agree on some general guidelines.
>
> I think the Wikipedia style of prominently saying something like
> "this page should be titled xxx but due to technical
> restrictions we can't do that" is too ponderous (although
> reasonable in their context).
>
> Maybe we could have something more informal (if scriptable).
> For example: tip 249 in your errornames might be:
>
> Title = C - Quickly insert precompiler directives
> [I'm not very happy with this wording]
> But first line of the tip might say:
> C/C++: Quickly insert #if 0 - #endif around block of code
>

I decided for myself that I dont wont to do editorial work on the tips
or comments. So some pages will look ugly and have to be repaired
manually later. But its a wiki: I hope that will evolve naturally.

> > stable regexes for 1500 pages are not easy to do
>
> I'm glad it's you and not me! It's hardly reasonable to come up

As you perhaps guessed its a bit of fun for me :-) I'm learning python,
I deepen my regex understanding etc. Its a nice study :-)

See my next post for details ...

S.

Your message has been successfully submitted and would be delivered to recipients shortly.