I've recently became interested in writing converters for YaBB again. However, Matt's converter wasn't ready. I talked to him, and he said it was ok for me to take over the Universal Converter project and release my own.

I've mostly followed his spec, only changing a few things (mainly adding to it).

So, I'm going to post the updated spec attached to this post, and the converter in a later post.

Make sure to use this version of the converter!** Beta 4: Attached to this post.

Looks good. Can you upload the HOW-TO for the 'from' convertor? I'm nearly there with my Webboard8 one and it would be very nice to see what I need to change to make it work with this, and whether there are any bits that need some more work

NB: not really looked too close (yet) - how does it handle higher volumes, where setup.pl does reloads to deal with > 4000 members/messages? I've worked out an easy way for database-driven forums to deal with this aspect.

Ok, I'll answer the easy question first. The problem with having too much data doesn't really happen here. The user of the converter is allowed to set a maximum amount of time for each step to run. If it exceeds that time, the converter jumps out of the conversion loop and dumps anything that needs to be saved to a Perl file. When the user clicks "next", it's loaded into memory again. It doesn't auto-reload, but it should be perfect at saving and resuming, as long as it's within the same time period.

Now, for the howto. Your existing code will need to be separated and transfered into 3 different types of routines for this to work. I'll explain the basic format of the indriver here, as well as explain the Init routine.

This might seem really complicated. It probably is, compared to most of the programming I do. But it's fairly easy to understand if you understand the following points:- There's 3 main parts of the file: The setup part, which determines what parts of the conversion you can do and what code is executed when; the initial/finish routines, which are generally simple and similar to other routines of the same type; and the conversion routines.- The save/resume work in progress is very generic. For saving, you just need a list of what variables to store and what they're named. For saving, you just call the resume routine and it does the rest.- Use localized variables and filehandles all the time. This works wonders to prevent unexpected results of variables that weren't intended to be global.

I recommend taking my Eblah input driver (or if you want a fictional board format, I can provide that input driver) to start with and using the basic framework, especially the Init/Finish routines. If you know what you are doing, you can write a second converter in less than 10 hours, as I did with the Eblah one.

The %CONFIG hash does nothing at the moment, but it's good to have in there for when it might be used.

Now, the @CONVERT array is important. It specifies what items the converter can convert. So far, there's four that are supported by YaBB's output driver: Boards, Categories, Messages, and Members. The converter uses the @CONVERT arrays of the input and output driver to find out what to convert.

Each item in the @CONVERT array is matched up to a subroutine using the %SUBROUTINES, %INITSUBROUTINES, and %FINISHSUBROUTINES hashes. You can call the subroutines that match up anything you want.

We'll start with the easy subroutines, which are the init/finish ones.

The input driver's init routine for each step has a very important step: telling how many steps to run the loop for. It also sets up package globals needed throughout the conversion process, like matching what board goes to what category, or what topics are sticky.

The first interesting thing is the $fh variable. It's used exactly the same as a filehandle. I consistently use $fh when I need an anonymous filehandle.

Second, I use "die". Die outputs HTML, but you don't need to worry about the templating area, which is handled in the converter. Same goes for "warn", which is also considered a fatal error.

I open up the board index file and count up the boards present. Since there's one board per line, I capture the whole list in the global @boardlist. I use it in forced scalar context to count up the maxsteps.

Something important is that maxsteps is actually the number of steps, plus one! It works this way due to how I made the for loop. It usually works out perfectly since you often count the lines in the file.

A change from the Beta 1.0 and Beta 1.1 versions is that you should return the number of steps. All you have to do is return it instead of store it in $maxsteps.

That's it. You just need to make sure you're saving work in the corresponding Finish routine. You'll notice that sometimes I don't save/resume in the input driver. That's due to the fact that I only need to open 1 or 2 files, and there's no real problem with counting it again.

I'm running out of characters in this post, so I'll continue typing in my next post.

Here's where the universal part comes into play. The convert routine translates data from the board-specific format to an in-memory version that is passed directly to the output routine. (Whatever you return in this routine is sent as the arguments to the next routine).

For the conversion routine, you get one argument: The current loop step. This is all you really should need, as you've already build the list of items to convert in the Init step.

Second, Eblah doesn't have data about the board in board-specific files. So all I need is what was loaded in memory earlier. I have to figure out the category, but other then that, a simple split and assignment to a hash is all that's needed. I return the %board hash, which is sent to the corresponding YaBB routine.

Easy enough, right? Well, it is for the boards, categories, and members. Don't even think of converting messages or PMs yet; they require a multi-dimensional hash. Ugly, but it works fairly easily. Just follow the spec to find out what all you need to get from the input files.

The Finish routine

The finish routine is optional for the input driver. All you'll need to do is send data to the SaveWorkInProgress routine if the argument says to.

It's fairly straightforward, I think. The SaveWorkInProgress subroutine takes an argument list of the name of the variable (without the symbol in front), and a reference to it (\@threadlist for the @threadlist array, for instance). You can send any data format this way, as long as it doesn't include references (so no hashes of hashes). If you need to send references, I can figure it out, but only if you need it.

Cleanup

Last but not least, there's a cleanup entry. If the variable $FINALROUTINE is set, the subroutine named $FINALROUTINE will be executed. The indriver doesn't need this much here, but it's available if you want it. No data is passed, but you can always put a "Thanks for converting" note in there using $HTMLOUTPUT (see below).

Summary of the Variables / Subroutines that have specific meaning

I'll review some of the variables and subroutines here, as well as introduce a few new ones.

Variable name: $maxstepsPurpose: This is a global set by the Init routine and it tells how many steps there are needed to convert a particular item (boards, categories, etc).

Variable name: $HTMLOUTPUTPurpose: A global variable. The content of this added to the main output if it is set.

Variable name: @CONVERTPurpose: A global variable that is compared to the corresponding array in the outdriver. We'll only try to convert items that appear in both, even if one driver can convert more.

Subroutine name: &SaveWorkInProgressPurpose: It's a global subroutine, defined in the converter.pl file. It takes a list of the variable name (without the symbol) followed by the variable's reference, and stores it to a file.

Subroutine name: &ResumeWorkInProgressPurpose: A global subroutine defined like the one above. It takes no arguments and automatically figures out what workinprogress file to load.

Yes - can see what you're doing now. Eblah looks safely the easiest to pick as the baseline.

Any thoughts on :

Where the

Code

use DBI ;

is going to be best placed for the db driven ones?

I've had to add an extra screen in on my code, to allow for confirming that PERL can see and access the input database, since your kind of #stuck# otherwise, and then (this is Webboard for you confirm which (board) to do the convert on.

How about having a sub to take in any 'input' date format, convert it to a standard that can then be processed by the 'output' to whatever format needed? Or would that be more work than directly converting ?

I'll see if I can add options to the screen right before it converts, where you input the maximum time and see what can be converted. Expect this in a week or so -- school's being evil to me now).

The "use DBI;" should be placed right by the "use warnings", I think. It'll error out before any conversion is attempted this way.

As for the generic time conversion routine, there's a slight problem. The converter doesn't know what type of time to expect. You can do like I did with the Eblah one if you have to, which is adding a string-to-time routine. I'd perfer the input and output drivers to be fairly self-contained. You'll need to best figure out a way to convert into the Unix time() format.

The "use DBI;" should be placed right by the "use warnings", I think. It'll error out before any conversion is attempted this way.

that's me misunderstanding the way 'use' works! I'd assumed it would add load to the code - like 'require'.

Quote:

As for the generic time conversion routine, there's a slight problem. The converter doesn't know what type of time to expect. You can do like I did with the Eblah one if you have to, which is adding a string-to-time routine.

sorry - that was badly described. What I meant was, I've written a sub:

(ignore the $yytrace - that's my debugging string!)which takes the time from the input , in whatever format its done - sql datetime in the above case - with the $conv_from variable switching the exact algorithm in. This returns the date/time in 'yabb compliant' form here, but could just as easily switch it to a standard that can be fed to a second sub to switch it out to the output format. Similar to the format_timestring sub in setup.pl.