Need a good text-to-HTML tool

OK guys,

Pewter asked me today to try and track down his blueberry recipe in the old forum files. I went in and did a 'word in file' search and got about a gazillion results. I'm slogging through them.

But, since Pewter is one of many who have asked me, I *really* need to take these files and get them into a readily searchable format so we *all* can access them. I think putting them back into the forum might be a bit of a pain (unless I can get them all into HTML formatting and do a push directly into the database via myPHPAdmin and put them in a 'old forum' board, a definite idea), but we need them online, and readable.

I've attached a couple examples of how the actual posts are stored for y'all to peruse. If we can come up with a relatively easy way to make these text files read easily, and be insertable into the new Gotmead format, I'll set up a 'old forum posts' section and get them all uploaded.

Anyone got an idea, or want to take a crack at it?

Vicky - whose brain is buzzing with all the cool stuff I'm working on for Gotmead....

Re: Need a good text-to-HTML tool

We'd need the dictionary file or the file layout for the data stored in the records as you posted them. YABB should have some reference for that somewhere. I think I was dinkering around with this for a bit, but then some pressing business came up and blew it in the weeds.

File layout (fixed field width, space, tab, comma would be great) as long as we can get it would be great. Otherwise there may be entries for each record that have variable width fields and/or other delimeters we may not know about without a reference point. I'll dig around a bit to see what I can find.

Re: Need a good text-to-HTML tool

If your going from one forum to another (yabb to SMF), it's feasable to export everything from the yabb database to excel, take a sample (100 records or so) from smf, and format the yabb spreadsheet like the SMF sample, categories and all. You may have to reformat timestamps (or just plug them all with a date pre-SMF) and will have to take special notice of how the posts and replies are related and adjust accordingly. You will probably loose some of the columns in the process due to column incompatibility, or just to much work, but you should be able to dump everything into SMF. It will be some work, but once its done, it will all be there. Of course you'll have to backup the SMF database before trying to merge anything.

Line breaks should transfer fine, and everything should already be escaped in the yabb database. I'm sure some of the HTML formatting in the posts will be borked, but that can be searched and removed (or a script can be created to remove/replace everything that is incompatible from a .csv dump of the database).

An interim solution would be to create a search field button that searches the yabb subject line, message and user, and produces matches and formats the post's (best effort) in HTML, going by the yabb database structure. Since its already there.

But if you just want to go text ot html, I would write a script using a regular expression function to deal with the text formating in the samples you provided, and format from there. I don't know of any tools that do that, but it would be pretty easy to make a script. Using the database is a way better choice though, either the yabb structure that is already there or merging with SMF.

And who knows, if you search the SMF forums, you may find someone has already made a conversion tool.

Re: Need a good text-to-HTML tool

Vicky,

I note that the main text of the message already has html-like tags in it.

It is easy UNIX stuff to convert the text file to another text file for importing into MySQL.

What is needed is

the database fields (date, message, author, ...) and field types that SMF requires (From the SMF messages table, the fields are probably
ID_TOPIC, ID_BOARD, posterTime, subject, posterName, posterEmail, posterIP, body
- and the ID_BOARD will be determined when you create a board for these YABB postings, ID_TOPIC may need to be created as required per YABB topic, posterName/email will be from the SMF Member table)

the SMF table or references to convert YABBs authors to the SMF author IDs (from the SMF members table, the ID_MEMBER & memberName fields)

An example of the 10 fields in the YABB text file are (last field is blank):

Transporting
Becka@a.com
10/04/03 at 23:43:01
Guest
xx
0
148.137.240.200
Howdy. &nbsp;I am thinking about making my very first batch of mead, hopefully as a 2004 Christmas present. &nbsp;The problem I have is that since I live at college, I have to transport everything I own back and forth from home 4 times a year (I come home between every semester.) &nbsp;Obvioulsy I would have to take the mead with me. &nbsp;Its a 2 hour drive that is pretty much smooth for the most part, though there are alot of winedy roads. &nbsp;Will transporting my mead like this while it is fermenting or aging mess it up for me?

We probably just need to convert

Message title (field 1)

Date/Time (field 4)

Message author (field 2)

Message text (field 9)

So extracting these fields is easy-peasy. Converting the author to the equivalent SMF author id (a reference to the SMF table entry for that author) requires a dump of the relevant fields from the members table in SMF, so that a manual (auto?) cross-reference can be done (if indeed we can work out who the YABB person is now referred to in SMF).

THEN it can be "imported" into SMF (maybe have to create some topics to match the YABB topics first).

Re: Need a good text-to-HTML tool

Hi James,

Well, I was thinking that we'd create a 'old forum' board, but I have to admit I hadn't reckoned with duplicate topics. Is there any way to append a -Yabb or -old forum to the topics so that they'll be unique?

I'll send along an email with the link to the dump for the users......

Everyone else: Remember, we had some posts go *blooie*, and disappear completely. So not *all* the old stuff will come back, just what I've been able to track down and save.

Re: Need a good text-to-HTML tool

I'm thinking a top level 'Old Forum' board, with sub forums inside. IIRC, there weren't all that many, I think it wouldn't take long to just create them by hand....how would we do it with a SQL query? I'm game if you think it will work.

yeah, I think we'll just do a 'Yabb-unknown' member for the missing IDs. Path of least resistance. We're interested in the posts more than anything else.

Re: Need a good text-to-HTML tool

Pewter asked me today to try and track down his blueberry recipe in the old forum files. I went in and did a 'word in file' search and got about a gazillion results. I'm slogging through them.

Vicki,

I appreciate your efforts. I am not fluent in the sorts of things the rest of you are discussing so I can't contribute in that way. But since the Gotmead log is the only one I had for the Blueberry Wine, I would gladly manually search each of the files if there is a way I can get access to them in a text format. The same is true for my Low Alcohol Yeast Tests.

If dates would help, the messages were posted very close to the time the board was being hacked. Within a few weeks of that time...

Re: Need a good text-to-HTML tool

With luck, James will have us up and running with what was left very soon. I can't work on it today, I'm neck-deep in page creation for Redstone Meadery's site, but I'll give it time as I can. I'll send you a mess of the .txt files, but dude, they're *very* much a pain to get through.....