Best Way to Add Posts, Topics and Users from Old Forum?

From 1998-2003, I ran a forum using the Network54 website. These forums were horrendous. There wasn’t any user database… people just wrote their posts then filled in their name and an optional email address. (It’s amazing I didn’t get any spam during that five year period.)

My goal is to get all these posts into bbPress.

I was unable to spider the forum. Network54’s server kicked in and blocked all attempts using Spiderzilla.

Instead this is what I’ve done: There are 53 pages of posts. Some of them are threaded, some are not. On each page, there’s a “View All Messages” link that will show you all of that’s page’s posts and threads on one single HTML page. Each thread is separated with an HR tag.

I downloaded each of these 53 “View All” pages. I just spent 20 minutes cleaning up the code from Page 1, and I’m left with a fairly lean page. Here’s what I got for page 1:

I will continue doing this for all 53 pages. Then I would like to separate out all the topics and posts, and put it into an “Archive” forum in my bbPress install. I’m trying to determine the best way of doing this…

For one, a lot of the users are not registered on the current forum. Some are, but used slightly different names. Some people never entered their optional email address at the Network54 forum, so I don’t have a way to put everybody into a user account. I could match up a bunch of them, but not all of them.

You want a better spider tool that can limit how many threads it uses, obeys robots.txt and uses a standard user agent.

Good luck getting all that into bbpress though. First you’ll need to write a custom parser that can standardize the posts and topics into a clean format. Some was working on a bbPress xml importer but if you get the data organized enough you could parse it directly into mysql. Not a trivial project by any means but I guess you realize that .

Hopefully you know PHP or some other language that can help you parse all the data?