I'm getting a runtime error with the program...
The link I provided may not be to the version of the viewer you need.
It's a case of 32-bit vs 64 bit.
Check your operating system type, and locate the viewer on Microsoft's website.
All the Threads in my workload are junk!
That's going to happen occasionally. In fact, that's the point of the project, to identify the junk threads so I can eliminate them.
Could you put this into a Forum instead? Could you upload this to my forum? Can I get a copy for my forum? Can I just get a copy of everything?
That's the long-term goal. To make it so people can download all the good threads, and do whatever they want with them.
But all that handing it our right now would accomplish is handing out a massive amount of unsorted data, leading to people being fustrated and trying to do it all themselves.
I'd also like to be able to turn it all into something like the PFSRD website, or even Netbook or Series of Netbooks.
Original Source authors will be credited.
It can't be that bad. How big is all this?
There are 630,250 or so threads.
(Actually, 544,450 threads, and 86,070 discussion topics from the old DND-L and related mailing lists)
The total file size is 7,677,066,179 bytes. And that's after removing 40K of formatting and meta-code from each of the 2015 threads!
That's enough to fill up 2 DVDs. And it's just raw text. No colors, no graphics, nothing.
By comparison, all the Wizards of the Coast books published for third edtion, without graphics, as just raw text, come outs to 131,735,287 bytes.
So, as of right now, there is more to go through in those threads, then the entirety of text from Wizards of the Coasts publication of the variants of third edition. 58 times as much!
How did you download all that? Did you use a website copier? I tried that, and it didn't work!
I considered using a website program back when I first downloaded the forums (back in 2005), but discovered that downloading a web discussion forum via a website downloaded wasn't practical (it was extremely slow, and downloaded everything, and tend to spider off in weird directions...)
I use WINTracker and ULtrasucker to download websites i want otherwise.
No, for downloading the forums, I used my college education in programming. It was actually very simple.
The short version is; I Went into each category (i.e Character Optimization), and copied the link for the first and last page. I dropped them into Excel, and use them to generate the links to the rest of the pages.
I then downloaded those pages, and merged them into one big file via DOS/Command Prompt. I took that file into Microsoft Word, found the thread links, ran a find-replace command to mark them and seperate them, ran another to put each code command in the webpages to their own line, and saved
I then pulled the file into Access, and deleted anything without the mark I had put in. End result is a link to the first and last page of each discussion topic in that category.
I copied the thread links for anything with 2 pages or less into a download program, generated the links I needed for threads/topics with more then 2 pages, and dropped that into a download program.
I repeated for each category, and when I had all the links loaded, told the download program to run while I went to bed. By morning, I had a complete copy of the discussion form.
Where did you learn how to do that?
College. I took a computer programming course. The languages covered were: Visual Basic, Visual C, C, C#, C++, Javascript, HTML, Cobol, and SQL. The version of Visual Basic I learned was VB6, which is now called Visual Basic for Applications.
I don't claim to remember the C variants, Cobol, or Javascript, but I have refreshed my knowledge of them on occasion, and it came back quickly. Then I didn't use it for two years and promptly forgot. (Shrugs).
the VB, HTML and SQL I still use on a regular basis. If I don't know how to do something in them, I google until I find it.
Where did you get those figures for Wizards of the Coasts books? It sounds made up....
I either scanned the book into my computer, and turned the images to text (with Omnipage), or copy and pasted them from digitial copies from the internet.
And before you ask, NO, you may not have a copy.
(At least until such a time occurs where wizards of the coast gives me permission to put that online. Don't hold your breath on that...)What are you doing with those scanned WOTC Books?
Without going into the background (Which I placed at the bottom), I made my own database with all that information in it, including a copy of it all set up as webpages. (You search in the database, and it loads the results on screen, as well as opening a copy in your web-browser.) It sees's use during my group game sessions. Instead of having to look things up in the books and trying to find them, it's literally drop in what you want to find, and ta-da, there it is.
Again, you may not have a copy at this time.So, let me get this straight: You downloaded the wizards of the coast forums (several times), without duplicates, and want to let everyone have a copy, but it's so big, you want help triming it out first?
Yes, that's it, exactly.
So, why MS-Access? Why not another program?
Several reasons
First, I am trained in MS-Access. I am familiar and comfortable with Access. I am not familiar with any other database programs to the point of setting something like this up.
In theory, I could set it up in Microsoft Excel. Unfortunately, that would present it's down set of issues.
I suppose I could make a Visual Basic front end for it, but I'd have to learn how to do that.
What about just putting it all online and letting people grab what they want
I considered that for a bit. But, there are a few problems with that.
The first being money. This is not being hosted on my home computer. I'm paying to have a server set up. And while web-hosting is not that expensive (My Transformers Combiner Wars Devestator set cost more then a year of hosting, and yes, he works wonderfully as a Colossal sized Golem or Titan....), a blanket archieve just doesn't seem right.
What about setting it up as a searchable Web-Discussion form? Maybe within a voting system with the same results as the database? Or an online database where people don't have to download?
I thought about that. IN fact, the reason this wasn't started back in November is a friend of mine with way more skill at internet/server based applications was working on setting that up. (Unfortunately, real life intruded again....)
An online database set up would be ideal. I lack the skill and talent to set it up in a timely or effective manner. (If someone thinks that can do that, please, let me know and I'll happily give you the specifications you require)
As for a webforum, that would actually require more oversight then I have the time/resources for.
As near as I can determine, I'd have to either dump threads online in groups, let them sit for a few weeks, and then rotate them, or dump them all, and make it so that after a week, if a thread has been voted on, it gets locked (I know of no software like that for a forum) to avoid duplicating work.
That would require alot of work, and probably people helping with the maintaining thereof.
You also stand the risk of a troll coming along, and figuring out how to automatically mark everything 'no content'.
The database is set up so that people look over threads at random, upload results as they do things, and once a week, I merge them all together and we update.
It's actually less work intensive for all involved.
And the viewer program is free.
What about an online database with a front end program?
Same problem: NO idea how to do that. Anyone that knows how to, let me know you want to help, and I'll send you the specs.
Is there any other way I can help besides the database program?
Depends on what you want to do.
If you are on any forms that might be interested in this, please, post about it! Discusson forms, mailing lists, your own website, Facebook groups, or even asking the publisher of a web-comic to put it up as a news item, go for it!
If people start doing that, I'll keep a list of where it's been done to avoid people bothering them again.
If you mean to review them without using the database; Not at this time.
Like I said, I could, in theory, set something up in Microsoft Excel. The problem I see with that is I'd have to put every single thread link into the spread sheet, and it would be huge, and slow, and people could use that list to just download everything and leave the rest of us hanging out to dry.
In theory, I could set something up where you don't need to download the viewer program, but I'd essentially be making a new viewer program, so that's kind of redundent.
Anything else, I lack the programming knowledge to do. If someone else wants to set something up, I'm willing to give them the basic information. Heck, I've provided it below at the buttom of this!.
NOw, there is a chance I'll need to periodically review and verify someone's work/choices, and you could help then.
Unfortunately, I'll be using a database for that.....
Can I host a mirror or copy of this project?
Depends on what you mean be 'mirror or copy'.
If you mean put a link to the database somewhere, feel encouraged. If you want a copy of the database on your website for people to download, just put the link in, it will save you file space.
If you mean a copy of all the threads, the answer is "no" at this time. Simply put, I'd have to reprogram the database to account for each and every mirror.
I have unlimited bandwidth on the server, as well as an offline backup, so I'm not that concerned about that. One server should do. If it turns out that's not the case, I'll look into the idea of mirrors later on.
That biggest risk is that the server crashes and takes out all the results files. That's why I built the 'Re-upload all my results buttons'. If the server crashes, and I lose all the results (I plan to make offline copies, but you never know), then all that would be needed to get those back up to date would be for people to just resend.
I don't like the idea of the buttons doing all the file transfers. Can I just email you my results?
I thought about that. And that would be very easy to set up. I thought people would be happier with the simpler 'press button, program does it for me.'
If enough people ask for an email option, I'll modify the database so that it would create the file, and you'd be able to send it to an email as an attachment, and download a zip of the results instead of the button doing it.
I made a better way to do this...
Really? Cool. Let me check it out, and if it's a better approach, I'll use it and give you full credit for it.
.. and I want money for it!
Not happening.
Is there anyway to work on this offline?
I thought about it. But this project is huge. You'd either have to download the entire project (and then, quite frankly, you could just delete the program and keep the threads for yourself), or would have to download whatever it is you are assigned, work them, delete them, and then download more.
If enough people ask, I'm willing to implement a 'download my workload so I can work offline' feature.
I'm looking for a specific thread, can I get a copy of it from you?/While Looking over my work, I saw a thread I want a copy of. Can I get it?
I'm willing to do that for people that are helping. Let me know your handle via email. If you've sent results, I'll send you the thread ink thank-you.
If I get enough requests like that, I'll add a the ability for you to mark a thread for download to your computer. It would download after you send your latest results.
This doesn't look like the Wizards of the Coast board
For the most part, no, it doesn't.
I made a macro that went in, and tossed out 99.9% of the code that was in each thread. The size reduction level was absolutely insane.
THe first 40,000 characters of any thread file was meta data, links to other parts of the WOTC website, or other things of that nature. There was more at the end. That's not needed, so it was tossed.
The tables the posts were in had formatting not needed for this project, so it was clearned up.
The result is similiar to how some websites have a 'printer friendly view' for threads.
Hey, I posted threads on the WOTC forums, and I don't want them involved in your project
Can you prove you are the original poster of those threads?
At this point, probbably note, as the source of those threads, and therefore you ability to prove it, is gone.
Everything that was posted on the WOTC forums was under the D20 system liscence if mechanical, or the OGL if not. Everyone and anyone is free to republish them if they want.
That being said, as the original post information is being perserved, including author and date/time (as both this stage, and at what I'm hoping will become the final results stage down the line), that also falls under fairuse.
I'm going to have to decline your request.
Don't worry, anything personal that is found in a thread will go under 'no content' anyway, and I didn't download user specs or information.
Hey, I used to post on the WOTC forums. Is there any way for me to work just my threads? Or threads I posted in
Actually, yes there is.
Let me know what your handle was. I'll make a special copy of the database for you. The login will already be populated with your handle from the WOTC forums. You'll be assigned either all threads you created, or all threads you posted in, or a combination of the two.
Hey, my graphic is part of your project!
Actually no it's not. It just looks that way.
When I cleaned up the thread files, the code removal stopped where the posts began, and resumed where the posts ended.
As a result, any embedded links were left intact.
So, let's just say a post had a link or thumbnail of a graphic in it, that was preserved, and still works.
This does not appear to be the case with Avatar Thumbnails, just pictures put inside a post.
The first time I saw that looking over results, I considered modifying the macro I was using to clean things up to remove that, but decided against it.
I didn't want to risk removing something that was important to the post, or an error in the image link causing the macro to screw up a thread file.
You stoll my idea!
Really? I'm extremely surprised by that. I've looked around for a similiar project online, and I haven't found any in English. I've found lifeboats/copies of some of the threads from before the boards closed, but no large scale identification/classification/sorting/hosting projects.
I've also found nothing for threads that had been purged pre-migration back from Gleemax.
Now, if you're saying you were going to do this, but it's not online yet, great minds think alike.
If you decide to put your project online (or it already is), I'd be more then willing to share results and data with you. If it's not online yet, you're more then welcome to help with this one.
As work is done, can you make threads that were marked for content/humor/ideas/discussion avialable for download.
The answer is: Yes I can, and I plan to. In fact, you'll note I already put stuff online for people as proof of intent.
I just haven't put any of the original threads up for download (outside of working them) yet.
How long until you make them avialalbe?
I'm writing this on March 18, 2016. As far as I am concerned, the project went public March 17, 2016 when I first advertised it's extences on GITP, Stardestroyer.net to a few facebook groups, and emailed the admins at EnWorld about it.
That being said. Anyone with a full copy of MS-Access would be able to figure out the links to every thread already. (They just won't know which thread is which, because the titles/names are not in the database.) I just haven't put in an index file for what's been reviewed yet, or made a zip of the same.
I'm hoping to make that Index file for reviwed threads by April 15, 2016, and to update it every few weeks as work is done. At that point, anyone will be able to figure out the thread links easy enough, but again, will have no idea which thread is which unless they've been reviewed.
Arn't you worried about someone figuring out the links and downloaded them all? Why not anti-mass-download security
Not really.
All the threads are stored with file names of their source thread number. To find out what a thread is about, you have to read it. This is because that's how most forums store the threads, by number. So they were downloaded and saved as numbered files. I saw no need to change that.
(That would just be unneeded work).
I, of course, have a list of all the thread numbers and their thread titles, and will be using that for making the index files that I plan to put up for reviewed threads that have content/humor/ideas or are a discussion.
Basic Specs for the project.
This is literally one of the most straight forward programs I've ever created.
There's a User info table. It stores the users handle, and what was the last results file they downloaded.
There's a Form-Category table. Forum, Year, Category, and a identification number.
The main table is the Links table. It has the identification number for the forum, and ID for each thread, the thread file name, and a field for the content choice, a field for if the thread is assigned, and a field for when the results were uploaded to the server/when it was updated from the server.
That's it.
This is literally a massive, massive bookmark file, with a few extra columns for work efforts.
As a network project, like an SQL Server database, the tables would literally be
LinkID, forumID (or directory path), ThreadFile, Content, Markedby
Background
Where to begin....
I've been playing D&D; since 1987 or so. Playing with a friend when I went to his house (he lived out of town after previously living down the street from me), and in high school I meet friends and we started playing together.
Occasionally, little arguements would come up over what version of a spell or class or monster was the right one, but nothing serious.
Then, the internet came along. Three of the members of my highschool group had internet access fairly early (having been involved with the BBS scence, the format that predated the internet), and began grapping anything D&D; related off the internet we could find.
This expanded to included the D&D; Core Rules program, and all the user created files for it.
College came around, the high school group drifted a part (we all went to different schools) and I took my programming classes. For one of my classes, we had to make a database in Microsoft Access that demonstrated various abilities and functions.
So, I made one to look up D&D; spells, and scanned in a bunch of books for it.
I started tossing more spells into it after the program got the highest mark in the class.
When Third edition came around, and my girlfriend (later wife) and I decided to switch, I decided to use that database to start converting spells. That was a mixed success.
However, I got the idea to expand it to include everything from third edition. Put it in, with the publication date. That way if something was printed more then once, I'd know which version was the current version.
As I got third party D20 system books, they went in as well. I also had the idea to toss in websites and discussion forms and netbooks. (I honestly didn't expect tossing the books in to take that long...)
Unfortuantely, there were several technical difficulties with that. No digital copies exited that had text in them, and alot of the digital copies I managed to find were of poor quality, so the books had to be scanned in manually, as I could afford to get them.
OCR software (that is what turns scanned text into digital text you can copy, paste and edit) was still in it's infancy, so data entry was slow.
Add in the sudden death of two different computers (power surge for one, cat hair clogging up the power supply and heat fans melting the interior of the other), as well as the destruction of an external hard drive (knocked off the shelf it was one by the same cat), it didn't really take off the ground.
Fortunately, all the source data back ups survived
Now, that project started to move along with the advent of websites like DriveThruRPG (usually good quality scans with copiable text), a better scanner and scanner software, thumb-drives, and a relaxation of security polices at work.
You see, do to a network upgrade at work, (hardware, software) what used to take me all week to do, I could now do in an hour. It was that big of a difference. So, my supervisor at the time said. "You know what, I know how hard, how fast, and how well you work, and how important your work is to the organization. Get it done, let me approve it, and then look busy for the rest of the week).
I later found out that they'd considered 'outsourcing' my job, and anywhere they looked, everyone wanted three times what I made in a week to do it, and would take longer.
Anyway, so, at that point, I restarted the database from stratch. Starting with the Third edition SRD, I started entering everything into the database. Without trying to make it a generator I might add.
I'd decided to get all the information in, THEN start trying to make a character generator out of it. That way, if I couldn't get a generator working, I at least had a one-stop reference library/help file.
I had 80% or so of everything published by Wizards of the Coast (and Paizo) up to date and entered about 2 weeks before it was announced that Wizards was pulling Paizos liscence for Dragon and Dungeon magazines. I finished up Wizards of the Coast's entry, kept it up to date, and started getting third party books ready for entry.
(Specifically, the non-liscenced stuff from Mongoose Publishing). Shortly thereafter, I downloaded the Wizards of the Coast forums again. I also downloaded ENWorld before the crash/hack that happened a few years ago. (at least the homebrew/conversion section of it).
Had my job continued, I'd probably have most of the third party publishers entered by now, and be well through the forums. Unfortunately, my job was surplused, along with alot of other people. My new job doesn't has WAY strickter IT and security policies, and while I'm one of the more productive workers, no thumb drive shall pass, and incoming emails with attachments are subject to review.
My wife and I read over Fourth Edition when it came out, and decided not to switch. One of the big selling points to us on third edition was the backward compatabilitiy of it. We had no trouble converting characters from pre-3e to 3e. We saw no way to do that to fourth edition without losing alot of the character.
Simply put, a bad fit for us. I remember being very venomous about that.
Instead, we switched to Pathfinder after it came out. Fully backward compatiable with a little thought.
At that point, I stopped putting 3.X/D20 System stuff into the database (after finishing the publisher I was working on), and started putting Pathfinder stuff in. Occasionally, I'd download a webforum as I went along.
Now, I haven't always been full bore on the database, or the Forums. Occasionally, with any project like that you go 'okay, I'm not working on this for a while (for whatever reason).