When it comes to data collection efforts, the spark-for-today arrived with the realization that far too many of the citations collected by Wikiquote are absolute garbage.

It seems that "Quotations are only as good as the people who share them." (... and you can quote me on that one) (lol.)

UN-censorship

To aid in the trash-identification effort, one must also understand that one man's weeds are another man's salad. For that reason alone, moving forward the intent is that quotations be classified - never deleted.

Why keep the bad ones? -Because freedom of speech means that both sage and fool must learn to live in their own droppings. (ahem - that quote is nothing compared to the horrific experience many will have reviewing those Wikiquotes!)

Strategy

Wikipedia presently has over 150,000 wanna-be pop-sayings to triage here, friends... yet wee Piranhas consume impossibly-sized meals one byte at a time ...

So yes, perhaps wee 'Quoties be few & far between... but the inspiration de-jure is to allow our handful to exchange NEW quotes - as well as classification meta-data for the others - on such an impossible-sized collection of willfully corrupted, would-be inspiration.

File Format

Today's shared data file is WikiQuote_Data.zip.

From the documentation therein:

The name of the file indicates the date that the data were collected.

File Format===========(1) Each line contains a single citation ("quote.")

(2) Columns in each line are TAB separated.

(3) There are a least 3 columns per line:

Field #1: Overall Classification --------- Default is "Unknown" Doctor Quote Uses: My_Favorite(5), Very_Good(4), Good(3), Not_Good(2), Deleted(1), Unknown(0) Field #2: Citation Hash Code --------- Used to uniquely identify each citation. The default is a Winzip-style, zero-weighted CRC32. Field #3: Citation --------- The quote, as harvested from Wikipedia.org Note that HTML line breaks and quotes have replaced newlines & unary-quotations Field 4+: Page References(s) --------- The location(s) where each quotation was found on Wikipedia.org Technically unlimited, yet typically containing only one page-reference.

...

Software Assistance

Designed to import, export, collect, classify, and to share the updates between several 'webless 'Quoties across the aforementioned WikiQuote_Data.zip file, our plans are to share a completely new "Doctor Quote" on Github as our time & resources may allow.

Ode To RAD

Rather that C/C++ as used previously, the new version of Doctor Quote is being written in Java. --Mostly a proof of concept, note that future efforts will use C++ 2017, as well as an appropriate GUI toolkit.

For the sake of completeness, we should probably also note that these data were initially collected using Python 3.

p.s.

IF you have read this far, then you might have what it takes to be one of the FEW ... the INSPIRED ... the 'Quoties?

If you think that you can endure the tedium, the deliberately offensive quotes, and the brainstorming involved while helping a group of 'litheads with such a clean-up effort, then feel free to CLICK HERE to group-up with us!