Someday, everyone in the quiz bowl community will use Linux or OS X, and all questions will be written in an XML format with XSLT stylesheets for print and web output, and we will be living in a perfect world. Seriously, it'd be awesome. Someone needs to come up with QBML.

BuzzerZen wrote:Someday, everyone in the quiz bowl community will use Linux or OS X, and all questions will be written in an XML format with XSLT stylesheets for print and web output, and we will be living in a perfect world. Seriously, it'd be awesome. Someone needs to come up with QBML.

BuzzerZen wrote:Someday, everyone in the quiz bowl community will use Linux or OS X, and all questions will be written in an XML format with XSLT stylesheets for print and web output, and we will be living in a perfect world. Seriously, it'd be awesome. Someone needs to come up with QBML.

BuzzerZen wrote:Someday, everyone in the quiz bowl community will use Linux or OS X, and all questions will be written in an XML format with XSLT stylesheets for print and web output, and we will be living in a perfect world. Seriously, it'd be awesome. Someone needs to come up with QBML.

From there it's an easy XML to LaTeX parse. This will happen as soon as everyone realizes that everything looks better in LaTeX.

Someone needs to just do this. I started on something similar once like 3 years ago but then I got busy with a project or some crap and I never finished it.

I'm a little confused. How would this be useful unless people inputed text in this format? By "get working on this" do you mean "get working on some Word document parser that puts it into this format"?

Well, the idea would be to require people to write questions in the hypothetical QBML. Probably easier for non-packet-submission events. What would need to be "worked on" is a specification and a schema, followed by XSLT stylesheets for turning the QBML documents into plaintext, PDF via LaTeX, XHTML, etc. Nobody needs to parse Word if people use regular text editors and UTF-8. See what standards are good for? We should get a wiki to develop this on. Or use QBWiki or something.

The sad reality of the situation is that people can hardly be bothered to format Word documents correctly. They aren't going to write anything in XML unless someone is holding a gun to their head. I'm interested in something like what Mike is talking about, a parser that will turn a Word document into something pretty; more importantly, something that can be imported directly into a database (trivial to do with a valid XML file).

A searchable database of questions would certainly be interesting. We could start to have inane arguments over a subject's "canon index", i.e. how many times results for it are found in the database. Oh and I'm sure it would also be a great tool for people (:chip:) looking to plagarize questions from the archive.

And I'd have to agree that you would never, ever be able to get people to format their questions in XML format. But converting from richly formatted text to being able to parse out questions and answers isn't *that* difficult. I've done it on a minor scale for some editing tools I use, but it would have to be a lot more flexible for those people who love to avoid formatting conventions in their questions.

Steroid McBlooddoper wrote:But converting from richly formatted text to being able to parse out questions and answers isn't *that* difficult. I've done it on a minor scale for some editing tools I use, but it would have to be a lot more flexible for those people who love to avoid formatting conventions in their questions.

In principle, it's pretty easy if everyone sticks to the same general format (sections labeled "Tossups" and "Bonuses", questions clearly delineated and answers marked, etc.). In the past, people would send things in with all sorts of crazy crap formatting (I'm looking at you, Ray "monospace fonts" Luo), weird indents, carriage returns all over the place, and bonus parts marked with whatever enumeration method was at hand. I'm working on the new ACF formatting guide so that people will know what's ok and what's not, and hopefully we can get everyone to stick to the ONE TRUE FORMAT. It's gotten better overall recently, but it's still short of where I think we should be.

BuzzerZen wrote:If there's a big enough push, and the markup is simple enough, it could happen.

Dude, no. Take this from someone who currently does scientific programming for a living: in the real world, you're lucky if you can get small a team of actual computer programmers to all use a standard markup language. Trying to place the onus on users is a principle of design doomed to fail.The facts that it's just easier to type questions into a WYSIWYG editor (it is easier, by the way); that the people writing questions are not by and large scientific researchers or computer scientists or anyone else likely to be familiar/competent with XML; and that, from the user's perspective, there's really no compelling reason not to just use Word (or whatever) like they do for everything else more or less ensures that this idea isn't going to work.On the other hand, if someone could write an interface to allow Word (say) to output files in this markup so that writers can just keep doing things transparently, you'd be in business. I don't work with interfaces much myself, but would that be very difficult?

ImmaculateDeception wrote:Dude, no. Take this from someone who currently does scientific programming for a living: in the real world, you're lucky if you can get small a team of actual computer programmers to all use a standard markup language. Trying to place the onus on users is a principle of design doomed to fail.

This is the truth. I'm in the middle of debugging some code written by another physicist some time ago, and it's no picnic because everything is done in some idiosyncratic way not consistent with any standards of good programming style.

On the other hand, if someone could write an interface to allow Word (say) to output files in this markup so that writers can just keep doing things transparently, you'd be in business. I don't work with interfaces much myself, but would that be very difficult?

Word already outputs valid HTML and XML files. Unfortunately, it tends to tag every separate fragment with its own style rather than being smart about it, but PHP for example has functions which will strip all HTML/XML except for that which you specify from a string, so in that sense it's not too bad.

A few years back I had the idea of creating a program that would randomly pull a question out of a database of old questions for my own use for studying. I was going to use the Standard Packet Archive as my source of questions, but I quickly found that formats change from tournament to tournament and sometimes from packet to packet within a tournament. This makes any text to xml converter relevent only for a given tournament (if you are lucky) and if you are really lucky tournaments from a given school.

For a while I debated doing my master's thesis on automatically tagging quizbowl packets. This is currently a somewhat active area of research and I saw a presentation about some work being down in this area that allowed academic papers to be tagged so that there wasn't just a scanned pdf, but automatically created meta-data for the pdf that could be searched.

I agree that it would be hard to get people to switch over to writing questions in some standardized QBML, but in a way it would actually make things easier on the writer. The questions could be any format (no standard font, tabbing, underlining, etc) as long as the tags are correct. It also wouldn't be that hard to make a QBML editor where the user could pick what type of question they were writing and the empty tags would just show up for them to fill in with options for power marks, alternate answers, unacceptable answers, etc.

If someone wants to make the editor, it might be easier; if not, it's really not easier to have to additionally type a bunch of XML tags, even assuming you get them all right on the first try. Sorry, dude.

ArloLyle wrote:I agree that it would be hard to get people to switch over to writing questions in some standardized QBML, but in a way it would actually make things easier on the writer. The questions could be any format (no standard font, tabbing, underlining, etc) as long as the tags are correct. It also wouldn't be that hard to make a QBML editor where the user could pick what type of question they were writing and the empty tags would just show up for them to fill in with options for power marks, alternate answers, unacceptable answers, etc.

It's much easier to make people format their packets consistently in Word and then have a script that converts to your favorite markup.

Well, if people can be made to format things correctly in Word, you could probably get them to do something similar in Notepad, with underscores instead of underlining, etc. It makes for a much easier parse. In fact, we did this for the 2007 JIAT, with no XML intermediary; just went from text files to LaTeX.

BuzzerZen wrote:Well, if people can be made to format things correctly in Word, you could probably get them to do something similar in Notepad, with underscores instead of underlining, etc. It makes for a much easier parse. In fact, we did this for the 2007 JIAT, with no XML intermediary; just went from text files to LaTeX.

Sorry, this isn't going to happen. Notepad lacks spell checking and the ability to both underline and place italics. It also makes things look really ugly when you have underscores and some other marks to represent bolding and italics all over the place, and you run into the problem of having no real standard on knowing how long your questions are. Yes, you could maybe get people to use, say, Notepad++, but this would require downloading a separate program and by then you might as well just get people to download your own editor that can do this for them.

I still think that it wouldn't be unreasonably hard to create a very flexible editor that would convert all but the most egregiously formatted documents into a quizbowl XML format. It would obviously take some work, especially on bonus parts, but it would be a much better solution than making the users do it.

I'm sorry, y'all, but this is kind of ridiculous. ACF Fall threatened to charge people if they didn't obey formatting guidelines last year and still most teams did not obey the formatting guidelines. And it's not like that's anything but par for the course. Having packets in a uniform format would be nice, but it would basically have to be done after the fact, since not even tournament editors (for one reason or another) can be totally relied on to standardize the formatting within their tournaments.

Kit Cloudkicker wrote:I'm sorry, y'all, but this is kind of ridiculous. ACF Fall threatened to charge people if they didn't obey formatting guidelines last year and still most teams did not obey the formatting guidelines. And it's not like that's anything but par for the course. Having packets in a uniform format would be nice, but it would basically have to be done after the fact, since not even tournament editors (for one reason or another) can be totally relied on to standardize the formatting within their tournaments.

Well I envisoined this as something more to create a searchable database of tossup and bonus answers from the Stanford Archive (and to even do cool things that would require some manual input like give me all the science tossups at this tournament). It would still be useful in editing a tournament given that it's pain free, but that's probably lower on the priority list.

By the way, maybe a mod should move at least some of the posts in this thread to College Discussion. Somehow AHAN Jr. has produced a serious thread.

Kit Cloudkicker wrote:I'm sorry, y'all, but this is kind of ridiculous. ACF Fall threatened to charge people if they didn't obey formatting guidelines last year and still most teams did not obey the formatting guidelines. And it's not like that's anything but par for the course. Having packets in a uniform format would be nice, but it would basically have to be done after the fact, since not even tournament editors (for one reason or another) can be totally relied on to standardize the formatting within their tournaments.

The thing is, though, if you could make this language work, it would automatically make the formatting uniform for the editor regardless of how the writer chooses to format things; that's the point of it. It's a good idea, but it's not one that would take if we're just saying "Hey, writers, start using a plaintext editor and double your typing by adding all these XML tags and be sure to get them all right." On the other hand, it could work very well if we can come up with an editor or output method for Word or whatever so that it's at least as easy to use as what people currently do. I'd be interested in seeing that happen and helping to work on that if anyone has a good idea how.

BuzzerZen wrote:Well, if people can be made to format things correctly in Word, you could probably get them to do something similar in Notepad, with underscores instead of underlining, etc. It makes for a much easier parse. In fact, we did this for the 2007 JIAT, with no XML intermediary; just went from text files to LaTeX.

Sorry, this isn't going to happen. Notepad lacks spell checking and the ability to both underline and place italics. It also makes things look really ugly when you have underscores and some other marks to represent bolding and italics all over the place, and you run into the problem of having no real standard on knowing how long your questions are. Yes, you could maybe get people to use, say, Notepad++, but this would require downloading a separate program and by then you might as well just get people to download your own editor that can do this for them.

I still think that it wouldn't be unreasonably hard to create a very flexible editor that would convert all but the most egregiously formatted documents into a quizbowl XML format. It would obviously take some work, especially on bonus parts, but it would be a much better solution than making the users do it.

As far as I know, it did happen with JIAT 2007 (in January). Evan and I made a LaTeX class and a script to convert specially formatted plain-text (with underscores for underlining and asterisks for italics) to LaTeX source, and that is what was used for this year's JIAT.

You know, I just perpetually fail to understand the issue of formatting and why people care so much about it. I could care less what parts of a document you choose to underline or bold or indent or whatever. If you actually write a good packet, I mean really actually write a solid packet - I'll accept it Elephant Moon Rebus format and instead of penalizing you, I'll give you a gold star and piss myself in joy.

In other news, it would be nice to have a fully searchable database of packets. But, surely there are simpler ways of going about that, no? All this talk about random question generators and identifying tags and whatever else all this is just seems like pointless chatter to me.

Well, there are those of us who would rather spend hours programming tools to automate things than simply do them over and over. And there are others of us who are nerds for beautifully-formatted documents and classy typography. When they intersect, you get this discussion.

Ryan Westbrook wrote:In other news, it would be nice to have a fully searchable database of packets. But, surely there are simpler ways of going about that, no?

You might be surprised...Anyway, I don't really give a damn about "beautiful formatting" or whatever. I think implementing something like this is a good idea because, if it worked right, it would make editors' and writers' jobs easier, allowing them to spend much less time worrying about formatting and more time doing what's important (i.e. making good questions.) The fact that it would be easier to make a searchable database, etc. with packets in this language is also an important benefit.

ImmaculateDeception wrote:If someone wants to make the editor, it might be easier; if not, it's really not easier to have to additionally type a bunch of XML tags, even assuming you get them all right on the first try. Sorry, dude.

I don't know, maybe I'm thinking about this from my point of view instead of that of other people who would be resistent to this type of thing. At this point, when I'm creating some document that needs to be formatted a certain way I'd rather use latex. I'm sure I'm in a very small minority.

grapesmoker wrote:It's much easier to make people format their packets consistently in Word and then have a script that converts to your favorite markup.

Agreed.

I suppose the point I was trying to make was that that anytime you go away from plain text (without that many whitespace restrictions for that matter) the possibility for inconsistant formatting increases.

ArloLyle wrote:I don't know, maybe I'm thinking about this from my point of view instead of that of other people who would be resistent to this type of thing. At this point, when I'm creating some document that needs to be formatted a certain way I'd rather use latex. I'm sure I'm in a very small minority.

It's really not a matter of, like, the uninformed being unreasonably "resistant." The fact is that it requires a lot less typing to set a properly formatted packet in a word processor than it does to do the same writing plain text for TeX (with whatever macros) or it would to do the same in a markup language. Therefore, even if people were equally proficient in all those methods, the word processor is the reasonable choice if your objective is to get a properly formatted packet as quickly as possible (and, given how late most packets are, it had damn well better be.) You can prefer LaTeX all day long; it's still a white elephant for this purpose.Again, the point is that, if we want to try and change something, we'd better be able to make what we want to do either work with what people are already doing or be at least as easy as that. Otherwise, why would anyone use it?

OK, thanks to the power of the Internet, there's a QBML wiki now. Anyone else interested in fleshing out this idea, feel free to go muck around on it. While ACF-style is, as Jerry said, trivial, in the high school realm there's other formats that it would be nice to be able to support, mainly PACE.

One stumbling block I forgot to mention is people's use of special characters, mainly those generated by Word when it completes "smart quotes" or some such thing. I heartily encourage all Word users to turn these features off.

One stumbling block I forgot to mention is people's use of special characters, mainly those generated by Word when it completes "smart quotes" or some such thing. I heartily encourage all Word users to turn these features off.

In Word 97-2003, it's under Tools > AutoCorrect Options. In Word 2007 it's under Options > Proofing > AutoCorrect Options. Either place, you can turn off unhelpful features like "smart quotes" and "capitalize things that you think are the beginnings of sentences even when they're not" and "make numbered lists when I don't want them" and so forth. Good stuff.

I've been using this for TRASH since I got sick of Word, around 1997. (I was sick of Word a long time before that, but I got reeeally sick of Word when I had to assemble twenty packets out of individual questions emailed to me. I'm sure I'm not alone in that pain.) It's been rewritten about three times since then. It's pretty mature code now.

The tools take text in what I call the Simple Format (documentation forthcoming, or see below) and convert them into an internal representation. From there I convert to LaTeX or HTML, or an XML dialect I wrote as an exercise.

The whole thing's written in Perl -- it was the language in which I was most fluent in 1997 -- but it's pretty well-behaved Perl, all things considered. CS types will be disappointed at the ad-hoc nature of the parser, but it's pretty clean otherwise.

I rejected XML as a question format because it is human-readable (and more importantly, human-writable) only with difficulty. Instead, I came up with Simple, based on the following principles:

1. Plain text format. Easy to sling around in mail, IRC, etc.2. Easy to read and write. It's about as hard to write an email as it is to write a well-formatted question. Similarly, not only can you sling them around in email, you can print them out with no formatting whatever and read them normally. (I figured this would be a big benefit for someone prnting questions in a hurry, without access to the conversion tools, due to some day-of-tournament crisis.)

Here are some sample questions in the format, from 2005 TRASH Regionals:

TU: Released in 2003, in this game you play a nameless New Jersey-ite who receives help from store owner Stacy Peralta, pro Chad Muska, and childhood friend Eric Sparrow. A jealous Eric cheats, framing you for crashing a tank in Moscow. It is the first game in the series to make use of extensive cut-scenes, including one in which you jump off of a building and perform a McTwist over a helicopter. For ten points, name this title in which you progress from a local amateur to a sponsored professional skateboarder.ANS: _Tony Hawk Underground_ (prompt on "THUG")

==Oh, yeah, you do comments like this

=: Multi-line comments work like this. Handy for commenting out whole questions at a time.:=

TU: Books published with his name following his death include four titlesin the Covert One series, "co-written" by both Gayle Lynds and PatrickLarkin. Born in 1927, his first calling was in theater, where he operatedthe Playhouse on the Mall in Paramus, New Jersey. His first book, *The Scarlatti Inheritance*, was released in 1971, while *The Osterman Weekend* was released one year later. Several of his books have been made into films, including *Osterman*, *The Apocalypse Watch* and a pair of movies starring Matt Damon as a secret agent with amnesia. FTP name this author responsible for *The Bourne Identity* and its sequels.ANS: Robert _Ludlum_

B: Nothing on earth is more important than a clean toilet. Identify thefollowing toilet cleaners FTPE.PART: This cleaner with a bird-like application nozzle comes in several varieties including Liquid Rimblock and Ultra Gel and is manufactured by SC Johnson, A Family Company.ANS: _Toilet Duck_PART: Advertisements for this line of bathroom cleaners have featured animated versions of the product attacking toilet filth to the accompaniment of Ride of the Valkyries.ANS: _Scrubbing Bubbles_PART: Chappelle's Show aired a spoof commercial for Potty Fresh, a toilet cleaner featuring this rapper famed for such songs as "Rockafella" and "Smash Sumthin" riding a jet ski in a woman's toilet.ANS: _Redman_ (also accept Reggie _Noble_)

B: When you think of country, you think of New Jersey. Name thesemusicians on a 15-5 basis.PART: [15] Born while his father was temporarily working in New Jersey, he had a country #1 with his debut single in 1989.PART: [5] In addition to 1989's "A Better Man", other #1's include 1999's"When I Said I Do", a duet with his wife Lisa Hartman.ANS: Clint _Black_PART: [15] Born in Princeton, New Jersey, she traveled the WashingtonD.C. coffeehouse circuit before releasing her first album, *Hometown Girl*.PART: [5] This "Down at the Twist and Shout" and "Shut Up and Kiss Me" singer attended not Princeton, but Brown University.ANS: Mary Chapin _Carpenter_

Using those questions as examples, you can pretty much write any question you care to. Incidentally, they translate to the following XML (truncated):

(The XML doesn't parse the underscores and asterisks, but in LaTeX and HTML they come out as the appropriate formatting for underlines/bold and italics, respectively.)

The only non-obvious thing here is the cycle tag. Conceptually in the code, each question is composed of one or more cycles, each of which contains one or more parts and exactly one answer. I was able to do pretty much anything I wanted to with question formatting using that abstraction.

A few of the things I think could be improved:

1. Normalize the answers more. You can infer what the necessary part is from the underscores but a more formal notion of a moderators note would be nice.

2. Metadata. I wrote a basic question randomizer for one of the tournaments, and it did a wretched job -- it turns out that "sports question, sports question" turns up occasionally in a random shuffle, but no one wants that at a tournament. I was thinking of using encoded comments to store subject, estimate of difficulty level, and perhaps other information. This could be used to make a smarter shuffle that knew to separate questions of differing subjects. It could also be used to assemble packets based on difficulty level, author, date written, etc. The main reason I didn't do this was that I didn't want to complicate the format.

3. Tool support. I wrote a (very, very) basic web-based GUI for submitting questions. It has met TRASH's needs admirably, but I've seen things lying by the side of the road that were prettier. I suspect a port to Java or Python would make thick clients easier.

Edit: 4. Add power marks.

The code is GPL (v2). Feel free to download and do with it what you will, subject to the terms of the license. Contact me directly if you have any questions (pgroce@gmail.com) -- I don't read this board very often.

Last edited by pgroce on Thu Jul 19, 2007 1:18 am, edited 1 time in total.

In Word 97-2003, it's under Tools > AutoCorrect Options. In Word 2007 it's under Options > Proofing > AutoCorrect Options. Either place, you can turn off unhelpful features like "smart quotes" and "capitalize things that you think are the beginnings of sentences even when they're not" and "make numbered lists when I don't want them" and so forth. Good stuff.

Hate, hate, hate smart quotes.

The code I just posted converts Smart Quotes to regular quotes. It also parses em-dashes and various punctuation and non-English characters into whatever's appropriate to the output format. (HTML entities, LaTeX commands, etc.)