User:Phydeaux

I'd like to make a template for quartetters, so names will be automatically linked, and we could have pages auto-populated for the singers. So if I look up a quartet page with a bass I particularly like, I can click on their name to get to a page that shows all the quartets that bass sings in.

I'm not sure how to build the template yet, but I parsed out the singers as best I could from the wiki project's download function of all "Quartets" categorized pages. If anyone is interested in the python script or the table below in xml format, let me know.

Progress

Code

Current Table Notes

This isn't scrubbed, other than the couple checks I did scrolling through it. I am improving the algorithm as I go. I hope to have it pretty clean by the time the champion list pages are up to date, then I can just hit the button and generate all the remaining quartet, singer, and disambiguation pages.

There may be duplicate or inaccurate entries:

Titles are not distinguishable from names ("Jr." and "Sr." end up as new names if they had a comma in front of them) - This has been fixed for Jr. and Sr.

Maiden names are not distinguishable from longer sentence fragments (so some sentences are ending up as names, if the sentence starts with a part name like "Lead" or "Tenor")

Obvious misspellings are not automatically corrected (A "Johnston" and a "Johnson" singing for the same quartet but listed on different pages show up as two unique singers)

Common spelling variations cause singers to be counted more than once ("Steve" and "Steven" on two different pages cause two singers to be recorded)

There may be missing entries (for example, if a quartet page was not categorized as "Quartets", it's information would not show up here)

If no quartet singers were listed in the champion list page, the championship is not attributed to the quartet

There are duplicate quartet names out there, so I didn't want to attach the championship to the quartet name, instead to the individual singers

Many of the championship lists do not have singers, my goal is to fill those out the best I can before creating the quartet and singer stub pages

After going through several of the lists and being simply unable to find TLBB lists, I think I will have to allow quartets to get championships, not just singers

This will require some sort of QA, I'm not sure how that will work yet. Maybe flag a quartet that wins championships in more than one district/region, or keep a list of the quartets that won in the most competitions

Next Steps

I plan to parse the champions lists and use TLBB names from there and this table to make stub pages for each champion singer, with the years and competitions they've won, and the quartets they've sang with. I added "List of Champions" category to all the champions list pages, so I will download and parse them next and add them to the table below.

Fill in more of the TLBB lists for district champions

Should make the script to indicate level of completion for championship lists:

Total number of years listed (indicate number without TLBB, and number without Chapter)

Total span of years (indicate number of missing years)

Parse directors' names and start tracking choruses

Include directors' awards in lists of most-awarded singers

Use fuzzy searches to identify when two people are likely the same person

Produce list of most likely candidates to be merged

Decide whether or not to parse and include quartets who placed 2nd and lower (TLBB names are just about non-existent for these, and formatting is not as consistent for these in the contest pages)