5
Global Origins of WorldCat Materials Content Languages: 478 49% of WC non-English Top 5 non-English: German:12 million French:6.1 million Spanish:3.5 million Dutch:2.6 million Japanese:2.4 million Content Languages: 478 49% of WC non-English Top 5 non-English: German:12 million French:6.1 million Spanish:3.5 million Dutch:2.6 million Japanese:2.4 million Materials w/non-US origins: 57.9 million (55%) Top 5: Germany:10.0 million UK:8.8 million France:4.2 million Netherlands:2.9 million Canada:2.9 million Materials w/non-US origins: 57.9 million (55%) Top 5: Germany:10.0 million UK:8.8 million France:4.2 million Netherlands:2.9 million Canada:2.9 million Non-English Metadata Language: 28 million (66 languages) Top 5: German:11 million French: 1.8 million Dutch:5.0 million Finnish: 0.7 million Swedish:1.9 million Non-English Metadata Language: 28 million (66 languages) Top 5: German:11 million French: 1.8 million Dutch:5.0 million Finnish: 0.7 million Swedish:1.9 million

10
OCLC WorldMap TM : Objectives Geographically represent WorldCat data Titles published in each country Holdings for titles published in each country Languages represented for titles published in each country

12
OCLC WorldMap TM : Objectives Research prototype Support OCLC data mining research Visually display data for review and analysis Internal use Sales and marketing External use Library collection assessment and comparison Data may be processed AT A GLANCE Complement the AAU/ARL Global Resources Network project Project of the Council on Library and Information Resources (CLIR)

36
Publisher Name Server: Research Objectives Resolve for data mining and quality of WorldCat ISBN prefixes to publisher name Variant publisher names to a preferred form Complement Collection Analysis Service Librarians Publishers Capture and profile attributes of individual publishers Location(s) Language(s) of materials published Genre(s)/format(s) Dominant subject domain(s) Parent company and subsidiaries

37
Publisher Name Server: Methodology Programmatically cluster publishers records using ISBN prefixes Data clustering (The Free Dictionary) "The science of extracting useful information from large data sets or databases" Classification of similar objects into different groups Partitioning of a data set into subsets (clusters) Data in each subset (ideally) share some common trait Hand parse the entities and resolve ISBN prefixes