Web content analysis often has two sequential and separate steps: Web Classification to identify the target Web pages and Web Information Extraction to extract the metadata contained in the target Web pages. This decoupled strategy is highly ineffective since the errors in Web classification will be propagated to Web information extraction and eventually accumulate to a high level. In this paper we study the mutual dependencies between these two steps and propose to combine them by using a model of Conditional Random Fields (CRFs). This model can be used to simultaneously recognize the target Web pages and extract the corresponding metadata. Systematic experiments in our project OfCourse for online course search show that this model significantly improves the F1 value for both of the two steps. We believe that our model can be easily generalized to many Web applications.

The revised "TEI Text Encoding in Libraries: Guidelines for Best Encoding Practices," currently in draft form, contain updated versions of the widely adopted encoding 'levels' - from fully automated conversion to content analysis and scholarly encoding. They also contain a substantially revised section on the TEI Header, designed to support interoperability between text collections and the use of complementary metadata schemas such as MARC and METS. The new Guidelines also reflect an organizational shift. Originally authored by the DLF-sponsored TEI Task Force, the current revision work is a partnership between members of the Task Force and the TEI Libraries SIG. As a result of this partnership, responsibility for the Guidelines will migrate to the SIG, allowing closer work with the TEI Consortium as a whole and a stronger basis for advocating for the needs of libraries in future TEI releases.

This new report summarizes the findings of research conducted by OCLC on what constitutes quality in library online catalogs from both end users’ and librarians’ points of view.

Key findings:

The end user’s experience of the delivery of wanted items is as important, if not more important, than his or her discovery experience.

End users rely on and expect enhanced content including summaries/abstracts and tables of contents.

An advanced search option (supporting fielded searching) and facets help end users refine searches, navigate, browse and manage large result sets.

Important differences exist between the catalog data quality priorities of end users and those who work in libraries.

Librarians and library staff, like end users, approach catalogs and catalog data purposefully. End users generally want to find and obtain needed information; librarians and library staff generally have work responsibilities to carry out. The work roles of librarians and staff influence their data quality preferences.

Librarians’ choice of data quality enhancements reflects their understanding of the importance of accurate, structured data in the catalog.

The codes listed below have been recently approved for use in MARC 21 records. The codes will be added to MARC Code Lists for Relators, Sources, Description Conventions.

The codes should not be used in exchange records until after June 16, 2009. This 60-day waiting period is required to provide MARC 21 implementers time to include newly-defined codes in any validation tables they may apply to the MARC fields where the codes are used.

Classification Code Sources

The following code is for use in subfield $2 in field 084 in Bibliographic and Community Information records (Other Classification Number), in subfield $2 in field 084 in Classification records (Classification Scheme and Edition) and in subfield $2 in field 065 in Authority records (Other Classification Number).

Addition:

rilm

RILM classification system [use only after June 16, 2009]

Term, Name, Title Sources

The following codes are for use in subfield $2 in appropriate 6XX fields (Subject Added Entries/Index Terms) in Bibliographic and Community Information records; subfield $2 in fields 700-754 (Index Terms) in Classification records; subfield $2 in fields 700-788 (Heading Linking Entries) in Authority records; and subfield $f in field 040 (Cataloging Source) in Authority records.

OCLC is making a WoldCat Local available for free to First Search customers. Currently it is an OPAC replacement, however, plans are to add circulation, ERM, and other ILS functions. One effect of this would be to have the circulation records for dozens or hundreds or thousands of libraries in a central location. What could be done with that data? Who would own it?

Libraries that subscribe to FirstSearch WorldCat will get the WorldCat Local "quick start" service as part of their subscription at no additional charge. WorldCat Local "quick start" offers libraries a locally branded catalog interface and simple search box that presents localized search results for print and electronic content along with the ability to search the entire WorldCat database and other resources via the Web.

Seventeen new class numbers were added to the main schedules and four new class numbers were added to the 19th Century schedule. One class number was canceled. Eighty-one MeSH terms were added to the index, including thirty-four new to the MeSH vocabulary as of 2009; in addition, seventy-one schedule records and five hundred and eighty index entries were updated since the 2008 edition was published on April 24, 2008.