Category: Blog

Assistant Professor Mohammad Aljayyousi, Philadelphia University Amman, Department of English Language and Literature, is a visiting scholar at the CCeH from October 2017 to March 2018. His stay is funded by the DAAD. During his time at Cologne University, Mr. Aljayyousi will contribute to the CCeH’s work in the field of literary studies and widen his own skills in Digital Humanities. In particular, he will work on his own research project “iNovel”. Dr. Aljayyousi is going to present and openly discuss his research at a public lecture on …

Abstract

iCriticism: An Interactive, Innovative and Inter-medial Approach to Literature.

The presentation will introduce a new approach, to the study of literature in the digital age tentatively called, iCriticism. Broadly speaking, iCriticism is a response to the fact that reading now takes place in an ecosystem of devices including both print and digital, and it starts from the belief that the computer is a unique invention which is adaptable to a wide variety of uses. Within literary studies, the computer can be used in a humanistic way to best serve the purposes of the field. Some main principles of the approach that will be elaborated on in the presentation include the following:

Texts are multi-dimensional and heterogeneous and the relation among their various dimensions, codes of significance, or levels is not heuristic.

The algorithmic, dynamic nature of traditional texts.

Rejection of formal logic and the CRUM (Computational-Representational Understanding of Mind) paradigm as the only option.

Material conditions, including textuality, are created in the space between physical and non-physical (human) factors.

Digitizing texts is a process of translation / rewriting that can result in pedagogical tools.

The computer technology can introduce fun and increase the engagement of students through attention to experiential aspects, and the multiple roles that the student can play: user-player-learner-reader-writer.

Last week (14th and 15th of September 2017) a meeting of the XProc 3.0 working group took place in Aachen, organized by Achim Berndzen of xml-project and Gerrit Imsieke of le-tex and hosted by LOGOI.

The meeting was extremely successful, consensus has been reached on many topics and important roadblocks have been overcome. I will tell you about what the WG accomplished in a second. Before that allow me to introduce XProc, XML pipelines and explain why they are useful. (If you already know all this stuff, skip directly to the XProc 3 section, that’s OK. :))

XML pipelines? What are you talking about?

Everybody who has worked with XML knows that real-world applications are always born as simple transformations (“I’ll just convert this XML to HTML with XSLT”) but quickly develop into a big tangled web of unreadable code as soon as you have to deal the inevitable…

small mistakes in the input (“wait, why is there a <p> inside a <em>?”),

flaws in the receiving applications (“let’s have a separate output for Internet Explorer 6, so that the poor students access this from the library”) or

requests from the project collaborators (“could you make a summary version with only one sentence per chapter?”).

Addressing all these needs can be done, but doing it by adding fixes on top of fixes on the original core transformation is a nightmare in terms of maintenance and readability.

Small steps and scripts

A better way to solve all these issues is splitting monolithic transformations into smaller pieces, or steps. (More about how our experience at the CCeH in splitting complicated transformations into focused steps in a future article.)

Now that you have all these steps, how do you transform the input into the output in practice?

Are you going to run each step manually, clicking around in your XML editor? I hope not.

A much better way to run this split transformation is to create a shell script that takes the input file, applies the first step (fix the small mistakes), then the second (transform into HTML) and then, if requested, the third (uglify HTML to make it IE6 compatible).

Such a script would work just fine but it has many problems:

Either you hardcode how to invoke the XSLT processor or you have to write an abstraction layer that allows you to call other XSLT processors.

Requires a working Bash shell environment (not that easy to get on Windows).

Does not provide any kind of validation of the intermediate results.

Requires a deserialization/serialization cycle for each step.

Gets rapidly very complex as soon as other steps, conditional steps and loops are added.

Works only on a single document.

We could address all these problems ourselves making a better script. Or we could avoid reinventing the wheel and make use of XProc and write a declarative XML pipeline.

Enter XML pipelines and XProc

XProc is a language for writing declarative XML pipelines.

An XML pipeline is a series of steps though which an XML documents flow, just as in the shell script in the previous example. However, in contrast with a shell script, XProc pipelines are:

Declarative: you state what you want and the XProc interpreter chooses the right tools. (A PDF transformation? Let’s use Apache FOP. An XSLT Transformation? Let’s use libxslt. Oh, are we running inside oXygen? Let’s use the internal Saxon-EE engine then.)

Portable: pipelines can run wherever there is a XProc interpreter: Linux, Windows, Mac OS, you name it.

Specialized for XML: documents are not deserialized and serialized in each step.

Can have more than one input and produce more than one output.

Easily extend to intricate pipelines with loops and parallel branches.

XProc 3.0

XProc 3.0 is the upcoming version of XProc. The original XProc 1 specifications have been published in 2010 by the W3C and since then users and implementers have found small problems, inconsistencies as well as ergonomic issues that make writing XProc pipelines harder than it should.

The focus of XProc 3 is simplifying the language, making implementations behave in more sensible way by default and making it possible to process non-XML documents (think LaTeX or graphic files).

From 9 to 13 October 2017 the University of Cologne is hosting an Epidoc Autumn school in combination with an expert workshop on digital sigillography. During the first three days the autumn school will introduce the participants to Epidoc, the encoding standard for epigraphic texts and materials. Wednesday afternoon is dedicated to presentations on advanced imaging technologies in the fields of epigraphy, papyrology and sigillography. On Thursday and Friday there will be an expert workshop focusing on digital formats and standards for the description and publication of seals and similar materials.

Abstract: “The traditional model of scholarship has been an exchange of ideas built up over time, using print; but in the second half of the twentieth century this became steadily more difficult, as the volume of academic publications increased, and the cost of printing rose. Cologne was the home of new approaches, particularly in epigraphy and papyrology – the Inschriften griechischer Städte aus Kleinasien series has transformed our understanding of the epigraphy of Asia Minor; and ZPE has stimulated new levels of conversation. In the home city of such innovation, I would like to ask what the 21st century might look like.”

Speech Recognition, Biometrics, Text-To-Speech, Natural Language Understanding and AI are key research areas to redefine the relationship between people and technology. Nuance’s Research team is working on all of these in order to develop a more human conversation with technology. This talk will highlight a few current research topics and trends in the company. Automotive solutions in cars on the road today and others that will come out in the next few years will be used to illustrate how achievements in Natural Language Understanding and AI help to create the next generation of digital assistants.

Ekaterina Kruchinina is a Research Manager in the Natural Language Understanding department at Nuance. Her principal research responsibilities are in NLU, machine learning, corpus annotation and evaluation of NLU systems. Ekaterina joined Nuance as a Senior Research Scientist in 2012. Before joining Nuance, she worked as a research associate at the JulieLab at the Friedrich-Schiller-University of Jena. She received her Phd supervised by Prof. Dr. Udo Hahn (Friedrich-Schiller-University Jena) and Prof. Ted Briscoe (University of Cambridge) with a dissertation titled “Event Extraction from Biomedical Texts Using Trimmed Dependency Graphs” in 2012. Ekaterina developed the relation extraction system JReX that was ranked second during the BioNLP 2009 Shared Task on Event Extraction (at NAACL-HLT 2009). Ekaterina has 12 years of academic and industry experience, leadership in development and application of cutting edge technology in NLP solutions for intelligent human-machine interfaces. Ekaterina speaks Russian, German, English and French. Ekaterina’s work has been featured in the article „Die Computerversteherin. Ein Job an der Schnittstelle von Mensch und Maschine“, c’t 04/2017.

Transkribus (https://transkribus.eu/Transkribus/) is a platform for the automated recognition, transcription and searching of handwritten historical documents. Transkribus is part of the EU-funded Recognition and Enrichment of Archival Documents (READ) (http://read.transkribus.eu/) project. The core mission of the READ project is to make archival material more accessible through the development and dissemination of Handwritten Text Recognition (HTR) and other cutting-edge technologies. The workshop is aimed at institutions, researchers, and students who are interested in the transcription, searching and publishing of historical documents. It will introduce participants to the technology behind the READ project and demonstrate the Transkribus transcription platform. Transkribus can be freely downloaded. Participants should bring their laptops along to the workshop. And have (if possible) Transkribus installed.