Jekyll2019-09-14T15:33:12+00:00jeffreycwitt.com/feed.xmlJeffrey C. WittThis is my professional site. Find out about my on going work here.
Forscher und Institutionen via IIIF verbinden2018-10-15T00:00:00+00:002018-10-15T00:00:00+00:00jeffreycwitt.com/2018/10/15/leipzig-iiif-scta<p>Danke ihnen. Ich bin sehr froh hier zu sein und ich freue mich Ihnen ein bisschen erzählen zu dürfen, warum IIIF wichtig für Forscher der mittelalterlichen Geistesgeschichte ist und warum IIIF ein Mittel sein kann, die Zusammenarbeit zwischen Forschern und Kulturerbe-Institutionen effizienter zu gestalten.</p>
<p>Ich leite ein digitales Archiv, zusammengesetzt aus Text-Daten, die ein mittelalterliches scholastisches Korpus repräsentieren. Dieses Archiv heißt die Scholastic Commentaries and Texts Archive (oder kurz SCTA) und ist Teil von einem Projekt, Text-Daten verfügbar zu machen, die in den mittelalterlichen Handschriften versteckt sind. Und wir versuchen das auf eine wissenschaftliche Art und Weise zu tun.</p>
<p>Dieser wissenschaftliche Anspruch erzeugt den Wunsch nach Vollständigkeit und Transparenz. Eine wissenschaftliche Edition in der digitalen Welt hat das Potenzial, jede editorische Entscheidung transparent zu machen. Ein Forscher muss nicht mehr nur auf die Interpretation eines Editors vertrauen, sondern kann in die Lage versetzt werden, den Kontext jeder editorischen Entscheidung wiederherzustellen und diese Entscheidung nachzuvollziehen.</p>
<p>Obwohl dies sehr attraktive Möglichkeiten sind, bleiben sie nur Theorie ohne die Kooperation der weltweiten Forschungsgemeinde.</p>
<p>Dazu müssen wir die Frage stellen: welche Anreize haben die Institutionen, solche Kooperationen einzugehen? Sie geben wahrscheinlich zu, dass es sehr toll sein würde, wenn Forscher diese Art von Editionen machen würden. Aber eine Selbstverpflichtung zu einer weltweiten Kooperation wie überhaupt jeder Wechsel zu einem neuen Ansatz kann teuer und schwierig sein, also müssen wir auch die Vorteile klar machen.</p>
<p>Meine These ist, dass, wenn wir uns selbst genügend organisieren und wenn wir die richtige Technologie haben, d.h. wenn wir Daten nach allgemeinen Standards öffentlich machen, bekommen die Kulturerbe-Institutionen gleich viel oder sogar mehr zurück als sie investiert haben.</p>
<p>Im Folgenden versuche ich diese Möglichkeit mit einem ausführlichen Beispiel zu illustrieren.</p>
<p>Einer der zentralen Texte der SCTA stammt aus dem zwölften Jahrhundert. Es ist eine Sammlung von „Sentenzen“ von einem gewissen Petrus Lombardus. Dieser Text wurde die ganzen folgenden Jahrhunderte hindurch von mittelalterlichen Autoren genutzt und kommentiert.</p>
<p>Im Rahmen eines Versuchs, ein möglichst vollständiges Bild dieser Kommentartradition zu erhalten, versuchen wir, alle Zeugen von diesem Sentenzen-Text zusammenzubringen. Wir versuchen nicht nur Verweise zu machen, sondern die Zeugen verfügbar zu machen und Möglichkeiten zu bieten, sie auf einer Plattform direkt miteinander zu vergleichen.</p>
<p>Das Problem ist natürlich, dass keine einzelne Institution alle diese Zeugen besitzt. Im Gegenteil: sie liegen auf der ganzen Welt verstreut. Bei dieser Ausgangslage hat keine Institution Lust, Anreiz oder Geld, eine komplette Sammlung anzustreben. Die Forscher hingegen hätten natürlich Interesse daran, aber sie haben weder die Ressourcen, all die Zeugen zu sammeln, noch eben die Zeit, all diese Handschriften einzusehen und zu studieren. Meistens werden daher starke Kompromisse eingegangen. Einige besondere Handschriften werden ausgewählt und der Rest der Überlieferung wird übergangen. Obwohl solche Auswahlen nachvollziehbar sind, wird der Traum von Vollständigkeit doch geopfert und viele kleine, aber wichtige Handschriften bleiben vergessen und werden nicht in die Kommentartradition integriert. Und, weil sie mit dem Rest der Überlieferung nicht verbunden werden, sind sie schwierig zu studieren und entsprechend kaum zu würdigen. So stehen sie isoliert und abgetrennt von der Tradition, in der sie eigentlich von Bedeutung wären.</p>
<p>Ein Beispiel gibt es hier in Leipzig: ein kleines Fragment eines winzigen Teils von Petrus Lombardus’ Sentenzen. Unter dem üblichen Druck von Geld und Zeit würde dieses Fragment vergessen werden. Aber im Zusammenhang der ganzen Überlieferung und als Vergleichsgröße für diese wäre es trotzdem wichtig. Denn es ist ein einzigartiger Zeuge mit Teilen einer Marginalglosse, der in der Tat ein Unikat sein dürfte. Kurz gesagt: beim gerade beschriebenen, herkömmlichen Zugang vernachlässigen wir sie nicht etwa, weil sie nicht wichtig wäre, sondern weil die Zugangshürde zu hoch ist, um den Aufwand zu rechtfertigen.</p>
<p>Mit IIIF sieht die Lage anders aus. Ein einzelnes Bild von diesem Zeugen, zur Verfügung gestellt vom Fragmentarium-Projekt in der Schweiz, wird für mich als Forscher unmittelbar nutzbar. Auf diese Art und Weise können wir alle mit unseren verschiedenen Interessen gewinnen. Denn Leipzig hat natürlich ein Interesse an all den Handschriften, die in Leipzig sind, Fragmentarium hat ein allgemeines Interesse an Fragmenten weltweit, und die SCTA hat ein Interesse an all den Handschriften, die Lombardus Text enthalten.</p>
<p>Hierzu kann ich ein Beispiel zeigen.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/iiif-ldn-leipzig-example.png" alt="Folie 1" /></p>
<p>Hier können Sie sehen, dass ich nach allen Handschriften gefragt habe, die Lombardus Text enthalten, aber die auf der ganzen Welt verstreut liegen, doch mit der Hilfe von IIIF habe ich alle diese Zeugen an einem Ort vereint. Stellen Sie sich einen Forscher vor, der kein Interesse an Fragmenten hat und keine Ahnung hatte von der Sammlung in Leipzig. Plötzlich, durch sein Interesse an Lombardus und an bereits bekannten Handschriften entdeckt er eine neue interessante Handschrift und hat unmittelbar Zugang dazu. Ohne IIIF und die Kooperation von Kulturerbe-Institutionen würde diese Entdeckung unmöglich bleiben.</p>
<p>Es geschieht etwas, wenn man einen neuen Gegenstand innerhalb eines Beziehungsnetzes, das schon Bedeutung hat, entdecken kann. Plötzlich generiert auch er ein Interesse, das er in einem anderen Zusammenhang nicht haben würde.</p>
<p>Und wenn wir die Werkzeuge gleich zur Hand haben, um etwas mit diesem Gegenstand anfangen zu können, ist es wahrscheinlicher, dass wir wirklich damit arbeiten.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/iiif-teiwebeditor-leipzig.png" alt="Folie 2 - Bild von TEI-WEB-EDITOR" /></p>
<p>In meinem Fall, wie Sie hier sehen können, habe ich einen einfachen Text-Editor kreiert, mit dem man schon existierende Transkriptionen benutzen kann, um eine neue Transkription zu erstellen, die alle Varianten in diesem Fragment festhalten kann.</p>
<p>Und mit dieser neuen ergänzenden Information können wir dieselbe Information als Annotationen benutzen und teilen.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/iiif-leipzig-comparison.png" alt="Folie 3 - Bild von Mirador Table of Contents" /></p>
<p>Hier kann man sehen, dass das Inhaltsverzeichnis einer Edition eine Navigationshilfe für Handschriften werden kann.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/iiif-ldn-leipzig-transcriptions.png" alt="Folie 4 Bild von Mirador Transkription" /></p>
<p>Und der Text einer Edition kann ein Hilfstext werden, der es leichter macht, die Handschrift zu erforschen.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/iiif-leipzig-text-search.png" alt="Folie 5 Bild von Mirador Search" /></p>
<p>Hier kann man auch sehen, dass der Text die Basis für einen Suchdienst werden kann, mit dem man in der Handschrift navigieren kann.</p>
<p>Aber diese Informationen sind nicht begrenzt innerhalb irgendeiner bestimmten Website oder eines Interfaces. Sie sind frei und verfügbar zur Verwendung und Wiederverwendung.</p>
<p>Beispielsweise können wir statt in einer Bild-zentrierten Applikation wie Mirador dieselben Daten ein zweites Mal in einem Text-zentrierten Interface anzeigen, bei dem die Bilder nunmehr als Annotationen erscheinen.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/iiif-lbp-leipzig-1.png" alt="Folie 6 LombardPress" /></p>
<p>Hier können wir verschiedene Versionen des Textes sehen und die Bilder als Evidenz für die editorischen Entscheidungen konsultieren. Hier können Sie sehen, dass ich den Text vom Leipzig Fragment zeige.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/iiif-lbp-leipzig-2.png" alt="Folie 7 LombardPress" /></p>
<p>Und es ist genauso leicht das Leipzig Fragment zu zeigen wie eine ganze andere Handschrift, die, zum Beispiel, in Baltimore ist.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/iiif-lbp-leipzig-3.png" alt="Folie 8 LombardPress" />
<img src="{ site.assets_url }}iiif-collation-leipzig-1.png" alt="Folie 9 LombardPress" /></p>
<p>Und mit dem Text von diesen Handschriften können wir leicht Text vergleichen.</p>
<p><img src="{ site.assets_url }}iiif-adfontes-leipzig-1.png" alt="Folie 10 Ad fontes" /></p>
<p>Dazu kann ich in einer ganz anderen App sein und diese Daten abermals in einer neuen Form antreffen. Diese App wurde entworfen, um Zitate zu studieren. Und wenn ich nach einem spezifischen Zitat suche, finde ich nicht nur den Text, sondern Zugang zu dem Text in jeder Handschrift und auch den jeweiligen Bildern jeder Handschrift.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/iiif-adfontes-leipzig-2.png" alt="Folie 11 Ad fontes" />
<img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/iiif-adfontes-leipzig-3.png" alt="Folie 12 Ad fontes" /></p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/custom-manifest-leipzig1.png" alt="Folie 13 Mirador, quotation, marginal note, manifest" /></p>
<p>Und wieder kann ich dieselbe Zitat-Information, die hier aus der SCTA stammt, und die IIIF-Canvas Information, die aus verschiedenen Kulturerbe-Institutionen stammt, benutzen, und damit eine neue Art “IIIF Manifest” erschaffen; ein “Manifest”, das alle “Canvases” zeigt, die ein spezifisches Zitat enthalten.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/custom-manifest-leipzig2.png" alt="Folie 14" />
<img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/custom-manifest-leipzig-3.png" alt="Folie 15" /></p>
<p>Oder ein “Manifest” dass alle “Canvases” zeigt, die eine Randnotiz enthalten. Ich glaube, es ist nicht schwierig sich vorzustellen, wie nützlich ein solches Manifest sein kann. Wenn eine Forscherin oder ein Forscher Interesse an der Geschichte von Fußnoten oder Zitations-Praktiken hat, würden sie ein solches Manifest sehr wertvoll finden.</p>
<p>Aber lassen Sie uns am Ende zur ursprünglichen Frage zurückkehren. Es ist sehr nett, dass die Mitarbeitenden dieser Institutionen diese Beispiele in IIIF ermöglichen. Aber was bekommen diese Institutionen zurück? Jenseits der Nutzung von ihren Bildern durch das Internet hindurch ist es auch für diese Institutionen möglich, Daten zurück zu bekommen, welche andere Forscher weltweit inzwischen erzeugt haben.</p>
<p>Im Prozess der Erarbeitung einer kritischen Edition, generieren Forscher oft tausende kleine Datensätze, die von hoher Relevanz für die verstreut liegenden Handschriften sind. Es sind viel zu viele Daten, um sie in ein Buch aufzunehmen, aber wenn die Informationen von den Grenzen eines statischen Buchs befreit sind, können sie zahlreichen Nutzern von einzelnen Bibliothekssammlungen weiterhelfen. In der Vergangenheit hatten wir keine sinnvolle Möglichkeit, diese Daten den Institutionen zurückzugeben, und aus diesem Grund haben wir verworfen, was sich nicht in ein Buch einfügen ließ.</p>
<p>Um diese Situation zu verbessern, haben wir eine Methode entwickelt, damit Forscher und Forschungs-Gemeinden via IIIF Bibliotheken und Museen informieren können, wenn sie Daten erschaffen haben, die zu ihren Sammlungen in Beziehung stehen. Und wir haben einen Ansatz entwickelt, dass IIIF Viewers (wie Mirador) diese Daten nahtlos in ihrer Nutzeroberfläche importieren können.</p>
<p>Lassen Sie mich mit ein Paar Beispielen aufhören:</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/iiif-ldn-leipzig-fragmentarium.png" alt="Folie 16" /></p>
<p>Hier können Sie sehen, dass ich mit einer Suche bei Fragmentarium (oder Universität Leipzig) anfange. Ich entdecke eine Handschrift von Interesse und ich importiere diese Handschrift in Mirador. So weit so gut. Ich kann diese Handschrift erforschen, aber es ist noch schwierig, darin zu navigieren. Ich brauche ein Inhaltsverzeichnis und Transkriptionen. Es wäre schade, wenn Fragmentarium oder andere Institutionen diese Information erzeugen müssten, denn ich habe diese Information schon erarbeitet als Teil von meiner Forschung.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/iiif-ldn-leipzig-announcement.png" alt="Folie 17" /></p>
<p>Aber mit IIIF und einer Technologie, die „Linked Data Notifications“ heißt, kann ich jetzt eine Mitteilung machen, und durch diese Mitteilung sind meine Forschungsdaten verfügbar und verbunden mit diesen Bildern von Fragmentarium.</p>
<p>Nachdem ich, als Forscher, eine Mitteilung gemacht habe, kann ein anderer Nutzer in einem vollkommen verschiedenen Zusammenhang, vielleicht auf der Website von einer Bibliothek oder anderen Institution, Zugang zu dieser Information haben.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/iiif-ldn-leipzig-oldtoc.png" alt="Folie 18 screen shot of mirador list" /></p>
<p>Also, hier können Sie den Text sehen, wie er bei Fragmentarium scheint, mit minimalistischem Inhaltsverzeichnis und ohne Transkription.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/iiif-ldn-leipzigms.png" alt="Folie 19" />
Aber jetzt nach meiner Mitteilung kann ein Nutzer, ohne mich oder die SCTA zu kennen, per Klick eine Liste von verfügbaren ergänzenden Forschungsdaten bekommen.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/iiif-ldn-leipzig-newtoc.png" alt="Folie 20, 21" />
<img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/iiif-ldn-leipzigms.png" alt="Folie 21" />
<img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/iiif-ldn-leipzig-transcriptions2.png" alt="Folie 22" /></p>
<p>Und dann, mit einem Klick, kann die Nutzerin oder der Nutzer entscheiden, ob er diese Information importieren will oder nicht.</p>
<p>Ich bin der Meinung, dass wir hier nur den Anfang dessen sehen, was Möglich ist. Aber ich hoffe, deutlich gemacht zu haben, dass wir diese Möglichkeiten nur realisieren können, wenn wir zusammenarbeiten. Konkret bedeutet dies, dass wir allgemeinen Standards wie IIIF folgen müssen. Aber ich hoffe, ebenso klar gemacht zu haben, dass diese Arbeit sich lohnt. Die zusätzliche Mühe, die gefordert ist, um diese Möglichkeiten zu realisieren, zahlt sich [fast schon] automatisch aus, und alle können gewinnen: sowohl die Kulturerbe-Institutionen als auch die Forschenden und die Forschungs-Gemeinden.</p>
<p>Jetzt freue ich mich auf Ihre Fragen und ich bin auch gerne bereit, einige meiner Demonstrationen mit ein bisschen mehr Details zu zeigen.</p>Danke ihnen. Ich bin sehr froh hier zu sein und ich freue mich Ihnen ein bisschen erzählen zu dürfen, warum IIIF wichtig für Forscher der mittelalterlichen Geistesgeschichte ist und warum IIIF ein Mittel sein kann, die Zusammenarbeit zwischen Forschern und Kulturerbe-Institutionen effizienter zu gestalten.SCTA und Topic Modelling: ein DAAD Bericht2018-10-15T00:00:00+00:002018-10-15T00:00:00+00:00jeffreycwitt.com/2018/10/15/SCTA-und-topic-modelling-ein-DAAD-Bericht<p>SCTA und Topic Modelling: ein DAAD Bericht</p>
<p>Mit der Explosion von Daten wird die Frage der Zukunft nicht sein, “Ist dieser Text oder sind diese Daten verfügbar?”, sondern “Können wir diesen Text oder Text-Teil finden in dem Stapel dessen, was verfügbar ist?”</p>
<p>Die wissenschaftliche Gemeinde lagert diese Aufgabe der adäquaten Auswahl von Informationen auf eigene Gefahr aus. Wenn Daten theoretisch verfügbar sind, aber noch nicht auffindbar, dann ist dies ein Problem von Kuration. Wenn wir tausende oder sogar Millionen Ergebnisse haben, können wir nicht alle diese Ergebnisse untersuchen. Wir müssen auswählen und das ist Kuration. Kuration ist eine Art von Auswahl beruhend auf Grundsätzen. Wissenschaftliche Entdeckung fordert Kuration beruhend auf wissenschaftlichen Grundsätzen. Diese Kuration auszulagern und zum Beispiel Google zu überlassen, heißt, mit unwissenschaftlichen Ergebnissen zu arbeiten. Wir, die Fachleute, müssen die Verantwortung wieder übernehmen, die neuen digitalen Ansätze zu lernen und anzuwenden, so dass wir in der Lage sind, an der Aufgabe von Kuration teilzunehmen.</p>
<p>Mithilfe des Deutschen Akademischen Austausch-Dienstes habe ich als Leiter des SCTA (Scholastic Commentaries and Texts Archive, https://scta.info) einen ersten Schritt in diese Richtung gemacht, einen ersten Versuch, diese Verantwortung zu übernehmen.</p>
<p>Anfang Oktober 2018 habe ich mit meinem Kollegen Dr. Thomas Köntges bei der Digital Humanities Lab an der Universität Leipzig versucht, einen Ansatz des “Natural Language Processing” Ansätze, nämlich das sogenannte “Topic Modelling”, auf das SCTA Korpus anzuwenden.</p>
<p>Die Grundidee ist, dass wir mit der Kombination von Computer-Rechenleistung und Fachkenntnis ein Profil jedes Absatzes im Scholastik-Korpus bauen können. Mit diesen Profilen können wir erwartete und unerwartete Verbindungen im gesamten Korpus entdecken.</p>
<p>Nichts von dem wäre möglich gewesen ohne die Fachkenntnis und Zusammenarbeit mit Dr. Thomas Köntges. Dr. Köntges hat eine wichtige Applikation entwickelt, die “ToPan” heißt und mit der man Texte analysieren und “Topics” erschaffen kann.</p>
<p>In dem Bild unten kann man ein Beispiel eines Topics sehen, das von Dr. Köntges Applikation “ToPan” erzeugt wurde.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/toPan-topic-modelling-viz.png" alt="Topan" /></p>
<p>Mit diesen Topics oder Themen kann man dann dieses Korpus unterscheiden und sortieren.</p>
<p>Die Frage ist nur: Wie kann man ein so riesiges Korpus wie das SCTA Korpus automatisch in diese Applikation eingeben? Um das zu schaffen, habe ich ein “CSV API” für das ganze SCTA-Korpus erzeugt. Dieses API macht Millionen von Lateinischen Wörtern, die in scholastischen Texten gefunden werden, in einer Form verfügbar, die eine Applikation wie “ToPan” verstehen kann.</p>
<p>Der nächste Schritt, bevor diese Ergebnisse nützlich sein werden, ist diese Ergebnisse in solcher Weise zu veröffentlichen, dass sie von anderer “Client Applications” gebraucht werden können. Dr. Köntges hat schon eine weitere Applikation entwickelt, die Metallo heißt, um diese Ergebnisse darzustellen. Zusammen haben wir diese Applikation modifiziert, so dass sie die Ergebnisse als nützliche Daten verfügbar machen kann, nämlich als “JSON data”.</p>
<p>Nach diesen Schritten waren wir jetzt in der Lage, diese Ergebnisse zu benutzen, um unseren Text und Suchdienst zu verbessern.</p>
<p>Die offensichtlichste Anwendung von diesen Absatz-Profilen ist, Nutzern zu erlauben, Suchergebnisse nach Themen zu gliedern und zu sortieren. Auf diese Art und Weise vermeiden wir unwissenschaftlichen Gebrauch von Suchergebnissen, worin wir nur die ersten Suchergebnisse wählen, weil sie zuerst vorkommen, und nicht, weil sie die besten sind oder (sie) am Besten zu unserer Forschung passen.</p>
<p>Zum Beispiel kann man in dem ersten Bild unten eine Liste von unsortierten Suchergebnissen sehen. Der Suchdienst hat das Ergebnis “potentia absoluta” in vielen verschiedenen Absätzen gefunden, aber das Absatz-Profil und ein verbundenes Thema weisen darauf hin, dass die folgenden Absätze dieselbe Phrase, “potentia absoluta”, in drei verschiedenen Diskussionen benutzen. <!-- left off with corrections here 6/4 --></p>
<p>Ein einfaches Beispiel wäre: eine rohe Suche für das Wort “Leiter”, die Absätze zurücksendet, die sowohl etwas mit einem Bergsteiger als auch mit einem Chef von einem Geschäft zu tun haben. Mithilfe von Topic Modelling können wir diese verschiedenen Diskussionen sortieren, wie man in dem zweiten Bild sehen kann.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/lbp-topic-modelling-search-results1.png" alt="TopicModellingSearchResults1" /></p>
<p>Hier kann ein Nutzer ein Topic auswählen und nur die Absätze sehen, die etwas mit dieser Diskussion zu tun haben.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/lbp-topic-modelling-search-results2.png" alt="TopicModellingSearchResults1" /></p>
<p>Aber die Sortierung von Suchergebnissen ist nur der Anfang.</p>
<p>Mithilfe von einem Profil für jeden Absatz zielen wir darauf, einen Empfehlungsdienst zu bauen. Solch ein Dienst sollte einen traditionellen Anspruch erfüllen, nämlich, die Fähigkeit Nutzer zu verknüpfte Diskussionen zu führen.</p>
<p>In diesem Bild können wir sehen, dass das ein traditionelles Ziel ist.</p>
<p>Cremona 1618
https://books.google.com/books?id=h2IUiZ6aYZUC&amp;pg=PA66#v=onepage&amp;q&amp;f=false</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/scholion-cremona1618-example" alt="TopicModellingSearchResults1" /></p>
<p>Viele weitere Beispiele aus dem 16. und 17. Jahrhundert könnten gefunden werden.</p>
<p>Aber dieser Anspruch hat sogar bis in die moderne Zeit angehalten.</p>
<p>Das Skolion der Ausgabe von Bonaventure aus dem späten 19. Jahrhundert ist ein treffliches Beispiel.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/bonaventure_scholion.png" alt="TopicModellingSearchResults1" /></p>
<p>Diese Arten von Verbindungen sind wichtig. Sie machen uns den größeren Zusammenhang bewusst. Einige Verbindungen könnte ein Fachmann / eine Fachfrau vorhersehen. Wir können erwarten, dass ein Kommentar zu Distinctio 17 sich auf viele andere Kommentare zu Distinctio 17 beziehen kann.</p>
<p>Aber unsere Erwartungen sind auch unsere Grenze, denn wir suchen Verbindungen nur dort, wo wir diese erwarten. Und offensichtlich bleiben uns jene Verbindungen verborgen, die wir nicht erwarten.</p>
<p>Die Hilfe, hier von wohlmeinenden Herausgebern zur Verfügung gestellt, gibt uns nur ein Muster von Verbindungen. Diese ist jedoch keineswegs umfassend oder wissenschaftlich. Sie ist nur eine Auswahl, die auf den Vorlieben des Herausgebers beruht. Und obwohl diese Auswahlen oft hilfreich sein könnten, steuern sie trotzdem die Richtung aller nachfolgenden Forschung, entgegen jeder Forderung von Wissenschaft oder historischer Genauigkeit. Ist der Verweis in der Bonaventura Skolion auf die parallele Diskussion in Gregory Biel nur da, weil diese in Biel eng verbunden mit jener in Bonaventure ist? Enger oder wichtiger als alle Diskussionen, die zwischen der Zeit Bonaventures und der Zeit Biels (fast zweihundert Jahre) stattgefunden haben, die trotzdem nicht erwähnt sind? Es ist wahrscheinlicher, dass Biel im Kopf des Herausgebers einer der “Big Guys”, einer der “wichtigen Scholastiker,” ist und deshalb ist ihm diese Diskussion bewusst. Dieser Prozess allerdings ist ein Teufelskreis. Biel ist gelistet, während viele andere spätere Scholastiker nicht gelistet sind, weil der Herausgeber glaubt, dass Biel wichtiger ist. Nachforscher sehen diese Liste und orientieren ihre Arbeit daran. Aufgrund der begrenzten Zeit entscheiden sich die nachfolgenden Forscher von diesem Skolion, die Diskussion von Biel zu untersuchen und übersehen die anderen Diskussionen. Also geht der Kreis weiter und unweigerlich entdecken wir nur, was unsere bisherigen Entscheidungen uns erlauben zu entdecken.</p>
<p>Was wir brauchen, ist ein wissenschaftlicherer und umfassenderer Ansatz: ein Ansatz, der die Diskussionen enthüllt, die von unseren Vorurteilen versteckt werden.</p>
<p>“Topic Modelling” kann uns hier helfen. Mit der Hilfe gewaltiger Computerrechenleistung können wir die Relevanz jedes Absatzes betrachten; nicht nur die Absätze, die uns schon bekannt sind. Der Computer kann ein Profil von jedem Absatz bauen und wir können dieses Profil benutzen, um verbundenen Passagen zu empfehlen und anzuzeigen.</p>
<p>Während meiner Zeit in Leipzig habe ich mit Dr. Köntges ein Beispiel entworfen, um diese Möglichkeiten zu demonstrieren.</p>
<p>Unten kann man sehen, was passiert, wenn man nach mehr Information über diesen Absatz fragt. Zunächst bekommt man eine Liste von Absätzen mit einer direkten Verbindung zum entsprechenden Absatz. Diese Verbindungen sind die gefundenen Ergebnisse eines Forschers. z.B. dieser Absatz zitiert den Anderen und so weiter.</p>
<p>Aber unten ist eine neue Liste von verbundenen Absätzen, dessen Verbindungen von Computer bestimmt wurden. Und in diesem Fall hat der Computer das ganze Korpus analysiert und deshalb kann er Passagen empfehlen, die jenseits der Vorurteile des Herausgebers bestehen.</p>
<p>Und abermals, in demselben Bild können wir diese in Beziehung stehende Absätze in einer graphischen Darstellung.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/lbp-recommendations-by-topic.gif" alt="TopicModellingSearchResults1" /></p>
<p>In der Zukunft planen wir beide Ansätze immer enger zusammenzubringen, so dass wir durch die Kombination von Eigenschaften, die von den Forschern erzeugt wurden, und jenen, die vom Computer erzeugt wurden, einen effektiven Empfehlungsdienst erschaffen können: einen Dienst, der uns erlaubt, die Verbindungen den ganzen Korpus hindurch in einer wissenschaftlichen und umfassenden Weise zu sehen.</p>SCTA und Topic Modelling: ein DAAD BerichtEncountering the Text in the Information Age2018-09-17T00:00:00+00:002018-09-17T00:00:00+00:00jeffreycwitt.com/2018/09/17/encountering-the-text<p>Below is a list of readings I would like to use to structure our seminar.</p>
<p>Given the various time commitments we all have, I know that, despite best intentions, it is not always possible to do all the readings before a seminar. At the same time, our seminar will be infinitely more enriching if participants can devote some time to reading preparation.</p>
<p>In order to make it easier for everyone to do a least some reading, I have indicated readings that are considered <strong>focused</strong> readings and those that are <strong>recommended</strong> or <strong>highly recommended</strong>. Please prioritize <strong>focused</strong> readings over <strong>recommended</strong> readings.</p>
<p>Additionally, for <strong>focused</strong> readings, I’ve also try to indicate even smaller page ranges which constitute the core of what I would like to focus on during our time together. Obviously, the contents within these pages ranges will be more rewarding if one can read them in the context of the entire text. But if one is pressed for time, concentrating attention on these pages ranges will allow us to come together with some common understanding of the issues at play and will hopefully enable a rewarding discussion.</p>
<p>Each reading below has a file reference. During the course of the seminar, files corresponding to these references will be available <a href="https://drive.google.com/drive/folders/1cZlar7NtUIQlWvdu_yleNudahxjH6_XD?usp=sharing">here</a>. If possible, I recommend printing the <strong>focused</strong> readings out, so that during the seminar we can draw our attention toward each other and away from our screens.</p>
<h1 id="monday">Monday</h1>
<ol>
<li>Marshal Mcluhan, “The Medium is the Message” in <em>Understanding Media</em>, pp. 7-21 <a href="https://drive.google.com/open?id=15w29PROI5FLfr0j1nnDRKAfaf_sR7-xh">file 01-01</a>
<ul>
<li><strong>Focused</strong>: p. 11, p. 18</li>
</ul>
</li>
<li>Nicholas Carr, “Introduction” in <em>The Shallows</em>, pp. 1-4 <a href="https://drive.google.com/open?id=1eG0LGpOJ7113S5Z2j6Vt8RE0AJea75n2">file 01-02</a>
<ul>
<li><strong>Focused</strong>: all</li>
</ul>
</li>
<li>Karl Marx, “[4. The Essence of the Materialist Conception of History.
Social Being and Social Consciousness]” in <em>German Ideology</em>, pdf pp. 1-2 <a href="https://drive.google.com/open?id=1K-Zvi0APWFl2riS7IwKDff9zwEumQMU2">file 01-04</a>
<ul>
<li><strong>Recommended</strong>: all</li>
</ul>
</li>
<li>Karl Marx, “Ruling Ideas” in <em>German Ideology</em>, pdf pp. 1-3 <a href="https://drive.google.com/open?id=1gKMu0ssKw8C7k7CK1TM9BywKsb1CMTsx">file 01-05</a>
<ul>
<li><strong>Recommended</strong>: all</li>
</ul>
</li>
<li>Plato, <em>The Phaedrus</em>, pdf pp. 1-36 <a href="https://drive.google.com/open?id=1shv7X75cF--_C5vqPN49k0TQcyqjFqgk">file 01-03</a>
<ul>
<li><strong>Focused</strong>: pp. 12-18 (speech in praise of the lover), pp. 28-30 (criteria of true rhetoric), pp. 32-36 (in defense of speech over the written word)</li>
</ul>
</li>
</ol>
<h1 id="tuesday">Tuesday</h1>
<ol>
<li>Walter Ong, “Orality of Language”, <em>Orality and Literacy</em>, pp. 5-15 <a href="https://drive.google.com/open?id=1wQedq3gKNtkEC6Kvq6SzQa_EfiW2xSAN">file 02-01</a>
<ul>
<li><strong>Recommended</strong></li>
</ul>
</li>
<li>Walter Ong, “Writing Restructures Consciousness”, <em>Orality and Literacy</em>, pp. 78-116 <a href="https://drive.google.com/open?id=1U56ohdoVmGXOiJ54Efi8siUH2ITprUw-">file 02-02</a>
<ul>
<li><strong>Focused</strong>: pp. 78-96, 101-103</li>
</ul>
</li>
<li>Nicholas Carr, “Tools of the Mind (C. 3)”, <em>The Shallows</em>, pp. 39-57 <a href="https://drive.google.com/open?id=1onwAvHGm_Yc6KhQDo5q9aTkRPUzH-ca4">file 02-03</a>
<ul>
<li><strong>Recommended</strong></li>
</ul>
</li>
<li>Walter Ong, “Print, Space, Closure” pp.117-138 <a href="https://drive.google.com/open?id=1m4CVRprx1IIqXwZCo2nl_E8tMiWD-Yy4">file 02-04</a>
<ul>
<li><strong>Focused</strong>: pp. 119-121</li>
</ul>
</li>
<li>Nicholas Carr, “The Deepening Page(C. 4)”, <em>The Shallows</em>, pp. 58-77 <a href="https://drive.google.com/open?id=1db73sX0exDWgyj3hQtCcWYTzxVeiBLRH">file 02-05</a>
<ul>
<li><strong>Focused</strong>: pp. 61-63</li>
</ul>
</li>
<li>Michelle Levy and Tom Mole, “Materiality”, in <em>The Broadview Introduction to Book History</em>, pp. 3-27 <a href="https://drive.google.com/open?id=1TPdtAn8VUVrDHAtHcYp4bEAkYtNE1-vr">file 02-06</a>
<ul>
<li><strong>Recommended</strong></li>
</ul>
</li>
</ol>
<h1 id="wednesday">Wednesday</h1>
<ol>
<li>James Gleick, “Information Theory”, <em>The Information</em>, Chapter 7, pp. 204-232 <a href="https://drive.google.com/open?id=11O5NPPiBMxKX4lY60VZ5E4VpwLMhaAzG">file 03-01</a>
<ul>
<li><strong>Focused</strong>: pp. 221-232</li>
</ul>
</li>
<li>Sriram Vajapeyam, “Understanding Shannons’s Entropy metric for Information”, pdf pp. 1-6 <a href="https://drive.google.com/open?id=1863InlNpdAPIenq9a_gkbqd7soRrWFpr">file 03-01a</a>
<ul>
<li><strong>Focused</strong>: all</li>
</ul>
</li>
<li>Vannevar Bush, “As we may think”, <em>The Atlantic</em>, pdf pp. 1-21 <a href="https://drive.google.com/open?id=1gj-RPsr2ozdtjBsKx4etd6mBn-Ya-XmH">file 03-02</a>
<ul>
<li><strong>Focused</strong>: all</li>
</ul>
</li>
<li>Ted Nelson, “Hyperworld” in Chapter 0, <em>Literary Machines</em>, pp. 0/1-13 <a href="https://drive.google.com/open?id=1QWabLriGyzV-ZY3SCuD6RP5dCAxgfnlo">file 03-03</a>
<ul>
<li><strong>Highly Recommended</strong></li>
</ul>
</li>
<li>Ted Nelson, “Hypertext” in Chapter 1, <em>Literary Machines</em>, pp. 1/14-19 <a href="https://drive.google.com/open?id=1QWabLriGyzV-ZY3SCuD6RP5dCAxgfnlo">file 03-03</a>
<ul>
<li><strong>Focused</strong>: all</li>
</ul>
</li>
<li>Ted Nelson, “2.1 An Electronic Literary System” in Chapter 2, <em>Literary Machines</em>, pp. 2/4-8 <a href="https://drive.google.com/open?id=1QWabLriGyzV-ZY3SCuD6RP5dCAxgfnlo">file 03-03</a>
<ul>
<li><strong>Highly Recommended</strong></li>
</ul>
</li>
<li>Ted Nelson, “2.2 What is Literature?” in Chapter 2, <em>Literary Machines</em>, pp. 2/9-12 <a href="https://drive.google.com/open?id=1QWabLriGyzV-ZY3SCuD6RP5dCAxgfnlo">file 03-03</a>
<ul>
<li><strong>Highly Recommended</strong></li>
</ul>
</li>
<li>Roland Barthes, “The Death of the Author”, p. 142-148 <a href="https://drive.google.com/open?id=1RV3W0toGmJ6goLb7RWhypLrwxtKi6JWJ">file 03-04</a>
<ul>
<li><strong>Recommended</strong>: esp. 146-148</li>
</ul>
</li>
</ol>
<h1 id="thursday">Thursday</h1>
<ol>
<li>Sahle, Patrick. “Zwischen Mediengebundenheit Und Transmedialisierung.” Editio 24 (2010): 23–36 <a href="https://drive.google.com/open?id=1_zvr0-NwPliRB1lltvgIuKvd4BwUjxpY">file 04-01</a>, Working/Rough Translation <a href="https://drive.google.com/open?id=1cRsl-dA1kmAvIwHikucrmMz4_GF2BEBB">file 04-01a</a>
<ul>
<li><strong>Focused</strong>: all</li>
</ul>
</li>
<li>De Rose, et al. “What is a Text Really”, <em>Journal of Computing in Higher Education</em>, vol. 1 (2), 1990, pp. 3-26 <a href="https://drive.google.com/open?id=1F-KfS6HGuP7mPK7BlQ_9fxwPdmcIe1ut">file 04-02</a>
<ul>
<li><strong>Focused</strong>: pp. 1-6</li>
</ul>
</li>
<li>“The concept of a work in World Cat: An application of Frbr”, pdf pp. 7-32 <a href="https://drive.google.com/open?id=1kguwUVYeA2AKa9VPtvned5lY1fToCnJk">file 04-03</a>
<ul>
<li><strong>Focused</strong>: pp. 3-8</li>
</ul>
</li>
<li>Wikipedia, “Functional Requirements for Bibliographic Records” <a href="https://drive.google.com/open?id=1glTY0r2aUnxEOyqN6D5Y7jYaqtvDXjE5">file 04-04</a> or <a href="https://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records">https://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records</a>
<ul>
<li><strong>Focused</strong>: all</li>
</ul>
</li>
</ol>
<h1 id="friday">Friday</h1>
<ol>
<li>Nicholas Carr, “The Very Image of a Book (C. 6)”, <em>The Shallows</em>, pp. 99-114 <a href="https://drive.google.com/open?id=1vKy3eAg4--cJDxMwo-YXLyJvbfO1wHUE">file 05-01</a>
<ul>
<li><strong>Focused</strong>: all</li>
</ul>
</li>
<li>Nicholas Carr, “The Juggler’s Brain (C. 7)”, <em>The Shallows</em>, pp. 115-143 <a href="https://drive.google.com/open?id=1MDGGOfQO6wGe6qESVX2Z9uP5UjE9gXJL">file 05-02</a>
<ul>
<li><strong>Focused</strong>: all</li>
</ul>
</li>
<li>Sven Birkerts, “Into the Electronic Millennium”, <em>Gutenberg Elegies</em>, pp. 117-133 <a href="https://drive.google.com/open?id=146FCBT-MV6gzLCQ11KH_Yg6yVTDJS5L2">file 05-03</a>
<ul>
<li><strong>Recommended</strong></li>
</ul>
</li>
<li>Sven Birkerts, “Perseus Unbound”, <em>Gutenberg Elegies</em>, pp. 134-140 <a href="https://drive.google.com/open?id=146FCBT-MV6gzLCQ11KH_Yg6yVTDJS5L2">file 05-03</a>
<ul>
<li><strong>Recommended</strong></li>
</ul>
</li>
<li>Sven Birkerts, “Hypertext: Of Mouse and Man”, <em>Gutenberg Elegies</em>, pp. 151-164 <a href="https://drive.google.com/open?id=146FCBT-MV6gzLCQ11KH_Yg6yVTDJS5L2">file 05-03</a>
<ul>
<li><strong>Recommended</strong></li>
</ul>
</li>
</ol>Below is a list of readings I would like to use to structure our seminar.Traveling Imprimatur Demonstration2017-12-15T00:00:00+00:002017-12-15T00:00:00+00:00jeffreycwitt.com/2017/12/15/travelling-imprimatur-demo<p>In this demo, I’d like to show some of the early realizations of a system of quality control and imprimatur that can travel with an edition, freeing it from the confines of a particular publisher or particular presentation. In an <a href="http://lombardpress.org/2016/05/19/the-traveling-imprimatur">earlier post</a>, I described an early conception of this idea as a “traveling imprimatur”, but of late I have had some requests for live demonstrations of how this might work in production rather than just in theory. So here I want to offer a few more thoughts about why this idea is important before offering a video demonstration of this idea working in production.</p>
<h1 id="preface">Preface</h1>
<p>As preface, I’d like to recall why the idea of a traveling imprimatur is important and how it challenges outdated paradigms that are still unnecessarily directing how we migrate our shared cultural heritage to the new digital medium.</p>
<p>In a great article by Joris van Zundert titled “barely beyond the book”, he introduces an idea called “paradigmatic regression”.</p>
<p>Van Zundert describes acts of “paradigmatic regression” as:</p>
<blockquote>
<p>“acts of shaping that translate an expression of the paradigm of the new technology into an expression of a paradigm that is already known to the user.”</p>
<blockquote>
<p>(Joris van Zundert, “Barely Beyond the Book?” in <em>Digital Scholarly Editing: Theories and Practices</em>, eds. Matthew James Driscoll and Elena Pierazzo, (http://dx.doi.org/10.11647/OBP.0095.05), 83-106, 85)</p>
</blockquote>
</blockquote>
<p>I start with this idea because today many acts of publishing an edition online embody an act of paradigmatic regression.</p>
<p>The concept we are familiar with from the print world is that an edition is a thing that is experienced in one place. To experience a particular edition is to experience the presentation of this edition as represented in a particular published physical book. The experience of this particular edition is therefore exhausted by the presentation found in this printed book because this edition can be experienced nowhere else.</p>
<p>Accordingly, the imprimatur of an edition is tied to a particular presentation of this text, and thus is tightly with coupled with the source or publisher of this presentation. If I want to view the edition that has been reviewed and carries the imprimatur of quality control, I can only view the text in the particular presentational form offered by a single publisher because, again, there is no other way for this edition to exist. The publisher who offers this presentation gains a monopoly over the “reviewed”, and therefore “authoritative” text, because the review is associated with this particular presentation rather than the data underlying this presentation.</p>
<p>Today, we see acts of paradigmatic regression in the creation of digital editions because this paradigm is being re-enacted in the digital medium despite the fact that it is no longer necessary.</p>
<p>That is, all too often, we tend to see the essence of our edition as something that is presented on a particular webpage. If I want to experience that edition, I am required to travel to a particular page or website in order to encounter that edition.</p>
<p>Consequently, the way we think about review, quality control, and the imprimatur for this text continues to follow the old paradigm. A text is considered reviewed when a review is given for this particular online presentation of the edition. Thereby, the party responsible for this presentation on this particular website gains an unnecessary and often unearned monopoly over the reviewed and authoritative version of the text and the uses that can be made of it.</p>
<p>Thus, if you want to see the reviewed text, one is needlessly forced to view that edition in one place and in one context only. Further uses and representations of this edition are prohibited precisely because the approval of the text is tied to a particular publication of this text rather than to the text itself. The authority and veracity of the imprimatur is once again tied to the source of the presentation, that is, the publisher or the website making the text visible, rather than to the data itself.</p>
<p>The big difference between the print enactment of this paradigm and the digital is that, in the latter case, the imprimatur is <strong>needlessly and unnecessarily</strong> tied to the publisher rather than the text. It is no longer the medium that requires us to do this, but our “paradigmatic regression” to an older model with which we are already familiar and comfortable.</p>
<p>The digital medium makes it possible for us to decouple the imprimatur of a particular edition from whoever is publishing the text or whatever website at a given moment is presenting that text. In this way, the reviewed text becomes free for anyone to publish and free for anyone to make new and innovative uses of without ever loses its identity as the reviewed and authoritative text.</p>
<h1 id="demonstration">Demonstration</h1>
<p>In the follow screen cast, I want to offer some demonstrations of this new paradigm in action and how this kind of “traveling imprimatur” can work in the real world. While still a work in progress, it is important to recognize that this is already operational and therefore technologically possible. Thus, the main obstacles to progress lie, not in technological problems, but rather in generating the social and political will to adopt a new paradigm.</p>
<iframe width="100%" height="315" src="https://www.youtube.com/embed/oNzciuTgjr8" frameborder="0" gesture="media" allow="encrypted-media" allowfullscreen=""></iframe>In this demo, I’d like to show some of the early realizations of a system of quality control and imprimatur that can travel with an edition, freeing it from the confines of a particular publisher or particular presentation. In an earlier post, I described an early conception of this idea as a “traveling imprimatur”, but of late I have had some requests for live demonstrations of how this might work in production rather than just in theory. So here I want to offer a few more thoughts about why this idea is important before offering a video demonstration of this idea working in production.Politics and Society: The Patristic Legacy in the Middle Ages2017-11-22T00:00:00+00:002017-11-22T00:00:00+00:00jeffreycwitt.com/2017/11/22/oxford-patristics-cfp<hr />
<h5 id="workshop-proposal-and-call-for-papers-for">Workshop Proposal and Call for Papers for:</h5>
<h4 id="xviiith-international-conference-on-patristics-studies">XVIIIth International Conference on Patristics Studies</h4>
<p>Oxford University
19 August-24 August 2019</p>
<hr />
<h3 id="politics-and-society-the-patristic-legacy-in-the-middle-ages">Politics and Society: The Patristic Legacy in the Middle Ages</h3>
<p>a workshop organized by John T. Slotemaker, Fairfield University and Jeffrey C. Witt, Loyola University Maryland</p>
<hr />
<p>The XVIIIth Oxford Patristics Conference (hereafter OPC) will take place in the Examination Schools on High Street, Oxford during August of 2019. The general call for papers has been issued (see: www.oxfordpatristics.com) and the deadline for both short communications and workshops is 31 August 2018. The present call for papers is to organize a workshop on <em>Politics and Society: The Patristic Legacy in the Middle Ages</em> within the <em>nachleben</em> (lit. ‘afterlife’) subdivision of the OPC.</p>
<p>The theme of this year’s workshop is <em>Politics and Society</em> broadly conceived. We invite proposals that examine how medieval thinkers used the Patristic inheritance to develop their own political and social worldviews. Papers might address questions such as: How Patristics authors shaped the way medieval thinkers theorized the proper relationship between church and state, or an individual to his or her family? How particular Patristic quotations were used or misused to support various medieval political or social agendas? How Patristic authors encouraged or prevented medieval multi-cultural or inter-religious interactions? How Patristic authors were used to shape law (civil or canon) and legal institutions? How Patristic authors were used to guide or direct various social practices such as baptism, marriage, or last rites?</p>
<p>The theme is meant broadly and we are eager to consider proposals from a wide variety of points of view, including historical, theological, philosophical, sociological, etc. We are likewise interested in expanding our horizons and expectations of where Patristic sources were used in the Middle Ages: to that end, we encourage papers that look beyond the scholasticism of the 13th century chronologically (looking at both the early middle ages and the later middle ages) and employ a variety of sources (i.e., looking at theological treatises, canon law, biblical commentaries, sermons, etc.).</p>
<p>If you wish to join this workshop please consider submitting a proposal to John Slotemaker or Jeff Witt (<a href="mailto:johnslotemaker@gmail.com">johnslotemaker@gmail.com</a>, <a href="mailto:jeffreycwitt@gmail.com">jeffreycwitt@gmail.com</a>). We will accepting proposals for this workshop up through 30 June 2018. The workshop will consist of 12 papers with each paper given 20 minutes with 10 minutes for discussion. At the conclusion of the workshop participants will be invited to submit their contributions as part of collected volume to be published with Studia Patristica.</p>
<p><em>Nota bene</em>: by accepting your proposal we will assume your participation in the workshop and your desire to publish the essay with <em>Studia Patristica</em>.</p>IIIF and Linked Data Notifications - Thoughts and Reflections2017-02-28T00:00:00+00:002017-02-28T00:00:00+00:00jeffreycwitt.com/2017/02/28/datasharing-iiif-and-ldn<p>A post by Jeffrey Witt (@jeffreycwitt)</p>
<h1 id="introduction">Introduction</h1>
<p>In the following, I offer some reflections on how the <a href="http://iiif.io">IIIF community</a> could use the emerging <a href="https://www.w3.org/TR/ldn/">Linked Data Notification</a> specification to facilitate the sharing of IIIF resources between research groups and libraries. This post is a sequel and companion to <a href="http://lombardpress.org/2016/04/16/iiif-webmentions/">my earlier description</a> of how Rafael Schwemmer (of text &amp; bytes and e-codices) and I used the <a href="https://www.w3.org/TR/webmention/">Webmention</a> specification to achieve similar results. See also my related post on using <a href="http://lombardpress.org/2017/01/24/linking-research/">linked data notifications to share discussions between connected resources</a>.</p>
<p>Caveat: none of the following has the approval or authority of the IIIF community; it is entirely speculative and experimental, designed primarily to move the discussion forward.</p>
<p>The main outcomes desired are as follows: First, we would like to create an automated way of allowing content providers to “announce” the publication of IIIF content (usually “supplemental”, i.e. a non-manifest resource) that has some kind of relationship or relevance to other IIIF content (usually, a manifest), particularly in cases where these relationships are not made explicit within the resource itself. Second, we want to create a standard serialization of these “announcements” and “content publication” so that users of this content can develop automatic workflows of incorporating this related data into their systems.</p>
<h1 id="general-use-cases">General Use Cases:</h1>
<p>To understand the motivation behind these goals, it is helpful to look at a few emerging use cases.</p>
<p>The <a href="http://scta.info">SCTA</a> publishes a large number of IIIF ranges, transcription layers, and search services as separate stand-alone IIIF resources that relate to manifests, canvases, and images published and maintained by several independent libraries.</p>
<p>The <a href="https://www.princeton.edu/~geniza/">Princeton Geniza Lab</a> similarly maintains a database of transcriptions of Hebrew manuscripts scattered in more than 70 libraries.</p>
<p>The SCTA and Geniza Lab, despite being different projects with different datasets, should be able to adopt one common solution of announcing and publishing their “supplemental” data that can be understood and consumed by a plurality of libraries.</p>
<p>Moreover, the SCTA and Geniza Lab both, independently, have “supplemental” (non-manifest) data relevant to artifacts in the same libraries. For example, both independent research groups have “supplemental” data about manuscripts at the University of Pennsylvania and Cambridge.</p>
<p>The University of Pennsylvania and Cambridge should be able to receive, ingest, and use information from both research groups with one common workflow. In other words, they should not be developing one mechanism to include information from the SCTA and a second workflow to ingest information from the Princeton Geniza Project.</p>
<p>Again, the SCTA has transcriptions and complicated ranges for manuscripts in the Harvard University collection. As Harvard thinks about building a IIIF workspace in which scholars can work, it would be nice if the workspace could automatically alert the user to available transcriptions, ranges, or services related to the canvas currently in focus. In an ideal world, Harvard would not even need to modify its original manifest, but the workspace could simply offer an “alert” to the user. The user could then decide to bring in the “foreign” content if they wanted to.</p>
<p>Ideally, we would like to achieve something like the following:</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/ldn-visualizations.png" alt="ldn-visualization" /></p>
<p>Or</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/ldn-visualizations1.png" alt="ldn-visualization" /></p>
<h1 id="new-attempts-with-linked-data-notifications">New attempts with Linked Data Notifications</h1>
<p>In an <a href="http://lombardpress.org/2016/04/16/iiif-webmentions/">earlier post</a>, we described trying to facilitate this data sharing via <a href="https://www.w3.org/TR/webmention/">Webmentions</a>. Here we consider what this might look like using <a href="https://www.w3.org/TR/ldn/">Linked Data Notifications</a>. Some previous discussion of the topic can be found on the IIIF-discuss board <a href="https://groups.google.com/forum/#!topic/iiif-discuss/DMGdfHcfH8o">here</a>.</p>
<h2 id="example-notifications">Example Notifications</h2>
<h3 id="example-1">Example 1</h3>
<p>Layer Notification: <a href="http://scta.info/iiif/rothwellcommentary/wettf15/notification/layer/transcription">http://scta.info/iiif/rothwellcommentary/wettf15/notification/layer/transcription</a></p>
<p>Compare to the earlier Webmention Layer Supplement: <a href="http://scta.info/iiif/rothwellcommentary/wettf15/supplement/layer/transcription">http://scta.info/iiif/rothwellcommentary/wettf15/supplement/layer/transcription</a></p>
<p>This is what I see as the simplest and perhaps IDEAL case. It is the announcement of an available layer related to an e-codices manifest. The wrapper is very simple. There is an “id” for the sender’s notification, a “source” to indicate the domain from which the announcement comes, a “target” (i.e. the manifest to which the announced material is related), and then the “object”. The object in this case is just the URL ID to the “supplemental” non-manifest layer that can be de-referenced independent of the notification or manifest.</p>
<h3 id="example-2">Example 2</h3>
<p>Service Notification: <a href="http://scta.info/iiif/rothwellcommentary/wettf15/notification/service/searchwithin">http://scta.info/iiif/rothwellcommentary/wettf15/notification/service/searchwithin</a></p>
<p>Compare to the earlier Webmention Service Supplement: <a href="http://scta.info/iiif/rothwellcommentary/wettf15/supplement/service/searchwithin">http://scta.info/iiif/rothwellcommentary/wettf15/supplement/service/searchwithin</a></p>
<p>This example is fairly similar except that the object does not point to a de-referencable link, but provides the json object itself. There are no examples of a <code class="highlighter-rouge">@type: "service"</code> in the IIIF search API, but I added it here because I am expecting that the client would be using the <code class="highlighter-rouge">@type</code> property to know what kind of information is being announced and what to do with it. (This does, however, compete with an example in the IIIF documentation where the value of “type” in the service block was “feature.” See <a href="http://iiif.io/api/annex/services/#geojson">http://iiif.io/api/annex/services/#geojson</a>. Something else besides “type” could be used. However, on this approach, it would have to be the same property on all announced objects.</p>
<p>Once the client knows that it is a “service” and not a “layer” or “range” it can check the service “profile” to know what kind service it is and whether or not they want to incorporate it.</p>
<h3 id="example-3">Example 3</h3>
<p>Service Notification: <a href="http://scta.info/iiif/rothwellcommentary/wettf15/supplement/ranges/toc">http://scta.info/iiif/rothwellcommentary/wettf15/notification/ranges/toc</a></p>
<p>Compare to the earlier Webmention Service Supplement: <a href="http://scta.info/iiif/rothwellcommentary/wettf15/supplement/ranges/toc">http://scta.info/iiif/rothwellcommentary/wettf15/supplement/ranges/toc</a></p>
<p>Here is a range announcement. The “object” property is taking a single object that then wraps a flat list of all other connected ranges being announced. The <code class="highlighter-rouge">@type</code> can be used to recognize this as a range. The viewing hint is set to “wrapper” to alert the client that this is a wrapper and should be discarded. Using a “wrapper” range like this also allows me to create a de-referencable id for the entire set of ranges (e.g. <a href="http://scta.info/iiif/rothwellcommentary/wettf15/ranges/toc/wrapper">http://scta.info/iiif/rothwellcommentary/wettf15/ranges/toc/wrapper</a>. Such a de-referencable collection of ranges would also allow me to just provide the link as the value of the “object” (as in the case of “example 1” above). Further, if I had several different ranges for this manifest, I could send them to e-codices all at once as an array of de-referencable links to range wrappers.</p>
<h1 id="sending-a-notification">Sending a Notification</h1>
<p>Sending notification is a simple post request.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/bash_send_notification.png" alt="bash_send_notification" /></p>
<h1 id="the-inbox">The Inbox</h1>
<p>“The Inbox” is a service described by the LDN spec, which accepts the POST request of any announcement from “senders” and offers a list of notifications for GET requests from “consumers”.</p>
<p>On a generic GET request to the inbox endpoint, the inbox should return a list of received notifications.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/unfiltered_notifications.png" alt="unfiltered_notifications" /></p>
<p>On a GET request for a particular notification, the notification itself should be returned.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/single_notifications.png" alt="single_notifications" /></p>
<p>I have also modified this inbox, so a user/client could request a list of resources related to a particular manifest (or other resource).</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/filtered_announcements.png" alt="filtered_announcements" /></p>
<p>Now, theoretically, UPenn, Harvard, or Cambridge, could just send a request to this inbox to see if there are any announcements about resources related to their own manifests.</p>
<p>In return they will receive a list of notifications that they can crawl. They can then, in turn, crawl the resources announced via these notifications and then incorporate them into their own systems however they see fit.</p>
<h1 id="final-thoughts-and-reflections">Final Thoughts and Reflections</h1>
<p>What role would notifications play if there was a IIIF directory/registry (built from crawlers and sitemaps) that listed all acknowledged IIIF resources (not just manifests, but independent services, ranges, layers, etc)?</p>
<p>In this world, notifications would seem to be of primary use for the notifications of “updates”.
But, if the content of my range list changes or improves, what actually needs to be updated? Presumably, a registry of resources would store just the link to the content I am publishing. In this case, if my content updates, the URL would remain the same, and and clients using this information, would automatically get the most up-to-date information. The only update then that seems necessary is the “announcement” of a new resource (a new URL) that the crawler did not capture the first time around.</p>
<p>However, at the present, the announcement wrapper seems to provide another <strong>CRITICAL</strong> service besides just the announcement of an update. The announcement wrapper is the <strong>only way</strong> (that I know of) to link, via the “target” property, a resource (for example a range list) with a foreign manifest on another system.
Normally, a manifest is responsible for containing all the links that “lead out” to all connected resources. But here, we are considering a case, where a manifest does not, ahead of time, know about these connected resources. Currently, the IIIF API does not provide a mechanism to discover manifests from related supplemental material. Therefore, we need a mechanism to “lead in” from external resources to a manifest. Currently, the announcement wrapper is performing this function.</p>
<p>Compare, for example, the two links below:</p>
<ul>
<li>A notification of a set of ranges
<a href="http://scta.info/iiif/rothwellcommentary/wettf15/notification/ranges/toc">http://scta.info/iiif/rothwellcommentary/wettf15/notification/ranges/toc</a></li>
<li>And then the same set of ranges without the notification as external wrapper
<a href="http://scta.info/iiif/rothwellcommentary/wettf15/ranges/toc/wrapper">http://scta.info/iiif/rothwellcommentary/wettf15/ranges/toc/wrapper</a></li>
</ul>
<p>In the latter case, the list of ranges includes no references to the manifest, but only links to the canvas IDs. So, how can a crawler, by itself, make the association between this set of ranges and a foreign manifest that includes identical canvases?</p>
<p>The notification wrapper gives us a way to connect resources, even if the manifest does not contain the necessary connecting links within itself.</p>A post by Jeffrey Witt (@jeffreycwitt)Linking Research, the SCTA, LombardPress, and LinkedData Notifications2017-01-24T00:00:00+00:002017-01-24T00:00:00+00:00jeffreycwitt.com/2017/01/24/linking-research<div id="aim-of-the-scta">
<h2>Aims of the SCTA</h2>
<p>
<a href="http://scta.info">The Scholastic Commentaries and Texts Archive (SCTA)</a> is an RDF database, designed to generate unique RDF IDs for granular components, linking all paragraph and sections together through a variety of relationships (isPartOf, references, abbreviates, copies, isRelatedTo, etc.) and linking these text parts to their manifestations in various books, manuscripts, and digital transcriptions (see <a href="http://lombardpress.org/2016/06/12/DTS-modeling-proposal/">http://lombardpress.org/2016/06/12/DTS-modeling-proposal/</a>.
</p>
<p>
A key feature of this approach is that we can create relationships between the entire corpus: relationships between discrete sections of enormous texts written over 500 years of continuous discourse. Each time a new text is edited and sources are identified, these asserted relationships can be inverted, and we can, for example, automatically collect all the places a paragraph written in the 12th century is discussed or referenced over the next 500 or so years of medieval thought.
</p>
</div>
<div id="unrealized-potential">
<h2>Unrealized Potential of Linking Secondary and Primary Sources</h2>
<p>This approach un-taps <em>only some</em> of the potential of linked research. While it helps us to link together the primary sources of the corpus as they become available, we do not yet have a mechanism to link together the many secondary articles quoting, referencing, and analyzing various parts of the SCTA corpus.</p>
<p>In an ideal world, we would like an automated way to collect (or be notified about) any discrete section of a primary source text within corpus that has been cited or discussed in any secondary article.</p>
<p>With a list of referencing secondary articles, we can, in our display to the user, offer a list of distributed secondary articles (i.e. hosted anywhere) that discuss the primary source passage in question.</p>
</div>
<div id="using-linkeddata-notifications">
<h2>Using LinkedData Notifications to connect distributed scholaraly discussion</h2>
<p>What might this look like in practice?</p>
<p>Let's imagine I'm writing an article about a topic discusssed in scholastic philosophy and I'm quoting primary source material from the scholastic corpus. Because of the possibilities inherent in semantic markup, my authoring platform (currently my Jekyll blog you are currently reading) can embed meaningful metadata into every citation. Using <a href="https://rdfa.info/">RDFa</a> my quotation of a passage can include a reference to the URL for that cited passage, where property="cito:discusses" and resource="http://scta.info/resource/b1d3qun-qnveid" are added the blockquote element as attributes. (Think of this as a cutting edge research footnote, designed for creating connections rather than for being siloed at the bottom of a printed page). In this case the property describes the relationship between the quotation and the targeted resource. Such a reference might look like the following:</p>
<blockquote property="cito:discusses" resource="http://scta.info/resource/b1d3qun-qnveid">Quod non videtur, quia secundum Augustinus in Sermone communi de uno martyre "si servasset in se homo bonum quod in illo creavit Deus, id est imaginem suam, semper laudaret dictum non solum lingua sed et vita" etc.</blockquote>
<p>Following the emerging specifications for <a href="https://www.w3.org/TR/ldn/">Linked Data Notifications</a>, this embeded link becomes the lynch pin for aggregating a distributed discussion. Each resource in the SCTA database has an associated inbox, which can be found by simply de-referencing the targeted resource, searching for the property <em>http://www.w3.org/ns/ldp#inbox</em> and retrieving the value of that property. Now, when this article is published, a "notification" that this target resource is being discussed in this article can be sent to the resource inbox. This notifications is saved in the respective inbox and awaits use and consumption by other clients interacting with this resource.</p>
<p>The following video shows the above described interactions in action.</p>
<iframe width="100%" height="400" src="https://www.youtube.com/embed/tM4-G7NZ4b8" frameborder="0" allowfullscreen></iframe>
</div>Aims of the SCTA The Scholastic Commentaries and Texts Archive (SCTA) is an RDF database, designed to generate unique RDF IDs for granular components, linking all paragraph and sections together through a variety of relationships (isPartOf, references, abbreviates, copies, isRelatedTo, etc.) and linking these text parts to their manifestations in various books, manuscripts, and digital transcriptions (see http://lombardpress.org/2016/06/12/DTS-modeling-proposal/. A key feature of this approach is that we can create relationships between the entire corpus: relationships between discrete sections of enormous texts written over 500 years of continuous discourse. Each time a new text is edited and sources are identified, these asserted relationships can be inverted, and we can, for example, automatically collect all the places a paragraph written in the 12th century is discussed or referenced over the next 500 or so years of medieval thought. Unrealized Potential of Linking Secondary and Primary SourcesDigital Scholarly Editions and API Consuming Applications2016-11-02T00:00:00+00:002016-11-02T00:00:00+00:00jeffreycwitt.com/2016/11/02/dse-and-api-consuming-applications<h1 id="digital-scholarly-editions-and-api-consuming-applications">Digital Scholarly Editions and API Consuming Applications</h1>
<p>Below is a video recording of my talk “Digital Scholarly Editions and API Consuming Applications” given at the University of Graz, September 24th, 2016.</p>
<iframe width="100%" height="450" src="https://www.youtube.com/embed/cI99Q_929Dg" frameborder="0" allowfullscreen=""></iframe>
<p>Comments and thoughts welcome.</p>Digital Scholarly Editions and API Consuming ApplicationsCreating Dynamic Custom IIIF Manifests and the Importance of Great Data2016-10-22T00:00:00+00:002016-10-22T00:00:00+00:00jeffreycwitt.com/2016/10/22/dynamic-manifests-and-great-data<h1 id="introduction">Introduction</h1>
<p>There is a lot of interest of late within the IIIF community to create GUI tools allowing scholars to explore material and create custom manifests or custom tables of contents.</p>
<p>This is all well and good. It is important, and it has a place. But I want to make sure we are not overlooking the power inherent in the production of strong data models and the publication of open access data.</p>
<p>In this post I describe a couple of examples of how strong data models and open can data allow us to construct <strong>dynamic</strong> IIIF manifests and collections (curated from libraries throughout the world) with a speed and scale that individual GUI manifest constructors cannot compete with.</p>
<h1 id="example-1-text-collections">Example 1: Text collections</h1>
<p>Over the last month, most of my work has involved implementing the Manifestation Surface data model I described in an <a href="/2016/08/09/surfaces-canvases-and-zones/">earlier post</a>. A central motivation behind this model and my implementation work is that the focus of the SCTA and LombardPress is slightly different than most of the main players in the IIIF community right now. Many IIIF implementers are primarily focused on building IIIF collections of codices that mirror their physical collections.</p>
<p>The SCTA however is an archive that has no physical collections. Rather we collect ideas. Or more specifically Expression of texts and their Manifestations. (See my <a href="/2016/06/12/DTS-modeling-proposal/">earlier post</a> for a description of the modified FRBR model we use at the SCTA.) These Expressions have their own hierarchy that do not correspond directly to the material hierarchy of a codex.</p>
<p>For example, an Expression may have Manifestations in many codices scattered throughout the world. Further, these manifestations often constitute only a part of a codex. Further many Expression Manifestations span several material codices.</p>
<p>A IIIF Manifest that focuses simply on the presentation of a full codex is great for many purposes, particularly for codicological studies. But when a scholar is focused on an Expression of a text and wants to see the Manifestations of that text, simply providing the scholar with a list of codexes in which this Expression or part of this Expression is found leaves a lot of work left to be done. Further, at least in the world in medieval philosophy and theology texts, it is extremely common for a scholar to be an expert on a particular section of a text. For example a scholar may be doing research only on Book I of William Rothwell’s commentary on Lombard’s <em>Sentences</em>. In this case, if a scholar asks for a lists of Manifestations of said commentary and receives a list of any codex containing this commentary, they will end up with a lot of noise. Specifically, they will receive a list of codices that may only contain books 3 or 4 of the commentary. It will take the scholar further work to filter out which codex is relevant and which codex is not relevant. Likewise, the reverse can happen. A scholar may be presented with codices that contains two or three other texts beside the text in question. They must then navigate into the codex and often spend a long time trying to find where the relevant text begins. In both cases, a further strain is placed on the scholar. This strain can be alleviated when we offer them dynamic collections of dynamic manifests that only display the texts or parts of text in which they are interested.</p>
<p>The following screen shots show what our dynamic collections and manifests can offer the scholar.</p>
<p>In the first screen shot, you can see that we can produce a manifest of the codex 686 in the University of Pennsylvania collection (complements of <a href="http://openn.library.upenn.edu/">OPENN</a>).</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/Penn-Rothwell-Manifest.png" alt="Penn-Rothwell-Manifest" /></p>
<p>As you can see, this codex contains approximately 232 pages. But as a researcher working on the text of William of Rothwell, my interest is not in this codex directly, but rather all the Manifestations to Rothwell’s texts. Thus I need first the capability to build dynamic collections that can show all Manifestations of this Text/Expression. Second, I also need the capability to build dynamic manifests that can provide the user with only those pages that include the relevant part of Rothwell’s text.</p>
<p>As one can see in the image below, the Penn text contains considerably fewer pages that correspond to the Rothwell text than are found in the codex as a whole. The rest of the pages correspond to an entirely different text. Nor do I want to be confined to Penn manuscripts only, since this same Expression also has Manifestations in the e-codices collection and Royal Danish Library.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/RothwellText.png" alt="Rothwell-Text" /></p>
<p>Further, it is quite likely that I’m not even interested in the entirety of Rothwell’s commentary. Rather I may only be interested in Book 1 of his commentary. In the screen shot below, we give users the options to create a dynamic collection for only Book 1 of Rothwell’s commentary. What should be noticed here is that this collection no longer includes a manifest from the Royal Danish Library. This is because this particular manuscript only contains book 4 of Rothwell’s commentary. Thus, if we only gave the researcher a collection of entire codices that contain some part of Rothwell’s text, he or she would be immediately misled to think that there are three manifestations of the text that they are interested in rather two. Through dynamic collections like this, we hope to help scholars avoid the tedious labor of finding the material of actual interest and, in turn, help them find exactly what they need and then get to work.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/Rothwell-Book1.png" alt="Rothwell-Book1" /></p>
<h1 id="example-2-custom-query-manifests">Example 2: Custom Query Manifests</h1>
<p>The second example is more experimental but also exciting. Using the SCTA metadata about our texts and their connections to Surfaces and Canvases, we can allow researchers (or the technical staff supporting a particular research group with particular research needs) to create dynamic manifests from custom SPARQL queries.</p>
<p>The admittedly lengthy SPARQL query shown below is one such example.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SELECT ?top_level ?top_level_title ?surface ?surface_title ?isurface ?canvas ?canvas_label ?canvas_width ?canvas_height ?image_height ?image_width ?image_type ?image_format ?image_service ?image_service_profile ?anno ?resource
{
?element &lt;http://scta.info/property/structureType&gt; &lt;http://scta.info/resource/structureElement&gt; .
?element &lt;http://scta.info/property/isInstanceOf&gt; &lt;http://scta.info/resource/hebr11_1&gt; .
?element &lt;http://scta.info/property/isPartOfStructureBlock&gt; ?paragraph .
?paragraph &lt;http://scta.info/property/isPartOfTopLevelExpression&gt; ?top_level .
?top_level &lt;http://purl.org/dc/elements/1.1/title&gt; ?top_level_title .
?paragraph &lt;http://scta.info/property/hasManifestation&gt; ?manifestation .
?manifestation &lt;http://scta.info/property/hasSurface&gt; ?surface .
?surface &lt;http://purl.org/dc/elements/1.1/title&gt; ?surface_title .
?surface &lt;http://scta.info/property/hasISurface&gt; ?isurface .
?surface &lt;http://scta.info/property/order&gt; ?order .
?isurface &lt;http://scta.info/property/hasCanvas&gt; ?canvas .
?canvas &lt;http://www.w3.org/2000/01/rdf-schema#label&gt; ?canvas_label .
?canvas &lt;http://www.w3.org/2003/12/exif/ns#width&gt; ?canvas_width .
?canvas &lt;http://www.w3.org/2003/12/exif/ns#height&gt; ?canvas_height .
?canvas &lt;http://iiif.io/api/presentation/2#hasImageAnnotations&gt; ?bn .
?bn &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#first&gt; ?anno .
?anno &lt;http://www.w3.org/ns/oa#hasBody&gt; ?resource .
?resource &lt;http://www.w3.org/2003/12/exif/ns#height&gt; ?image_height .
?resource &lt;http://www.w3.org/2003/12/exif/ns#width&gt; ?image_width .
?resource &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; ?image_type .
?resource &lt;http://purl.org/dc/elements/1.1/format&gt; ?image_format .
?resource &lt;http://rdfs.org/sioc/services#has_service&gt; ?image_service .
?resource &lt;http://rdfs.org/sioc/services#has_service&gt; ?image_service .
OPTIONAL{
?image_service &lt;http://usefulinc.com/ns/doap#implements&gt; ?image_service_profile .
}
OPTIONAL{
?image_service &lt;http://purl.org/dc/terms/conformsTo&gt; ?image_service_profile .
}
}
ORDER BY ?top_level
</code></pre></div></div>
<p>The screen shot below shows a number of examples using the above query to build dynamic manifests that include content from multiple providers in the same manifest.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/custom-manifests.png" alt="custom-manifests" /></p>
<p>While the SPARQL query is complicated, it allows us to ask the computer a question and to construct a manifest in response, rather than having to use a GUI to manually create such a manifest. A query like the one shown above could be used for all kinds of amazing research and pedagogical purposes. For example, we could ask the data set to construct a manifest for every page that contains a marginal note and then order those results by date. Such a query could be used to study how citation and reference practices changed over time. Again, we could ask the data set to shows us a manifest of every instance of the name Augustine, and then sort those pages by date, regions, and scribal hand, so that we could see how spellings and abbreviations of Augustine changed over time.</p>
<p>With strong data models and open data, there seems to be no limit to the kind of questions we can ask and the kinds of manifests we can build.</p>
<p>Questions and comments welcome :)</p>IntroductionCreating an aggregated dataset from distributed sources - a report from the 2016 Basel meeting.2016-08-25T00:00:00+00:002016-08-25T00:00:00+00:00jeffreycwitt.com/2016/08/25/basel-workshop-report<p>The following is a report and summary of the main proposal discussed at the “Linked Data and the Medieval Scholastic Tradition,” workshop held at the University of Basel in August 17-19, 2016.</p>
<h1 id="the-problem-domain">The Problem Domain</h1>
<p>The Basel workshop was attended by representatives of several separate research projects based in Europe dealing with some aspect of the medieval scholastic tradition. These projects ranged from Sentences commentaries to Aristotelian commentaries to logical texts and logical commentaries. Each group aims to create data and to display that data both in print and in various online formats.</p>
<p>The central problem we observed is that, because at present each group works fairly independently, each team has developed its own ‘silo’ of of 1) data input and creation, 2) data storage, and 3) data display.</p>
<p>There are couple of notable problems with this approach.</p>
<p>First, this requires each group to construct a technology stack, that despite various differences, conforms to a fairly standard pattern. This results in several redundant technology stacks that are expensive and difficult to maintain. For example, each group is developing some kind of a web form or data input interface. Most groups are then storing this data in a traditional relational mysql database. Finally, each group is then developing a web display that queries this database.</p>
<p>Second, because each group is developing this data pipeline on a single isolated server, their data is effectively isolated from the data of other related research groups. This causes two further problems. This isolation means that each group is in many cases producing highly redundant data. If a group needs a prosopographical data, they have no choice but to create their own prosopography, even if another group has already created a similar prosopography which they store on their own isolated server. Secondly, because each research group is dealing with a corner of a highly connected dataset, even when they are not producing redundant data, they are usually missing out on the opportunity to create and discover connections between their data and the data created by other groups.</p>
<p>Third and finally, because data creation interfaces, data storage, and data display interfaces are so tightly coupled, we are missing out on the opportunity to create re-usable interfaces and modular software. In other words, at the present, the ability to make a great display application, requires someone to also set up her own storage solution and to populate that database with her own data instead of simply being able to make a great display application using the data already being created by other groups.</p>
<p>We have summarized the basic problem of data siloing in the following graphic.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/2016-08-25-basel-workshop-report/data-silo-example.png" alt="data-silo-example" /></p>
<h1 id="a-proposal">A Proposal</h1>
<p>The proposal presented at the Basel workshop aimed primarily at de-coupling the distinct tasks of data-creation, data-storage, and data-display, while still allowing individual projects complete autonomy with their own data.</p>
<p>The central proposal is to create RDF IDs for every common resource within our common problem domain, and then to allow independent research groups to publish the data they have about this common resource according to a common data standard, such as a customized TEI schema or Open Annotation.</p>
<p>Groups that want their information pooled into a common dataset simply need to register these “data feeds” with a common registry. Using this registry, we can write a “build-script” that crawls all known resources and constructs a RDF dataset. This build-script would harvests key pieces of information about common resources as well as links to individual project datasets and other global data sources such as DBPedia and VIAF. Further, beyond merely aggregating known information, the build-script can also infer new connections that no individual group knows in isolation, but can be deduced from two different pieces of information known by two previously isolated research groups. For example, if one groups knows that author X cites author Y, and another group knows that author Z cites author Y, the central dataset alone will know that author Y is cited both by authors X and Z. This third assertion is something that can only be inferred when these two pieces of information, originally isolated from one another are brought together.</p>
<p>This RDF meta data can then function as switch board for all display applications. Display applications can query directly to the public SPARQL endpoint for information about the location of encoded transcriptions or prosopographical information. In this way, we create a common pool of information from the work of each independent research group. Likewise, each display application is no longer limited to the information stored in the local data storage, but has access to the pool of information known by the entire collective.</p>
<p>The below graphic illustrates how the data silos seen above have been transformed into a web of criss-crossing connections.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/2016-08-25-basel-workshop-report/united-data-set-example.png" alt="united-data-set-example" /></p>
<h1 id="demonstration">Demonstration</h1>
<p>During the course of the Basel workshop we constructed a couple of primitive examples to illustrate how this kind of distributed network of resources might work.</p>
<p>The RCS database has been collecting its own set of prosopographical data for authors of Sentences commentaries from which other related scholastic research projects could benefit. But up until now, this information has only been available in the RCS viewer which has localhost access to the RCS datastore.</p>
<p>In the proposed setup, the RCS project would be asked to publish a data feed as well as any other html data presentations it desired to present. These data feeds should be constructed according to a common standard such as the emerging Open Annotation standards. But for the present it was enough for RCS to simply publish its information in a simple XML feed. See below:</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/2016-08-25-basel-workshop-report/feed.png" alt="united-data-set-example" /></p>
<p>Once made available, we simply needed to register the address of this feed with SCTA build script, and all of this information becomes available to every other project via the SCTA RDF triple store. For example, here is the LombardPress display page for the author Herveus Natalis.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/2016-08-25-basel-workshop-report/lbp-name-view.png" alt="united-data-set-example" /></p>
<p>On the right you can see two different data feeds, one from the RCS database and one from Dbpedia. The LombardPress client does not have its own database, instead it queries the public SCTA RDF dataset for the information it needs. There it can find aggregated information about an individual author (such as the life events original recorded by the RCS team) or links to information about this author in other datasets (such as Dbpedia and the Dbpedia abstract).</p>
<p>If every research team begins to prioritize data publication as much as html or print publication, this kind of data sharing and re-use can become a reality on a large scale.</p>
<p>What’s more, each research group that contributes information (via an information feed) to the aggregated SCTA dataset can also get new information that enhances its own particular website or presentational output. For example, the RCS dataset is focused on name and manuscript identification. But another research group has focused on procuring lists of questions contained within Sentences commentaries. For example view, the question list below seen in the LombardPress viewer.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/2016-08-25-basel-workshop-report/lbp-question-list.png" alt="united-data-set-example" /></p>
<p>Once the SCTA RDF dataset has recorded the association between the SCTA RDF ID and the ids used in other datasets, the RCS data set can send a request for all of these question lists and display them in its own viewer without ever having to recreate this data in its own data store. In the example below, you can see the RCS viewer re-using this same information in its own display.</p>
<p><img src="https://s3.amazonaws.com/lum-faculty-jcwitt-public/2016-08-25-basel-workshop-report/QQList.png" alt="united-data-set-example" /></p>
<p>At this stage, this work is a proposal and work-in-progress. Therefore, we welcome and openly solicit comments and feedback. Do you have a related data set? We’d love to hear about it and think with your team about we can create an ever deeper connected distributed dataset.</p>The following is a report and summary of the main proposal discussed at the “Linked Data and the Medieval Scholastic Tradition,” workshop held at the University of Basel in August 17-19, 2016.