November 1, 2009

Maintenance is the hardest and most expensive portion of an Information Technology project. When computers first came on the scene the rule of thumb was that the development of a program/system was about 1/10th the total cost. Maintenance was the expensive part of computer software. Bill Gates changed that by making maintenance and upgrades a profit center instead of a cost center. The cost and effort is still there, but now the consumer pays for it, not the manufacturer.

My project is only useful if it is current. If the site becomes “The Place To Go” for the latest and greatest knowledge of metadata standards for historians, it has to have the latest information. One of the sites we looked at this week was “ The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials” I was pleased to see they has a specific section on “Metadata.” Actually, I was scared because this promised to be what I thought my project was supposed to be. Anyways, I went to that site and went down some of their links. What did I find? ONE OF THEIR LINKS IS BROKEN. (The one to MPEG-7) Keeping web sites up-to-date is a never ending process.

I figure I can solve my maintenance problem by making the project a resource provided by the CHNM and then let them worry about it. (Sounds like a good idea to me 😉 )

Actually, that’s not far off. I would hope to have my grant be sponsored by GMU and work with the standards group of the W3C, in a cooperative endeavor. They are the ones working with the Semantic Web and have a users group that is developing just these types of standards. If this site could become part of the W3C overall effort, the community would have a vested interest in perpetuating and keeping the data current.

The second stage of the project – the development and melding of standards is a long term project that will only succeed as part of a consortium of W3C, the NINCH and all the other standards and governing bodies that are out there. These bodies are more than willing to participate in projects, if they see concrete results and a benefit to themselves. Therefore, an important component of the grant application, will have to be the marketing of the idea to these other organizations and the benefits they will receive by participating.

Like usual we have an abundance of material and a scarcity of time to process it all. The only way New Media changes the equation is now ABUNDANCE is in capital letters and time is in subscript. Much of the readings are right down my alley. They explain the need for tagging of data and what it can be used for. Of course the readings also ask the question “Do we really need to expend all of this effort?”

There is something to be said for the old stubby pencil solution. I currently work on a project where the customer wants a process automated. So far they have spent 7 million dollars on the development of a web based automated system. I showed them that they could get the same functionality by hiring 4 people who updated by hand. The automated system will need maintenance of about $250,000/year and will probably need to be replaced in 10 years. Even if the total cost was $100,000 per year per person, it would be cheaper doing it manually than automating it. Just because something CAN be done, does not mean it SHOULD be done.

Now that I got that off my chest, how does that relate to tagging for media on the web. I guess the biggest thing I can say is I don’t know what the long term benefits would be. If you would have asked someone 20 years ago how putting a Universal Product Code (UPC) on every item would be beneficial they would not have dreamed of an iPhone app which allows you to find the cheapest place to purchase something. Not only that, but I’m sure the Department of Defense did not envision GPS being used to hook into the iPhone app to know where you were when you asked the question so the app could tell you the closest place where the item is the cheapest. However, if products weren’t tagged with UPC codes and GPS data was not available through web services, this functionality would not work.

I have been thinking of Terese’s project of providing resources for teachers. If every teacher who used the web, and every software program that allowed teachers to create lesson plans, syllabi, and web sites, included the tagging the suggested age group or grade their work was aimed at, Terese could easily create a service that gathered that information and presented it to her users. You can’t do a Google search on”9-12” and hope you get information about class room work aimed at 9 to 12 year olds. How do you know “9-12” doesn’t mean grades 9 through 12? But Terese can’t tag her information with the keyword and someone else tag with the keyword . The computer needs some standards so programs can find and provide the right data. Google had an interface that allowed CHNM to provide this type of content, but once they removed their API that function no longer works. If everyone played well together the information would still be available.

Krista had a post about the need for curation, because, as the link she posted, said, ”Presenting endless volumes of content is no longer the defining characteristic of a good digital publisher. Instead, the core competency must shift to presenting the most relevant information.” Tagging allows a site to provide this context. I recommend you read her post and her link.

So getting back to the theme for this post. With tagging, we can allow computers to assist in figuring out the context of information, which the authors themselves provide. Without tagging we are forced to wade through hundreds and thousand’s of links returned by Google using their algorithms, over which we have no influence. If we want to help future scholars use our work, we need to provide the context for it.

October 26, 2009

I came across this article about John Dean fighting to stop a recording of him being posted on nixontapes.org. Stifling the publication of primary evidence by threatening with copyright infringement doesn’t seem right. I can not find any other references to this fight, but those particular recordings are not on the site. It easier if you are an ancient scholar. I don’t think Julius Ceaser will be coming back to sue over copyright infringement or libel.

October 18, 2009

ABSTRACT: Develop an interactive, collaborative location to gather information about the many different XML, DTD, and data sharing standards that exist within the Digital Humanities community. This project is the first phase of a much larger effort. This phase is the collection of known standards and soliciting feedback from the digital humanities community. This collection will also become an area where developers can learn how to make their collections and sites friendly to automated discovery and use.

Depending upon feedback this overall effort can then branch into a concerted effort to develop semantic web standards for History or Digital Humanities, or on a smaller scale develop middle layer tools that will allow one domain to interoperate with another.

WHY IS THIS PROJECT NEEDED: The National Endowment for the Humanities funds the development of innovation in the digital humanities. Many of the projects state that they want to enhance the ability of researchers and scholars to interchange information. In many cases they create a new set of meta-data and definitions that enable workers in their particular field to share data. In the awards for last year the Alexandria Archive Institute was granted and award to develop a way to share data between archeological sites. They developed a set of metadata keywords and content. The University of Indiana developed an ontology for philosophical thought. Duke University developed an extension to TEI which includes a set of rules for encoding Whitman poems. The problem with this approach is that there are so many standards they cannot work together in a collaborative, automated way. The term “Place” for an Archeologist has been superseded by the term “site.” However, a site exists at a “location,” and does not mean a web site where more information is available. However, in philosophical thought “place” is meant to be the city where something happened, and not the country where an archeological “site” was discovered. A human can easily see what is meant in each case, but a computer needs to know that site means one thing in one collection of data and something else in another collection of data. An effort to overcome these problems is the Semantic Web effort. The main thrust of this effort is in the government and business community, although the Smithsonian Institute is currently working on promoting the use of the International Committee for Documentation (CIDOC) of the International Council of Museums (ICOM) Conceptual Reference Model (CRM) which is an ontology intended to facilitate the integration, mediation and interchange of heterogeneous cultural heritage information.

Someone who wanted to create a new web site and make their research, artifacts, and analysis available as services to other digital humanists would not know where to start. If the site only contained documents the TEI would be a good place to start, but the web is so much more than the digitization of documents. This project is the first step in an effort to allow these many different standards to interoperate.

FEATURES/FUNCTIONALITY: The initial effort would be in the development of a presence which would provide links to all known XML and DTD standards for the Digital Humanities with emphasis on History. These links would include a short description of the information, status, domain space, and functionality of the site. Each entry would also provide a place for users to comment on their individual experience with the technology defined in the link. This link page would also include a method for users to nominate other web sites to be included in the list, along with the reason they feel the site should be included. There would be another list of technology tools that have been created to assist in the creation of XML which follow the specific schemas and rules of the domain. Again there would be a place where users could nominate other sites. Another section would be an overview of the project and proposed course of action to continue the development of this effort. This section would be a wiki which would have a main page of project plan and rationale.

In a completely separate section, focused on an entirely different audience, would be a set of easy “how-tos” or links to easy to understand information on the use of machine readable information on web pages. This would include links to the API primer on the ProfHacker web site, etc. Sort of a one stop shop for how to make your web site/museum collection/document repository discoverable and useable by automated programs. It would also bring out the advantages and disadvantages of using different tools to make sites. This area would also have an area for users to nominate sites, locations, books for others to use. In this area I foresee a grading system being employed shoing users what was most beneficial to other users.

AUDIENCE: There are two main audiences for this site. The first one is those people who are interested in promoting the free flow of information between sites that can be accomplished through the use and power of automated programs. These users include those organizations which have already spent considerable effort to develop their reference models and would like to have a different place to advertise their existence. For instance the CIDOC CRM has been over a decade in development. The George Mason Center for History and New Media (CHNM) provides a well known and respected location for people to come to participate in the development of a more universal definition.

The second audience is for people who want to create a web site and have their work be accessed and used by many others. Stand alone web sites with static data are limited in functionality. Researchers are always looking for ways to make their research more available and used and if they can easily enhance their web sites and data collections they would do so. This site would show them how to do that.

TECHNOLOGIES TO BE USED: This site would take advantage of basic html, wiki, and would of course make the information contained, accessible through standard APIs.

USER-CONTRIBUTED/INTERACTIVE ELEMENTS: The basic functionality of this site is the user interaction. Users input will be in all sections of the site. In some cases it will be the nomination of items, in others it will be interactive feedback on what was useful and what wasn’t. In order to provide some level of control over technology zealots all input would be moderated before being posted to the site.

October 14, 2009

One thing leads to another and pretty soon I have all my project ideas shot down and have spent another evening/night surfing the web.

Today he posted that he was going to attend a workshop on API’s in digital humanities. Since my project ideas were set around API’s for maps/timelines etc. I thought I would go see what was happening at that conference. One thing led to another and it is now 2 hours later. For people who really want to know about API’s a short little series of posts that explain them and why they are important is at Part 1, Part 2,Part 3.

As I was following some of the API sites mentioned I came across one site that is very good in providing historical information, using collaboration with people to gather information, using Google Map API’s, and using Twitter. The site is about the diary of Samuel Pepys. It uses Pepys diary from the 17th century and updates daily what old Pepys is doing that day. For instance today you could read the diary entry from Sunday Oct 14, 1666 as a blog entry. People can comment/contribute more information about the diary entry. You can also sign up to get Tweets from Sammy boy. If you go into the Encyclopedia section you can click on a map which uses the Google Map API to show locations that Pepys talks about, information about his houses, where he worked etc. A very good site using many exciting features of the web and web 2.0.

This site uses API’s from other web providers to give an enhanced experience of reading Pepys. But it is missing something – Providing API’s for other sites.If this site provided an API which would allow others to access data based on things like people mentioned, location, date or any of the other things that are given as entries into the encyclopedia think what could be done. This API should be able to collect data from the diary, from the collaboration posts, from an integration with the Google API and current google maps. If this site provided an API with this type of information someone else could gather data and enhance the Flikr interface to not only show photos put in by users of this site, but using the Flikr API, actively mine the data between the two sites and show current photos of places mentioned, historical photos etc. With a data discovery engine as new sites made London data discoverable the integrated site could continually provide new views of data and link them back to Pepys in the 17th century.

It seems only fair that if you use API’s to enhance your site you should provide API’s of your data so other web sites/applications can use your data. This site has many contributors whose contributions are not easily found by other researchers because there is no API.

Well now I have to stop daydreaming about all the neat ways web sites can capture information from each other and get back to work on developing a project that hasn’t been done before and writing my historiography for my Britian in the 20th Century class. I am now saving all my work to my hard drive, sending it to my google group account, and will make a CD of my work each week until the end of the semester. I WILL NOT LOSE MY WORK AGAIN!!!!

However, time lost because of curiosity about Twitter posts is gone forever.

October 13, 2009

The continuing saga of me vs Mac. Last month my disk drive went south. I took it into the shop and they reformatted and said everything was ok. I came back and restored everything from my TimeCapsule. It worked like a charm. Everything was there – did not miss a beat. On Saturday I sat down to do some work and the lovely question mark was on my screen. No disk drive again. Took the computer back in and this time they replaced the whole disk drive. I thought, this is no problem, I’ll just restore my disk drive from my TimeCapsule back up and I’ll be good as new. WRONG, WRONG, WRONG.

For all of you Mac users that use TimeCapsule. When you restore from a backup it does not restore your backup settings. As a matter of fact it turns TimeCapsule off. I did not check this setting. After 30 years in this business you would think I would learn my lesson. You would be wrong.

So my last backup was from September 19. Everything I have done since then is gone. I am such a happy camper. However, I have verified my backup is now working. I even restored from a backup I did yesterday to make sure the restore works correctly.

I think for my project I will make a device that comes out of your monitor and slaps you up side your head and tells you “REMEMBER YOUR BACKUPS” It is probably the most useful tool for New Media I can think of right now.

October 5, 2009

I don’t know if you all had a chance to review the link Professor Cohen gave us on twitter concerning how people disseminate their scholarly research but the Communicating Knowledge Report had some interesting surveys. The surveys were taken from scholars and researchers in the UK on how and why they publish their research. Peer reviewed journals are very important to 94% of the respondents whereas Internet blogs and forums are either not important (70%) or not applicable (18%). When asked why they considered peered reviewed journals important 74% said they needed it for career advancement. (18) But hidden in the numbers is the ironic statistic that “and more than one third of respondents say that open access repositories are important to their research.”(17)

So most of these researchers consider web self publishing as not important, but value “open access.” I guess it all goes back to the “trust and rigor” implied in being published versus putting on the web. Some of the initiatives talked about in the Bell article from the American Historical Association of providing peer reviewed web material may change some people’s minds. However, that article was written in 2005. Does anyone know the state of peer reviewed web publishing site?

I also found it interesting that people in the humanities put so much more emphasis on publishing a chapter in a book, than the other disciplines.

Also, for a truly misleading bar graph I highly recommend Figure 2 on page 17 which shows the importance of peered reviewed journals for the different fields. All other bar graphs in the paper have a scale of 0% to 100%. Figure 2 starts at 80% and goes to 100%. The impression of importance is very skewed.

My reason for liking digital books doesn’t touch the problems discuessed in the article for this week, but it is an aspect of digital books that wasn’t covered. My wife works with people with disabilities. Digital books allow them to increase the size of the font (Which is also great for us older folks) or listen to the book. That computer generated voice may be annoying and grating to you, but it allows blind people to have access to material they could not get to before. Digital books allow people with other disiabilities to manipulate the book in ways that make no sense to me, but makes perfect sense to them.

There are problems with digital books – a business model, the lack of review processes, the stamp of respectability a book gets just because it got through a publisher to be published, but these books allow people to have access to material they otherwise would not get to read.

September 25, 2009

Just a silly question. When I reference an article that I pull off of JSTOR I just reference the magazine and author, as if I went and actually got the real publication. Some of the web sites we have been looking at have scanned in primary sources, like turn of the century newspapers. If I want to reference an article in one of those newspapers, do I reference it like I actually had the newspaper? (This is what I would do if I went to a library and looked at in micro-fiche.)

The reason I ask this is Carl showed how easy it is to manipulate scanned images. I don’t want to be using a fake but I can’t see the paper and determine it’s authenticity. Do I cite the paper and the web site that provided the primary document?

New media makes primary documents more accessible – does it also make them more suspect?

I think the readings about collaboration have a direct relevance to what we have been talking about in our blogs. Mark Kornbluh concludes his presentation with

“It is essential, however, to understand that librarians, archivists, curators, and scholars are as essential to the development of digital humanities as computer scientists and programmers. Digital humanities content requires curation. If we do not get the metadata right, all we have is junk. And if we do not figure out how to preserve digital objects, than scholarship will be fleeting.”

Isn’t that what we have been talking about with Carl’s digitization project and Lynn’s digitizing of the Arlington slave register. If we think about the digital projects we looked at in class, I can see the one concerning medieval canon law being around for a long time. The data is tagged following accepted standards and is presented in a Web 2.0 environment. Another content developer could easily take feeds from that site and create another site with other value added. Compare that with the Cleveland Corridor project, which is very ephemeral – in 20 years that train/bus line won’t exist and the ability to access the Flash will be gone. Since the data is accessed from Flash other sites will not be able to use the data or access it. This limits the amount of collaboration that can be achieved.

I was very impressed with Zayna’s post about the wikipedia articles. In fact I was so impressed I thought I would take our web design discussions to heart and just steal her approach and design. Using Zayna’s successful way to get a starting point I went to look at featured articles and saw a recent featured article was the Ross Sea Party. So that is where I started my review.

Ross Sea Party –

The discussion page only covered trivial, i.e. non-historical, issues. One person thought there was too much detail and since nothing was done in 8 months then went in and fixed the article they way they thought it should be. The other topic of discussion was the use of English grammar versus American grammar. I was intrigued by the amount of vandalism this site experienced after it was named a “featured site.” People put up all sorts of things about Sarah Palin and other foolishness. Perhaps becoming a featured site is not such a good thing for a Wikipedia article. It gets you a lot of traffic, but that traffic includes a lot of crazy people. This problem was covered in great detail in Roy Rosenzweig’s article.

Battle of Grand Port

This article was found using the Featured content off of the main page. This one had very little discussion (None), but had a lot of history as people changed dates of ship launchings, and other small details. However, there was very little evidence of vandalism of the site. If you’re site is not a top featured site I guess the vandalism is not nearly as important.

Second Boer War

I went to this site because the Boer War was the focus of my web site for Clio2. One thing I find amusing is the policy, discussed in Rosenzweig’s article, that all entries have to be neutral. Neutrality is almost impossible in discussing history, and is demonstrated quite convincingly in this article’s history and discussions. In the very beginning there was a controversy about what it should be named. To most South Africans today it should be called the Anglo-Boer war. To the majority black population it was basically between two groups of foreigners, not native Africans. To some people naming the war one way or another is not only, not neutral, they consider it racist.

An interesting observation is the timing of changes to the article. It seems as if the article is left alone for months, and then all of a sudden there is a flurry of activity with more than one individual making changes, defending the changes and going back and forth. In some ways, reading the different discussions is more illuminating than reading the article itself.

Can Wikipedia be used by historians and should historians contribute to the project? Rosenzweig gives arguments on both sides, and takes great exception to the “no original research,” and “neutrality” policy. I think we have to look at Wikipedia as a tool that can give some insights but needs to be used with a healthy dose of skepticism. After all Wikipedia only show the conventional wisdom, and using that the Wright Brothers would have believed flight was impossible.