Category Archives: Operating systems

I have learned so much this semester, it’s hard to know where to begin. But I guess I need to begin with metadata and taxonomy. Early in the semester I posted that “I read some articles this week that made me realize that much of what added value librarians provide to collections is in the form of metadata. I guess I always thought of librarians mainly as reference librarians or subject specialists — not as experts in classifying and indexing information.” But now I know (a little) about the value of good metadata, and taxonomies, and I’ve learned about metadata standards such as Dublin Core, that help to standardize metadata usage over the entire web. I’ve learned about the semantic web, and the idea of linking data and building ontologies that describe the relations between concepts. I’ve learned about the contrasting benefits of controlled vocabularies versus “folksonomy” (i.e., tagging). And I’ve learned a little about harvesting metadata, using PKPHarvester to harvest metadata from several databases and data providers.

I’ve also learned more about the Open Access movement, and open access initiatives; and about the issues of “freeing” information from behind paywalls. The main obstacle to this is that knowledge (and its associated data) is a currency that has value, and that making it freely available will necessitate basic structural changes in academia and in academic publishing.

Those structural changes include major changes in the role of the library and librarians in the production and preservation of knowledge. These changes present sigificant challenges to libraries in managing, curating and preserving digital materials and data. Librarians are increasingly expected to have the technical skills to design and select Content Management Systems for their libraries, to design, create, and maintain digital collections and digital repositories, and to train other librarians to do the same, often with limited technical staff and limited budgets. Open Source software is a boon to the small library or non-profit or museum that needs these types of functionality; but again that requires technical knowledge and skill on the part of the librarian to install, configure, and maintain operating systems and small in-house servers.

In order to gain those technical skills, I learned how to create several virtual machines with linux stacks of various sorts, and to install and configure four different content management/digital repository software systems (Drupal, DSpace, Eprints, and Omeka). I created a sample digital collection in each one, and used the experience to compare each system’s strengths and weaknesses, and then to decide which sorts of digital collections (and environments) each system is best suited for. I then chose which system to use to host my digital collection, set up the system, entered the records and created the metadata, and then wrote a paper on the process, which will contribute toward my digital portfolio. In my case, I decided upon Drupal as the system I want to use for my digital collection. Drupal has a steep learning curve, and I really learned a lot about Drupal in a short time through the process of designing my collection, downloading and installing extra modules to provide the functions I needed, and troubleshooting the installation. I’m proud of how well my prototype digital collection works, but I already have plans to keep working on the prototype to get it working even better, and to extend its functions, and to redesign certain features. I’m turning into a Drupal geek already.

Librarians are also expected to conduct outreach to their various communities, in order to make the services of the library more accessible and useful. This seems to be especially needful for the humanities scholar community. We read many articles about the obstacles that keep humanities scholars from embracing digital initiatives, and from using digital resources (and those articles confirmed my own observations). We learned about how humanities scholarship, data, and workflows are vastly different than those of the scientific community; identified some of the obstacles that prevent humanities scholars from using (or producing) digital resources, including digital repositories; and read about several digital humanities initiatives, both in the U.S. and in Europe.

I think one of the most enjoyable aspects of the course for me was just the chance to see so many different examples of digital collections; to interact with my fellow students over their collections and interests; and to explore what is already being done, what is possible and useful. A find that was very helpful to me was the UK Reading Experience Database (RED). This is a database that contains much of the same sorts of data that I wanted to collect in my own digital collection, so it gave me some assurance that I was on the right track with my ideas.

Finally, I have included a slideshow of some screen shots from my project.

I was able to install and configure DSpace with no problems; we followed these steps:

We set up a new virtual machine, and built, not a LAMP stack, but a LTPJ stack: Linux-Tomcat-PostgreSQL-Java. Once those programs were installed, we needed to create all the structure for DSpace: we used sudo to create linux directories and users for DSpace, set their permissions, and then set up a related user and space in PostgreSQL. Then we set up a DSpace database and directories in Tomcat.

Once those structures were ready, then we downloaded the DSpace source code and set up a configuration file, then used maven to actually “build” the installation according to the configuration we specified. I’m guessing that means maven compiled all the code using the modules and settings we specified in the configuration files. The we used ant to do a “fresh install” – I guess it installed the compiled binary code that maven created.

The we had to create a DSpace administrator/user at the linux command line and edit some configuration files to give that user privileges; then we rebooted the system and were then able to access DSpace from the browser and set up our collection.

look like they would be followable; although the comments on those instructions show there is some room for error in interpretation. The details of the steps are different than what Bruce gave us, but they seem to follow the same general outline. I’m not sure I could follow them without technical support. Bruce’s step-by-step commands are probably best if you are going to try to do this without support; but the screenshots in the second link are probably helpful; and I like the clear delineation of steps in the first link.

Discuss either a) which module you decided to try to try from assignment 2 and how it enhances your collection; include if you like any problems or tips related to installation; or
b) now that you have some experience, how you feel overall about the suitability of Drupal for your collection.

It is clear that Drupal, in the hands of a trained Drupal programmer, would be a powerful and customized tool that could be used to manage my digital collection; although it seems that it is not really designed for the type of content I would like to include: many large searchable text files (in pdf or other formats, especially including files with specialized markup). When I say that it is not really designed for it, I mean that the native content types don’t lend themselves to it (although I have not experimented with the “book” type). Of course there are many modules that add that type of functionality; I saw several that seemed designed to make RDF-type relations between nodes; but I was too intimidated by all the dependencies to try to install such modules, and the help material was too highly technical for a casual Drupal user to understand.

I did find an apparently simple module that added some necessary functionality to my site, i.e., the ability to search attached text files. The module is called, appropriately, search-files.Here is a screenshot of the kind of output the module produces:

Because this is a crucial function for my collection, I decided to install it, even though it requires several “helper applications” in Linux.

Helper Applications

In order to extract text, this module calls ‘helper apps’ such as cat and pdftotext. Drupal administrators can configure any helpers they like. Helper apps need to be installed on the server and need to be setup to print to stdout.

I assumed that my Linux installation might already have these applications available, although I could enable them separately if need be. So I downloaded and installed search_files-6.x-1.6.

I had no difficulty installing it or configuring it in Drupal. But it can’t search the pdf files I have attached, so I’m assuming I also need to install the helper applications in Linux.

UPDATE: as it turns out, this module worked in Drupal 5 but is broken in Drupal 6. Evidently it works in Drupal 7, so hopefully when I update my system I can get this working. Else I will need to find a different CMS, because this search functionality is crucial.

This (SIRLS 672) was my first course in the DigIn certificate program, so I did not quite know what to expect of myself or of the program, especially because I do not have professional experience in a library or with collections, except as a frequent user. Although I have a technical background, I had not worked as a programmer or database administrator in years. I discovered that although the course was more technical than I had anticipated, it was still within my capabilities. The examples of digital collections and the library-specific assignments were enlightening as to the scope of the kinds of projects involved and the kinds of skills needed to manage digital collections in a library or archival environment.

I have learned new technical concepts and skills in this course that will form the basis of a new and expanded conceptualization of digital collections. For example, I knew little about the inner workings of the internet before taking this class; now I understand the various data protocols and standards used, and the procedures used to get data from one node to another. I also did not know anything about the component parts underlying a digital collection (except a little about databases): now I understand, using the LAMP (Linux, Apache, MySQL, PHP) stack as an example, the basic relations between the operating system, the web server, the database management system, and the scripting language underlying a digital collection. I already knew some HTML but I learned a little more, and I learned about XML as a way of describing and structuring data, which was completely new to me. I had professional experience with the concepts underlying relational databases and database design, but it was a good review; and I was introduced to the specifics of MySQL and the scripting language PHP.

In addition to the technical aspects of the course, I learned about the controversies and issues surrounding digital information, such as the argument for open-source software, and the advantages/disadvantages of various system interfaces, such as the CLI (command line) versus a GUI (graphical user interface). I especially appreciated the opportunity to try tasks using a variety of methods and interfaces so that I could come to my own conclusions about my preferences. I also learned skills and methods related to project management, the importance of a technology plan, and a little about how technology projects are funded, especially through the e-rate program. Through the examples and the discussions, I learned about how all of these issues affect libraries, and the issues surrounding the creation and maintenance of digital collections in a library or archival setting. I have also learned about some of the initiatives in the digital humanities.

I especially have a new appreciation for the technical aspects underlying digital collections, and the prodigious amount of work that goes into designing, creating, and maintaining such collections. This knowledge gives a counterweight to the arguments in favor of free access for digital collections: while I agree that access should be as free as possible, I realize that digital collections do not come into being without a large price tag in terms of people-hours and expertise. I think that librarians will have a increasingly large role to play in creating and maintaining these collections, especially in this era of financial constraints.

As I write this I am impressed with how much I have learned, yet I feel a little trepidation because I’m afraid I may have learned just enough to be dangerous. I realize how far I am from being really proficient in any of these areas; but since the course description states that “this is not a course in network administration, web development or programming!” I feel a little better. I feel that I have achieved the stated goal, which is to learn “about server technology supporting digital collections in libraries, archives, cultural heritage organizations and other institutions.” I think I have indeed “gain[ed] confidence in [my] ability to learn new technologies as they are developed” and I have come to “understand basic information management architecture.” I hope and expect that this course will prove to be a firm foundation to build upon as I pursue my future in the digital humanities.

Discuss briefly how you went about learning HTML and which resources you used. Comment briefly on how helpful they were (or not), and indicate any intermediate or advanced modules or sections you reviewed. Provide a brief status report on installation of your practice system, if you have elected to try to bring one up.

LEARNING HTML: I used the recommended tutorial http://www.w3schools.com/html/default.asp
I had previous exposure to HTML, but it was long ago, so I did all of the HTML basic sections. I found the tutorial very useful. I especially like how it allows you to test the HTML you are writing and see the result immediately in a split screen. I intend to go on and do the more advanced sections. I see that this site also has a CSS tutorial. So I guess that’s next!

PRACTICE SYSTEM: I elected to install another virtual machine on my computer rather than to use another physical machine. I followed the standard install instructions, and everything worked as described. I was then successful in assigning static IP addresses to each of my two VMs, and ping each of them as well as access the Apache web server on each of them. I also tried to access webmin, and could not for the new VM, so I had to go back to the Unit 4 instructions and install webmin on the new VM. That worked fine, and then everything tested OK. I then edited the HOSTS file, and now each VM has a name in the file. I was able to access each VM by using the host name in the browser address bar.

This week, you might reflect on the variety of presentations for this material – the lecture, the links (especially to Wikipedia entries), videos and podcasts. Which work best (or worst) for you, and why? How do these different kinds of presentations complement your own learning style (refer to any readings you did this week on learning styles)?

I read the Felder and Solomon article “Learning Styles and Strategies” that Bruce linked for us. I think my preferred learning modes are that I am a reflective, intuitive, verbal and sequential learner for the most part.

Reflective: I like to have time to process what I’m learning, and I need to review/summarize periodically for new material to stick. I have learned that I reflect best in writing, so these blog/discussion entries are helpful. I will probably use the blog more now to help summarize and reflect upon the week’s lessons even if that is not assigned.

Intuitive: As an intuitor, I definitely dislike rote memorization, and need to make connections/abstractions to understand material. I find that analogies and metaphors help me to do that, and I am always faster at grasping concepts than details.

Verbal: I tend to be more verbal than visual – I like to learn by reading, although sometimes visuals help me get the overall concept if it is very detailed. In my own studies I need to use visual aides if I am to remember details like dates (timelines, etc); so I have learned to use visuals to supplement my preferred verbal mode. I also tend to be an auditory learner, so I read aloud at times or use a text to speech mode to help me remember details. I definitely like the lectures the best, and find videos tedious mostly (although I do use them to reinforce material I have read first).

Sequential/Global?: I am able to think globally and often do, especially since I like to understand relationships and concepts rather than details. When I read texts with hyperlinks, I do tend to read (or skim) the entire document over first and then go back and click on links. I’m not sure if that is global or sequential. However, a disorganized lecture bugs me, and I tend to only click through to one level since I lose the overall thread of the discussion if I have to click through too many levels before I get back to the main lecture.

When I learn, I like to analyze, then synthesize. That means I like to look at the big picture, take it down to its component parts, analyze the relationships between them, then put things back together in new ways. So I tend to read first for the big picture, then look at details, then read all the detailed info, then put it back together by actually doing what I’ve read about. That is why I think I don’t have trouble with the assignments — by the time I get around to doing them I have already conceptualized them in my head.

We were instructed to add a user to the system in three different ways, so we could compare the methods to each other. Here is a summary of what each method entailed in my experience:

Assignment 2: Adding a user at the command line with adduser

This went as described. The plethora of different switches available for the commands was a little intimidating; I needed to go back and review them and to have the .pdf files open in front of me to make sure I understood what I was doing. And of course I had to use sudo in order to make these changes because I needed superuser powers!

The order of commands made sense: I had to use groupadd to create a new group before I could add a user, since each user needs to be assigned to a group when created, and the group needs to exist first. I named the group the same as the new user, which I nameduserme.

Then I used useraddto add the new user. The switch –g defines the user’s initial login group; this was the group I created previously with groupadd. The switch –G defines additional groups that the user belongs to; the command line we were instructed to use indicated that my new user also belonged to a group named users. The switch –m makes the user’s home directory if it doesn’t already exist.

Then I used the passwd command to give the new user a password.

As the assignment suggested, after I logged out and then logged in again as the new user (userme) , I found I could not use the sudo command because my new user was not listed in the group sudoers, which meant I did not have administrator privileges.

So I logged out and back in as mebell. I used the grep command to do a string search of the /var/log/auth.log file for the string “userme” and I found an entry in the auth.log file that showed I tried to execute a command using sudo privileges from an unauthorized account.

I found using the command line easy and powerful, since I could tell it exactly what I wanted. But I had to know what I wanted, and I had to understand all the switches in order to use it properly, and had to understand the order of commands (for example, that I had to create the group first before I created the user. Also, when there is an error message, I have to know where to look to fix the problem (like the auth.log file). Fortunately one can use the usermod command to change settings later for a user.

Adding a user with Webmin

I had no trouble logging into Webmin. Finding the correct menu items was also easy and fairly intuitive. I found it easy to create a group and a user (which I named useryou), and the default settings made it fast, but powerful. I like that the settings on the groups and users tabs had default values but also gave me drop-down menus so I could see the available choices. This would be handy if I had to configure several users and groups; I especially liked that you could create a user and have webmin create a group for you, so that you didn’t have to know the order of the commands like you do with the command line. It seems that it would be easier to do the tasks without making a mistake, especially if you had several tasks to do, or if you had many users and groups; it would help avoid making typos as one could at the command line. Also, although we didn’t do this in the assignment, from the reading it seems that the webmin batch commands could also be very powerful, if I wanted to create multiple users and groups and execute commands before/after creating new users.