In the new era of explosive information collected, people rely on their electronic devices more than ever, and I was wondering: how does it affect the traditional libraries? During my internship at Shanghai Library this summer, I found that readers still crowded the reading rooms every day and there was even a long line waiting for the library to open early in the morning. But my question still remains: are there any interesting phenomenon and facts which we might not know? Fortunately, the library is managed and supported by very advanced IT systems which provided a large quantity of data, and a statistical analysis of the library readers and their lending information can be retrieved, which allows us to unveil some interesting facts. This is the scope of my internship, for which I performed some preliminary data analysis using R programming as a software tool.

Distribution of Reader’s age

Readers are the main service subjects of the library, so I focus on the basic facts of readers as my first research project. The histogram is used to show the distribution of reader’s ages. I divided the readers’ age into groups of 5-year each, and the results are shown in Fig. 1. It is very obvious that the main reader groups are between the ages of 20 and 45.

Fig. 1

In the next step, I studied the distribution with more refined scale by grouping readers every 1 year of their ages. Fig. 2 indicates that most readers are concentrated in the range of 25 – 27 years old. There are also some interesting facts presented in the chart. For one thing, it is apparent that the population of readers goes through a small rise at age 14 to 15. That is possible that teenagers during that ages are busy preparing for the high school entrance exams and getting used to the new high school life. And then the number of readers experiences a dramatic rise starting from 20 years old and reaches its peak at about 30 years old, which is probably due to readers’ need to find a substitute of college library and the urge to learn new knowledge for their new jobs and possible new way of living. Also, there is a small plateau around 60 years old, which might due to the result of more people picking up readings again after their retirement. A simple chart, reflecting many different stage and chapter of their lives, which I found they are are quite interesting.

Fig. 2

According to the statistics result in Fig. 3, female readers are around 41.4% of total population, and male readers are 58.6% respectively.

Fig. 3

In addition, I drew the distribution of readers’ age between different genders, and compared the structures of two genders, which are quite similar. More details are provided in Fig. 4 as follows.

Fig. 4

Furthermore, by comparing the distribution of readers’ ages for different genders side by side, which is shown in Fig. 5, I observed that there are slightly more female readers than male readers among those younger than 25 years old; but for the elder ones, male readers are significantly more than females, and indeed, they become the majority of readers among readers 50 years old or beyond. This phenomenon also reminds me of my own grandparents: my grandmother likes to stay in front of television all day while my grandfather would make a cup of tea and read for a while everyday.

Fig. 5

In Statistics, boxplot is another frequently used graphical visualization tool, it can show different angles and details comparing to histogram chart. See Fig. 6 as an example, the thick line in the middle of the box is the median, the two sides of the box are quartiles, so what’s within the box reflects middle 50% of all data. The crosshatch on the whisker is plotted at the position of 1.5 x IQR, and those small circles reflect some extreme data, until the minimum and maximum are all drawn.

In the following, I also investigate the reader’s reading choices among three types: books, magazine and newspaper. I observed that books are of course the majority, which are most lent. In the meantime, magazine also attracted a significant amount of readers, and newspaper gains a certain popularity, especially among the aging readers. See Fig. 7 below.

Fig. 7

In Fig. 7, it’s noticeable that many readers are very active, borrowed hundreds of books or magazines. But through the view of boxplot in Fig. 8, 75% of book readers only borrowed no more than 5 books, 4 magazines, or 2 newspapers respectively. More research could be done to find out how to stimulate more readers to borrow more books, magazines or newspapers.

Fig. 8

Popular Books

Next, I also analyzed another important factor, books, to discover what kind of books are most sought by readers. There are different methods to classify the book types, for example, one is used to suggest the right display shelf in bookstores, and another is more professional Chinese Library Classification(CLC). I will further explore this based on the sample data of books lending history in 2014.

Using R programming, a summary is a very handy function, which can be used to summarize tens of thousands of records and get the outcomes within seconds. According to the standard book classification, the most popular book types in 2014 have a following order: Economy, Literature, Chinese Literature, Computer and Internet, History and Geography, Language, Light Industry, Philosophy, Pharmacy & Health, and Economics Administration. By taking a closer look at the preferences of people from different age groups, the results are summarized in Fig. 9.

Fig. 9

Following are the findings:

People between 20-60 years old are most interested in Economics, which is on the top of their reading list in 2014. Although readers older than 60 obviously lost their interest on reading Economics books.

Literature and Chinese Literature are the only two types of books that are among top 10 across all age groups.

The young readers’ group, which I am most interested, is between 1-20 years old, their books selection is quite unique comparing to other age groups. Fig. 10 provides more details. Note that Psychology is ranked No. 2 for the young age group. This is not surprising, since I also read many books in Psychology. Biography and Sociology, which are top 10 for this age group. This may suggest that young people like me are seeking role models for their growing and trying to learn to fit into the society. Does that make sense?

Fig. 10

Summary

Above is the summary of my preliminary research during my summer internship in 2015. During which, I was able to use the basic statistics knowledge that I learned by myself, and apply it for the real world scenarios which also interest me a lot. This is an amazing experience. Through learning and practicing R programming language, I found that the R is such a handy and powerful tool to perform data processing and visualization. There are more to be learned, and I would love to continue to refine my skills of mastering R, and conduct humanity research using high technology. Wow, I really enjoy the learning experience!

Acknowledgement

I am very grateful to have this internship opportunity and learning in Shanghai Library System and Networking Center. I also appreciate Professor Annie Qu of UIUC to provide statistical guidance during my research.