Abstract on Web- based Medical Data Archive System

This web-based medical data archive system presents a prototype system of a proposed paradigm of research aiming at building up a distributed archival system over the Internet to facilitate medical practitioners for maintaining, sharing, updating, searching, and processing medical information conveniently and consistently. Our system differs from the existing systems in the sense that it not only offers a full spectrum of online communications, processing, and annotation tools, butalso provides powerful multimodal search functionalities to the users. In addition, the database is always kept in a “live” mode such that information contributed by users is periodically indexed. At this time of publication, a preliminary version of our system containing over 2000 medical images in different modalities along with associated annotation text for each of the images is fully implemented and is in evaluation. Finally, the system is concluded with the current on-going development on the prototype system

Contents:

1.Introduction

2. System Architecture

3. System Functionalities

4. Current Work

4.1 Image Indexing

4.2 Multimedia Presentation

4.3 More Modalities of the Information

5. Conclusion

6. References

1. Introduction

This system proposing a new paradigm of managing and archiving large scale medical data. Specifically, this research demonstrates this new paradigm through a prototype web-based medical data archive system. This system allows storing, maintaining, sharing, updating, and retrieving medical files based on the state-of-the-art multimedia database technologies and the Internet facilities.

Medical files typically contain data in different modalities, such as text description (either in print-typing form or in handwriting form), audio recordings, image files (including Nuclear Medicine, MRI, CAT, Mammography, Ultrasound Imagery, etc.), as well as possibly video clips. Currently, even though significant progress has been made towards archiving medical data in digital forms, there are still many medical centers or hospitals archiving medical files in the physical forms. As the population of this world increases fast, and as new medical diagnostic technologies keep being developed and applied in the clinical practice, resulting in more new modalities and increasing amount of medical data for each patient, this traditional archival system obviously exhibits several major problems in accommodating the avalanching demand for archiving and managing medical data.

Problems with Current System:

Ømore physical space will be necessary to maintain the medical files

Ømore human manpower will be necessary to maintain the medical files

Øit will become less efficient and more difficult to access and retrieve a particular medical file.

Proposed system for Recovery :

Even with the digital archival systems available today, the medical files are typically inconsistent and fragmented, varying from different medical institutions or medical centers to different hospitals, and there is still a substantial effort that is necessary in order to keep the medical data archival systems consistent, efficient, and effective in storing and retrieving data.

With the fast development of computer technologies, especially the multimedia database and Internet technologies, it is possible to archive all the medical data into a “central” system so that all the medical practitioners and/or researchers in a membership controlled community may share and access to this system conveniently and consistently without all the hassle imposed by the conventional archival system.

This is the motivation of this research, as well as of the development of our prototype system. A typical scenario with the system follows. Assume that the “central” archival system is located in a computer server at Washington. A medical practitioner in Seattle is reviewing medical files of a patient. He/she somehow decides to consult with the database for the previous diagnostic approaches made by his/her colleagues in the same community for those patients who happen to exhibit certain similar medical symptoms.

To do so, he/she poses a query to the system through a standard browser in his/her PC either at home or at the office. The query here may not necessarily just be a simple keyword based query. It may be a complicated query consisting of data in different modalities.

An extreme case is that he/she could send the whole set of files of this patient as the query, which includes information in possibly many symptoms. Our system not only allows the user to review all the files in different modalities by popping up different windows to present the files, but also allows the user to do online processing and /or annotation directly on top of the files, just as writing down annotation on top of the physical medical files. The user could then save the processed files and/or the original files along with the newly written annotation into his/her local folders, or contribute them to the whole community by uploading the files back to the system, sothat this information will be archived in the system, and may be shared by other colleagues in the community, if he/she wished to.

Applying multimedia technologies to medical archive has been a focus of research in years since 90’s. With the advent of Internet, more research is reported in combining Internet technologies with multimedia research in designing and developing medical archive systems. our system differs from the existing systems in the sense that it not only offers a full spectrum of online communications, processing, and annotation tools, but also provides powerful multimodal search functionalities to the users. In addition, the database is always kept in “live” mode such that information contributed by users is periodically indexed. This paper is organized as follows. The next section describes the architecture of our system, and then is followed by a section documenting the implemented functionalities of the currently developed preliminary system.

2. System Architecture

The architecture of the system consists of the standard 3-tier, browser-server model. The client machine may be any machine with any operating system, as long as it is connected to the Internet, and has a standard browser. After a user logins into the system, a Java Applet is downloaded to start the communications with the server. The server consists of a middleware and a backbone database system. The middleware is implemented using Java Servlets and Java Server.

Current on-going development on the prototype system , and the backbone uses Oracle 9i database. The communication between a client Applet and the server Servlet is implemented using object serialization. The communication between the middleware and the backbone uses JDBC (Java Data Base Connectivity) and RMI (Remote Method Invocation). In the current preliminary version of our system, the middleware is J2EE compliant web server (Java Web Server) hosted on a SUN Enterprise 250 server with a dual processor running Solaris 7, 1 GB memory, and 18 GB hard disk; and the backbone machine is a Dell TPX 800 MHz Pentium III running Windows 2000 with 512 MB memory and 40 GB hard disk. While the whole system architecture is implemented in Java, some libraries (engines) of the processing, annotation, and querying tools are implemented in C++ with the efficiency consideration. Figure 1 shows the architecture.

3. System Functionalities

Our system offers users the following implemented functionalities in the preliminary

Version:

• Data management

Users may save the browsed or annotated or processed documents into their local machines, or if they wish, they can contribute these documents to the community by uploading them into the database, which allows data sharing among their colleagues.

• Online data processing

Users may apply tools available in the system toolboxes to do the online processing for the documents they are reviewing. The modality-dependent toolboxes are developed in close collaboration with clinicians and medical researchers. The currently available tools include zoom in and out for either global of local areas of a document, arithmetic image processing (ex: addition of two images, layer analysis for a composition of up to four individual images, etc.), pseudo coloring, threshold, histogram analysis, etc. The algorithms used in these functionalities are from the existing literature .Figure 2 (a)-(c) show examples of part of these functionalities.

• Macro Language

Optionally, users may write/record their own retrieval /processing protocols and add their customized analytical tools.

• Online annotation

Different graphical tools (ex: ellipse, rectangle, line, etc.) as well as different text fonts and color choices are available for online annotation. Users may write or voice dictate (provided that a local computer is equipped with microphone) their annotations directly on top of a document in review, or write or voice dictate their annotation on the specially available blank space next to the document in the annotation window if they wish. The annotations and the graphical information will be indexed separately after they are uploaded into the database. Figure 2(d) shows an example of annotating an image.

• Powerful querying capabilities

In addition to the standard SQL queries, text based and image-based queries are also available to users, and user-friendly graphical interfaces are available to allow multimodal query and retrieval. The SQL query facilities are directly provided by Oracle 9i , and are incorporated into the system. The text query functionalities are implemented based on Inverted File Indexing scheme. The image query functionalities are implemented using histogram indexing.

• Membership control

Password protected membership control to make sure that the sensitive and private information is only available to the members of the community.

While the aim of developing the preliminary system was to assemble each component using existing technologies to show the proof-of-concept for the novel archival paradigm our system represents, the current work focuses on furthering the research by upgrading the preliminary system into a real application system by improving the performance of the developed functionalities and by developing new functionalities. Specifically, the following three areas are identified for further research and improvement.

4.1 Image Indexing

Image indexing is a well-received area in multimedia research, as image related search typically is notoriously expensive, and often fails to deliver an effective retrieval (an effective retrieval refers to the fact that retrieved images are semantically related to the query posed). In this task, a number of state-of-the-art image indexing algorithm from this research community are applied and combined to replace the simple histogram based indexing. One of such indexing algorithms is the Geometric Histogram algorithm developed recently by Rao, Srihari and Zhang, which exploits geometric constraints of intensity values with respect to the spatial pixel distribution, and applies these constraints into histogram. The preliminary evaluation suggests that geometric histogram has promising potential in image indexing, and therefore, is taken as a candidate image indexing algorithm in this task.

In addition to a good indexing algorithm, a “good” database organization data structure is also necessary when an efficient retrieval is desired. (An efficient retrieval refers to an acceptable response time when the retrieval is completed after a query is posed). Several advanced spatial data structures are considered and reviewed, and the upgraded version of our system will be based on one of these structures as opposed to the simple arrays of images.

4.2 Multimedia Presentation

Multimedia presentation is another well-received area in multimedia research. After a query is processed and all the retrieved documents are ready to deliver to the user, multimedia presentation concerns with how to present the information to the user who made the query. A good presentation can not only deliver the retrieved documents to the user, but also deliver the information in a way the user feels most “comfortable” with. Since for different users, their personal interests are different, and their personal preferences are different, a “good” multimedia presentation should be able to be individualized to match each user’s personal interest and preference. This can be done by first tracking down the user’s browsing patterns over a period of time, and then analyzing the patterns to infer what most likely the user’s personal interest and preference are, and finally forming a presentation plan based on the inference. This research is being carried out and is being incorporated into the next version of our system.

4.3 More Modalities of the Information

While the preliminary version of our system only focused on data in image and text, new modalities of information such as audio, video, and mouse pointing are being added into the system for the next version, as a general medical archive system is likely to contain information other than image and text. Due to this requirement, indexing algorithm for each the media types are developed now, and the user interface is upgraded to accommodate the interactions between a user and the system for different modalities of

information.

5. Conclusion

This system presents a new paradigm of archiving medical data based on combining the state-of-the-art multimedia research with the Internet facilities. The system-concept offers a unique solution to the medical archiving problem. It surpasses the existing systems in he commercial and in the research sectors as it provides convenient, efficient, effective, and flexible online processing, annotation and multimodal querying capabilities combined together. A preliminary version of our system is fully implemented and in evaluation that contains over 2000 medical images in different modalities along with associated annotation text for each of the images. This technology will assist medical practitioners/researchers by enabling efficient management and sharing of the medical data within or across a community without being subject to geographical restrictions and without creating problems of inconsistent and fragmented medical data.