Login

Cracking The Vault (part 1)

Electronic documents are all well and good – but when you work on
them collaboratively, they can end up being more difficult to handle than
ordinary pieces of paper. Multiple versions, competing standards, access
permissions and revision history tracking are just some of the issues that
arise in a paperless office. This article discusses building and deploying
a document management system across your network – and also teaches
beginnners a little bit about designing Web-based applications with PHP and
mySQL in the process.Now, I don’t know about you guys, but I don’t think the paperless office is a
feasible idea. At least not without a lot more thinking, and a lot more
work.

I’m not being a pessimist here. I love the concept, and, just like
everyone else, my eyes light up at the thought of replacing the mounds of paper
piled up around my workstation with something a little more attractive (a Mr.
Potato Head, maybe?). And so, a while back, a few like-minded colleagues and
myself got together and decided to try and make a go of using only electronic
documents in our daily workday.

Obviously, we already use internal email
extensively, and swap documents over the network – however, since we are a
content production company, our work involves accepting and editing both digital
and printed material, and most people still prefer printing out an article or
report, as opposed to reading the digital version off a computer
screen.

After a very trying eight weeks, a couple of problems with this
approach became immediately visible in our weekly after-hours gripe session. And
so, a skunkworks project was born, a software development effort designed to
make it easier to manage collections of electronic documents in a networked
environment.

All this is, of course, by way of background. Over the next
few pages, I will be describing our unique requirements and the problems we
faced in greater detail, together with our proposed solution. And, after putting
in some thought, I will be guiding you through the process of building a
document management system, with the help of powerful open-source tools like PHP
and mySQL.

The goal here is two-fold: to introduce novice and
intermediate programmers to the process of designing and implementing a
Web-based application, and to PHP’s session handling, file upload and database
capabilities; and to offer developers, administrators, corporate efficiency
experts and people with messy desks a possible solution to their
woes.

Lofty goals, you scoff? Well, let’s see…

This article copyright Melonfire 2001. All rights
reserved.{mospagebreak title=Just Another Day At The Office} Before we
get into the nitty-gritty of PHP scripting, it is important to understand the
problems we faced, so that the functional requirements of the solution become
clear.

The primary problem we faced with electronic documents was one of
version control. Here are three scenarios which illustrate the
problem:

1. The company has just been contracted for a software
development project, and John is identified as the team lead. He is responsible
for publishing a schedule describing milestones and resource allocation. Before
committing delivery dates to the customer, John must communicate with various
department heads and obtain their agreement to deliver work product on specific
dates.

Typically, John would create a schedule, outlining the tasks as he
sees them (leaving dates and resource names empty). He would then send this
schedule to different department heads, asking each to fill in the missing data
in the areas that concern them. For example, Tim (Design Manager) would need to
assign a resource to the interface design aspects of the project and state how
long the task would take, while Roberta (Publications) would need to assign a
writer to the task of putting together a manual once the code is
released.

The problem here is that if John emails the document to each
person, he has to integrate the different versions he receives back into a
single picture, and again send the final draft out for approval to all design
heads. Alternatively, he could place it on the network – but how is he to know
when the various departments have finished adding to it, and the changes made to
the document by each?

2. A corollary of the problem above – once the
schedule is decided, John stores the document in a project folder on the
network, accessible to all. However, as per Murphy’s Law, the project is plagued
with delays, necessitating frequent updates of the schedule…and consequently,
different versions of the same schedule. Team members often find themselves
working off an outdated version of the schedule (“Oh, didn’t you know, we need
the widget tomorrow – look, it says so right here, in this new
schedule!”).

John also needs to assign permissions to the schedule (and
other group documents) carefully, to ensure that only authorized people are
viewing or making changes to it (this is particularly important for billing and
invoice information). If John is technically qualified, he can easily use
built-in Netware, Linux or Windows NT capabilities to assign document
permissions…but what if he is (horror of horrors!) a manager?!

3. The
third scenario is one unique to our particular area of business. As a content
development company, we have a small but talented team of freelancers developing
content for us. These guys (and gals) email material to us from home, sometimes
dropping in to the office to perform edits or corrections. Now, if they email
material to Harish, and drop in the next day expecting to edit it, they’re going
to be deeply wounded by the fact that Harish decided to go to the beach today,
and no one else can get through the BIOS password on his workstation.

A
similar problems arises on collaborative writing projects; if two or more
writers are working on different aspects of the same piece, an editor has to sit
down at the end and make sure that each writer’s changes and additions are
integrated into a final, workable article. And here’s another twist: what
happens if a new writer is brought in to work on the piece (these guys like to
call themselves “consultants”), and the work is spread piecemeal among the other
three members of the team? Obviously, there needs to be a way for new team
members to quickly obtain the most current version of a document, so as to get
up to speed quicker…

You may have faced similar situations at your own
workplace – if so, you’ll be glad you decided to read this article. If you’ve
never, ever, encountered one of these situations or variants thereof – is your
company hiring?

This article copyright Melonfire
2001. All rights reserved.{mospagebreak title=An Evil Plan Is Born}
Having understood the problems, it becomes easier to decide on the requirements
of the solution. An analysis of the problems above reveals that most of them
would be resolved if we had a system which:

1. offered a central storage
area for group documents;

2. used some manner of secure access to ensure
that only users with appropriate permissions viewed or edited sensitive
information;

3. made it possible to identify which users were working on
which documents at any point in time;

4. tracked the changes made to each
document;

5. offered a quick and easy way to locate documents in a large
repository, and categorize them according to type;

6. provided a
framework for multiple users to collaborate on a document, without imposing any
special software requirements or constraints on them.

I initially debated
restricting the system to text documents only; however, since we use a variety
of different applications for internal communication (spreadsheets, documents,
project schedules, schematics), I decided to also allow users to upload binary
files of any type (administrators should, of course, have the option to restrict
certain file types if necessary.)

This, therefore, constitutes the
initial feature set for our application. If I am able to meet these basic
requirements, it will make a lot of people very happy, and more features will be
requested. Once I have enough requests for more features, I plan to go to the
guy in the corner office and ask him to make me a manager, double my salary,
give me my own office and assign me to the project full-time. Ahhhh haa haa ha
ha!

This article
copyright Melonfire 2001. All rights
reserved.{mospagebreak title=Setting The Ground Rules} At this point, I
have also come up with a few ideas as to how this system is likely to work. I
plan to model it closely on the check-in/check-out adopted by many popular
source-control system, most notably CVS [http://www.cvshome.org/]. Incidentally,
we did consider using CVS – the developers are already quite comfortable with it
– but the managers had a “problem” with the command-line interface.

Here
is an initial draft of the basic rules that I plan to build into this
application:

1. Any user can log in to the system and add a file to the
repository. This user can also assign permissions to allow other users to view
(download) copies of the file, and modify (upload) new versions of the file.
Users can only view/modify files if they have appropriate permissions.

2.
Users with “view” permissions can download the most recent copy of a file for
viewing on their local systems.

3. Users with “modify” permissions can
“check out” a file from the storage area, make changes to it, and then “check
in” the new version of the file. Other users cannot access a file while it is
checked out.

4. A revision tracking system will enable users to enter
comments at the time of checking in a file; these comments serve as both a
reminder of the changes made to the document, and a notice of which users have
made the changes. However, unlike most source-control systems, I do not
currently plan to archive previous copies of a document (because it is not a
requirement at present – though this can easily be added), or provide a method
to “roll back” changes (because I have no idea how to track and revert changes
made to binary files like JPEG images or Microsoft Word DOCuments).

5.
All this activity will take place via a Web browser, thus freeing the user from
having to install any special software on their workstation.

And while
we’re deciding on the features and rules, let’s also decide on a name. I’m going
to call it The Vault, simply because it functions much like a bank vault (secure
access; user identification and tracking; sounds hip).

This article copyright Melonfire 2001. All rights
reserved.{mospagebreak title=Design Time} Now that the functional
requirements have been decided, it’s time to find someone I don’t particularly
like and shift the whole job over to him while I enjoy a long lunch and a
comfortable nap…

Just kidding. The next step is actually to begin
thinking about the database design, in accordance with the feature set
above.

This is a good time for you to download the source code, so that
you can refer to it throughout this article (you will need a Web server capable
of running PHP and a mySQL database).

This table contains information on the documents stored
in the system. Each document is assigned a unique ID, and the table also stores
information on the document category, the creation date, the document “owner”
(the user who initially added the file), and a user-specified description and
comment. The last field stores the current status of the document (checked
in/out).

This table stores permissions on a per-file basis;
permissions may be defined as “view” (1) or “modify” (2). The unique combination
of these three columns makes it possible to identify any user’s rights for any
file in the system.

Finally, this table has been created specifically to
store revision information when documents are checked back into the system.
Don’t worry about this one for the moment, I’ll be discussing it in detail at a
later stage.

At this point, I should say that the schema above
constitutes an initial draft only. As the software evolves over time, I plan to
modify this to add support for new features, and to optimize existing code.

This article
copyright Melonfire 2001. All rights
reserved.{mospagebreak title=Start Me Up} With the database design out of
the way, it’s time to actually start creating the scripts. We’ll begin at the
top, with the scripts which verify the user’s password. Here’s the initial login
form, “start.html”.

Assuming the username and password is correct, the script
initiates a session, and registers a session variable named $SESSION_UID,
containing the user’s ID; this variable will remain available throughout the
session, and will be used in many of the subsequent scripts. The script then
redirects the browser to “out.php”, which forms the main interface to the
system, via an HTTP header.

A login failure will redirect the browser to
the generic error handler, “error.php”, with an error code indicating the type
of error. I’ll be using this error handler extensively, to handle the different
types of errors possible.

It is important to note that calls to header()
and session_start() must take place before *any* output is sent to the browser.
Even something as minor as whitespace or a carriage return outside the PHP tags
can cause these calls to barf all over your script.

Finally, the
include()d file, “config.php”, contains some useful variables – the database
name, user name and password, together with the location of the data storage
area and a list of allowed file types.

This article copyright Melonfire
2001. All rights reserved.{mospagebreak title=Entry Points} Once the user
is successfully logged in, “out.php” takes over and generates a list of
documents available to the user; this list includes documents the user may have
uploaded, as well as documents the user has “view” rights for. If this is the
first time you are logging in, you will probably see nothing; once you add
documents, the screen will fill up.

Before
we look at this script, however, I want to draw your attention to the script
“add.php”, accessible from the top right corner of the page and used to add new
documents to the system. I’ll explain this first, since it offers a logical
starting point.

The first thing “add.php” (and every other script) does
is to verify the existence of a valid session – this is necessary to prevent
unauthorized users from viewing the pages. If a session doesn’t exist, the
browser is immediately redirected to the error handler.

I’m not going to go into details here – it’s essentially
a bunch of HTML tables, all dressed up to look pretty – but I will draw your
attention to the file “menu.inc”, which contains the main menu for the page and
is include()d on every page.

This article copyright Melonfire 2001. All rights
reserved.{mospagebreak title=Seeding The System} Since “add.php” is a
form, the script is divided into two sections; the first section displays the
form, while the second section processes the form data.

A couple of points to be noted about this form. First,
since I plan to use this to upload files, I’ve specified the form encoding type
to be “multipart/form-data” and added a form field of type “file”. And a little
further down, I’ve queried the database to generate a list of users so that
rights can be assigned appropriately. Note how I’m checking each user’s ID
against the current $SESSION_UID in order to pre-select the current user’s
name.

Although I’m going to use the results of the SELECT query in two
places – to generate a list for both “view” and “modify” rights – it isn’t
necessary to run the query twice. This is because the mysql_data_seek() function
takes you back to the top of the current resultset, allowing you to reuse query
results more than once.

The first thing to be done is to verify certain file
properties – for example, the file size must be greater than zero bytes, and the
file must be of an allowed file type. In order to perform these checks, I’m
using the four variables created by PHP whenever a file is uploaded – $file is
the temporary file name assigned by PHP, $file_size is the size of the uploaded
file, $file_type returns the MIME type, and $file_name is the original name of
the file.

Assuming everything checks out OK, I then process the
descriptive data entered by the user, INSERT it into the “data” table, process
the list of users with “view” and modify” rights, INSERT this data into the
“perms” table, rename the uploaded file and copy it to the storage area, and
redirect the browser back to “out.php” with a status message indicating
success.

There is an interesting chicken-and-egg situation here that you
may be familiar with. I need to rename the newly-uploaded file to “fileID.dat”;
however, I can only do this once the record has been inserted into the table and
an ID generated for it. Once the ID is generated, I would normally need to query
the table again to obtain the ID. However, the mysql_insert_id() function stores
the ID generated by the last INSERT operation, and can conveniently be used
here.

This
article copyright Melonfire 2001. All
rights reserved.{mospagebreak title=Red And Green Clouds} Let’s now take
a look at “out.php”, the PHP code used to generate the file list.

Pay attention to the query – I’m joining the “user”,
“data” and “perms” tables together to obtain detailed information (description,
creation date, owner) for those files which contain the logged-in user’s ID and
a permission value of 1 (view). Once I have a dataset, I use a “while” loop to
iterate through it and display the file list.

Next, the $filename
variable. As I’ve already mentioned, once a document is entered into the
repository, I don’t plan to retain the original file name, but rather hope to
rename it in the form “fileID.dat”. So I’ve generated a filename on the basis of
each file ID, and am using the filesize() function to display the size of the
file as part of the description.

The fixDate() function is used to turn
the mySQL timestamp into something a little more readable.

Depending on the value of the “status” column, an icon
and message is displayed for each file; green indicates that the file is
available, while red indicates that it is not.

And here’s what it all
looks like.

This article copyright Melonfire 2001. All rights
reserved.{mospagebreak title=Digging Deeper} Each of the files listed in
“out.php” is hyperlinked to the script “details.php”, which displays detailed
file information. This script is passed a file ID via the URL GET method, and
uses this ID to query the “data” table and obtain the file description, author
comment, owner, creation date and file status.

This article copyright Melonfire
2001. All rights reserved.{mospagebreak title=Basic Maintenance} As I’ve
just explained, the owner of a document has the ability to edit document
properties, or delete the document from the system. The scripts that accomplish
these tasks are “edit.php” and “delete.php” respectively, and I’ll briefly
explain them here.

“edit.php” is almost identical to “add.php” – it
contains both a form and a form processing script, and includes all of the same
form elements, with the exception of the file upload box. The document owner can
use this script to alter the file description, and grant/revoke user
permissions.

Since this is a pre-existing document, “edit.php” needs to
query the database in order to pre-fill the form with the current values of the
various fields. This piece of code is designed to obtain the list of allowed
user IDs from the database, create an array, and then use the array to
pre-select the usernames in the list boxes.

This article copyright Melonfire
2001. All rights reserved.{mospagebreak title=The D Word} Finally, we
come to “delete.php”, which takes care of removing a document from the file
system and the database.

Nothing too complex here – verify that the file can be
deleted, DELETE all records containing the file ID from the various tables,
delete the file (this will fail if your Web server doesn’t have the right
permissions) and go back to the main page.

At this point, I’ve built the
foundation for the system, although I have not yet addressed the scripts which
actually take care of checking documents in and out. That, together with an
explanation of how I plan to implement the revision history mechanism and the
search feature, will be addressed in the second part of this article. Download
the code, play with it, send me your thoughts/flames/savings…and come back
next time for more!