NOTE 1: The versions listed here are those that were used during
development. It is possible that Luci will work with an older version of the
same module, but when using an older module perfomance should then be
considered uncertain.

NOTE 2: It is possible that some of the modules listed here have module
requirements of their own.

Un-tar the archive to a directory from which you wish to run Luci. You should
consider installing Luci behind SSL. ( see the FEATURE LIST
for information on how Luci works with SSL and why Luci should reside behind a
Secure Socket Layer. )

Using your favorite editor, open the luci.conf.cgi and update the following
parameters in accordance with your environment:

NOTE: These settings are required for Luci to work properly. Any parameter
within the config file may be modified to tweak Luci to best suit your needs,
but you only need to change the required list to get Luci up and running.

- app_root_url - absolute URL of the application root directory.
( NOTE: the 's' in http's' is recommended, yet not necessary. Please see
the FEATURE LIST for information on how Luci works with SSL and why Luci
should reside behind a Secure Socket Layer. )

ex: https://www.yourdomain.com/text_only/parser

Under <system>

- default_target - absolute URL of the default site where Luci should
be directed. This site will be used if a user references the luci.cgi
directly.

ex: http://www.yourdomain.com/

- apache_2 - set true if running under apache 2, false otherwise

- allow_url - urls that Luci will treat as internal ( ie. allowed
for parse ). This directive should be repeated for every url under which
Luci should allow for parsing. If you leave this field empty, Luci will
allow parsing of any domain. See luci.conf.cgi for specific info on how
to set this directive.

- deny_url - urls that Luci will not allow for parse. This directive
follows the same conventions as does the allow_url directive. Note: deny_url
will override settings in the allow_url directive. If you leave this field
empty, Luci will allow parsing of any url restricted by those set with
allow_url. See luci.conf.cgi for specific info on how to set this directive.

- cryptkey - the encryption key is used with CBC::Crypt for generating
parameter names, but also used for en/decrypting passwords used in
conjunction with 401 authorization. ( ie. .htaccess ) If you are not familiar
with this, set it to some random string. cryptkey can be anything you like
- I think - don't be too creative, but don't use the example given here.
( see the FEATURE LIST for information on how 401 authorization works with
Luci and why you need to set cryptkey. )

ex: cryptkey = aZ4eg3P

Under <cookies>

- domain - the domain under which you've installed Luci.

ex: yourdomain.com

- secure - Set to 1 if Luci is running under SSL. Used with cookies
set by Luci. According to CGI::Cookie: ``If the 'secure' attribute is set,
the cookie will only be sent to your script if the CGI request is occurring
on a secure channel, such as SSL.''

The configuration file ( luci.conf.cgi ) contains some information which you
may not want publicized; namely the cryptkey parameter. ( see 401
authorization in the FEATURE LIST for more information on cryptkey and
why it should be hidden ) The following methods may be used to hide the config.
( This is by no means a complete list, so please feel free to send us any
methods you feel should exist with this list, along with the OS used )

- hide luci.conf.cgi from your web server document root - By moving the
config file to a directory outside the web tree, anonymous users will not
be able to access your configuration details. To do this, you need to first
move luci.conf.cgi to some directory outsie of your web server document root,
then edit the luci.cgi and the index.cgi, and update the following line:

use constant LUCI_CONF => $path.$sep.'luci.conf.cgi';

to read:

use constant LUCI_CONF => "/path/to/hidden/luci.conf.cgi";

- run setuid - By running Luci setuid, you can probably leave the config
under the web server document root, as long as the permissions are set
properly.

NOTE: When running setuid, Luci runs as the owner of the Luci scripts, and
therefore has those permissions available to that user on the system.
Please see the setuid man page for more information on how it works. When
running setuid you should probably run the application with a user account
that has minimal permissions on the system.

NOTE: As of version 1.3, we have renamed the configuration
file with the appended '.cgi'. This should allow you to leave the file
in place if you serve perl generated pages with the cgi extension. In this case
it is recommended that you set it with read only permissions so any user
attempting to read the file with their browser will get a forbidden error.

In release 1.2 and earlier, we encountered an issue where web crawling engines
that found an instance of Luci would spawn a large number of invocations
attempting to spider websites via this service. In an environment where Luci
is hosted without any url restriction, this was a real problem where the
crawler would saturate the host, using all available resources, in turn denying
a response to any other request. The issue is that crawling
engines recognize sites via Luci as non-existent in their datastore, and so
begin indexing, even though they should be considered a duplicate. This should
be a non-issue for smaller sites that use the allow_url directive in the
configuration file, but for larger sites, or those that do not make use of the
allow_url directive, this can be a problem. For example, the googlebot
engine is not only fast, but also runs in a distributed environment, and if it
were allowed the availablity to spider the internet via your Luci instance,
it could quite quickly tap all resources, in turn bringing your
server to a grinding halt.

The solution has two parts:

- a - As per the googlebot documentation, we've added the meta content
necessary to hide your Luci pages from a crawler that may attempt indexing
via your site. This is available as an option in the config, and is by default
turned on. If you have a small website, and do make use of the allow_url
directive in your luci.conf.cgi, you shouldn't be too concerned about robots
because they will only index those pages allowed by your Luci installation.

- b - For larger sites, or those allowing open access by not using
the allow_url directive, it would be good practice to at minimum leave
the option to include the nofollow meta turned on as is described in
- a - above. Additionally, you can include a robots.txt file at the
root of your site which is the standard method used by a website admin to
communicate with a crawler engine.

As an example: if you install Luci under
https://www.yourdomain.com/text_only/parser/, then you would create a
robots.txt file at https://www.yourdomain.com/robots.txt. The entries
necessary to hide this installation are as follows:

User-agent: *
Disallow: /text_only/parser

NOTE: The robots.txt file *MUST* exist at the root of your site, else it
will have no effect.

Below is a list of environments under which we know Luci will run. With
each is a description of what software versions were used, and what was
required to get Luci running. Some systems have special requirements:

- UNIX/Linux/OS X - Luci was developed in a UNIX environment, and
should work without issue. See BUGS for reporting issues found
when running/installing Luci in a UNIX environment.

- LWP will support https URLs if the Crypt::SSLeay module is installed.
- 501 Protocol scheme 'https' is not supported (Crypt::SSLeay not
installed).
it is required that the IIS internet guest account (iusr) has
'Read and Execute' permissions set on the Crypt-SSLeay dlls,
namely, ssleay32.dll and libeay32.dll. to set permissions,
locate these files, choose properties from the context menu, and
under the security tab, add the iuser account to the user names menu
with 'Read and Execute' permissions checked.

- the following document was provided by John Newman from
http://www.newluna.com/, and may be of assistance for those
installing under Windows 2003:
http://luci.sourceforge.net/other/win2003_05_30_2006.pdf
if you intend to perform a non-network install, it will be necessary
that you retrieve the modules and port them to your local machine
manually. we cannot provide them here, and would also prefer that
the latest version of the modules be used.

- test configuration - Once you have Luci installed and configured to
work on your server, using Luci is quite simple. To test your configuration,
simply point your web browser at the Luci install directory. You should see
a 'text only' version of the default_target which was set in the configuration
file.

- adding accessibility links to your pages - Using the provided
index.cgi you can quite easily add accessibility
links to all your pages. Use the following in your HTML to provide quick
and easy access to Luci. The index.cgi will take care
of forcing Luci to parse the appropriate page.

Luci is the bright young, colonial cousin of the venerable dowager Betsie,
BBCs Education Text to Speech Internet Enhancer. While still
bearing a family resemblance to Betsie, Luci has been completely
re-written mainly to accommodate SSL.
( see SSL in the FEATURE LIST for more information on why we wrote Luci )

Luci allows you to change the way your browser views web pages by simplifying
their content into a well-structured, text-only format, mainly for
accessibility purposes. Luci works equally well whether you wish to change
the font-size and colour scheme of your dislpay to make it easier to read, or
want to send it to a text-reader.

Once in Luci's unified text rendering view, the user has several options for
adjusting the application's display settings ( by choosing font, colour scheme,
font size, and line height ). These settings are maintained as you continue to
browse within the alloted domain(s), or or until you switch back to a
Graphical Version of the site.

Luci has many features, those of which we've deemed most important are listed
here. For more detailed information on what Luci can do, see the source, the
config, and you may also find some detail in the changelog.

- Why SSL? - At our institution, a very important component of our
core services are made available to our clients ( namely students ) via secure
web. For us to make these services accessible it was required that we have a
parser that could accommodate SSL.

At the time of this writing, Betsie was not capable of parsing secure content.
An attempt was made to modify Betsie yet after much ado, a decision was made
that we could benefit from a full re-write, in that we would add the ability
to parse encrypted content, and would take advantage of any features that
may not have been available/feasible at the time when Betsie was originally
written. ( ex: OO concepts, certain Perl Modules, etc... )

- Luci should reside behind SSL - In Figures 1 and 2, you can see how Luci
works. Basically, Luci acts as an intermediary between the user and some web
site. In both diagrams, you can easily see that Luci may communicate with
either secure or non-secure web servers.

In Figure 1, under http ( non-SSL ), the communication between the user and
Luci is not encrypted, and therefore, not secure. This can be dangerous
because those users accessing secure content using Luci in this scenario
would be transferring data intended for encryption across an un-encrypted
connection.

In Figure 2, Luci is hosted in a secure environment, and therefore,
communication between a user and the web server is now encrypted. This is how
Luci should be hosted.

Because of these security concerns, we've added a check in Luci that will fail
parsing of secure pages when Luci is hosted from http. In this case, users
will be presented with an error message when attempting to parse a secure
document. See DISCLAIMER.

When navigating a framed page, Luci will maintain the frameset, emulating what
a user sees ( framewise ) when browsing with a non-text-only client. We've also
added a feature that will force a frameset reload if a user changes their style
settings. ( javascript required for this feature to work )

Luci not only stores all cookies that the user comes across in one single
cookie, but also stores any information pertaining to the creation of those
cookies within that cookie. We also have our own auth scheme which requires
setting a cookie to maintain authorization. ( see 401 auth below )

- size and excessive use -

The cookie specification as per RFC2109 ( see: http://rfc.net/rfc2109.html )
states that a user agent must at LEAST accommodate for a cookie of 4 kb
in size. ( quashing the myth that browsers only support cookies 4kb in size )
( see:
http://search.cpan.org/~gaas/libwww-perl-5.800/lib/HTTP/Cookies.pm#METHODS
for details on cookie creation ) As a result, the application
should work very well where cookie sizes are moderate. It is up to the
browser how it will perform once the cookie size surpasses the required 4 kb
limit. ( truncation a definite possibility ) Therefore, application
performance with respect to cookie size and excessive use is uncertain, and
most likely dependent on the users browser specification.

When a user navigates a site that requires 401 authorization ( ex:
.htaccess ), Luci will provide them with our own homegrown login screen.
Once the user passes authorization, the credential information is encrypted,
and stored in an authorization cookie.

Using this model, you cannot be logged into more than one 401 site at a time.
Luci passes the credential information to the server upon each and every 401
request. If the user navigates to a separate domain, they will
be prompted for their credentials with respect to that domain, and any
existing authorization data will be overwritten.

The cryptkey parameter in the luci.conf.cgi is the key that is used to
en/decrypt the credential information that is stored in the authorization
cookie. As stated, you should change the cryptkey and hide the luci.conf.cgi
from being viewed via the web. If your cryptkey is publicly available ( ie.
if you use the one provided with the download ), and some intruder manages to
steal the authorization cookie from one of your users, they could then easily
decrypt the data and obtain their credential information.

Using the Luci configuration file, you can easily tweak Luci to best suit your
sites needs.

- templates -

All the HTML associated with Luci is templated, and availabe in the templates
directory provided with the distribution. With these templates, you can
quite easily port your site branding to Luci, etc...