Troubleshooting Web Page Problems

Some common problems with FTP,
character encoding, and fonts

FTP Issues

FTP has two modes of
operation: text and binary.
In text mode some characters may be changed when
transferred.
(This is considered a feature and not a bug since text files
have different end of line conventions on different
platforms.)
For most files but especially graphics, applets, and other media,
you need to use binary mode.
Oddly for text files the mode usually doesn't matter.
So make sure you uploaded your files in binary mode.

Another problem is having connection issues.
These can occur when you computer or your ISP
has security barriers in place, such as virus scanners and
firewalls.
(It is also common to make a typo in the hostname, username,
or password, so always check that first!)
FTP is a strange protocol, invented before security
was a big concern.
It requires two separate connections between the two computers.
The normal mode of operation is called active
mode.
In active mode the second connection is made from
the server back to your computer.
Most firewalls will block this!

To work around this issue you must use FTP in passive
mode.
In passive mode your computer makes both required
connections to the server,
which most firewalls will allow.
So make sure passive mode (or passive transfers)
is selected before trying to connect.

Character Encoding Issues

Another issue is the character encoding used in your
web page.
All text is actually represented by a set of numbers, on all
computers.
Which number represents which character is called the
encoding or the character encoding.

When you use a text editor such as Notepad (or TextEdit)
the text files are encoded using some platform default
encoding.
On some modern systems this default is Unicode
or UTF-16.
Sadly this doesn't match the default encoding used by most
web browsers!
The result is the numbers are interpreted incorrectly and you see
all sorts of junk on the screen.

There are many hundreds of different encoding schemes used in
the world today.
Some of the common encodings are UTF-8
and 8859-1 (also called
ISO-8859-1 or
ISO Latin I).
Windows systems have always used Microsoft encodings that
Microsoft and IBM calls code pages.
By default Windows XP uses an encoding called
CP-1252 or
Windows-1252, and old DOS
systems used CP-437.

Microsoft likes to call the default Windows encoding
ANSI, perhaps to pretend it is some sort
of national standard encoding.
(I guess it is sort of standard considering the number of
Windows systems in use in the world today.)
I found this information at
scripts.sil.org/IWS-Chapter03:

When Windows was being developed, the American National Standards
Institute (ANSI) was in the process of drafting a
standard that eventually became ISO 8859-1Latin 1.
Microsoft created their codepage 1252 for Western
European languages based on an early draft of the
ANSI proposal, and began to refer to this as
the ANSI codepage.
Codepage 1252 was finalised before
ISO 8859-1 was finalised, however, and the two are
not the same: codepage 1252 is a superset of
ISO 8859-1.

Later, apparently around the time of Windows 95 development,
Microsoft began to use the term ANSI in a different
sense to mean any of the Windows codepages, as opposed to
Unicode.
Therefore, currently in the context of Windows, the terms
ANSI text or ANSI codepage should be understood
to mean text that is encoded with any of the legacy 8-bit Windows
codepages rather than Unicode.
It really should not be used to mean the specific codepage
associated with the US version of Windows, which is codepage
1252.

I don't currently have a Mac or Vista but I am seeing a large
number of student web pages encoded as Unicode
(UTF-16) and I suspect that is the
new default on at least one of these platforms.
Using a different encoding than
the web browser expects will likely make your page look
bad (or completely unreadable).

The fix is very simple:
Choose Save As... in Notepad
and select an encoding such as UTF-8
or ISO-8859-1.
Then re-upload your web pages, making sure to use the
binary mode transfers option.

It is possible to add an HTML tag to a web page to
indicate the encoding used.
However some web servers over-ride that and tell the browser
this page uses the XYZ encoding
so setting it in the web page won't always help.
To indicate the encoding used on some web page, add the following
tag in the HEAD section of the page:

And replace encoding with utf-8,
iso-8859-1, windows-1251,
utf-16, or whatever encoding you used to create
that web page.
The official list of encoding scheme names can be found at:
www.iana.org/assignments/character-sets.

To view a page that has a weird encoding you can tell the
browser to use that encoding.
Under the View menu of your web browser you can change
the encoding used by the browser.
When I see a page that doesn't look right I try
ISO-8859-1 or UTF-8
and usually one of those will work fine.
UTF-16 uses two bytes per character, not one.
So when you see every other characters is a weird character
(On my system a black diamond with a question mark in it)
it is likely that it was encoded as UTF-16 and
your web browser is set to iso-8859-1,
utf-8, or Windows-1252.

(If your web pages look normal on your system it is because
the web browser uses the system default encoding when viewing
local files.
Once you upload your web pages the default encoding is set
by the web server instead, usually UTF-8 or
ISO-8859-1.)

Font Issues

A font (for the purpose of this discussion) is
a collection of tiny graphics, each associated with a number in
some encoding.
For example most fonts associate the number
65 with an upper-case letter
A.
Since there are potentially millions of characters, a given font
only has graphics for some subset of those characters
(a few hundred).
If you see a box or a weird question mark symbol it is sometimes
because you used some character that the current font doesn't
have a graphic for.

This can be a problem since not all users have the same fonts
installed.
In that case a web browser will substitute the unknown font for
one that is installed.
So if you use a fancy font in a web page and it looks fine on your
screen, it may look awful on some other user's system if they don't
have those fonts installed!

Fonts generally are not free, so Microsoft, Red Hat, Apple, and
other computer vendors pay a license fee for the fonts they bundle
on their systems.
The result is different systems almost always
have different sets of fonts installed on them.

The best advise is to use fewer fonts, ones that you believe
will be available to your audience.
Provide an alternative font and make sure your web pages look
okay in that default font.

This isn't intended as a full discussion of fonts but you
should know there are font families that are fonts
with similar characteristics.
You can specify the family to use if some specific font is not
installed, and the system will pick an appropriate one.
Here's an example of specifying styles for paragraphs.
The style for paragraphs says to use the Georgia font,
and if not available try Times New Roman instead,
and if that isn't available either, to pick some default font in
the serif font family.

URL Encoding Issues

Many characters that are legal in a filename are not legal in
a URL or web link.
The most common problem is with spaces in the filenames.
It is easier to just use letters and digits (plus the extension)
for naming files, then you don't need to worry.
(While many web browsers are forgiving about such errors
and will try to guess what you meant, not all browsers are so nice.)

If you do include any unusual characters in your filenames, they
should be encoded using what is called URL
encoding or sometimes percent encoding.
You simply replace each special character with a percent sign
followed by two hex (hexadecimal) digits.
The two digits indicate what the character was.
For example if you have saved an image file with the name
New York.gif, the space must be
encoded and the IMG tag would look
something like this:

<IMG src="New%20York.gif">

You can view this
URL encoding reference
from w3schools.com
for a list of characters and their encoded equivalence.
(Note even normal letters and digits can be encoded, but there is
no point to doing that.)

Image Tag Issues

A common problem is having images not show up when you
view your web page.
Here are several common reasons images might not show up:

Using the wrong filename in an IMG tag.
If the image file is named foo.gif then you must
have an IMG tag like this:

<IMG SRC="foo.gif">

If the file is really named Foo.gif,
foo.jpg, foo.gif.gif, or
anything else, the web browser won't be able to find it.
The file name used in the IMG tag must exactly match the file's actual name.

The image isn't in the same folder as your HTML file.
If the images are on your desktop when your images.html file is in (say)
My Documents, the images won't be found.

Using a full (complete, or absolute) pathname or URL instead
of just the filename.
You should only use the filename of the image when all the files are in
the same folder.
Some students use
C:\Documents and Settings\user\Desktop\image project\foo.gif,
which is an absolute pathname.
This would be a big mistake!
The web page won't work when uploaded to Blackboard.com, and then downloaded
to your instructor's computer.
Just use the name of the file itself.

Something else might be wrong with your HTML file.
In that case, a web browser may give up before loading any images.
For example, missing a closing quote mark will fool most web browsers.

Using weird characters in the names of the downloaded image files.
Many websites use web page and image files with bizarre names.
But you need not use the original name; you can rename the image files
when you download them (only don't change the extension, usually one of
.gif, .jpg,
or .png).

Use a simple, short name, that contains nothing except letters and digits
(and the extension).
Note some folk have Windows set to hide extensions, so their files
look like they have the name foo.gif,
when they really have the name foo.gif.gif.
There is a way to turn off this Windows feature, so you can see
the entire name's of files.