Apache 2.0 currently ships with "AddDefaultCharset iso-8859-1" in httpd.conf.
This should be fixed (by commenting out or removing it, or replacing it with
AddDefaultCharset Off) and the comment in httpd.conf should be corrected,
for the following reasons:
1) Charset information is important, but no charset information is much
preferable to wrong charset information (contrary to what the comment
in httpd.conf says).
2) Many document formats have their own internal way to specify character
encoding. It is often sufficient to rely on these. It is often easier,
for document authors and administrators, to make sure these are correct,
than to make sure that the served headers are correct.
3) In most parts of the world, including Europe and the Americas (because of
windows-1252), there is rarely any server that contains only iso-8859-1
documents, and there is rarely any server administrator who knows the
encodings of all the served documents (if s/he is even aware of character
encoding issues).
4) Upgraders from Apache 1.3 to Apache 2.0 often overlook this setting,
resulting in large numbers of files served wrongly with charset=iso-8859-1,
and an increasing number of complaints to ISPs and Web hosters. Fixing
this bug would make upgrading easier and more predictible, and would
reduce complaints to hosters that they have difficulties to address
because they are not familiar with character encoding issues.
5) In order to override the setting (and assuming users know how to do
this), users have to have FileInfo permissions for their .htaccess files.
httpd.conf as shipped contains an example of settings for UserDir
directories and similar cases where the users are allowed some
amount of configuration, but this is commented out. So the chance
is high users don't have a chance to fix the problem, even if they
know the correct encoding of their document and the correct way to
set the HTTP header.
6) The oft-cited default of iso-8859-1 for HTTP is something that exists
on only paper, but not at all in practice. If it were observed in
practice, "AddCharsetDefault iso-8859-1" would be unnecessary.
Because the default is not observed, this setting is harmful.
7) The comment in httpd.conf claims that this setting is a good start for
internationalization. This ignores the fact that many hosts already
contain a lot of internationalized documents.
In connection with this, the documentation for AddDefaultCharset should be
updated to clearly point out the potential dangers of using it (i.e.
only use this if you know the character encoding of the majority of
the documents on your server, and you know what the exceptions are and
make sure they are set correctly).

The new default value causes corruption for people upgrading to the new
version. The mislabeling of Windows-1252 as iso8859-1 can cause the euro symbol
to be incorrect and result in erroneous financial transactions.
The misleading Apache documentation and the change to apply the default charset
causes subtle differences which have significant impact. It can also cause non-
subtle differences. The fact that web standard calls for http charset to
override the charset in the page, means that change will override even pages
with self-documenting charset (ie pages that use the meta tag). The old
behavior should be restored right away.

I'm not enough of an expert in this area to make a decision about it,
but the problem with simply removing this directive is that it
creates problems with cross-site scripting. See:
http://httpd.apache.org/info/css-security/
and links from that page.
In fact, AddDefaultCharset was originally added to deal with these
problems, so simply removing it without addressing the CSS issue
would not be smart.
(See also bug 13986 that states
that apache shouldn't set a default content-type by default.
This issue should probably be addressed along side that one.)

OK, I think we need a clarification. We are not requesting the command AddDefaultCharset be eliminated.
We are requesting that its use in the default configuration to set the charset to iso 8859-1 be eliminated.
As for the security risk, the significant piece of the referenced document seems to be:
"In addition, web pages should explicitly set a character set to an appropriate value in all dynamically generated pages. "
We can all agree with this. The problem is iso 8859-1 is not an appropriate value for the majority of configurations.
The article references that this used to be the default for some of the web standards and is no longer the case.
It is because it is not the best choice in the majority of cases, even in English speaking markets these days, that it is no longer the default.
Perhaps a better compromise solution is to at least ask the administrator what the value should be during the installation and
provide a list of the most common encodings for them to choose from.
Or default to UTF-8 and let people know clearly that is what you use.

SuSE 9.0 shipped Apache 2.0 with AddDefaultCharset utf8
As a result any other encoding mentioned in the hmtl/xhtml/xml-source sent to
the server was ignored.
That does not fit the behaviour of Apache talked about on the
cross-site-scripting page; there it is told that option AddDefaultCharset is
only activated if any page-specific encoding is missing.
A mistake in logic, of the behaviour of option AddDefaultCharset ?

To Joshua Slive: Then the faulty behaviour is on the browser's side, insomuch a
request is sent without an complete or appropriate header, i.e. including the
encoding information. That was my first guess, at Mozilla.
Of course it's presumed that option AddDefaultCharset only is activated if no
encoding information is available.Or, to extend the view, if no valid/accepted
encoding is sent in the request, given a list of encodings accepted by the server.
Would that still help the CSS-problem?

Is this issue still not resolved? I am Chinese and I am strongly on the side of
the reporter.
The problem, I suppose, arises from a problematic standard. AFAIK, the header
sent from the server overrides that contained in a meta tag. Browsers I use all
conform to this behaviour, and sorrows of non-Western Web developers grow. For
Chinese, we routinely use
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
to mark a page as Chinese. And this method allows us to place an ISO-8859-1
page on the same server/directory without worrying about server configurations.
I even do not know now how to achieve this effect if "AddDefaultCharset" is
ever used.
Security is important, but I do not think setting the default charset by the
SERVER is the right way to go. Indeed, I think the suggestion to use a default
charset has caused more problems than solved (see stories below). It is the
server-side SCRIPT that should take care of this. And I do not think the
comment in the conf file is correct: it really does harm, because setting it
will PREVENT Web developer from specifying the charset in their pages, who
should really be responsible for such issues.
By the way, some stories. Several times I have been called by colleagues
because they cannot make Apache display Chinese characters correctly on a newly
installed box. I once translated the mission page for webstandards.org, and
after a site migration it no longer displayed Chinese. After several emails the
non-Western pages are moved to a special server or directory and it was OK. Now
the page is archived at
http://archive.webstandards.org/mission_gb2312.html
And it is wrong AGAIN, along with other translations like Japanese!
What is the use of security, if it makes things inaccessible?
(Not to mention that it is a wrong response for a security issue. Even the page
http://www.cert.org/tech_tips/malicious_code_mitigation.html#3
mentions only the use of a meta tag like the gb2312 example above.)
To Dietmar:
Your opinions about Accept-Charset are correct only if
1) A Chinese user can set his browser to accept only GB2312;
2) A Chinese user never need to view ISO-8859-1 pages, or the browser supports
per-page configuration of Accept-Charset; and
3) If "Accept-Charset: gb2312" is sent to the server, the server will not send
the default "charset=ISO-8859-1".
I do not see any of them holds.

I agree that shipping with an AddDefaultCharset preset is unsatisfactory,
and screws up users of servers with unresponsive admins.
Can we simply remove it from the default config to deal with the case of authors
having more clue than their sysops? I'd be happy with that, but I'm going to
ask for review in other fora where folks have relevant expertise.
Actually the solution is already available to users. There's a bunch of
AddCharset directives in the default httpd.conf that serve precisely this
purpose:
AddCharset ISO-8859-1 .iso8859-1 .latin1
AddCharset ISO-8859-2 .iso8859-2 .latin2 .cen
AddCharset ISO-8859-3 .iso8859-3 .latin3
etc.
So a fix would be to correct errors and omissions in that list, and leave it
to authors to control their charset using a suffix on the document name.
Of course that's ugly, but at least it works.
Also worth noting: mod_proxy_html 2.x will parse META elements in
HTML and XHTML documents and convert them to real HTTP headers.
See http://apache.webthing.com/mod_proxy_html/

I just wrote:
> So a fix would be to correct errors and omissions in that list, and leave it
> to authors to control their charset using a suffix on the document name.
> Of course that's ugly, but at least it works.
Hmmm, I neglected to add that the ugliness goes away if that's used with
mod_negotiation: perhaps we shold ship with multivies on by default?
The other crucial issue is of course to document it!

In addition to the principal reasons given earlier there is also a pragmatic
reason not to use AddDefaultCharset in the default httpd.conf. Sending the
charset declaration triggers an obscure bug in MSIE with multipart forms, as
documented at
http://www.interactivetools.com/forum/gforum.cgi?post=34345;sb=post_latest_reply;so=ASC;forum_view=forum_view_collapsed
(at the bottom).
I know that Microsoft should fix their browser but I spend a lot of time today
debugging an old script that didn't work after upgrading to Apache 2 because of
this. I think the right thing is not to trigger bugs in a product that is still
used by so many users by shipping a httpd.conf that contains this as a default.

I'm surprised that this bug is still around. The only justification for that
that I was able to find in the record is the pointer to the Client Side
Scripting (CSS) issue. However, this is based on a shallow understanding
of CSS. In order to avoid CSS, just setting whatever character encoding
is not good enough. A solution requires that the client side gets the
right character encoding. Of course, declaring iso-8859-1 as a default
doesn't work for a huge amount of Web pages. So this default should be
removed as quickly as possible, and the documentation for CSS should be
updated to make more clear that it's not "declare an encoding" but
"declare the right encoding" that is important (also for other reasons
than just security).
I can easily provide more information (e.g. a page that shows how use
of the wrong encoding, such as declaring a page as iso-8859-1 that
isn't iso-8859-1 can lead to attacks) if contacted directly.

This is ASF Bugzilla: the Apache Software Foundation bug system. In case
of problems with the functioning of ASF Bugzilla, please contact
bugzilla-admin@apache.org.
Please Note: this e-mail address is only for reporting problems
with ASF Bugzilla. Mail about any other subject will be silently
ignored.