Request for Comments: Use default_charset As Default Character Encoding

Introduction

This RFC proposes that use default_charset as default character encoding.

Current PHP does not have default encoding setting. This makes adoption of PHP 5.4 difficult, since PHP 5.4's htmlentities/htmlspecialchars is now default to UTF-8. Some applications are required to set proper encoding for htmlentities/htmlspecialchars for proper character processing. If users mixed ISO-8859-1 and UTF-8 (AND many other multibyte character encodings), it could cause security problem.

There are many encoding setting in php.ini and functions that users simply ignore and leave it alone. However, it is required to handle character encoding properly for secure programs.

Objectives of this proposal are:

Setting charset in HTTP header is recommended since the first XSS advisory in 2000 Feb. by CERT and Microsoft. (Better security)

There are too many encoding settings and it is better to consolidated.

If we have yet another multibyte string module in the future, the new common ini settings can be used. (No more module specific INIs)

Proposal

Set default_charset=“UTF-8” as PHP default for both compiled and php.ini-* option.

Use default_charset as default for encoding related php.ini settings and module/functions.

Not touched

zend.script_encoding

PHP 5.6 and master, introduce new php.ini setting. Old iconv.*/mbstring.* php.ini parameters will be removed for master PHP6. Use of iconv.*/mbstring.* php.ini parameters raise E_DEPRECATED for 5.6 and up.

php.input_encoding

php.internal_encoding

php.output_encoding

iconv.input_encoding (Default: php.input_encoding)

iconv.internal_encoding (Default: php.internal_encoding)

iconv.output_encoding (Default: php.output_encoding)

mbstring.http_input (Default: php.input_encoding)

mbstring.internal_encoding (Default: php.internal_encoding)

mbstring.http_output (Default: php.output_encoding)

all functions that take encoding option use php.internal_encoding as default (e.g. htmlentities/mb_strlen/mb_regex/etc)