Problem/Motivation

Right now ImageInterface::getExtension() is used to check if an image is a supported image type or if the file is an image file. This is a bad pattern because:

The extension is a filesystem component that can lie about the file's nature,

A file having .PNG extension on disk may have a JPEG or even a non-image content,

Image type is not the same thing as the image extension.

Being a filesystem component, the extension is not something that is linked to the toolkit (ImageToolkitInterface) but to the image (ImageInterface). Toolkits should have image types, images should have image type (talking about content nature) and extension (talking about external, filesystem file extension).

Image types are determined (badly, by extension) in GDToolkit::getInfo() based on a hardcoded list of extensions.

If a custom module wants to add a new image type (extension), he normally has to extend GDToolkit and fork GDToolkit::getInfo(). Ugly!

The list of possible image types is something that should be provided by the toolkit. Some toolkits may provide more types than the other.

Extension should be decoupled from image type. Images created from image resources (images not on disk) or temporary images have no extension while they still have an image type.

Proposed resolution

Add supportedTypes method to ImageToolkitInterface.

supportedTypes should return a list of IMAGETYPE_* constants. Note that IMAGETYPE_* constants are PHP constants and are not bundled to GD library.

Declare a simple ImageInterface::isSupported() method in ImageInterface with implementation in Image that will be used to test if a image type is supported by the current toolkit or if a file is a valid image.

ImageInterface::getExtension() should ALWAYS return the filesystem extension and will not be used for other purposes anymore. What would be the logic of returning an extension different from the real filesystem extension as long we will not use the extension as a way to check the image type or if a file is image?. ImageInterface::getExtension() will fallback to image_type_to_extension() for images created from resources or images without any extension on filesystem (like temporary files).

Related Issues

It turned out in #2063373-20: Cannot save image created from scratch that Image::getExtension() needs to check if the image extension is a valid one (.png, .gif, .jpg). It's not the only place. Extension are already checked in GDToolkit::getInfo(). This is a code duplication and, worst, a hardcoded list of extensions.

If a custom module wants to add a new image extension, he normally has to extend GDToolkit and fork GDToolkit::getInfo(). This is really ugly! But, if #2063373: Cannot save image created from scratch will get in, then will have to extend Image too and fork the ::getExtension(). In fact #2063373-20: Cannot save image created from scratch is wrong because the list of possible extensions is something that should be provided by the toolkit. Some toolkits may provide more extensions than the other.

The problem are the JPEG images (IMAGETYPE_JPEG). There's one type and 2 extensions. image_type_to_extension(IMAGETYPE_JPEG) always returns .jpeg extension. That's why we need an additional check to get the extension from the path. Even then, an empty extension may be returned (e.g. temporary uploaded files have no extension). In such situation fallback to .jpg.

Note that before this patch the returned extensions for JPEG images was always .jpg regardless of how the file was saved on disk,

I don't fully understand the need to add in that image_type_to_extension logic, can we have a test that justifies it?

There's nothing special with image_type_to_extension(). It's the "PHP way" to map from image type (IMAGETYPE_* constants) to image extensions. What would be the alternative? Hardcoding another mapping array? I see this being the straight way for mapping (excepting the fact that is not completely handle the JPEG case).

I cleaned also the other interface methods by adding the "public" prefix.

That said, and looking at the problems that might give, shouldn't the method being added be named getAvailableImageTypes (or getSupportedImageTypes) and return a list of IMAGETYPE_XXX constants. The getInfo() method would be changed to return the extension as is, or, if absent (e.g. new file), use image_type_to_extension() to define one and that one should be (will be) lowercase.

Though conceptually better, this might give new problems if e.g. the imagemagick toolkit wants to define more supported file types than GD supports. This because the IMAGETYPE_ constants and the image_type_to_extension() function are quite GD specific: Imagemagick toolkit would need to define its own constants and image_type_to_extension() may no longer be called in a non-strict GD environment (i.e. outside the GD toolkit class).

* @return array
* FALSE, if the file could not be found or is not an image. Otherwise, a
* keyed array containing information about the image:
* - "width": Width, in pixels.
* - "height": Height, in pixels.
* - "extension": Commonly used file extension for the image.
* - "mime_type": MIME type ('image/jpeg', 'image/gif', 'image/png').

This changed my mind. It outdates #14 (case is not to be considered) and #18 (option 1 is actually better).

So IMO getInfo() should just return jpg (for jpg images, as that is the "commonly" used extension) and ignore the actual extension (jpeg vs jpg) and thus also the case (JPG vs jpg).

This will make the code easier as it only has to call image_type_to_extension() and thus does not have to do an if on type (jpg versus other) and does not have to call the pathinfo and strltolower functions.

Furthermore, I would still favor to decouple extensions and image formats a bit more, thus changing the name of the method to the one as suggested in #18 (getSupportedImageTypes) or to getSupportedImageFormats() (which I now prefer as imagetype is GD specific speak). It should return a list of strings that denote the supported image formats, probably being a list of the commonly used extensions. Thus 1 per image format, thus it should not return jpg and jpeg but only jpg.

I'm not against abstracting more the toolkit and images objects but I think that should be a new issue. This issue was only to provide a unified way to check the extension of an image. Let's keep the issue inside the initial scope.

This patch is BC with existing behavior (as it should IMO). We can do the suggested renaming (getSupportedImageFormats) in a later stage. but let's get this done so that the other issues blocked on this can progress.

I deliberately didn't came back with a patch because I needed more time to think on this. The question is: For what purpose is used, in fact, ImageInterface::getExtension()? (edited)

Let's see:

grep -nr --exclude-dir=core/vendor ">getExtension(" core

This returns 16 occurrences:

2 are unrelated -- one referring PHP $fileinfo->getExtension() and the other related to Twig.

5 are used to test if file is image -- if ($image->getExtension()) {...}

only 9 (all in GDToolkit and Tests) are really care about extension.

So, yes, you're right, I agree that we should make now this step.

Brief plan:

Add getSupportedTypes method to ImageToolkitInterface.

[EDIT] getSupportedTypes should return one of thea list ofIMAGETYPE_* constants or NULL. We need to test and make sure first that IMAGETYPE_* are NOT provided by GD extension but by PHP itself.

Declare a simple isImage() method in ImageInterface with implementation in Image that will be used for file testing.

::getExtension() should ALWAYS return the filesystem extension and will not be used for other purposes anymore. What would be the logic of returning an extension different from the real filesystem extension as long we will not use the extension as a way to check the image type or if a file is image?

::getExtension() will fallback to image_type_to_extension() for images created from resources or images without any extension on filesystem (like temporary files)

In the PHP documentation they are introduced in http://www.php.net/manual/en/image.constants.php. Though the path does not contain GD in the url, it is found under: PHP Manual > Function Reference > Image Processing and Generation > GD > Predefined constants.

OTOH: The functions image_type_to_extension() and image_type_to_mime_type() have a note that This function does not require the GD image library.... I am not sure how to check this. I work on Windows and cannot (AFAIK) disable GD, but will check that tomorrow, to late now for me.

#29:
It turns out that I could disable GD, some simple test after disabling revealed that:
- IMAGETYPE_xxx constants are available.
- Calling gd_info() resulted in a Fatal error: Call to undefined function gd_info().

#28:
- Not sure that we should go for IMAGETYPE_xxx constants, see my remark in last paragraph of #18. Why not going for mime types (for getSupportedTypes)? They are registered, so no confusion possible and they may be assumed to be complete (covering all image types that a toolkit may consider supporting). We then could use finfo_file (see http://www.php.net/manual/en/function.finfo-file.php) in isImage and check if it is supported with getSupportedTypes.
- getInfo should better remain BC (as in #27)

I did a review purely on going through the diff (i.e. I did not even apply it and have a look at the resulting classes). So far, I did not find any shocking points. Below are some minor points I would like to see changed.

Personally, I still prefer to switch to mime types. We now have some kind of duplication with properties type versus mimeType and methods getType() versus getMimeType(). So why drop the concept of extension as base to decide image format specific code on and instead introduce a new concept 'type' while we already have mimeType. What, e.g., if a toolkit supports Google's WebP image format for which PHP has no IMAGETYPE_XXX constant?

But if you can convince me, I'm willing to accept the way as the current patch does.

IMAGETYPE_* constants seems to be the most abstracted way in PHP to represent image types. Also, I'm afraid that mime types are not consistent enough. For example I see that there are 2 mime types for JPEG images: image/jpeg and image/pjpeg (http://en.wikipedia.org/wiki/Internet_media_type#Type_image).

Below some inputs on some of your points. The others are fixed in the interdiff.txt.

#35.3: As $this->type is a protected property and is accessible from the class, why calling the getter?

Well, ... IMAGETYPE_xxx constants are not complete enough. The inconsistency mentioned seems to be an IE internal thing only and is (AFAICS) not listed on the IANA site.

#35.3: just 1 method less that needs to know about processInfo(). Moreover, it's a safe and sound programming principle: if a class defines getters and setters, they should be used in its own code as well.

But this is RTBC for me, even if you keep disagreeing on those points.

I've largely stayed out of this issue, mostly because @claudiu.cristea and @fietserwin both know the actual implications of the change better than I. But now that they've come to a consensus for this issue, I can vouch for the patch, and +1 RTBC.