Python 2 vs 3: Security Differences

Python 3 and Python 2 have various functional differences. On their own, they’re not necessarily better or worse (though arguably Python 3 should be an improvement), but any change may introduce risk. This post highlights and explains a few differences between the versions that have security implications. Keep these in mind when migrating Python applications between the versions.

The Current State of Python Versions 2 and 3

Python 3 was first released in 2008. One problem that affected adoption was that Python 3 was not backward compatible. Many developers conitinued working with previous versions, primarily version 2, to avoid the effort of refactoring their applications. For this reason, support for Python 2 has continued, and it has remained in active development after the release of version 3.0.

Version 2.7 is the current and final release of Python 2, which is still actively used by many developers. 2.7 became available in 2010, and the Python Foundation has announced it will reach end of life in 2020.

Python 3.6 is the current Python version, and it first became available in 2016. Python 3 is considered the future of the Python programming language.

The main language differences between Python 2 and Python 3 are highlighted below. While none of the changes are security oriented, each has some security implications you should keep in mind.

Exception Chaining

Python 3 introduced exception chaining by default. In Python 2, when an exception occurs in a library referenced by the main Python program, by default, the details of that exception would not be shown in a traceback. In Python 3, the traceback shows full details of exceptions, both those thrown by the main program and by referenced libraries.

Any internals of your code or runtime behaviour, when exposed to the user, creates a security risk. Exception chaining, although it is convenient for debugging, exposes much more information to a potential attacker, who can run a traceback to see how the software reacts to errors.

It is possible to disable chained exceptions in Python 3, if you believe the risk outweighs the benefits.

Input method

Python 2 offers the eval() function, which is known to be insecure because it evaluates the text passed as a parameter, accepting an optional second argument for global values to use during evaluation. In Python 2, the input() function operates similarly to eval(), and is similarly insecure.

Attackers can exploit these functions by providing variable names within their input, and get the value of those variables, which may hold sensitive information.

Therefore, in Python 2 a security best practice is not to use eval() or input(), and instead use raw_input() which does not evaluate variables in the input text.

What changed in Python 3? raw_input() has been removed and its functionality has been transferred to a new function called input(). The new input() function operates just like raw_input() did in Python 2, so it is now safe to use.

However, if you run a Python 3 program in Python 2, all inputs can be exploited.

To summarize:

Function

Python 2.x

Python 3.x

eval()

Evaluates variables – insecure

Evaluates variables – insecure

input()

Evaluates variables – insecure

Does not evaluate variables, like raw_input() in Python 2

raw_input()

Does not evaluate variables

Removed

Integer Division

Python 2 treats integer division differently and less intuitively than Python 3. In Python 2, dividing integers always returns an integer value, which is the largest integer less than or equal to the answer. This method of division is known as floor division:

This more intuitive way of dealing with integer division is not backward compatible with Python 2. Therefore, it’s particularly dangerous to execute Python 3 code in Python 2 because integer-division behaviour does not raise a syntax error.

This also creates a potential security issue. For example, consider an app asking for a numeric PIN, and then dividing the PIN stored in the system by the user-provided one to indicate success. In Python 3, that would work well, but running the same app in Python 2 would allow many failed guesses through.

While this example is not realistic, it illustrates how relying on integer division to round numbers (in Python 2) or not round them (in Python 3) is very likely to change behaviour, and this could be exploited by attackers to circumvent protection mechanisms.

Using Unicode in Python 2 requires extra syntax—for example when using print, you must wrap the input text in the unicode() function to handle special characters.

The Unicode standard is much more versatile—it supports over 128,000 characters. In Python 3, Unicode is the default. You don’t need extra syntax to define Unicode values—they print automatically as utf-8 strings.

While powerful, this change can open up a substantial phishing risk. For example, it is possible to point users to a URL in which the characters of the domain name have been switched to letters in non-English alphabets. A post by Xudong Zheng demonstrated what this would look like to users:

At first glance, the URL looks fine, but all is not quite as it seems. While the URL in the example Xudong provided looks like “apple.com”, it’s actually “xn-pple-43d.com”. By using, for example, the Cyrillic “a” (U+0430) rather than the ASCII “a” (U+0061), you can make some browsers unintentionally obscure the real domain. So while the site appears to be secure, in fact the user is taken to a different domain which may contain security risks. This is known as a homograph attack.

This attack could be relevant in Python if targeted at a directory path parsed by the Python app, or a query parameter. For example, if the application exposes a path https://domain.com/Acme, an attacker might switch the letters “Acme” with characters in other alphabets, leading users to another account.

When porting an app from Python 2 to 3, you should consider that all inputs will be rendered as Unicode, possibly opening up a path for attackers to trick your users.

Vulnerabilities and security fixes in Python 3.6 vs. 2.7

Historic versions of both Python 2 and 3 had severe security vulnerabilities and should be avoided (see CVE Details for Python versions).

However, in the latest versions, 2.7.x and 3.6.x, most vulnerabilities have been fixed. Both Python 2 and 3 are currently actively maintained, and as long as you update to the latest version, you should be safe from major vulnerabilities. That being said, it is likely that Python 2 will receive less focus from the Python Software Foundation, and that it will receive less frequent security updates than Python 3.

Most significantly, Python 2 will reach end of life in 2020. From that point onwards, the language will not be maintained and security fixes will no longer be available. If you continue using Python 2, take into account that from 2020, the language will become insecure, and you should start planning your migration to Python 3.

Different dependency graphs, different vulnerable libraries

The lack of backward compatibility in Python 3 meant that pip, Python’s primary package manager, also had to create a 2.x and 3.x stream, and record which packages support which version. As a result, porting an application often requires changing the libraries your application uses, or at least their versions.

These libraries, like all software, are not perfect, and from time to time vulnerabilities are discovered. Once disclosed, application owners need to rush to fix these vulnerabilities before attackers exploit them—a race that Equifax, for instance, has sorely lost.

Fixes to vulnerabilities are often not ported to every possible version, so it’s critical that you include tests for known vulnerabilities as part of your porting process. If you’re sticking with Python 2 for any reason, the risk of vulnerabilities is even higher, as libraries are increasingly dropping support for the older version, including vulnerability fixes.