Python Best Practices and Tips by Toptal Developers

0shares

This resource contains a collection of Python best practices and Python tips provided by our Toptal network members. As such, this page will be updated on a regular basis to include additional information and cover emerging Python techniques. This is a community driven project, so you are encouraged to contribute as well, and we are counting on your feedback.

Python is a high level language used in many development areas, like web development (Django, Flask), data analysis (SciPy, scikit-learn), desktop UI (wxWidgets, PyQt) and system administration (Ansible, OpenStack). The main advantage of Python is development speed. Python comes with rich standard library, a lot of 3rd party libraries and clean syntax. All this allows a developer to focus on the problem they want to solve, and not on the language details or reinventing the wheel.

Be Consistent About Indentation in the Same Python File.

Indentation level in Python is really important, and mixing tabs and spaces is not a smart, nor recommended practice. To go even further, Python 3 will simply refuse to interpret mixed file, while in Python 2 the interpretation of tabs is as if it is converted to spaces using 8-space tab stops. So while executing, you may have no clue at which indentation level a specific line is being considered.

For any code you think someday someone else will read or use, to avoid confusion you should stick with PEP-8, or your team-specific coding style. PEP-8 strongly discourage mixing tabs and spaces in the same file.

… Formatting should be the task of the IDE. Developers have already enough work to care about the size of tabs, how much spaces will an IDE insert, etc. The code should be formatted correctly, and displayed correctly on other configurations, without forcing developers to think about it.

Furthermore, it can be a good idea to avoid tabs altogether, because the semantics of tabs are not very well-defined in the computer world, and they can be displayed completely differently on different types of systems and editors. Also, tabs often get destroyed or wrongly converted during copy-paste operations, or when a piece of source code is inserted into a web page or other kind of markup code.

Please, Write Unit Tests and Doctests.

Avoiding the writing of automated tests just for the sake of shipping quickly is a bad practice. Sooner or later, a bug will hit the surface, and it will happen on the production server, resulting with customer’s downtime. Just because of a “completely” manually-tested new feature, which will break something “almost” unrelated. Maybe after many sleepless nights of the development team, the bug will be found. But it will be too late.

Maybe this whole mess could be simply avoided if the developer would use best practices of writing unit tests and doctests. And after implementing a newly written feature, he would have run the tests once across the whole project.

How to Deal with Hash Table Based Operations?

Many people are confused with the fact that almost all hash table based operations in Python are done in place, without returning an object itself. As an example, they usually remember dict.update() and almost all operation on sets:

Although, I know some people who say loop version is more obvious and readable. In the case when we are dealing with huge amounts of data, we can just replace square brackets with usual round ones (like David Beazly usually suggests), and our code becomes a lazy generator expression. Not to mention, when dealing with loop version we’ll need to move it into separate generator function, with “yield” and stuff, which can sometimes lead to even worse readability. You can read more from David himself on this whole topic.

Contributors

Nikolay is a software engineer with a good knowledge of Python, algorithms and data structures. He has experience with scalable and highly loaded systems architecture - Web technologies, NoSQL, OpenStack - as well as experience leading groups of developers.

Why Is It Important to Write Documentation or Inline Comments?

Many developers think they are saving time by avoiding writing comments with a semi-obfuscated code. Or, they think avoiding docstrings will help them meet deadlines. Stay assured within a short period of time you’ll hate yourself when you will not remember what and why you did something in the way you did while reading your own code.

In the future, probably you will leave the company, and your code will haunt all the members of your team who will come across this zombie-like code. There is just no excuse to not write a documentation. Writing a doc-strings and comments on complex code sections don’t take that too much time. The other way to approach writing your code is to name your functions, methods and variables to reflect the purpose of the component, making them “self-documented”.

Feel the Power of the Python Logic Operands.

Python’s logical operations don’t return just Boolean values True or False, they can also return the actual value of the operations.

This power of the Python logic operands can be used to speed up development and to increase code readability. In the following example, object will be fetched from the cache, or if it’s missed in the cache it will be fetched from the database:

A quick explanation of the provided example: it will first try to get an object from cache (check_in_cache() function). If it doesn’t return an object and returns a None instead, it will get it from the database (pull_from_db() function). Written in this way is better than the following code snippet, written in a standard way:

def get_obj():
result = check_in_cache()
if result is None:
result = pull_from_db()
return result

Our first code example solves a problem in one line of the code, which is better than four lines of code from the second code example. Not to mention the first code example is more expressive and readable.

Just one thing to watch for - you should be aware of returning objects with logical equivalent of False, like an empty lists for example. If check_in_cache() function returns such an object, it will be treated as missing, and will cause your app to call a pull_from_db function. So, in cases where your functions could be returning these kind of objects, consider using additional explicit is None check.

Contributors

Nikolay is a self-directed and organized professional. He is a results-oriented problem solver with sharp analytical abilities and excellent communication skills. He is a highly competent software engineer with 4 years of experience in Python programming and web technologies.

You may think you will save some development time by pass-ing exceptions by “just for now”. But it will take hours, if not days, to find future bugs later inside this code block, as any exception will be masked by the pass, and the error location will be moved and thrown somewhere else outside this try:except block which may look like the most innocent code.

A quote from Aaron:

In my nearly ten years of experience writing applications in Python, both individually and as part of a team, this pattern has stood out as the single greatest drain on developer productivity and application reliability, especially over the long term.

If you really want to pass one or two well-expected exceptions, then make it explicit instead of all-pass. In Python, “explicit is better than implicit”, like in the code example bellow:

Why Should I Use Generator Comprehensions?

By using parentheses () instead of square brackets [] for comprehensions, we tell Python to use a generator rather than to create a list. This can be very useful if the full list is not needed, or if it is expensive to compile due to some combination of the list being long, each object being big, or the conditional is expensive to compute.

My main generator comprehension use case is, at the time of the writing, when I want a single object from group of objects under some conditional, and when I expect many objects will satisfy the conditional but I only need one. Like in the example below:

The parentheses tell Python it is another comprehension, but to instead create a generator for it. That is, an iterable is set up so that it can be used to generate the same list as above, but only as far as you tell it. The .next() tells the generator to generate the first value. Since only the first value (6 in this case) is pulled off and saved, the generator goes away and the remaining 7, 8, and 9 are never computed.

Given the smallness of the data and the extra overhead, this works out to be slower than simply using the list. But let’s take a look at what happens with a longer list:

As before, the generator comprehension just sets up an iterator. That iterator is asked for its first value with .next(). The generator then runs until it can deliver the first value. It checks 0, which fails the conditional. It proceeds to check 1, which again fails. It keeps going until it finds the first value that passes the conditional (again, 6) which is returned and saved into y. The remaining 99993 values are never checked which performs only slightly worse than the short_list of length 10 and an order of magnitude better than the long_list using list comprehension.

Contributors

Fascinated by the intersection of abstraction and reality, Allen found his calling in data science. He seeks new ways to empower decision making with data. With his Masters of Science degree in Computational Intelligence and experience in data mining and predictive modeling, his speciality is in clustering, modeling, and predicting user behavior.

Don’t Make Everything a Class.

In Python, overusing classes and making everything a class is considered a bad practice. Jack Diederich in his talk at PyCon 2012, pointed out developers should stop creating classes and modules every now and then. Before creating one, developers should think hard. Most likely, they would be much better with a function.

In Python-Verse, `try: except: else` Construct Is a Natural Control Flow.

If you are coming from the C++ or Java world, the confusion around try: except: else is natural. However, Python adopted this construct so much differently than C++ or Java. In Python, try: except: else construct is considered a good practice, as it helps to realize one of the core Python philosophy: “It is easier to ask for forgiveness than permission”, or “EAFP paradigm”.

Trying to avoid this practice will result in a messy, unpythonic code. On the StackOverflow, a core Python developer Raymond Hettinger, portrayed the philosophy behind it:

In the Python world, using exceptions for flow control is common and normal. Even the Python core developers use exceptions for flow-control and that style is heavily baked into the language (i.e. the iterator protocol uses StopIteration to signal loop termination). In addition, the try-except-style is used to prevent the race-conditions inherent in some of the “look-before-you-leap” constructs.

For example, testing os.path.exists results in information that may be out-of-date by the time you use it. Likewise, Queue.full returns information that may be stale. The try-except-else style will produce more reliable code in these cases. In some other languages, that rule reflects their cultural norms as reflected in their libraries.
The “rule” is also based in-part on performance considerations for those languages.

Contributors

Nikolay is a self-directed and organized professional. He is a results-oriented problem solver with sharp analytical abilities and excellent communication skills. He is a highly competent software engineer with 4 years of experience in Python programming and web technologies.

Why You Should Avoid Using `from module import *` in Your Projects?

The practice of using a from module import * in your code can turn nice and clean modules into a nightmare. This practice is not a trouble-maker in small projects, consisting of just up to ten modules which are being developed by a small team. But when the project grows into a mid-sized project, and working team spreads across multiple teams in multiple locations, the code using this practice will start to see bewildering errors due to circular references.

The from module import * wild-card style leads to namespace pollution. You’ll get things in your local namespace that you didn’t expect to get. You may see imported names obscuring module-defined local names. You won’t be able to figure out where certain names come from. Although a convenient shortcut, this should not be in production code.

Let’s show it with the examples. As mentioned, the worst use case would be the following code:

Contributors

Nikolay is a self-directed and organized professional. He is a results-oriented problem solver with sharp analytical abilities and excellent communication skills. He is a highly competent software engineer with 4 years of experience in Python programming and web technologies.

Dealing With Pyc Files When Working With Git.

Python developers are familiar with the fact that Python automatically generates a byte code from the .py file, which is executed by the interpreter. This byte code is stored in a .pyc file, usually in the same directory of its respective source file. The .pyc generation can happen either when the main Python script is being executed or when a module is imported for the first time.

As an example, in many Python web frameworks when we create a views.py, a file containing the logic for our views, we will most probably get a views.pyc file in the same directory after running an instance of our application.

As developers, we often work with big codebases utilizing Git, and on projects with many developers in teams. This means a bunch of features are being developed at the same time, and thus we have to switch branches frequently to pair with other developers, or to review and test someone’s code. Depending on the differences between two branches, we can end up with .pyc files from the other branch, which can lead to unexpected behaviors.

Git is a well known source code management tool which cleverly provides hooks, a way to fire custom scripts when certain important actions occur. We can include hooks to the most used actions, like before or after committing, pushing, rebasing, merging, and similar.

One of the available hooks is a “post-checkout” hook, which is fired after we checkout another branch or specific commit. We can include our code to clean the .pyc files in the “post-checkout” hook. All Git projects have .git folder in the project’s root and from there we just need to edit (or create new) file .git/hooks/post-checkout, by adding the following code:

When the Python interpreter is invoked with the -O flag, optimized code is generated and stored in .pyo files. For that reason, our command checks for .pyc or .pyo files and removes them (*.py[co]). Python 3 stores its compiled bytecode in __pycache__ directories, so we included a command that will remove them as well.

In the end, after saving our hook file on Linux and Mac, we need to make sure we add execution permissions to it:

$ chmod +x .git/hooks/post-checkout

Contributors

Ivan is a developer with four years of experience in web development. He has strong skills in back-end development, mainly with Python language and Django and Pyramid frameworks. He also has good experience in front-end development, mainly using jQuery, AngularJS, and Bootstrap.

Should I Use Exceptions of Conditional Handling?

Python best practice is to use exceptions as a better way to handle “exceptional” cases. Unnecessary use of if’s may slow down your code. Although, keep in mind frequent “except” calls are even slower than plain if conditional handling, so you must use it wisely.

Contributors

Nikolay is a self-directed and organized professional. He is a results-oriented problem solver with sharp analytical abilities and excellent communication skills. He is a highly competent software engineer with 4 years of experience in Python programming and web technologies.