Author: jneves

Are you running your development environment with different packages than production? And your tests? Or even different parts of production?

The common advice (example) is to pin versions of packages in a requirements.txt file (that means defining which particular version works for your program). That advice fails because the most common way to pin your dependencies is installing the dependencies on a virtual environment and do a pip freeze > requirements.txt. The problems with this are multiple:

The environment might have broken dependencies. pip install only checks the dependency requirements for the last package installed, so that’s the only one guaranteed to work.

That environment will have the direct dependencies and, recursively, the dependencies of dependencies. It will be difficult to identify what your app actually requires in the future (particularly when checking for version conflicts on dependencies of dependencies).

This is not the only solution for this, you can use pipconflictchecker to deal with the first problem, or you can use pipenv, but I’ve been well served with pip-tools for years, so I thought I’d share my workflow and how it solves the problems above.

Then you need to create the file requirements.in, which has mostly the same format as a requirements.txt file. It will contain the packages and specific version your app requires. Usually I avoid setting versions in it, unless there’s a known issue with a specific package (and in that case the previous line will a commment and a link to the issue). This is the file you’re going to be creating/updating, all dependencies here should be used directly by your app.

The next step is “compiling” the requirements with pip-compile. This command will fetch the dependency information for each package and calculate a compatible set of dependencies that respects all the dependencies of all packages and their dependencies. The result is saved on a file called requirements.txt. You’ll notice that, for dependencies of dependencies, pip-compile generates comments explaining which packages are requiring it.

Once you have the requirements.txt file you can install it with pip install -r requirements.txt but I prefer pip-tools’ pip-sync which not only installs the packages but also uninstalls any package that is not on the requirements.txt file.

I also include the development/test dependencies in the requirements.in. That’s the only way to guarantee that the dependency set on requirements.txt has the same versions that are tested. It’s possible to pick up a requirements-dev.txt subset of requirements.txt and do a pip uninstall -r requirements-dev.txt before deploying to production, but even I’ll admit I only do that for more complex projects.

With this environment I can use then the following cheatsheet:

Action

Instructions

Add a package as a dependency.

Add the package name in requirements.in. Run pip-compile.

Remove a dependency the app no longer uses.

Remove the package name from requirements.in. Run pip-compile.

Upgrade dependencies to the current versions.

Run pip-compile --upgrade. (1)

Update the local environment.

Run pip-sync.

Deploy

Run pip-sync on the server/docker image build.

(1) I usually use this to create pull-requests to run the tests and QA the changes weekly.

Are you having problems with packaging your dependencies for a Lambda? You’ve done the AWS lambda python tutorials but now need to deal with something more complex? Do you just want to build a package to upload in the console or with terraform?

Zappa is a great tool to handle everything from publishing your code to have it integrated with other AWS services with the minimum modifications possible. That results in sometimes ignoring that you can use for some smaller roles, like just packaging a lambda in a zip file that you can then use on the console and/or terraform.

So, let’s imagine you have your code in a file called myfunction.py doing something like:

On the previous article Your queries are slow, but memory/cpu usage is low I mentioned that one of the reasons where you can observe slowness in an app on AWS that is using RDS is IOPS budgeting. The main symptom you’ve already gathered from the title and that’s why you are reading this: slow queries, but you can actually see whether the issue is IOPS budget, not because AWS made it easier to keep track of it (they haven’t), but because the effect is visible in the metrics: you’ll notice a flatline in the maximum Read/Write IOPS.

If you’re seeing a flatline at a number that is 3 times the storage capacity (eg: for a 10GB hard disk (EBS volume), that would be 30 IOPS) then you’ve run out of budget. If you see a bunch of peaks that are under and over that maximum, you have budget to spare.

If you see the flatline of slowness, you have two choices:

If you just want an increase to less than 1000 IOPS, you can increase the storage you are using. You probably won’t need the space, but you’re just doing it for the IOPS. If you have a multi-AZ RDS instance the downtime will be minimal (under a minute in my latest experience).

If you want more than 1000 IOPS then you can add Provisioned IOPS to your instance:

On the same screen as previously, click instance actions on the top right corner.

Under Storage type, choose: Provisioned IOPS (SSD).

Now a new option will appear for choosing how much capacity you want to provision:

Note: Remember that these changes will make you database slower during the conversion, so please do it at a time where your users don’t need it.

Recently I changed a function to receive a single argument instead of the previous two. Unfortunately that resulted in another function failing later. The issue wasn’t lack of tests (that function had 100% coverage), but the way this function was tested.

One of the downsides of using mock objects in tests is that you lose the connection to the original object. So the code above runs happily and the test passes because mock_function is a simple Mock object. But we don’t need to accept this, a simple change will make the Mock object validate that the arguments are valid for the original functional:

By adding autospec=True to the patch call, the arguments are validated against the original function signature, and in this case the call would raise a TypeError: function_changed() takes 1 positional argument but 2 were given and the test would fail. And now you’d know that something was wrong well before it gets deployed…

You’re noticing timeouts on queries running on your PostgreSQL database? Or just slower response to queries? And when you look at RDS monitoring the cpu and memory look like the machine is doing nothing?

There are two main causes I’ve found for this: lock contention and IOPS limits. Today I’m talking about the most likely of them: lock contention.

So, first things first, how to identify if you have lock contention? There are two queries that will help you identify what’s going on:

SELECT COUNT(*) FROM pg_locks

This will show you how many locks are in your system at a point in time. If the database is not being used, the number should be zero. If it’s being used, I’d expect the locks to go up and sometimes fall down to zero (we’re talking about low load scenarios). If that doesn’t happen (or rarely happens) you might have slow transactions helding locks for a long time and making everything else wait for them.

You can confirm if that’s the case with this query:

SELECT * FROM pg_stat_activity WHERE state <> 'idle'

You’ll be looking at some states of idle in transaction and/or with wait_type lock.

If you see these signs, there are 2 strategies you can use to speed things up:

Optimise the queries that are causing the locks. The quickest return will probably come up from adding some indexes, I’d recommend talking a look at PostgresSQL’s wiki index usage query for where you’ll get the biggest return.

Reduce locks in your application (usually be eliminating queries). Remember those less than clear queries you were thinking of cleaning up another day? It might be the right time.

You did a zappa deploy and it failed with An error occurred (ValidationException) when calling the PutRule operation: Provided role <your lambda role> cannot be assumed by principal 'events.amazonaws.com'.?

You tried to create a lambda with a new handmade role only to be greeted by this cryptic error message. Or you tried to use an already existing role with lambda.

Translating the message: it means you haven’t authorized the events (events.amazonaws.com) service to assume the role, so lambdas can’t use it. So, how do we add that authorization?

Go to https://console.aws.amazon.com/iam/

Click roles on the left.

Click the role you want to use for lambda.

Click the tab trust relationships.

Click the button Edit trust relationship.

If this lambda is only to be used by lambda, you can just replace the policy by:

You did a zappa deploy and it failed with InvalidParameterValueException: An error occurred (InvalidParameterValueException) when calling the CreateFunction operation: The role defined for the function cannot be assumed by Lambda?

You tried to create a lambda with a new handmade role only to be greeted by this cryptic error message. Or you tried to use an already existing role with lambda.

Translating the message: it means you haven’t authorized the lambda service to assume the role, so lambdas can’t use it. So, how do we add that authorization?

Go to https://console.aws.amazon.com/iam/

Click roles on the left.

Click the role you want to use for lambda.

Click the tab trust relationships.

Click the button Edit trust relationship.

If this lambda is only to be used by lambda, you can just replace the policy by:

How to use?

Next, setup your sentry DSN as the value of environment variable SENTRY_DSN, either on the zappa_setting.json file or in any of the other methods on https://github.com/miserlou/zappa/#setting-environment-variables

Looking to delete old entries from a table because they’ve expired? Want to do it in an elegant way?

I usually like to split this kind of functionality into two different parts: the method that does the deleting and a static method that can be invoked from cron, a celery scheduled task or a django command.

As an example, let’s say we want to delete all the log entries on a system that are over 181 days (6 months) old.

You’ll notice the use of @classmethod, that’s needed so we can invoke from the class and not from an object, as I’m doing in the next function (the one that can be called from a celery scheduled task, for instance):

def delete_expired_logs():
LogEntry.delete_expired()

And with this you keep it elegant: all the relevant model relevant information in the model class, so if someone changes the timestamp field to another name, they will only have to change it in the delete_expired method, but you can easily call from somewhere else like a task or command.