Handling Python Project dependencies

Introduction

If you ever started a Python project you might have heard of a requirements.txt file that contains a list of dependencies.

pelican
beautifulsoup4
ghp-import
Pillow==4.0.0
markdown
pysvg>=0.2,<0.3

However if you plan on shipping your project or library with Pypi, installing it in another project, or in a virtualenv you need to put them in the setup.py file directly in order for them to be installed automatically along with your code.

setup(name='saucisson',
  version='1.0.0.dev0',
  license='Apache License (2.0)',
  packages=find_packages(),
  include_package_data=True,
  zip_safe=False,
  install_requires=[
    'colander >= 1.3.2',
    'colorama',
    'cornice >= 2.4',
    'cornice_swagger >= 0.5',
    'jsonschema',
    'jsonpatch',
    'python-dateutil',
    'pyramid > 1.8',
    'pyramid_multiauth >= 0.8',  # User on policy selected event.
    'transaction',
    'pyramid_tm',
    'requests',
    'structlog >= 16.1.0',
    'waitress',
    'ujson >= 1.35'])

This raises a lot of questions:

How do I update project dependencies?

What were the known working dependencies version at a certain release?

How do I make sure a release of one of my dependencies doesn't break my release?

How do I make sure that my code is not used with a unsupported version of a dependency?

Avoid using requirements file to handle your program dependencies

Once your program (i.e saucisson) is released, users will install it with pip install saucisson, which means that python setup.py install needs to pick up the right dependencies for it.

It means that setup.py needs to know what are your program dependencies and not the requirements.txt file. Instead use your requirements.txt to keep a known working set of your dependencies at the time of a release.

We usually add a build-requirements makefile rule on our projects:

VIRTUALENV = virtualenv --python=python3


build-requirements:
    $(eval TEMPDIR := $(shell mktemp -d))
    $(VIRTUALENV) $(TEMPDIR)
    $(TEMPDIR)/bin/pip install -U pip  # Upgrade pip
    $(TEMPDIR)/bin/pip install -Ue .   # Develop the current project locally
    # Freeze the dependencies ignoring the dependency links.
    $(TEMPDIR)/bin/pip freeze | grep -v -- '^-e' > requirements.txt

If you run make build-requirements before tagging your release you will document what was the known working dependency set at the time of the release.

Then you can install your program later using this file as a dependency constraint file:

pip install saucisson -c requirements.txt

Avoid pinning version in your setup.py

You might be tempted to put dependencies versions in your setup.py, i.e. "pinning" them to a specific version.

You don't need to do so because you are using a constraint file, you are safe for future updates that might break your code.

On your CI, don't use a contraint file. It will help you to detect that you need to take some actions to support the new released version that breaks your tests.

However with that in mind, there are some cases when you still want to pin some version:

If you know that your project will not work with a lower version

If you are using a feature or API that didn't exist before or you hit a bug that was fixed later.

psycopg > 2.5
colander >= 1.3.2
ujson >= 1.35

It won't change the way pip handles your dependency, because even if you don't put this, pip will always try to install the latest version.

However it will detect if another library or the project using your library is trying to use it with a lower version that won't work with your code.

If you know that your project doesn't support yet the next release

I insist that this applies only if a new version version of a dependency has already been released and that your test suite doesn't run on it.

In that case, and only in that case, you can pin the dependency's version for the shortest possible time until you port your project to it.

It's common to encounter breaking changes when upgrading frameworks:

Pyramid < 1.8
django >1.6,<= 1.8

The danger of doing it is that you might create pkg_resources.VersionConflict errors.

When Python starts and imports your lib it will look at the requirements list and validate that all dependencies are installed with their expected version. If it is not the case Python will not let you start your application.

However when you install a dependency, pip will check if it is already installed without validating if the expected version is installed but rather if a version is installed.

If a lib already installed the dependency with a greater version in your virtualenv, pip will not upgrade it with the mandatory lower version.

An easy way to break things is to pin a max requests version for instance:

requests < 2.13

If you do that, you will end up having pkg_resources.VersionConflict error when running your program.

What is happening is that Python is checking the dependencies and will refuse to run if you have a greater requests version.

This can happen if another dependency already needed requests as a dependency and pip already installed it with a greater version.

So really do it only if you must.

What about test dependencies?

That's a good question, I am glad you asked.

Test dependencies are less of an issue. You can either use test_requires in your setup.py or a dev-requirements.txt file.

In the former you will need to run tests using python setup.py test which unfortunately doesn't install dependencies in a virtualenv

In the later you will need to make sure your test dependencies are installed before running the tests but tools like tox already do that for you.

Our take on this was to put test dependencies in a dev-requirements.txt file.

We have a Makefile rule that knows if dev-dependencies should be installed or not before running the tests target.

As a bonus it will automatically create a virtualenv if you don't have one already activated:

VIRTUALENV = virtualenv --python=python3
VENV := $(shell echo $${VIRTUAL_ENV-.venv})  # Use the activated virtualenv path or use .venv
PYTHON = $(VENV)/bin/python
INSTALL_STAMP = $(VENV)/.install.stamp
INSTALL_DEV_STAMP = $(VENV)/.dev_env_installed.stamp

install: $(INSTALL_STAMP)

$(INSTALL_STAMP): $(PYTHON) setup.py  # Refresh the virtualenv if setup.py changed
    $(VENV)/bin/pip install -U pip
    $(VENV)/bin/pip install -Ue .
    touch $(INSTALL_STAMP)

virtualenv: $(PYTHON)  # Create the virtualenv if needed (python executable not present)

$(PYTHON):
    $(VIRTUALENV) $(VENV)

install-dev: install $(DEV_STAMP)

# Refresh dev dependencies if dev-requirements.txt changed
$(DEV_STAMP): $(PYTHON) dev-requirements.txt
    $(VENV)/bin/pip install -Ur dev-requirements.txt
    touch $(DEV_STAMP)

tests-once: install-dev
    $(VENV)/bin/py.test tests/

serve: install
    $(VENV)/bin/python manage.py runserver

Conclusion

As a conclusion, when working with Python dependencies there are three process that needs to work well together.

Installing automatically a project version in a working state, you can do this using requirements constraint files.
Keeping the project up-to-date with next version of its dependencies without breaking previous released versions.
Installing a project as part as another project without it breaking the project with dependencies version conflicts.

Our solution to this is as follows:

Keep your project dependencies clean of any version number.
At release time, document what is the tested dependency versions.
Use that known working state as a way to install a given release of the project.
Run your CI without the constraint file to detect dependencies update that might break future release of your code.

In our way to the future

In a perfect world, it would be great to know in advance what is the next version of the library that will stop working with your code.

The good news is that this future is already there and that's what semantic versioning is trying to address.

What can break your code?

Changing the name of functions or objects
Changing the name of parameters or their order in the function call
Changing the way to configure the project

In one word, everything that changes the API that is exposed by the library will break the code of people relying on it.

What is Semantic Versioning?

Semantic versioning is a way to be able to tell if a new release of a dependency you are using is compatible with your code by looking at its version number.

Alternatively it is a way to tell people relying on your code if your release is likely to break their code or not.

In semantic versioning, the version number is made of a triplet of numbers separated by dots: i.e 2.11.5

They are call MAJOR.MINOR.PATCH

MAJOR version

The MAJOR number is the version of the public API that the library provides.

The API that you provide are the public function that you expect people that use your API to use or the plugin of your project to implements.

As soon as the MAJOR version is greater than 0, the API is considered to be stable and not to change unexpectidly.

It is really important to set the MAJOR version to 1 if it is not the case yet as soon as one project is using the code in production.

Note

In case your project is providing a web API or implements a protocol, it might be really tempting to say that the MAJOR version is also the version of the implemented protocol.

You should not do that because if your break the code API and not the protocol API and that both are coupled you will not be able to increment this MAJOR version. Instead the protocol version should be tracked in another protocol version information that could also follow semantic versioning.

MINOR version

The MINOR number is the most common number to change, it is incremented when you are improving the software, adding non-breaking features to existing API or refactoring existing code.

PATCH version

The PATCH number is incremented when the previous minor version has some bugs that affects users and that should be fixed as soon as possible. Non handled exceptions, cases that make the library fails and should not.

They are for bug fixes only and should has a really low impact on the existing code base.

The case of semantic versioning dependencies

If your project dependencies are using semantic versioning you can theoritically know that the next major version is likely to break it.

However minor should not break it and patched version must defpythontely not break it.

With projects using semantic versioning you could define your dependencies like that:

kinto >=5.1,<6
kinto-redis >=1.1,<2

Often people are afraid to increment the MAJOR version of their projects.

If it is the case ask yourself why, it should not be a problem to do so because if your project follows semantic versioning then that's how it should work.

Most of the time it is because you are also using your MAJOR number for something else so don't do it. You should not be worried about that, it is always great to increment a version number because it means your project is alive.

Revenir au début