Building and Installing Packages using hatch
Last updated on 2025-05-06 | Edit this page
Overview
Questions
- How can we manage our Python environment?
- How can we install our own packages?
Objectives
- Use
conda
to manage Python environments - Understand what happens when we install a package
- Use
pip
andhatch
to install packages to our local environment
Introduction
In the first lesson, we showed how to use the PYTHONPATH
environment variable to enable us to import our modules and packages
from anywhere on our system. There are a few disadvantages to this
method:
- If we have two different versions of a package on our system at
once, it can be tedious to manually update
PYTHONPATH
whenever we want to switch between them. - If we have multiple Python environments on our system (using tools
such as
venv
orconda
), settingPYTHONPATH
will affect all of them. This can lead to unexpected dependency conflicts that can be very hard to debug. - If we share our software with others and require them to update
their own
PYTHONPATH
, they will need to install any requirements for our package separately, which can be error prone.
It would be preferable if we could install our package using
pip
, the same way that we would normally install external
Python packages. However, if we enter the top level directory of our
project and try the following:
We get the following error:
OUTPUT
ERROR: Directory '.' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.
In order to make our project installable, we need to add the either
the file pyproject.toml
or setup.py
to our
project. For modern Python projects, it is recommended to write only
pyproject.toml
. This was introduced by PEP 517, PEP 518 and PEP 621 as a standard way
to define a Python project, and all tools that build, install, and
publish Python packages are expected to use it.
By making our project pip
-installable, we’ll also make
it very easy to publish our packages on public repositories – this will
be covered in our lesson on package
publishing. After publishing our work, our users will be able to
download and install our package using pip
from any machine
of their chocie!
To begin, we’ll introduce the concept of a ‘Python environment’, and how these can help us manage our workflows.
Managing Python Environments
When working with Python, it can sometimes be beneficial to install packages to an isolated environment instead of installing them globally. Usually, this is done to manage competing dependencies:
- Project B might depend upon Project A, but may have been written to use version 1.0.
- Project C might also depend upon Project A, but may instead only work with version 2.0.
- If we install Project A globally and choose version 2.0, then Project B will not work. Similarly, if we choose version 1.0, Project C will not work.
A good way to handle these sorts of conflicts is to instead use
virtual environments for each project. A number of tools have
been developed to manage virtual environments, such as
venv
, which is a standard built-in Python tool, and
conda
, which is a powerful third-party tool.
Callout
You can pip install
packages into a conda
virtual environment.
If we’re using Linux, we can find which Python environment we’re using by calling:
If we’re using the default system environment, the result is something like the following:
OUTPUT
/usr/bin/python
To create a new conda
environment called “packaging”
with Python v3.12 installed, we can call:
BASH
# Create a new environment
$ conda create -n packaging python=3.12 matplotlib hatch -c conda-forge
# It is a good idea to clean up cached and unused files
$ conda clean --all
# You can always check available envs with
$ conda env list
Now we can activate our new conda environment:
BASH
# Start the env
$ conda activate packaging
# To check which packages are installed in the env run:
$ conda list
Checking which Python we’re running should now give a different result:
OUTPUT
/Users/username/miniforge3/envs/packaging/bin/python
If we now install a new package, it will be installed within our new virtual environment instead of being installed to the system libraries. For example:
We can check the location of numpy
using
pip show
:
site-packages
is a standard location to store installed
Python packages.
If we no longer wish to use this virtual environment, we can return to the base environment by calling:
Virtual environments are very useful when we’re testing our code, as they allow us to create a fresh Python environment without any of the installed packages we normally use in our work. This will be important later when we add dependencies to our package, as this allows us to test whether our users will be able to install and run our code properly using a fresh environment.
An Overview of TOML files
pyproject.toml
is a TOML file, which stands for ‘Tom’s
Obvious Minimal Langauge’ (named for its developer, Thomas
Preston-Werner, who cofounded GitHub). There are many configuration file
formats in common usage, such as YAML, JSON, and INI, but the Python
community chose TOML as it provides some benefits over the
competition:
- Designed to be human writable and human readable.
- Can map unambiguously to a hash table (a
dict
in Python). - It has a formal specification, so has an unambiguous set of rules.
A TOML file contains a series of key = value
pairs,
which may be grouped into sections using a header enclosed in square
brackets, such as [section name]
. The values are typed,
unlike some other formats where all values are strings. The available
types are strings, integers, floats, booleans, and dates. It is possible
to store lists of values in arrays, or store a series of key-value pairs
in tables. For example:
TOML
# file: mytoml.toml
int_val = 5
float_val = 0.5
string_val = "hello world"
bool_val = true
date_val = 2023-01-01T08:00:00
array = [1, 2, 3]
inline_table = {key = "value"}
# Section headings allow us to define tables over
# multiple lines
[header_table]
name = "John"
dob = 2002-03-05
# We can define subtables using dot notation
[header_table.subtable]
foo = "bar"
We can read this using the tomllib
library in
Python:
PYTHON
>>> import tomllib
>>> with open("mytoml.toml", "r") as f:
... data = toml.load(f)
>>> print(data)
The result is a dictionary object, with TOML types converted to their corresponding Python types:
{
'int_val': 5,
'float_val': 0.5,
'string_val': 'hello world',
'bool_val': True,
'date_val': datetime.datetime(2023, 1, 1, 8, 0),
'array': [1, 2, 3],
'inline_table': {'key': 'value'},
'header_table': {
'name': 'John',
'dob': datetime.date(2002, 3, 5),
'subtable': {
'foo': 'bar'
}
}
}
Installing our package with pyproject.toml
First, we will show how to write a relatively minimal
pyproject.toml
file so that we can install our projects
using pip
. We will then cover some additional tricks that
can be achieved with this file:
- Use alternative directory structures
- Include any data files needed by our code
- Generate an executable so that our scripts can be run directly from the command line
- Configure our development tools.
To make our package pip
-installable, we should add the
file pyproject.toml
to the top-level
learn-hatch
directory:
📁 learn-hatch
|
|____📜 pyproject.toml
|
|____📁 src
|
|____📦 epi_models
|
|____📜 __init__.py
|____📜 __main__.py
|
|____📁 models
| |
| |____📜 __init__.py
| |____📜 SIR.py
| |____📜 SEIR.py
| |____📜 SIS.py
| |____📜 utils.py
|
|____📁
plotting
|
|____📜 __init__.py
|____📜 plot_SIR.py
|____📜
plot_SEIR.py
|____📜 plot_SIS.py
The first section in our pyproject.toml
file should
specify which build system we wish to use, and additionally specify any
version requirements for packages used to build our code.
We will choose to use hatchling
, which requires the
following:
TOML
# file: pyproject.toml
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
-
requires
is set to a list of strings, each of which names a dependency of the build system and (optionally) its minimum version. This uses the same version syntax aspip
. -
build-backend
is set to a sub-module ofhatchling
which implements the PEP 517 build interface.
With our build system determined, we can add some metadata that defines our project. At a minimum, we should specify the name of the package, its version, and our dependencies:
TOML
# file: pyproject.toml
# Build system configuration
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
# Project metadata
[project]
name = "epi_models"
version = "0.1.0"
dependencies = [
"matplotlib",
]
# Hatch build configuration
[tool.hatch.build]
source = "src"
That’s all we need! We’ll discuss versioning in our lesson on publishing. With this done, we can install our package using:
This will automatically download and install our dependencies, and our package will be importable regardless of which directory we’re in.
The installed package can be found in the directory
/Users/username/miniforge3/envs/packaging/lib/python3.12/site-packages/epi_models
.
If we look inside our installed package, we’ll see that our files
have been copied, and there is also a __pycache__
directory:
__init__.py __main__.py models plotting __pycache__
The __pycache__
directory contains Python bytecode,
which is a lower-level version of Python that is understood by the
Python Virtual Machine (PVM). All of our Python code is converted to
bytecode when it is run or imported, and by pre-compiling our package it
can be imported much faster. If we look into the directories
models
and plotting
, we’ll see those have been
compiled to bytecode too.
If we wish to uninstall, we may call:
We can also create an ‘editable install’, in which any changes we make to our code are instantly recognised by any codes importing it – this mode can be very useful when developing our code, especially when working on documentation or tests.
Callout
The ability to create editable installs from a
pyproject.toml
-only build was standardised in PEP 660, and only recently
implemented in pip
. You may need to upgrade to use this
feature:
There are many other options we can add to our
pyproject.toml
to better describe our project. PEP 621 defines a minimum
list of possible metadata that all build tools should support, so we’ll
stick to that list. Each build tool will also define synonyms for some
metadata entries, and additional tool-specific metadata. Some of the
recommended core metadata keys are described below:
TOML
# file: pyproject.toml
# Build system configuration
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
# Project metadata
[project]
name = "epi_models"
version = "0.1.0"
dependencies = [
"matplotlib",
]
# A simple summary of the project
description = "My wonderful Python package"
# Full description of the project.
# Should be the path to your README file, relative to pyproject.toml
readme = "README.md"
# The Python version required by the project
requires-python = ">=3.8"
# The license of your project.
# Can be provided as a file or a text description.
# Discussed in the lesson on publishing
license = {file = "LICENSE.md"}
# or...
license = {text = "BDS 3-Clause License"}
# Authors / maintainers
# Each entry can have a name and/or an email
authors = [
{name = "My Name", email = "my.email@email.net"},
{name = "My Friend", email = "their.email@email.net"},
]
# dependencies: Array of Strings
# A list of requirements for our package
dependencies = [
"matplotlib",
]
# Project URLs
[project.urls]
homepage = "https://github.com/username/learn-hatch"
documentation = "https://github.com/username/learn-hatch"
repository = "https://github.com/username/learn-hatch"
# Hatch build configuration
[tool.hatch.build]
source = "src"
TODO: Discuss including package data file with hatch.
Installing Scripts
If our package contains any scripts and/or a __main__.py
file, we can run those from anywhere on our system after
installation:
With a little extra work, we can also install a simplified interface
that doesn’t require python -m
in front. This is how tools
like pip
can be invoked using two possible methods:
This can be achieved by adding a table scripts
under the
[project]
header:
TOML
# file: pyproject.toml
# Add entry points
[project.scripts]
epi_models = "epi_models.__main__:main"
This syntax means that we should create a console script
epi_models
, and that running it should call the function
main()
from the file epi_models/__main__.py
.
This will require a slight modification to our __main__.py
file. All that’s necessary is to move everything from the script into a
function main()
that takes no arguments, and then to call
main()
at the bottom of the script:
This will allow us to run our package as a script directly from the command line
Note that we’ll still be able to run our code using the longer form:
If we have multiple scripts in our package, these can all be given invidual console scripts. However, these will also need to have a function name as an entry point:
TOML
# file: pyproject.toml
[project.scripts]
epi_models = "epi_models.__main__:main"
epi_models_sir = "epi_models.plotting.plot_SIR:main"
TODO: remove this?
So how do these scripts work? When we activate a virtual environment,
a new entry is added to our PATH
environment variable
linking to /path/to/my/env/bin/
:
After installing our console scripts, we can find a new file in this
directory with the name we assigned to it. For example,
/path/to/my/env/bin/epi_models
:
PYTHON
#!/path/to/my/env/bin/python
# -*- coding: utf-8 -*-
import re
import sys
from epi_models.__main__ import main
if __name__ == '__main__':
sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
sys.exit(main())
Installing our project has automatically created a new Python file
that can be run as a command line script due to the hash-bang
(#!
) on the top line, and all it does it import our main
function and run it. As it’s contained with the bin/
directory of our Python environment, it’s available for use as long
we’re using that environment, but as soon as we call
deactivate
, it is removed from our PATH
.
Setting Dependency Versions
TODO: make this an info box.
Earlier, when setting dependencies
in our
pyproject.toml
, we chose to specify a minimum requirement
for numpy
, but not for pyyaml
:
This indicates that pip
should install any version of
numpy
greater than 1.20, but that any version of
pyyaml
will do. If our installed numpy
version
is less than 1.20, or if it isn’t installed at all, pip
will upgrade to the latest version that’s compatible with the rest of
our installed packages and our Python version. We’ll cover software
versioning in more detail in the lesson on
publishing, but now we’ll simply cover some ways to specify which
software versions we need:
TOML
"numpy >= 1.20" # Must be at least 1.20
"numpy > 1.20" # Must be greater than 1.20
"numpy == 1.20" # Must be exactly 1.20
"numpy <= 1.20" # Must be 1.20 at most
"numpy < 1.20" # Must be less than 1.20
"numpy == 1.*" # Must be any version 1
If we separate our clauses with commas, we can combine these requirements:
A useful shorthand is the ‘compatible release’ clause:
This is equivalent to:
That is, we require anything which is version 1, provided it’s greater than 1.20. This would include version 1.25, but exlude version 2.0. We’ll come back to this later when we discuss publishing.
Optional Dependencies
Sometimes we might have dependencies that only make sense for certain
kind of user. For example, a developer of our library might need any
libraries we use to run unit tests or build documentation, but an end
user would not. These can be added as
optional-dependencies
:
TOML
# file: pyproject.toml
[project.optional-dependencies]
test = [
"pytest >= 5.0.0",
]
doc = [
"sphinx",
]
These dependencies can be installed by adding the name of each
optional dependency group in square brackets after telling
pip
what we want to install:
BASH
$ pip install ".[test]" # Include testing dependencies
$ pip install ".[doc]" # Include documentation dependencies
$ pip install ".[test,doc]" # Include all dependencies
Key Points
- We can configure our projects for
pip
installation by setting up apyproject.toml
file. - Simple projects require very little to be added to this file, but there are many optional extras we can add.