zavod can be installed as a standalone Python application, or by using Docker as a runtime environment. In order to choose the correct installation path, consider the following questions: Do you just want to execute the existing crawlers, or change them and add your own data sources? Getting
zavod to run inside a Docker container is very easy, but it makes working on the code harder and stands in the way of debugging a crawler as it is being developed.
In any case, you will need to check out the OpenSanctions repository which houses the
zavod application to your computer:
The steps below assume you're working within a checkout of that repository.
If you have Docker installed on your computer, you can use the supplied
docker-compose configuration to build and run a container that hosts the application:
Once the container images have been built, you can run the
tool within the container:
$ docker-compose run --rm app zavod --help
# Or, run a specific subcommand:
$ docker-compose run --rm app zavod crawl datasets/ua/edr/ua_edr.yml
# You can also just run a shell inside the container, and then execute multiple
# commands in sequence:
$ docker-compose run --rm app bash
container$ zavod crawl datasets/ua/edr/ua_edr.yml
# The above command to spawn an interactive shell is also available as:
$ make shell
The docker environment will provide the commands inside the container with access to the
data/ directory in the current working directory, i.e. the repository root. You can find any generated outputs and the copy of the processing database in that directory.
Python virtual environment
The application is a fairly stand-alone Python application, albeit with a large number of library dependencies. That's why we suggest that you should never install
zavod directly into your system Python, and instead always use a virtual environment. Within a fresh virtual environment (Python >= 3.10), you should be able to install
If you encounter any errors during the installation, please consider googling errors related to libraries used by
zavod (e.g.: SQLAlchemy, Python-Levenshtein, click, etc.).
zavod has dependecies on PyICU - a library related to the transliteration of names in other alphabets to the latin character set - and Plyvel - a fast and feature-rich Python interface to LevelDB. The installation and configuration of both libraries can be complex due to system dependencies. Consider following the PyICU and Plyvel documentation for the installation of both libraries.
Plyvel on Mac OS X: issue
ZAVOD_DATA_PATHis the main working directory for the system. By default it will contain cached artifacts and the generated output data. This defaults to the
data/subdirectory of the current working directory when the
zavodcommand is invoked.
ZAVOD_RESOLVER_PATHmust be set to the path to a nomenklatura resolver JSON lines file. It can be an empty file. e.g.
True) - When true, attempts to sync PEP positions with our positions database, requiring
ZAVOD_OPENSANCTIONS_API_KEYto be set with a valid key. Usually best set to