Context
            zavod.context.Context
    The context is a utility object that is passed as an argument into crawlers and other runners.
It supports creating and emitting (storing) entities, accessing metadata and logging errors and warnings. It also has functions for fetching data from the web and storing it in the dataset's data folder.
            cache
  
      property
  
    A cache object for storing HTTP responses and other data.
            conn
  
      property
  
    Expose a database connection to the ETL store.
            data_url
  
      property
  
    The URL of the source data for the dataset.
            lang = None
  
      instance-attribute
  
    Default language for statements emitted from this dataset
            timestamps
  
      property
  
    An index of the first_seen time of every statement previous emitted by the dataset. This is used to determine if a statement is new or not.
            version
  
      property
  
    The current version of the dataset.
            audit_data(data, ignore=[])
    Print the formatted data object if it contains any fields not explicitly excluded by the ignore list. This is used to warn about unexpected data in the source by removing the fields one by one and then inspecting the rest.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                data
             | 
            
                  Dict[Any, Any]
             | 
            
               A mapping which is to be checked.  | 
            required | 
                ignore
             | 
            
                  List[Any]
             | 
            
               List of string keys to be skipped when checking the mapping  | 
            
                  []
             | 
          
            begin(clear=False)
    Prepare the context for running the exporter.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                clear
             | 
            
                  bool
             | 
            
               Remove the existing resources and issues from the dataset.  | 
            
                  False
             | 
          
            clear_url(fingerprint)
    Remove a given URL from the cache using request fingerprint Args: fingerprint: The unique fingerprint of the request. Returns: None
            close()
    Flush and tear down the context.
            debug_lookups()
    Output a list of unused lookup options.
            emit(entity, external=False, origin=None)
    Send an entity from the crawling/runner process to be stored.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                entity
             | 
            
                  Entity
             | 
            
               The entity to be stored.  | 
            required | 
                external
             | 
            
                  bool
             | 
            
               Whether the entity is an enrichment candidate or already part of the dataset.  | 
            
                  False
             | 
          
                origin
             | 
            
                  Optional[str]
             | 
            
               Set the origin for statements where none has been provided.  | 
            
                  None
             | 
          
            export_resource(path, mime_type=None, title=None)
    Register a file as a data resource exported by the dataset.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                path
             | 
            
                  Path
             | 
            
               The file path of the exported resource  | 
            required | 
                mime_type
             | 
            
                  Optional[str]
             | 
            
               MIME type of the resource, will be guessed otherwise  | 
            
                  None
             | 
          
                title
             | 
            
                  Optional[str]
             | 
            
               A human-readable description.  | 
            
                  None
             | 
          
Returns:
| Type | Description | 
|---|---|
                  DataResource
             | 
            
               The generated resource object which has been saved.  | 
          
            fetch_html(url, params=None, headers=None, auth=None, cache_days=None, method='GET', data=None, absolute_links=False)
    Execute an HTTP request using the contexts' session and return
an HTML DOM object based on the response. If a cache_days argument
is provided, a cache will be used for the given number of days.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                url
             | 
            
                  str
             | 
            
               The URL to be fetched.  | 
            required | 
                params
             | 
            
                  ParamsType
             | 
            
               URL query parameters to be included in the URL.  | 
            
                  None
             | 
          
                headers
             | 
            
                  _Headers
             | 
            
               HTTP request headers to be included.  | 
            
                  None
             | 
          
                auth
             | 
            
                  _Auth
             | 
            
               HTTP basic authorization username and password to be included.  | 
            
                  None
             | 
          
                cache_days
             | 
            
                  Optional[int]
             | 
            
               Number of days to retain cached responses for.  | 
            
                  None
             | 
          
                method
             | 
            
                  str
             | 
            
               The HTTP method to use for the request.  | 
            
                  'GET'
             | 
          
                data
             | 
            
                  Optional[_Body]
             | 
            
               The data to be sent in the request body.  | 
            
                  None
             | 
          
                absolute_links
             | 
            
                  bool
             | 
            
               Whether to convert relative links to absolute links.  | 
            
                  False
             | 
          
Returns: An lxml-based DOM of the web page that has been returned.
            fetch_json(url, params=None, headers=None, auth=None, cache_days=None, method='GET', data=None)
    Execute an HTTP request using the contexts' session and return
a JSON-decoded object based on the response. If a cache_days argument
is provided, a cache will be used for the given number of days.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                url
             | 
            
                  str
             | 
            
               The URL to be fetched.  | 
            required | 
                params
             | 
            
                  ParamsType
             | 
            
               URL query parameters to be included in the URL.  | 
            
                  None
             | 
          
                headers
             | 
            
                  _Headers
             | 
            
               HTTP request headers to be included.  | 
            
                  None
             | 
          
                auth
             | 
            
                  _Auth
             | 
            
               HTTP basic authorization username and password to be included.  | 
            
                  None
             | 
          
                cache_days
             | 
            
                  Optional[int]
             | 
            
               Number of days to retain cached responses for.  | 
            
                  None
             | 
          
                method
             | 
            
                  str
             | 
            
               The HTTP method to use for the request.  | 
            
                  'GET'
             | 
          
Returns:
| Type | Description | 
|---|---|
                  Any
             | 
            
               The decoded response body as a JSON-decoded object.  | 
          
            fetch_resource(name, url, auth=None, headers=None, method='GET', data=None)
    Fetch a URL into a file located in the current run folder, if it does not exist.
            fetch_response(url, headers=None, auth=None, method='GET', data=None)
    Execute an HTTP request using the contexts' session.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                url
             | 
            
                  str
             | 
            
               The URL to be fetched.  | 
            required | 
                headers
             | 
            
                  _Headers
             | 
            
               HTTP request headers to be included.  | 
            
                  None
             | 
          
                auth
             | 
            
                  _Auth
             | 
            
               HTTP basic authorization username and password to be included.  | 
            
                  None
             | 
          
                method
             | 
            
                  str
             | 
            
               The HTTP method to use for the request.  | 
            
                  'GET'
             | 
          
                data
             | 
            
                  Optional[_Body]
             | 
            
               The data to be sent in the request body.  | 
            
                  None
             | 
          
Returns: A response object.
            fetch_text(url, params=None, headers=None, auth=None, cache_days=None, method='GET', data=None)
    Execute an HTTP request using the contexts' session and return
the decoded response body. If a cache_days argument is provided, a
cache will be used for the given number of days.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                url
             | 
            
                  str
             | 
            
               The URL to be fetched.  | 
            required | 
                params
             | 
            
                  ParamsType
             | 
            
               URL query parameters to be included in the URL.  | 
            
                  None
             | 
          
                headers
             | 
            
                  _Headers
             | 
            
               HTTP request headers to be included.  | 
            
                  None
             | 
          
                auth
             | 
            
                  _Auth
             | 
            
               HTTP basic authorization username and password to be included.  | 
            
                  None
             | 
          
                cache_days
             | 
            
                  Optional[int]
             | 
            
               Number of days to retain cached responses for.   | 
            
                  None
             | 
          
                method
             | 
            
                  str
             | 
            
               The HTTP method to use for the request.  | 
            
                  'GET'
             | 
          
                data
             | 
            
                  Optional[_Body]
             | 
            
               The data to be sent in the request body.  | 
            
                  None
             | 
          
Returns:
| Type | Description | 
|---|---|
                  Optional[str]
             | 
            
               The decoded response body as a string.  | 
          
            flush()
    Flush the context to ensure all data is written to disk.
            get_resource_path(name)
    Get the path to a file in the dataset data folder.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                name
             | 
            
                  PathLike
             | 
            
               The name of the file, relative to the dataset data folder.  | 
            required | 
Returns:
| Type | Description | 
|---|---|
                  Path
             | 
            
               The full path to the file.  | 
          
            inspect(obj)
    Display an object in a form suitable for inspection.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                obj
             | 
            
                  Any
             | 
            
               The object to be logged in pretty print.  | 
            required | 
            lookup(lookup, value, *, warn_unmatched=False)
    Invoke a datapatch lookup defined in the dataset metadata.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                lookup
             | 
            
                  str
             | 
            
               The name of the lookup. The key under the dataset lookups property.  | 
            required | 
                value
             | 
            
                  Optional[str]
             | 
            
               The data value to look up.  | 
            required | 
                warn
             | 
            
               Whether to log a warning if no match is found.  | 
            required | 
            lookup_value(lookup, value, default=None, *, warn_unmatched=False)
    Invoke a datapatch lookup defined in the dataset metadata, returning the value attribute.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                lookup
             | 
            
                  str
             | 
            
               The name of the lookup. The key under the dataset lookups property.  | 
            required | 
                value
             | 
            
                  Optional[str]
             | 
            
               The data value to look up.  | 
            required | 
                default
             | 
            
                  Optional[str]
             | 
            
               The default value to use if the lookup doesn't match the value.  | 
            
                  None
             | 
          
                warn_unmatched
             | 
            
                  bool
             | 
            
               Whether to log a warning if no match is found.  | 
            
                  False
             | 
          
            make(schema)
    Make a new entity with some dataset context set.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                schema
             | 
            
                  Union[str, Schema]
             | 
            
               The entity's type name  | 
            required | 
Returns:
| Type | Description | 
|---|---|
                  Entity
             | 
            
               A newly created entity object of the given type, with no ID.  | 
          
            make_id(*parts, prefix=None, hash_prefix=None)
    Make a hash-based entity ID from a list of strings, prefixed with the dataset prefix.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                prefix
             | 
            
                  Optional[str]
             | 
            
               Use this prefix in the slug, but not the hash.  | 
            
                  None
             | 
          
                hash_prefix
             | 
            
                  Optional[str]
             | 
            
               Use this prefix in the hash, but not the slug.  | 
            
                  None
             | 
          
            make_slug(*parts, strict=True, prefix=None)
    Make a slug-based entity ID from a list of strings, using the dataset prefix.
            parse_resource_xml(name)
    Parse a file in the resource folder into an XML tree.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                name
             | 
            
                  PathLike
             | 
            
               The resource name or relative file path.  | 
            required | 
Returns:
| Type | Description | 
|---|---|
                  _ElementTree
             | 
            
               An lxml element tree of the parsed XML.  | 
          
            zavod.entity.Entity
    
              Bases: StatementEntity
Entity for sanctions list entries and adjacent objects.
Add utility methods to the EntityProxy for extracting data from sanctions lists and for auditing parsing errors to structured logging.
            add_cast(schema, prop, values, cleaned=False, fuzzy=False, format=None, lang=None, original_value=None)
    Set a property on an entity. If the entity is of a schema that doesn't have the given property, also modify the schema (e.g. if something has a birthDate, assume it's a Person, not a LegalEntity).
            add_schema(schema)
    Try to apply the given schema to the current entity, making it more
specific (e.g. turning a LegalEntity into a Company). This raises an
exception if the current and new type are incompatible.
            unsafe_add(prop, value, cleaned=False, fuzzy=False, format=None, quiet=False, schema=None, dataset=None, seen=None, lang=None, original_value=None, origin=None)
    Add a statement to the entity, possibly the value.