Helpers
zavod.helpers
Data cleaning and entity generation helpers.
This module contains a number of functions that are useful for parsing real-world data (like XML, CSV, date formats) and converting it into FollowTheMoney entity structures. Factory methods are provided for handling common entity patterns as a way to reduce boilerplate code and improve consistency across datasets.
A typical use might look like this:
from zavod import Context
from zavod import helpers as h
def crawl(context: Context) -> None:
# ... fetch some data
for row in data:
entity = context.make("Person")
entity.id = context.make_id(row.get("id"))
# Using the helper guarantees a consistent handling of the
# attributes, and in this case will also automatically
# generate a full name for the entity:
h.apply_name(
entity,
first_name=row.get("first_name"),
patronymic=row.get("patronymic"),
last_name=row.get("last_name"),
title=row.get("title"),
)
context.emit(entity)
Any data wrangling code that is repeated in three or more crawlers should be considered for inclusion in the helper library.
apply_address(context, entity, address)
Link the given entity to the given address and emits the address.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
context
|
Context
|
The runner context used for emitting entities. |
required |
entity
|
Entity
|
The thing located at the given address. |
required |
address
|
Optional[Entity]
|
The address entity, usually constructed with |
required |
Source code in zavod/helpers/addresses.py
apply_date(entity, prop, text)
Apply a date value to an entity, parsing it if necessary and cleaning it up.
Uses the dates
configuration of the dataset to parse the date.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
entity
|
Entity
|
The entity to which the date will be applied. |
required |
prop
|
str
|
The property to which the date will be applied. |
required |
text
|
DateValue
|
The date value to be applied. |
required |
Source code in zavod/helpers/dates.py
apply_dates(entity, prop, texts)
Apply a list of date values to an entity, parsing them if necessary and cleaning them up.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
entity
|
Entity
|
The entity to which the date will be applied. |
required |
prop
|
str
|
The property to which the date will be applied. |
required |
texts
|
Iterable[DateValue]
|
The iterable of date values to be applied. |
required |
Source code in zavod/helpers/dates.py
apply_name(entity, full=None, name1=None, first_name=None, given_name=None, name2=None, second_name=None, middle_name=None, name3=None, patronymic=None, matronymic=None, name4=None, name5=None, tail_name=None, last_name=None, maiden_name=None, prefix=None, suffix=None, alias=False, name_prop='name', is_weak=False, quiet=False, lang=None)
A standardised way to set a name for a person or other entity, which handles normalising the categories of names found in source data to the correct properties (e.g. "family name" becomes "lastName").
Parameters:
Name | Type | Description | Default |
---|---|---|---|
entity
|
Entity
|
The entity to set the name on. |
required |
full
|
Optional[str]
|
The full name if available (this will otherwise be generated). |
None
|
name1
|
Optional[str]
|
The first name if numeric parts are used. |
None
|
first_name
|
Optional[str]
|
The first name. |
None
|
given_name
|
Optional[str]
|
The given name (also first name). |
None
|
name2
|
Optional[str]
|
The second name if numeric parts are used. |
None
|
second_name
|
Optional[str]
|
The second name. |
None
|
middle_name
|
Optional[str]
|
The middle name. |
None
|
name3
|
Optional[str]
|
The third name if numeric parts are used. |
None
|
patronymic
|
Optional[str]
|
The patronymic (father-derived) name. |
None
|
matronymic
|
Optional[str]
|
The matronymic (mother-derived) name. |
None
|
name4
|
Optional[str]
|
The fourth name if numeric parts are used. |
None
|
name5
|
Optional[str]
|
The fifth name if numeric parts are used. |
None
|
tail_name
|
Optional[str]
|
A secondary last name. |
None
|
last_name
|
Optional[str]
|
The last/family name name. |
None
|
maiden_name
|
Optional[str]
|
The maiden name (before marriage). |
None
|
prefix
|
Optional[str]
|
A prefix to the name (e.g. "Mr"). |
None
|
suffix
|
Optional[str]
|
A suffix to the name (e.g. "Jr"). |
None
|
alias
|
bool
|
If this is an alias name. |
False
|
name_prop
|
str
|
The property to set the full name on. |
'name'
|
is_weak
|
bool
|
If this is a weak alias name. |
False
|
quiet
|
bool
|
If this should not raise errors on invalid properties. |
False
|
lang
|
Optional[str]
|
The language of the name. |
None
|
Source code in zavod/helpers/names.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
|
assert_dom_hash(node, hash, raise_exc=False, text_only=False)
Assert that a DOM node has a given SHA1 hash.
Source code in zavod/helpers/change.py
assert_html_url_hash(context, url, hash, path=None, raise_exc=False, text_only=False)
Assert that an HTML document located at the URL has a given SHA1 hash.
Source code in zavod/helpers/change.py
assert_url_hash(context, url, hash, raise_exc=False, auth=None, headers=None)
Assert that a document located at the URL has a given SHA1 hash.
Source code in zavod/helpers/change.py
cells_to_str(row)
Return the string value of each HtmlElement value in the passed dictionary
Useful when all you need is the string value of each cell in a table row.
Source code in zavod/helpers/html.py
check_no_year(text)
Check for a few formats in which dates are given as day/month, with no year specified.
clean_note(text)
Remove a set of specific text sections from notes supplied by sanctions data publishers. These include cross-references to the Security Council web site and the Interpol web site.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
Union[Optional[str], List[Optional[str]]]
|
The note text from source |
required |
Returns:
Type | Description |
---|---|
List[str]
|
A cleaned version of the text. |
Source code in zavod/helpers/text.py
convert_excel_cell(book, cell)
Convert an Excel cell to a string, handling different types.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
book
|
Book
|
The Excel workbook. |
required |
cell
|
Cell
|
The Excel cell. |
required |
Returns:
Type | Description |
---|---|
Optional[str]
|
The cell value as a string, or |
Source code in zavod/helpers/excel.py
convert_excel_date(value)
Convert an Excel date to a string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
value
|
Optional[Union[str, int, float]]
|
The Excel date value (e.g. 44876). |
required |
Returns:
Type | Description |
---|---|
Optional[str]
|
The date value as a string, or |
Source code in zavod/helpers/excel.py
copy_address(entity, address)
Assign to full address text and country directly to the given entity.
This is an alternative to using apply_address
when the address should
be inlined into the entity, instead of emitting a separate address object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
entity
|
Entity
|
The entity to be assigned the address. |
required |
address
|
Optional[Entity]
|
The address entity to be copied into the entity. |
required |
Source code in zavod/helpers/addresses.py
extract_cryptos(text)
Extract cryptocurrency addresses from text.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
Optional[str]
|
The text to extract from. |
required |
Returns:
Type | Description |
---|---|
Dict[str, str]
|
A set of cryptocurrency IDs, with currency code. |
Source code in zavod/helpers/crypto.py
extract_date(dataset, text)
cached
Extract a date from the provided text using predefined formats
in the metadata.
If the text doesn't match any format, returns the original text.
Source code in zavod/helpers/dates.py
extract_years(text)
Try to locate year numbers in a string such as 'circa 1990'. This will fail if any numbers that don't look like years are found in the string, a strong indicator that a more precise date is encoded (e.g. '1990 Mar 03').
This is bounded to years between 1800 and 2100.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
a string to extract years from. |
required |
Returns:
Type | Description |
---|---|
List[str]
|
a set of year strings. |
Source code in zavod/helpers/dates.py
format_address(summary=None, po_box=None, street=None, house=None, house_number=None, postal_code=None, city=None, county=None, state=None, state_district=None, state_code=None, country=None, country_code=None)
cached
Given the components of a postal address, format it into a single line using some country-specific templating logic.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
summary
|
Optional[str]
|
A short description of the address. |
None
|
po_box
|
Optional[str]
|
The PO box/mailbox number. |
None
|
street
|
Optional[str]
|
The street or road name. |
None
|
house
|
Optional[str]
|
The descriptive name of the house. |
None
|
house_number
|
Optional[str]
|
The number of the house on the street. |
None
|
postal_code
|
Optional[str]
|
The postal code or ZIP code. |
None
|
city
|
Optional[str]
|
The city or town name. |
None
|
county
|
Optional[str]
|
The county or district name. |
None
|
state
|
Optional[str]
|
The state or province name. |
None
|
state_district
|
Optional[str]
|
The state or province district name. |
None
|
state_code
|
Optional[str]
|
The state or province code. |
None
|
country
|
Optional[str]
|
The name of the country (words, not ISO code). |
None
|
country_code
|
Optional[str]
|
A pre-normalized country code. |
None
|
Returns:
Type | Description |
---|---|
str
|
A single-line string with the formatted address. |
Source code in zavod/helpers/addresses.py
is_empty(text)
Check if the given text is empty: it can either be null, or the stripped version of the string could have 0 length.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
Optional[str]
|
Text to be checked |
required |
Returns:
Type | Description |
---|---|
bool
|
Whether the text is empty or not. |
Source code in zavod/helpers/text.py
links_to_dict(el)
Return a dictionary of the text content and href of each anchor element in the passed HtmlElement
Useful for when the link labels are consistent and can be used as keys
Source code in zavod/helpers/html.py
make_address(context, full=None, remarks=None, summary=None, po_box=None, street=None, street2=None, street3=None, city=None, place=None, postal_code=None, state=None, region=None, country=None, country_code=None, key=None, lang=None)
Generate an address schema object adjacent to the main entity.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
context
|
Context
|
The runner context used for making and emitting entities. |
required |
full
|
Optional[str]
|
The full address as a single string. |
None
|
remarks
|
Optional[str]
|
Delivery remarks for the address. |
None
|
summary
|
Optional[str]
|
A short description of the address. |
None
|
po_box
|
Optional[str]
|
The PO box/mailbox number. |
None
|
street
|
Optional[str]
|
The street or road name. |
None
|
street2
|
Optional[str]
|
The street or road name, line 2. |
None
|
street3
|
Optional[str]
|
The street or road name, line 3. |
None
|
city
|
Optional[str]
|
The city or town name. |
None
|
place
|
Optional[str]
|
The name of a smaller locality (same as city). |
None
|
postal_code
|
Optional[str]
|
The postal code or ZIP code. |
None
|
state
|
Optional[str]
|
The state or province name. |
None
|
region
|
Optional[str]
|
The region or district name. |
None
|
country
|
Optional[str]
|
The country name (words, not ISO code). |
None
|
country_code
|
Optional[str]
|
A pre-normalized country code. |
None
|
key
|
Optional[str]
|
An optional key to be included in the ID of the address. |
None
|
lang
|
Optional[str]
|
The language of the address details. |
None
|
Returns:
Type | Description |
---|---|
Optional[Entity]
|
A new entity of type |
Source code in zavod/helpers/addresses.py
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
|
make_identification(context, entity, number, doc_type=None, country=None, summary=None, start_date=None, end_date=None, authority=None, key=None, passport=False)
Create an Identification
or Passport
object linked to a passport holder.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
context
|
Context
|
The context used for making entities. |
required |
entity
|
Entity
|
The entity that holds the passport. |
required |
number
|
Optional[str]
|
The passport number. |
required |
doc_type
|
Optional[str]
|
The type of document (e.g. "passport", "national id"). |
None
|
country
|
Optional[str]
|
The country that issued the passport. |
None
|
summary
|
Optional[str]
|
A summary of the passport details. |
None
|
start_date
|
Optional[str]
|
The date the passport was issued. |
None
|
end_date
|
Optional[str]
|
The date the passport expires. |
None
|
authority
|
Optional[str]
|
The issuing authority. |
None
|
key
|
Optional[str]
|
An optional key to be included in the ID of the identification. |
None
|
passport
|
bool
|
Whether the identification is a passport or not. |
False
|
Returns:
Type | Description |
---|---|
Optional[Entity]
|
A new entity of type |
Source code in zavod/helpers/identification.py
make_name(full=None, name1=None, first_name=None, given_name=None, name2=None, second_name=None, middle_name=None, name3=None, patronymic=None, matronymic=None, name4=None, name5=None, tail_name=None, last_name=None, prefix=None, suffix=None)
Provides a standardised way of assembling the components of a human name. This does a whole lot of cultural ignorance work, so YMMV.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
full
|
Optional[str]
|
The full name if available (this will otherwise be generated). |
None
|
name1
|
Optional[str]
|
The first name if numeric parts are used. |
None
|
first_name
|
Optional[str]
|
The first name. |
None
|
given_name
|
Optional[str]
|
The given name (also first name). |
None
|
name2
|
Optional[str]
|
The second name if numeric parts are used. |
None
|
second_name
|
Optional[str]
|
The second name. |
None
|
middle_name
|
Optional[str]
|
The middle name. |
None
|
name3
|
Optional[str]
|
The third name if numeric parts are used. |
None
|
patronymic
|
Optional[str]
|
The patronymic (father-derived) name. |
None
|
matronymic
|
Optional[str]
|
The matronymic (mother-derived) name. |
None
|
name4
|
Optional[str]
|
The fourth name if numeric parts are used. |
None
|
name5
|
Optional[str]
|
The fifth name if numeric parts are used. |
None
|
tail_name
|
Optional[str]
|
A secondary last name. |
None
|
last_name
|
Optional[str]
|
The last/family name name. |
None
|
prefix
|
Optional[str]
|
A prefix to the name (e.g. "Mr"). |
None
|
suffix
|
Optional[str]
|
A suffix to the name (e.g. "Jr"). |
None
|
Returns:
Type | Description |
---|---|
Optional[str]
|
The full name. |
Source code in zavod/helpers/names.py
make_occupancy(context, person, position, no_end_implies_current=True, current_time=settings.RUN_TIME, start_date=None, end_date=None, birth_date=None, death_date=None, categorisation=None, status=None, propagate_country=True)
Creates and returns an Occupancy entity if the arguments meet our criteria
for PEP position occupancy, otherwise returns None. Also adds the position countries
and the role.pep
topic to the person if an Occupancy is returned.
Emit the person after calling this to include these changes.
Unless status
is overridden, Occupancies are only returned if end_date is None or
less than the after-office period after current_time.
current_time defaults to the process start date and time.
The after-office threshold is determined based on the position topics.
Occupancy.status is set to
current
ifend_date
isNone
andno_end_implies_current
isTrue
, otherwisestatus
will beunknown
current
ifend_date
is some date in the future, unless the datasetcoverage.end
is a date in the past, in which casestatus
will beunknown
ended
ifend_date
is some date in the past.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
context
|
Context
|
The context to create the entity in. |
required |
person
|
Entity
|
The person holding the position. They will be added to the
|
required |
position
|
Entity
|
The position held by the person. This will be added to the
|
required |
no_end_implies_current
|
bool
|
Set this to True if a dataset is regularly maintained
and it can be assumed that no end date implies the person is currently
occupying this position. In this case, |
True
|
current_time
|
datetime
|
Defaults to the run time of the current crawl. |
RUN_TIME
|
start_date
|
Optional[str]
|
Set if the date the person started occupying the position is known. |
None
|
end_date
|
Optional[str]
|
Set if the date the person left the position is known. |
None
|
status
|
Optional[OccupancyStatus]
|
Overrides determining PEP occupancy status |
None
|
Source code in zavod/helpers/positions.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
|
make_pdf_page_images(pdf_path)
Split a PDF file into PNG images of its pages.
This requires pdftoppm
to be installed on the system, which is
part of the poppler-utils
package on Debian-based systems.
Source code in zavod/helpers/pdf.py
make_position(context, name, summary=None, description=None, country=None, topics=None, subnational_area=None, organization=None, inception_date=None, dissolution_date=None, number_of_seats=None, wikidata_id=None, source_url=None, lang=None, id_hash_prefix=None)
Creates a Position entity.
Position categorisation should then be fetched using zavod.logic.pep.categorise and the result's is_pep checked.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
context
|
Context
|
The context to create the entity in. |
required |
name
|
str
|
The name of the position. |
required |
summary
|
Optional[str]
|
A short summary of the position. |
None
|
description
|
Optional[str]
|
A longer description of the position. |
None
|
country
|
Optional[str | Iterable[str]]
|
The country or countries the position is in. |
None
|
subnational_area
|
Optional[str]
|
The state or district the position is in. |
None
|
organization
|
Optional[Entity]
|
The organization the position is a part of. |
None
|
inception_date
|
Optional[Iterable[str]]
|
The date the position was created. |
None
|
dissolution_date
|
Optional[Iterable[str]]
|
The date the position was dissolved. |
None
|
number_of_seats
|
Optional[str]
|
The number of seats that can hold the position. |
None
|
wikidata_id
|
Optional[str]
|
The Wikidata QID of the position. |
None
|
source_url
|
Optional[str]
|
The URL of the source the position was found in. |
None
|
lang
|
Optional[str]
|
The language of the position details. |
None
|
Returns:
Type | Description |
---|---|
Entity
|
A new entity of type |
Source code in zavod/helpers/positions.py
make_sanction(context, entity, key=None, program=None, program_key=None, start_date=None, end_date=None)
Create and return a sanctions object derived from the dataset metadata.
The country, authority, sourceUrl, and subject entity properties are automatically set.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
context
|
Context
|
The runner context with dataset metadata. |
required |
entity
|
Entity
|
The entity to which the sanctions object will be linked. |
required |
key
|
Optional[str]
|
An optional key to be included in the ID of the sanction. |
None
|
program
|
Optional[str]
|
An optional program name. |
None
|
program_key
|
Optional[str]
|
An optional key for looking up the program ID in the YAML configuration. |
None
|
start_date
|
Optional[str]
|
An optional start date for the sanction. |
None
|
end_date
|
Optional[str]
|
An optional end date for the sanction. |
None
|
Returns:
Type | Description |
---|---|
Entity
|
A new entity of type Sanction. |
Source code in zavod/helpers/sanctions.py
make_security(context, isin)
Make a security entity.
Source code in zavod/helpers/securities.py
multi_split(text, splitters)
Sequentially attempt to split a text based on an array of splitting criteria.
This is useful for strings where multiple separators are used to separate values,
e.g.: test,other/misc
. A special case of this is itemised lists like a) test
b) other c) misc
which sanction-makers seem to love.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
Optional[Union[str, Iterable[Optional[str]]]]
|
A text or list of texts to be split up further. |
required |
splitters
|
Iterable[str]
|
A sequence of text splitting criteria to be applied to the text. |
required |
Returns:
Type | Description |
---|---|
List[str]
|
Fully subdivided text snippets. |
Source code in zavod/helpers/text.py
parse_date(text, formats, default=None)
Parse a date two ways: first, try and apply a set of structured formats and
return a partial date if any of them parse correctly. Otherwise, apply
extract_years
on the remaining string.
Source code in zavod/helpers/dates.py
parse_html_table(table, header_tag='th', skiprows=0)
Parse an HTML table into a generator yielding a dict for each row.
Returns:
Type | Description |
---|---|
None
|
Generator of dict per row, where the keys are the _-slugified table headings and the values are the HtmlElement of the cell. |
See also
zavod.helpers.cells_to_str
zavod.helpers.links_to_dict
Source code in zavod/helpers/html.py
parse_pdf_table(context, path, headers_per_page=False, preserve_header_newlines=False, start_page=None, end_page=None, skiprows=0, page_settings=None)
Parse the largest table on each page of a PDF file and yield their rows as dictionaries.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Path
|
Path to the PDF file. |
required |
headers_per_page
|
bool
|
Set to true if the headers are repeated on each page. |
False
|
preserve_header_newlines
|
bool
|
Don't slugify newlines in headers - e.g. for when the line breaks are meaningful. |
False
|
start_page
|
Optional[int]
|
The first page to process. 1-indexed. |
None
|
end_page
|
Optional[int]
|
The last page to process. 1-indexed. |
None
|
skiprows
|
int
|
The number of rows to skip before processing table headers. |
0
|
page_settings
|
Optional[Callable[[Page], Tuple[Page, Dict[str, Any]]]]
|
A function that takes a |
None
|
Pro tip
Save debug images in the page settings function to help with debugging.
- https://github.com/jsvine/pdfplumber?tab=readme-ov-file#drawing-methods
- https://github.com/jsvine/pdfplumber?tab=readme-ov-file#visually-debugging-the-table-finder
Source code in zavod/helpers/pdf.py
parse_xlsx_sheet(context, sheet, skiprows=0, header_lookup=None)
Parse an Excel sheet into a sequence of dictionaries.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
context
|
Context
|
Crawler context. |
required |
sheet
|
Worksheet
|
The Excel sheet. |
required |
skiprows
|
int
|
The number of rows to skip. |
0
|
header_lookup
|
Optional[str]
|
The lookup key for translating headers. |
None
|
Source code in zavod/helpers/excel.py
postcode_pobox(text)
For when PO Box is stuffed into postcode, sometimes.
Returns:
Type | Description |
---|---|
Tuple[Optional[str], Optional[str]]
|
Tuple of (postcode, po_box) |
Source code in zavod/helpers/addresses.py
remove_bracketed(text)
Helps to deal with property values where additional info has been supplied in brackets that makes it harder to parse the value. Examples:
- Russia (former USSR)
- 1977 (as Muhammad Da'ud Salman)
It's probably not useful in all of these cases to try and parse and derive meaning from the bracketed bit, so we'll just discard it.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
Optional[str]
|
Text with sub-text in brackets |
required |
Returns:
Type | Description |
---|---|
Optional[str]
|
Text that was not in brackets. |
Source code in zavod/helpers/text.py
remove_namespace(el)
Remove namespace in the passed XML/HTML document in place and return an updated element tree.
If the namespaces in a document define multiple tags with the same local tag name, this will create ambiguity and lead to errors. Most XML documents, however, only actively use one namespace.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
el
|
ElementOrTree
|
The root element or tree to remove namespaces from. |
required |
Returns:
Type | Description |
---|---|
ElementOrTree
|
An updated element tree with the namespaces removed. |
Source code in zavod/helpers/xml.py
replace_months(dataset, text)
Re-write month names to the latin form to get a date string ready for parsing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
Dataset
|
The dataset which contains a date format specification. |
required |
text
|
str
|
The string inside of which month names will be replaced. |
required |
Returns:
Type | Description |
---|---|
str
|
A string in which month names are normalized. |
Source code in zavod/helpers/dates.py
split_comma_names(context, text)
Split a string of multiple names that may contain company and individual names, some including commas, into individual names without breaking partnership names like "A, B and C Inc" or individuals like "Smith, Jane".
To make life easier, commas are stripped from company type suffixes like "Blue, LLC"
If the string can't be split into whole names reliably, a datapatch is looked up
under the comma_names
key, which should contain a list of names in the names
attribute. If no match is found, the name is returned as a single item list,
and a warning emitted.