webchanges.jobs module

Jobs.

class webchanges.jobs.BrowserJob(**kwargs)

Bases: UrlJobBase

Retrieve a URL using a real web browser (use_browser: true).

Parameters:

kwargs (Any)

use_browser: bool | str | None = True
proxy_username: str = ''
proxy_password: str = ''
chromium_connection_errors = ('net::ERR_CONNECTION_CLOSED', 'net::ERR_CONNECTION_RESET', 'net::ERR_CONNECTION_REFUSED', 'net::ERR_CONNECTION_ABORTED', 'net::ERR_CONNECTION_FAILED', 'net::ERR_NAME_NOT_RESOLVED', 'net::ERR_INTERNET_DISCONNECTED', 'net::ERR_SSL_PROTOCOL_ERROR', 'net::ERR_ADDRESS_INVALID', 'net::ERR_ADDRESS_UNREACHABLE', 'net::ERR_SSL_CLIENT_AUTH_CERT_NEEDED', 'net::ERR_TUNNEL_CONNECTION_FAILED', 'net::ERR_NO_SSL_VERSIONS_ENABLED', 'net::ERR_SSL_VERSION_OR_CIPHER_MISMATCH', 'net::ERR_SSL_RENEGOTIATION_REQUESTED', 'net::ERR_PROXY_AUTH_UNSUPPORTED', 'net::ERR_CERT_ERROR_IN_SSL_RENEGOTIATION', 'net::ERR_BAD_SSL_CLIENT_AUTH_CERT', 'net::ERR_CONNECTION_TIMED_OUT', 'net::ERR_HOST_RESOLVER_QUEUE_TOO_LARGE', 'net::ERR_SOCKS_CONNECTION_FAILED', 'net::ERR_SOCKS_CONNECTION_HOST_UNREACHABLE', 'net::ERR_ALPN_NEGOTIATION_FAILED', 'net::ERR_SSL_NO_RENEGOTIATION', 'net::ERR_WINSOCK_UNEXPECTED_WRITTEN_BYTES', 'net::ERR_SSL_DECOMPRESSION_FAILURE_ALERT', 'net::ERR_SSL_BAD_RECORD_MAC_ALERT', 'net::ERR_PROXY_AUTH_REQUESTED', 'net::ERR_PROXY_CONNECTION_FAILED', 'net::ERR_MANDATORY_PROXY_CONFIGURATION_FAILED', 'net::ERR_PRECONNECT_MAX_SOCKET_LIMIT', 'net::ERR_SSL_CLIENT_AUTH_PRIVATE_KEY_ACCESS_DENIED', 'net::ERR_SSL_CLIENT_AUTH_CERT_NO_PRIVATE_KEY', 'net::ERR_PROXY_CERTIFICATE_INVALID', 'net::ERR_NAME_RESOLUTION_FAILED', 'net::ERR_NETWORK_ACCESS_DENIED', 'net::ERR_TEMPORARILY_THROTTLED', 'net::ERR_SSL_CLIENT_AUTH_SIGNATURE_FAILED', 'net::ERR_MSG_TOO_BIG', 'net::ERR_WS_PROTOCOL_ERROR', 'net::ERR_ADDRESS_IN_USE', 'net::ERR_SSL_PINNED_KEY_NOT_IN_CERT_CHAIN', 'net::ERR_CLIENT_AUTH_CERT_TYPE_UNSUPPORTED', 'net::ERR_SSL_DECRYPT_ERROR_ALERT', 'net::ERR_WS_THROTTLE_QUEUE_TOO_LARGE', 'net::ERR_SSL_SERVER_CERT_CHANGED', 'net::ERR_SSL_UNRECOGNIZED_NAME_ALERT', 'net::ERR_SOCKET_SET_RECEIVE_BUFFER_SIZE_ERROR', 'net::ERR_SOCKET_SET_SEND_BUFFER_SIZE_ERROR', 'net::ERR_SOCKET_RECEIVE_BUFFER_SIZE_UNCHANGEABLE', 'net::ERR_SOCKET_SEND_BUFFER_SIZE_UNCHANGEABLE', 'net::ERR_SSL_CLIENT_AUTH_CERT_BAD_FORMAT', 'net::ERR_ICANN_NAME_COLLISION', 'net::ERR_SSL_SERVER_CERT_BAD_FORMAT', 'net::ERR_CT_STH_PARSING_FAILED', 'net::ERR_CT_STH_INCOMPLETE', 'net::ERR_UNABLE_TO_REUSE_CONNECTION_FOR_PROXY_AUTH', 'net::ERR_CT_CONSISTENCY_PROOF_PARSING_FAILED', 'net::ERR_SSL_OBSOLETE_CIPHER', 'net::ERR_WS_UPGRADE', 'net::ERR_READ_IF_READY_NOT_IMPLEMENTED', 'net::ERR_NO_BUFFER_SPACE', 'net::ERR_SSL_CLIENT_AUTH_NO_COMMON_ALGORITHMS', 'net::ERR_EARLY_DATA_REJECTED', 'net::ERR_WRONG_VERSION_ON_EARLY_DATA', 'net::ERR_TLS13_DOWNGRADE_DETECTED', 'net::ERR_SSL_KEY_USAGE_INCOMPATIBLE', 'net::ERR_INVALID_ECH_CONFIG_LIST', 'net::ERR_ECH_NOT_NEGOTIATEDnet::ERR_ECH_FALLBACK_CERTIFICATE_INVALID', 'net::ERR_PROXY_UNABLE_TO_CONNECT_TO_DESTINATIONnet::ERR_PROXY_DELEGATE_CANCELED_CONNECT_REQUESTnet::ERR_PROXY_DELEGATE_CANCELED_CONNECT_RESPONSE')
get_location()

Get the ‘location’ of the job, i.e. the (user_visible) URL.

Returns:

The user_visible_url or URL of the job.

Return type:

str

set_base_location(location)

Sets the job’s location (command or url) to location. Used for changing location (uuid).

Parameters:

location (str)

Return type:

None

static get_user_agent_platform()
Return type:

str

retrieve(job_state, headless=True, response_handler=None, content_handler=None, return_data=None)

Runs job to retrieve the data, and returns data and ETag.

Parameters:
  • job_state (JobState) – The JobState object, to keep track of the state of the retrieval.

  • headless (bool) – For browser-based jobs, whether headless mode should be used.

  • response_handler (Callable[[Page, str, Literal['commit', 'domcontentloaded', 'load', 'networkidle'] | None, str | None], Response] | None)

  • content_handler (Callable[[Page], tuple[str | bytes, str, str]] | None)

  • return_data (Callable[[Page, str, Literal['commit', 'domcontentloaded', 'load', 'networkidle'] | None, str | None], tuple[str | bytes, str, str]] | None)

Raises:
  • ValueError – If there is a problem with the value supplied in one of the keys in the configuration file.

  • TypeError – If the value provided in one of the directives is not of the correct type.

  • ImportError – If the playwright package is not installed.

  • BrowserResponseError – If a browser error or an HTTP response code between 400 and 599 is received.

Returns:

The data retrieved and the ETag.

Return type:

tuple[str | bytes, str, str]

format_error(exception, tb)

Format the error of the job if one is encountered.

Parameters:
  • exception (Exception) – The exception.

  • tb (str) – The traceback.format_exc() string.

Returns:

A string to display and/or use in reports.

Return type:

str

ignore_error(exception)

Determine whether the error of the job should be ignored.

Parameters:

exception (Exception) – The exception.

Returns:

True if the error should be ignored, False otherwise.

Return type:

bool

additions_only: bool | float | str | None = None
block_elements: list[str] | None = None
command: str = ''
compared_versions: int | None = None
contextlines: int | None = None
cookies: dict[str, str] | None = None
data: str | list | dict | None = None
data_as_json: bool | None = None
deletions_only: bool | None = None
diff_filters: str | list[str | dict[str, Any]] | None = None
diff_tool: str | None = None
differ: dict[str, Any] | None = None
empty_as_transient: bool | None = None
enabled: bool | None = None
encoding: str | None = None
evaluate: str | None = None
filters: Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | list[Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | dict[str, Any]] | None = None
fingerprints: dict[str, str | dict[str, Any]] | None = None
classmethod from_dict(data, filenames)

Create a JobBase class from a dict, checking that all keys are recognized (i.e. listed in __required__ or __optional__).

Parameters:
  • data (dict) – Job data in dict format (e.g. from the YAML jobs file).

  • filenames (list[Path])

Returns:

A JobBase type object.

Return type:

JobBase

get_fips_guid()

Calculate the GUID as a SHA256 hash of the location (URL or command).

Returns:

the GUID.

Return type:

str

get_guid()

Calculate the GUID, currently a simple SHA1 hash of the location (URL or command).

Returns:

the GUID.

Return type:

str

get_headers(job_state, user_agent='webchanges/3.36.1rc1 (+https://pypi.org/project/webchanges/)', include_cookies=True)

Get headers and modify them to add cookies and conditional request. If headers don’t contain User-Agent, either the default one or the one provided as user_agent is added.

Parameters:
  • job_state (JobState) – The job state.

  • user_agent (str | None) – The user agent string.

  • include_cookies (bool)

Include_cookies:

Whether to include cookies (from self.cookies) as a Cookie header.

Returns:

The headers.

Return type:

Headers

get_indexed_location()

Get the job number plus its ‘location’, i.e. the (user_visible) URL or command. Typically used in error displays.

Returns:

The job number followed by a colon and the ‘location’ of the job, i.e. its user_visible_url, URL, or command.

Return type:

str

get_proxy()

Return the correct proxy, depending on whether the URL is http or https.

Return type:

str | None

guid: str = ''
headers = Headers({}, encoding='utf-8')
http_client: Literal['httpx', 'requests', 'curl_cffi'] | None = None
http_credentials: str | None = None
http_version: Literal['v1', 'v2', 'v2tls', 'v2_prior_knowledge', 'v3', 'v3only'] | None = None
ignore_cached: bool | None = None
ignore_connection_errors: bool | None = None
ignore_default_args: bool | str | list[str] | None = None
ignore_dh_key_too_small: bool | None = None
ignore_http_error_codes: list[int | str] | int | str | None = None
ignore_https_errors: bool | None = None
ignore_timeout_errors: bool | None = None
ignore_too_many_redirects: bool | None = None
impersonate: str | None = None
index_number: int = 0
init_script: str | None = None
initialization_js: str | None = None
initialization_url: str | None = None
is_enabled()

Returns whether job is enabled.

Returns:

Whether the job is enabled.

Return type:

bool

is_markdown: bool | None = None
classmethod job_documentation()

Generates simple jobs documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

kind: str | None = None
loop: AbstractEventLoop | None = None
main_thread_enter()

Called from the main thread before running the job. No longer needed (does nothing).

Return type:

None

main_thread_exit()

Called from the main thread after running the job. No longer needed (does nothing).

Return type:

None

static make_guid(name)

Calculate the GUID from a string (currently a simple SHA1).

Returns:

the GUID.

Parameters:

name (str)

Return type:

str

markdown_padded_tables: bool | None = None
max_tries: int | None = None
method: Literal['GET', 'OPTIONS', 'HEAD', 'POST', 'PUT', 'PATCH', 'DELETE'] | None = None
mime_type: str | None = None
monospace: bool | None = None
name: str | None = None
navigate: str | None = None
no_conditional_request: bool | None = None
no_redirects: bool | None = None
note: str | None = None
params: str | list | dict[str, str] | None = None
pretty_name()

Get the ‘pretty name’ of a job, i.e. either its ‘name’ (if defined) or the ‘location’ (user_visible_url, URL or command).

Returns:

The ‘pretty name’ the job.

Return type:

str

proxy: str | None = None
referer: str | None = None
retries: int | None = None
serialize()

Serialize the Job object, excluding its index_number (e.g. for saving).

Returns:

A dict with the Job object serialized.

Return type:

dict

set_to_monospace()

If unset, sets the monospace flag to True (will not override).

Return type:

None

ssl_no_verify: bool | None = None
stderr: str | None = None
suppress_error_ended: bool | None = None
suppress_errors: bool | None = None
suppress_repeated_errors: bool | None = None
switches: list[str] | None = None
timeout: float | None = None
to_dict()

Return all defined (not None) Job object directives, required and optional, as a serializable dict, converting Headers object (which are not JSON serializable) to dicts.

Returns:

A dict with all job directives as keys, ignoring those that are extras.

Return type:

dict

tz: str | None = None
classmethod unserialize(data, filenames=None)

Unserialize a dict with job data (e.g. from the YAML jobs file) into a JobBase type object.

Parameters:
  • data (dict) – The dict with job data (e.g. from the YAML jobs file).

  • filenames (list[Path] | None)

Returns:

A JobBase type object.

Return type:

JobBase

url: str = ''
user_data_dir: str | None = None
user_visible_url: str | None = None
validate()

Checks all instance attributes against class type hints.

Return type:

None

wait_for: int | str | None = None
wait_for_function: str | dict[str, str] | None = None
wait_for_navigation: str | tuple[str, ...] | None = None
wait_for_selector: str | dict[str, str] | list[str | dict[str, str]] | None = None
wait_for_timeout: float | None = None
wait_for_url: str | None = None
wait_until: Literal['commit', 'domcontentloaded', 'load', 'networkidle'] | None = None
with_defaults(config)

Obtain a Job object that also contains defaults from the configuration.

Parameters:

config (_Config) – The configuration as a dict.

Returns:

A JobBase object.

Return type:

JobBase

exception webchanges.jobs.BrowserResponseError(args, status_code=None)

Bases: Exception

Raised by ‘url’ jobs with ‘use_browser: true’ (i.e. using Playwright) when an HTTP error response status code is received and is not one of the other Exceptions.

Parameters:
  • args (tuple[Any, ...]) – Tuple with the underlying error args, typically a string with the error text.

  • status_code (int | None) – The HTTP status code received.

Return type:

None

add_note(note, /)

Add a note to the exception

args
with_traceback(tb, /)

Set self.__traceback__ to tb and return self.

class webchanges.jobs.Job(**kwargs)

Bases: JobBase

Job class for jobs.

Parameters:

kwargs (Any)

get_location()

Get the ‘location’ of the job, i.e. the (user_visible) URL or command.

Returns:

The user_visible_url, the URL, or the command of the job.

Return type:

str

get_indexed_location()

Get the job number plus its ‘location’, i.e. the (user_visible) URL or command. Typically used in error displays.

Returns:

The job number followed by a colon and the ‘location’ of the job, i.e. its user_visible_url, URL, or command.

Return type:

str

pretty_name()

Get the ‘pretty name’ of a job, i.e. either its ‘name’ (if defined) or the ‘location’ (user_visible_url, URL or command).

Returns:

The ‘pretty name’ the job.

Return type:

str

retrieve(job_state, headless=True)

Runs job to retrieve the data, and returns data and ETag.

Parameters:
  • job_state (JobState) – The JobState object, to keep track of the state of the retrieval.

  • headless (bool) – For browser-based jobs, whether headless mode should be used.

Returns:

The data retrieved, the ETag, and the mime_type.

Return type:

tuple[str | bytes, str, str]

additions_only: bool | float | str | None = None
block_elements: list[str] | None = None
command: str = ''
compared_versions: int | None = None
contextlines: int | None = None
cookies: dict[str, str] | None = None
data: str | list | dict | None = None
data_as_json: bool | None = None
deletions_only: bool | None = None
diff_filters: str | list[str | dict[str, Any]] | None = None
diff_tool: str | None = None
differ: dict[str, Any] | None = None
empty_as_transient: bool | None = None
enabled: bool | None = None
encoding: str | None = None
evaluate: str | None = None
filters: Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | list[Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | dict[str, Any]] | None = None
fingerprints: dict[str, str | dict[str, Any]] | None = None
format_error(exception, tb)

Format the error of the job if one is encountered.

Parameters:
  • exception (Exception) – The exception.

  • tb (str) – The traceback.format_exc() string.

Returns:

A string to display and/or use in reports.

Return type:

str

classmethod from_dict(data, filenames)

Create a JobBase class from a dict, checking that all keys are recognized (i.e. listed in __required__ or __optional__).

Parameters:
  • data (dict) – Job data in dict format (e.g. from the YAML jobs file).

  • filenames (list[Path])

Returns:

A JobBase type object.

Return type:

JobBase

get_fips_guid()

Calculate the GUID as a SHA256 hash of the location (URL or command).

Returns:

the GUID.

Return type:

str

get_guid()

Calculate the GUID, currently a simple SHA1 hash of the location (URL or command).

Returns:

the GUID.

Return type:

str

get_proxy()

Return the correct proxy, depending on whether the URL is http or https.

Return type:

str | None

guid: str = ''
headers = Headers({}, encoding='utf-8')
http_client: Literal['httpx', 'requests', 'curl_cffi'] | None = None
http_credentials: str | None = None
http_version: Literal['v1', 'v2', 'v2tls', 'v2_prior_knowledge', 'v3', 'v3only'] | None = None
ignore_cached: bool | None = None
ignore_connection_errors: bool | None = None
ignore_default_args: bool | str | list[str] | None = None
ignore_dh_key_too_small: bool | None = None
ignore_error(exception)

Determine whether the error of the job should be ignored.

Parameters:

exception (Exception) – The exception.

Returns:

True or the string with the number of the HTTPError code if the error should be ignored, False otherwise.

Return type:

bool

ignore_http_error_codes: list[int | str] | int | str | None = None
ignore_https_errors: bool | None = None
ignore_timeout_errors: bool | None = None
ignore_too_many_redirects: bool | None = None
impersonate: str | None = None
index_number: int = 0
init_script: str | None = None
initialization_js: str | None = None
initialization_url: str | None = None
is_enabled()

Returns whether job is enabled.

Returns:

Whether the job is enabled.

Return type:

bool

is_markdown: bool | None = None
classmethod job_documentation()

Generates simple jobs documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

kind: str | None = None
loop: AbstractEventLoop | None = None
main_thread_enter()

Called from the main thread before running the job. No longer needed (does nothing).

Return type:

None

main_thread_exit()

Called from the main thread after running the job. No longer needed (does nothing).

Return type:

None

static make_guid(name)

Calculate the GUID from a string (currently a simple SHA1).

Returns:

the GUID.

Parameters:

name (str)

Return type:

str

markdown_padded_tables: bool | None = None
max_tries: int | None = None
method: Literal['GET', 'OPTIONS', 'HEAD', 'POST', 'PUT', 'PATCH', 'DELETE'] | None = None
mime_type: str | None = None
monospace: bool | None = None
name: str | None = None
navigate: str | None = None
no_conditional_request: bool | None = None
no_redirects: bool | None = None
note: str | None = None
params: str | list | dict[str, str] | None = None
proxy: str | None = None
referer: str | None = None
retries: int | None = None
serialize()

Serialize the Job object, excluding its index_number (e.g. for saving).

Returns:

A dict with the Job object serialized.

Return type:

dict

set_base_location(location)

Sets the job’s location (command or url) to location. Used for changing location (uuid).

Parameters:

location (str)

Return type:

None

set_to_monospace()

If unset, sets the monospace flag to True (will not override).

Return type:

None

ssl_no_verify: bool | None = None
stderr: str | None = None
suppress_error_ended: bool | None = None
suppress_errors: bool | None = None
suppress_repeated_errors: bool | None = None
switches: list[str] | None = None
timeout: float | None = None
to_dict()

Return all defined (not None) Job object directives, required and optional, as a serializable dict, converting Headers object (which are not JSON serializable) to dicts.

Returns:

A dict with all job directives as keys, ignoring those that are extras.

Return type:

dict

tz: str | None = None
classmethod unserialize(data, filenames=None)

Unserialize a dict with job data (e.g. from the YAML jobs file) into a JobBase type object.

Parameters:
  • data (dict) – The dict with job data (e.g. from the YAML jobs file).

  • filenames (list[Path] | None)

Returns:

A JobBase type object.

Return type:

JobBase

url: str = ''
use_browser: bool | str | None = False
user_data_dir: str | None = None
user_visible_url: str | None = None
validate()

Checks all instance attributes against class type hints.

Return type:

None

wait_for: int | str | None = None
wait_for_function: str | dict[str, str] | None = None
wait_for_navigation: str | tuple[str, ...] | None = None
wait_for_selector: str | dict[str, str] | list[str | dict[str, str]] | None = None
wait_for_timeout: float | None = None
wait_for_url: str | None = None
wait_until: Literal['commit', 'domcontentloaded', 'load', 'networkidle'] | None = None
with_defaults(config)

Obtain a Job object that also contains defaults from the configuration.

Parameters:

config (_Config) – The configuration as a dict.

Returns:

A JobBase object.

Return type:

JobBase

class webchanges.jobs.JobBase(**kwargs)

Bases: object

The base class for Jobs.

Parameters:

kwargs (Any)

index_number: int = 0
url: str = ''
command: str = ''
use_browser: bool | str | None = False
additions_only: bool | float | str | None = None
block_elements: list[str] | None = None
compared_versions: int | None = None
contextlines: int | None = None
cookies: dict[str, str] | None = None
data: str | list | dict | None = None
data_as_json: bool | None = None
deletions_only: bool | None = None
differ: dict[str, Any] | None = None
diff_filters: str | list[str | dict[str, Any]] | None = None
diff_tool: str | None = None
empty_as_transient: bool | None = None
enabled: bool | None = None
encoding: str | None = None
evaluate: str | None = None
filters: Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | list[Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | dict[str, Any]] | None = None
fingerprints: dict[str, str | dict[str, Any]] | None = None
guid: str = ''
headers = Headers({}, encoding='utf-8')
http_client: Literal['httpx', 'requests', 'curl_cffi'] | None = None
http_version: Literal['v1', 'v2', 'v2tls', 'v2_prior_knowledge', 'v3', 'v3only'] | None = None
http_credentials: str | None = None
ignore_cached: bool | None = None
ignore_connection_errors: bool | None = None
ignore_default_args: bool | str | list[str] | None = None
ignore_dh_key_too_small: bool | None = None
ignore_http_error_codes: list[int | str] | int | str | None = None
ignore_https_errors: bool | None = None
ignore_timeout_errors: bool | None = None
ignore_too_many_redirects: bool | None = None
impersonate: str | None = None
init_script: str | None = None
initialization_js: str | None = None
initialization_url: str | None = None
is_markdown: bool | None = None
kind: str | None = None
loop: AbstractEventLoop | None = None
markdown_padded_tables: bool | None = None
max_tries: int | None = None
method: Literal['GET', 'OPTIONS', 'HEAD', 'POST', 'PUT', 'PATCH', 'DELETE'] | None = None
mime_type: str | None = None
monospace: bool | None = None
name: str | None = None
navigate: str | None = None
no_conditional_request: bool | None = None
no_redirects: bool | None = None
note: str | None = None
params: str | list | dict[str, str] | None = None
proxy: str | None = None
referer: str | None = None
retries: int | None = None
ssl_no_verify: bool | None = None
stderr: str | None = None
suppress_error_ended: bool | None = None
suppress_errors: bool | None = None
suppress_repeated_errors: bool | None = None
switches: list[str] | None = None
timeout: float | None = None
tz: str | None = None
user_data_dir: str | None = None
user_visible_url: str | None = None
wait_for: int | str | None = None
wait_for_function: str | dict[str, str] | None = None
wait_for_navigation: str | tuple[str, ...] | None = None
wait_for_selector: str | dict[str, str] | list[str | dict[str, str]] | None = None
wait_for_timeout: float | None = None
wait_for_url: str | None = None
wait_until: Literal['commit', 'domcontentloaded', 'load', 'networkidle'] | None = None
classmethod job_documentation()

Generates simple jobs documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

get_location()

Get the ‘location’ of the job, i.e. the (user_visible) URL or command.

Returns:

The user_visible_url, the URL, or the command of the job.

Return type:

str

get_indexed_location()

Get the job number plus its ‘location’, i.e. the (user_visible) URL or command. Typically used in error displays.

Returns:

The job number followed by a colon and the ‘location’ of the job, i.e. its user_visible_url, URL, or command.

Return type:

str

set_base_location(location)

Sets the job’s location (command or url) to location. Used for changing location (uuid).

Parameters:

location (str)

Return type:

None

pretty_name()

Get the ‘pretty name’ of a job, i.e. either its ‘name’ (if defined) or the ‘location’ (user_visible_url, URL or command).

Returns:

The ‘pretty name’ the job.

Return type:

str

serialize()

Serialize the Job object, excluding its index_number (e.g. for saving).

Returns:

A dict with the Job object serialized.

Return type:

dict

validate()

Checks all instance attributes against class type hints.

Return type:

None

classmethod unserialize(data, filenames=None)

Unserialize a dict with job data (e.g. from the YAML jobs file) into a JobBase type object.

Parameters:
  • data (dict) – The dict with job data (e.g. from the YAML jobs file).

  • filenames (list[Path] | None)

Returns:

A JobBase type object.

Return type:

JobBase

to_dict()

Return all defined (not None) Job object directives, required and optional, as a serializable dict, converting Headers object (which are not JSON serializable) to dicts.

Returns:

A dict with all job directives as keys, ignoring those that are extras.

Return type:

dict

classmethod from_dict(data, filenames)

Create a JobBase class from a dict, checking that all keys are recognized (i.e. listed in __required__ or __optional__).

Parameters:
  • data (dict) – Job data in dict format (e.g. from the YAML jobs file).

  • filenames (list[Path])

Returns:

A JobBase type object.

Return type:

JobBase

with_defaults(config)

Obtain a Job object that also contains defaults from the configuration.

Parameters:

config (_Config) – The configuration as a dict.

Returns:

A JobBase object.

Return type:

JobBase

get_fips_guid()

Calculate the GUID as a SHA256 hash of the location (URL or command).

Returns:

the GUID.

Return type:

str

static make_guid(name)

Calculate the GUID from a string (currently a simple SHA1).

Returns:

the GUID.

Parameters:

name (str)

Return type:

str

get_guid()

Calculate the GUID, currently a simple SHA1 hash of the location (URL or command).

Returns:

the GUID.

Return type:

str

retrieve(job_state, headless=True)

Runs job to retrieve the data, and returns data and ETag.

Parameters:
  • job_state (JobState) – The JobState object, to keep track of the state of the retrieval.

  • headless (bool) – For browser-based jobs, whether headless mode should be used.

Returns:

The data retrieved and the ETag.

Return type:

tuple[str | bytes, str, str]

main_thread_enter()

Called from the main thread before running the job. No longer needed (does nothing).

Return type:

None

main_thread_exit()

Called from the main thread after running the job. No longer needed (does nothing).

Return type:

None

format_error(exception, tb)

Format the error of the job if one is encountered.

Parameters:
  • exception (Exception) – The exception.

  • tb (str) – The traceback.format_exc() string.

Returns:

A string to display and/or use in reports.

Return type:

str

ignore_error(exception)

Determine whether the error of the job should be ignored.

Parameters:

exception (Exception) – The exception.

Returns:

True or the string with the number of the HTTPError code if the error should be ignored, False otherwise.

Return type:

bool

is_enabled()

Returns whether job is enabled.

Returns:

Whether the job is enabled.

Return type:

bool

set_to_monospace()

If unset, sets the monospace flag to True (will not override).

Return type:

None

get_proxy()

Return the correct proxy, depending on whether the URL is http or https.

Return type:

str | None

exception webchanges.jobs.NotModifiedError

Bases: Exception

Raised when an HTTP 304 response status code (Not Modified client redirection) is received or the strong validation ETag matches the previous one; this indicates that there was no change in content.

add_note(note, /)

Add a note to the exception

args
with_traceback(tb, /)

Set self.__traceback__ to tb and return self.

class webchanges.jobs.ShellJob(**kwargs)

Bases: Job

Run a shell command and get its standard output.

Parameters:

kwargs (Any)

get_location()

Get the ‘location’ of the job, i.e. the command.

Returns:

The command of the job.

Return type:

str

set_base_location(location)

Sets the job’s location (command or url) to location. Used for changing location (uuid).

Parameters:

location (str)

Return type:

None

retrieve(job_state, headless=True)

Runs job to retrieve the data, and returns data, ETag (which is blank) and mime_type (also blank).

Parameters:
  • job_state (JobState) – The JobState object, to keep track of the state of the retrieval.

  • headless (bool) – For browser-based jobs, whether headless mode should be used.

Returns:

The data retrieved and the ETag and mime_type.

Raises:
  • subprocess.CalledProcessError – Subclass of SubprocessError, raised when a process returns a non-zero exit status.

  • subprocess.TimeoutExpired – Subclass of SubprocessError, raised when a timeout expires while waiting for a child process.

Return type:

tuple[str | bytes, str, str]

format_error(exception, tb)

Format the error of the job if one is encountered.

Parameters:
  • exception (Exception) – The exception.

  • tb (str) – The traceback.format_exc() string.

Returns:

A string to display and/or use in reports.

Return type:

str

additions_only: bool | float | str | None = None
block_elements: list[str] | None = None
command: str = ''
compared_versions: int | None = None
contextlines: int | None = None
cookies: dict[str, str] | None = None
data: str | list | dict | None = None
data_as_json: bool | None = None
deletions_only: bool | None = None
diff_filters: str | list[str | dict[str, Any]] | None = None
diff_tool: str | None = None
differ: dict[str, Any] | None = None
empty_as_transient: bool | None = None
enabled: bool | None = None
encoding: str | None = None
evaluate: str | None = None
filters: Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | list[Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | dict[str, Any]] | None = None
fingerprints: dict[str, str | dict[str, Any]] | None = None
classmethod from_dict(data, filenames)

Create a JobBase class from a dict, checking that all keys are recognized (i.e. listed in __required__ or __optional__).

Parameters:
  • data (dict) – Job data in dict format (e.g. from the YAML jobs file).

  • filenames (list[Path])

Returns:

A JobBase type object.

Return type:

JobBase

get_fips_guid()

Calculate the GUID as a SHA256 hash of the location (URL or command).

Returns:

the GUID.

Return type:

str

get_guid()

Calculate the GUID, currently a simple SHA1 hash of the location (URL or command).

Returns:

the GUID.

Return type:

str

get_indexed_location()

Get the job number plus its ‘location’, i.e. the (user_visible) URL or command. Typically used in error displays.

Returns:

The job number followed by a colon and the ‘location’ of the job, i.e. its user_visible_url, URL, or command.

Return type:

str

get_proxy()

Return the correct proxy, depending on whether the URL is http or https.

Return type:

str | None

guid: str = ''
headers = Headers({}, encoding='utf-8')
http_client: Literal['httpx', 'requests', 'curl_cffi'] | None = None
http_credentials: str | None = None
http_version: Literal['v1', 'v2', 'v2tls', 'v2_prior_knowledge', 'v3', 'v3only'] | None = None
ignore_cached: bool | None = None
ignore_connection_errors: bool | None = None
ignore_default_args: bool | str | list[str] | None = None
ignore_dh_key_too_small: bool | None = None
ignore_error(exception)

Determine whether the error of the job should be ignored.

Parameters:

exception (Exception) – The exception.

Returns:

True or the string with the number of the HTTPError code if the error should be ignored, False otherwise.

Return type:

bool

ignore_http_error_codes: list[int | str] | int | str | None = None
ignore_https_errors: bool | None = None
ignore_timeout_errors: bool | None = None
ignore_too_many_redirects: bool | None = None
impersonate: str | None = None
index_number: int = 0
init_script: str | None = None
initialization_js: str | None = None
initialization_url: str | None = None
is_enabled()

Returns whether job is enabled.

Returns:

Whether the job is enabled.

Return type:

bool

is_markdown: bool | None = None
classmethod job_documentation()

Generates simple jobs documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

kind: str | None = None
loop: AbstractEventLoop | None = None
main_thread_enter()

Called from the main thread before running the job. No longer needed (does nothing).

Return type:

None

main_thread_exit()

Called from the main thread after running the job. No longer needed (does nothing).

Return type:

None

static make_guid(name)

Calculate the GUID from a string (currently a simple SHA1).

Returns:

the GUID.

Parameters:

name (str)

Return type:

str

markdown_padded_tables: bool | None = None
max_tries: int | None = None
method: Literal['GET', 'OPTIONS', 'HEAD', 'POST', 'PUT', 'PATCH', 'DELETE'] | None = None
mime_type: str | None = None
monospace: bool | None = None
name: str | None = None
navigate: str | None = None
no_conditional_request: bool | None = None
no_redirects: bool | None = None
note: str | None = None
params: str | list | dict[str, str] | None = None
pretty_name()

Get the ‘pretty name’ of a job, i.e. either its ‘name’ (if defined) or the ‘location’ (user_visible_url, URL or command).

Returns:

The ‘pretty name’ the job.

Return type:

str

proxy: str | None = None
referer: str | None = None
retries: int | None = None
serialize()

Serialize the Job object, excluding its index_number (e.g. for saving).

Returns:

A dict with the Job object serialized.

Return type:

dict

set_to_monospace()

If unset, sets the monospace flag to True (will not override).

Return type:

None

ssl_no_verify: bool | None = None
stderr: str | None = None
suppress_error_ended: bool | None = None
suppress_errors: bool | None = None
suppress_repeated_errors: bool | None = None
switches: list[str] | None = None
timeout: float | None = None
to_dict()

Return all defined (not None) Job object directives, required and optional, as a serializable dict, converting Headers object (which are not JSON serializable) to dicts.

Returns:

A dict with all job directives as keys, ignoring those that are extras.

Return type:

dict

tz: str | None = None
classmethod unserialize(data, filenames=None)

Unserialize a dict with job data (e.g. from the YAML jobs file) into a JobBase type object.

Parameters:
  • data (dict) – The dict with job data (e.g. from the YAML jobs file).

  • filenames (list[Path] | None)

Returns:

A JobBase type object.

Return type:

JobBase

url: str = ''
use_browser: bool | str | None = False
user_data_dir: str | None = None
user_visible_url: str | None = None
validate()

Checks all instance attributes against class type hints.

Return type:

None

wait_for: int | str | None = None
wait_for_function: str | dict[str, str] | None = None
wait_for_navigation: str | tuple[str, ...] | None = None
wait_for_selector: str | dict[str, str] | list[str | dict[str, str]] | None = None
wait_for_timeout: float | None = None
wait_for_url: str | None = None
wait_until: Literal['commit', 'domcontentloaded', 'load', 'networkidle'] | None = None
with_defaults(config)

Obtain a Job object that also contains defaults from the configuration.

Parameters:

config (_Config) – The configuration as a dict.

Returns:

A JobBase object.

Return type:

JobBase

exception webchanges.jobs.TransientBrowserError(*args)

Bases: Exception

Raised by BrowserJob when a transient error is returned by the browser, either as a PlaywrightTimeoutError or as a browser error listed in the 100-199 Connection related errors.

The args[0] will contain the string ‘PlaywrightTimeoutError’ or the text of the browser error.

Parameters:

args (object)

Return type:

None

add_note(note, /)

Add a note to the exception

args
with_traceback(tb, /)

Set self.__traceback__ to tb and return self.

exception webchanges.jobs.TransientHTTPError(*args, status_code)

Bases: Exception

Raised by subclasses of UrlJobBase when one of these HTTP response status codes is received:

  • 429 Too Many Requests

  • 500 Internal Server Error

  • 502 Bad Gateway

  • 503 Service Unavailable

  • 504 Gateway Timeout

Parameters:
  • args (object)

  • status_code (int)

Return type:

None

status_code: int
add_note(note, /)

Add a note to the exception

args
with_traceback(tb, /)

Set self.__traceback__ to tb and return self.

class webchanges.jobs.UrlJob(**kwargs)

Bases: UrlJobBase

Retrieve a URL from a web server.

Parameters:

kwargs (Any)

get_location()

Get the ‘location’ of the job, i.e. the (user_visible) URL.

Returns:

The user_visible_url or URL of the job.

Return type:

str

set_base_location(location)

Sets the job’s location (command or url) to location. Used for changing location (uuid).

Parameters:

location (str)

Return type:

None

retrieve(job_state, headless=True)

Runs job to retrieve the data, and returns data, ETag and media type.

Parameters:
  • job_state (JobState) – The JobState object, to keep track of the state of the retrieval.

  • headless (bool) – For browser-based jobs, whether headless mode should be used.

Returns:

The data retrieved, the ETag, and the media type (fka MIME type)

Raises:

NotModifiedError – If an HTTP 304 response is received.

Return type:

tuple[str | bytes, str, str]

format_error(exception, tb)

Format the error of the job if one is encountered.

Parameters:
  • exception (Exception) – The exception.

  • tb (str) – The traceback.format_exc() string.

Returns:

A string to display and/or use in reports.

Return type:

str

ignore_error(exception)

Determine whether the error of the job should be ignored.

Parameters:

exception (Exception) – The exception.

Returns:

True if the error should be ignored, False otherwise.

Return type:

bool

additions_only: bool | float | str | None = None
block_elements: list[str] | None = None
command: str = ''
compared_versions: int | None = None
contextlines: int | None = None
cookies: dict[str, str] | None = None
data: str | list | dict | None = None
data_as_json: bool | None = None
deletions_only: bool | None = None
diff_filters: str | list[str | dict[str, Any]] | None = None
diff_tool: str | None = None
differ: dict[str, Any] | None = None
empty_as_transient: bool | None = None
enabled: bool | None = None
encoding: str | None = None
evaluate: str | None = None
filters: Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | list[Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | dict[str, Any]] | None = None
fingerprints: dict[str, str | dict[str, Any]] | None = None
classmethod from_dict(data, filenames)

Create a JobBase class from a dict, checking that all keys are recognized (i.e. listed in __required__ or __optional__).

Parameters:
  • data (dict) – Job data in dict format (e.g. from the YAML jobs file).

  • filenames (list[Path])

Returns:

A JobBase type object.

Return type:

JobBase

get_fips_guid()

Calculate the GUID as a SHA256 hash of the location (URL or command).

Returns:

the GUID.

Return type:

str

get_guid()

Calculate the GUID, currently a simple SHA1 hash of the location (URL or command).

Returns:

the GUID.

Return type:

str

get_headers(job_state, user_agent='webchanges/3.36.1rc1 (+https://pypi.org/project/webchanges/)', include_cookies=True)

Get headers and modify them to add cookies and conditional request. If headers don’t contain User-Agent, either the default one or the one provided as user_agent is added.

Parameters:
  • job_state (JobState) – The job state.

  • user_agent (str | None) – The user agent string.

  • include_cookies (bool)

Include_cookies:

Whether to include cookies (from self.cookies) as a Cookie header.

Returns:

The headers.

Return type:

Headers

get_indexed_location()

Get the job number plus its ‘location’, i.e. the (user_visible) URL or command. Typically used in error displays.

Returns:

The job number followed by a colon and the ‘location’ of the job, i.e. its user_visible_url, URL, or command.

Return type:

str

get_proxy()

Return the correct proxy, depending on whether the URL is http or https.

Return type:

str | None

guid: str = ''
headers = Headers({}, encoding='utf-8')
http_client: Literal['httpx', 'requests', 'curl_cffi'] | None = None
http_credentials: str | None = None
http_version: Literal['v1', 'v2', 'v2tls', 'v2_prior_knowledge', 'v3', 'v3only'] | None = None
ignore_cached: bool | None = None
ignore_connection_errors: bool | None = None
ignore_default_args: bool | str | list[str] | None = None
ignore_dh_key_too_small: bool | None = None
ignore_http_error_codes: list[int | str] | int | str | None = None
ignore_https_errors: bool | None = None
ignore_timeout_errors: bool | None = None
ignore_too_many_redirects: bool | None = None
impersonate: str | None = None
index_number: int = 0
init_script: str | None = None
initialization_js: str | None = None
initialization_url: str | None = None
is_enabled()

Returns whether job is enabled.

Returns:

Whether the job is enabled.

Return type:

bool

is_markdown: bool | None = None
classmethod job_documentation()

Generates simple jobs documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

kind: str | None = None
loop: AbstractEventLoop | None = None
main_thread_enter()

Called from the main thread before running the job. No longer needed (does nothing).

Return type:

None

main_thread_exit()

Called from the main thread after running the job. No longer needed (does nothing).

Return type:

None

static make_guid(name)

Calculate the GUID from a string (currently a simple SHA1).

Returns:

the GUID.

Parameters:

name (str)

Return type:

str

markdown_padded_tables: bool | None = None
max_tries: int | None = None
method: Literal['GET', 'OPTIONS', 'HEAD', 'POST', 'PUT', 'PATCH', 'DELETE'] | None = None
mime_type: str | None = None
monospace: bool | None = None
name: str | None = None
navigate: str | None = None
no_conditional_request: bool | None = None
no_redirects: bool | None = None
note: str | None = None
params: str | list | dict[str, str] | None = None
pretty_name()

Get the ‘pretty name’ of a job, i.e. either its ‘name’ (if defined) or the ‘location’ (user_visible_url, URL or command).

Returns:

The ‘pretty name’ the job.

Return type:

str

proxy: str | None = None
referer: str | None = None
retries: int | None = None
serialize()

Serialize the Job object, excluding its index_number (e.g. for saving).

Returns:

A dict with the Job object serialized.

Return type:

dict

set_to_monospace()

If unset, sets the monospace flag to True (will not override).

Return type:

None

ssl_no_verify: bool | None = None
stderr: str | None = None
suppress_error_ended: bool | None = None
suppress_errors: bool | None = None
suppress_repeated_errors: bool | None = None
switches: list[str] | None = None
timeout: float | None = None
to_dict()

Return all defined (not None) Job object directives, required and optional, as a serializable dict, converting Headers object (which are not JSON serializable) to dicts.

Returns:

A dict with all job directives as keys, ignoring those that are extras.

Return type:

dict

tz: str | None = None
classmethod unserialize(data, filenames=None)

Unserialize a dict with job data (e.g. from the YAML jobs file) into a JobBase type object.

Parameters:
  • data (dict) – The dict with job data (e.g. from the YAML jobs file).

  • filenames (list[Path] | None)

Returns:

A JobBase type object.

Return type:

JobBase

url: str = ''
use_browser: bool | str | None = False
user_data_dir: str | None = None
user_visible_url: str | None = None
validate()

Checks all instance attributes against class type hints.

Return type:

None

wait_for: int | str | None = None
wait_for_function: str | dict[str, str] | None = None
wait_for_navigation: str | tuple[str, ...] | None = None
wait_for_selector: str | dict[str, str] | list[str | dict[str, str]] | None = None
wait_for_timeout: float | None = None
wait_for_url: str | None = None
wait_until: Literal['commit', 'domcontentloaded', 'load', 'networkidle'] | None = None
with_defaults(config)

Obtain a Job object that also contains defaults from the configuration.

Parameters:

config (_Config) – The configuration as a dict.

Returns:

A JobBase object.

Return type:

JobBase

class webchanges.jobs.UrlJobBase(**kwargs)

Bases: Job

The base class for jobs that use the ‘url’ key. Includes UrlJob and BrowserJob.

Parameters:

kwargs (Any)

get_headers(job_state, user_agent='webchanges/3.36.1rc1 (+https://pypi.org/project/webchanges/)', include_cookies=True)

Get headers and modify them to add cookies and conditional request. If headers don’t contain User-Agent, either the default one or the one provided as user_agent is added.

Parameters:
  • job_state (JobState) – The job state.

  • user_agent (str | None) – The user agent string.

  • include_cookies (bool)

Include_cookies:

Whether to include cookies (from self.cookies) as a Cookie header.

Returns:

The headers.

Return type:

Headers

additions_only: bool | float | str | None = None
block_elements: list[str] | None = None
command: str = ''
compared_versions: int | None = None
contextlines: int | None = None
cookies: dict[str, str] | None = None
data: str | list | dict | None = None
data_as_json: bool | None = None
deletions_only: bool | None = None
diff_filters: str | list[str | dict[str, Any]] | None = None
diff_tool: str | None = None
differ: dict[str, Any] | None = None
empty_as_transient: bool | None = None
enabled: bool | None = None
encoding: str | None = None
evaluate: str | None = None
filters: Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | list[Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | dict[str, Any]] | None = None
fingerprints: dict[str, str | dict[str, Any]] | None = None
format_error(exception, tb)

Format the error of the job if one is encountered.

Parameters:
  • exception (Exception) – The exception.

  • tb (str) – The traceback.format_exc() string.

Returns:

A string to display and/or use in reports.

Return type:

str

classmethod from_dict(data, filenames)

Create a JobBase class from a dict, checking that all keys are recognized (i.e. listed in __required__ or __optional__).

Parameters:
  • data (dict) – Job data in dict format (e.g. from the YAML jobs file).

  • filenames (list[Path])

Returns:

A JobBase type object.

Return type:

JobBase

get_fips_guid()

Calculate the GUID as a SHA256 hash of the location (URL or command).

Returns:

the GUID.

Return type:

str

get_guid()

Calculate the GUID, currently a simple SHA1 hash of the location (URL or command).

Returns:

the GUID.

Return type:

str

get_indexed_location()

Get the job number plus its ‘location’, i.e. the (user_visible) URL or command. Typically used in error displays.

Returns:

The job number followed by a colon and the ‘location’ of the job, i.e. its user_visible_url, URL, or command.

Return type:

str

get_location()

Get the ‘location’ of the job, i.e. the (user_visible) URL or command.

Returns:

The user_visible_url, the URL, or the command of the job.

Return type:

str

get_proxy()

Return the correct proxy, depending on whether the URL is http or https.

Return type:

str | None

guid: str = ''
headers = Headers({}, encoding='utf-8')
http_client: Literal['httpx', 'requests', 'curl_cffi'] | None = None
http_credentials: str | None = None
http_version: Literal['v1', 'v2', 'v2tls', 'v2_prior_knowledge', 'v3', 'v3only'] | None = None
ignore_cached: bool | None = None
ignore_connection_errors: bool | None = None
ignore_default_args: bool | str | list[str] | None = None
ignore_dh_key_too_small: bool | None = None
ignore_error(exception)

Determine whether the error of the job should be ignored.

Parameters:

exception (Exception) – The exception.

Returns:

True or the string with the number of the HTTPError code if the error should be ignored, False otherwise.

Return type:

bool

ignore_http_error_codes: list[int | str] | int | str | None = None
ignore_https_errors: bool | None = None
ignore_timeout_errors: bool | None = None
ignore_too_many_redirects: bool | None = None
impersonate: str | None = None
index_number: int = 0
init_script: str | None = None
initialization_js: str | None = None
initialization_url: str | None = None
is_enabled()

Returns whether job is enabled.

Returns:

Whether the job is enabled.

Return type:

bool

is_markdown: bool | None = None
classmethod job_documentation()

Generates simple jobs documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

kind: str | None = None
loop: AbstractEventLoop | None = None
main_thread_enter()

Called from the main thread before running the job. No longer needed (does nothing).

Return type:

None

main_thread_exit()

Called from the main thread after running the job. No longer needed (does nothing).

Return type:

None

static make_guid(name)

Calculate the GUID from a string (currently a simple SHA1).

Returns:

the GUID.

Parameters:

name (str)

Return type:

str

markdown_padded_tables: bool | None = None
max_tries: int | None = None
method: Literal['GET', 'OPTIONS', 'HEAD', 'POST', 'PUT', 'PATCH', 'DELETE'] | None = None
mime_type: str | None = None
monospace: bool | None = None
name: str | None = None
navigate: str | None = None
no_conditional_request: bool | None = None
no_redirects: bool | None = None
note: str | None = None
params: str | list | dict[str, str] | None = None
pretty_name()

Get the ‘pretty name’ of a job, i.e. either its ‘name’ (if defined) or the ‘location’ (user_visible_url, URL or command).

Returns:

The ‘pretty name’ the job.

Return type:

str

proxy: str | None = None
referer: str | None = None
retries: int | None = None
retrieve(job_state, headless=True)

Runs job to retrieve the data, and returns data and ETag.

Parameters:
  • job_state (JobState) – The JobState object, to keep track of the state of the retrieval.

  • headless (bool) – For browser-based jobs, whether headless mode should be used.

Returns:

The data retrieved, the ETag, and the mime_type.

Return type:

tuple[str | bytes, str, str]

serialize()

Serialize the Job object, excluding its index_number (e.g. for saving).

Returns:

A dict with the Job object serialized.

Return type:

dict

set_base_location(location)

Sets the job’s location (command or url) to location. Used for changing location (uuid).

Parameters:

location (str)

Return type:

None

set_to_monospace()

If unset, sets the monospace flag to True (will not override).

Return type:

None

ssl_no_verify: bool | None = None
stderr: str | None = None
suppress_error_ended: bool | None = None
suppress_errors: bool | None = None
suppress_repeated_errors: bool | None = None
switches: list[str] | None = None
timeout: float | None = None
to_dict()

Return all defined (not None) Job object directives, required and optional, as a serializable dict, converting Headers object (which are not JSON serializable) to dicts.

Returns:

A dict with all job directives as keys, ignoring those that are extras.

Return type:

dict

tz: str | None = None
classmethod unserialize(data, filenames=None)

Unserialize a dict with job data (e.g. from the YAML jobs file) into a JobBase type object.

Parameters:
  • data (dict) – The dict with job data (e.g. from the YAML jobs file).

  • filenames (list[Path] | None)

Returns:

A JobBase type object.

Return type:

JobBase

url: str = ''
use_browser: bool | str | None = False
user_data_dir: str | None = None
user_visible_url: str | None = None
validate()

Checks all instance attributes against class type hints.

Return type:

None

wait_for: int | str | None = None
wait_for_function: str | dict[str, str] | None = None
wait_for_navigation: str | tuple[str, ...] | None = None
wait_for_selector: str | dict[str, str] | list[str | dict[str, str]] | None = None
wait_for_timeout: float | None = None
wait_for_url: str | None = None
wait_until: Literal['commit', 'domcontentloaded', 'load', 'networkidle'] | None = None
with_defaults(config)

Obtain a Job object that also contains defaults from the configuration.

Parameters:

config (_Config) – The configuration as a dict.

Returns:

A JobBase object.

Return type:

JobBase

webchanges.jobs.represent_headers(dumper, data)
Parameters:
  • dumper (SafeDumper)

  • data (Headers)

Return type:

MappingNode