webchanges.jobs module
Jobs.
- class webchanges.jobs.BrowserJob(**kwargs)
Bases:
UrlJobBaseRetrieve a URL using a real web browser (use_browser: true).
- Parameters:
kwargs (Any)
- use_browser: bool | str | None = True
- proxy_username: str = ''
- proxy_password: str = ''
- chromium_connection_errors = ('net::ERR_CONNECTION_CLOSED', 'net::ERR_CONNECTION_RESET', 'net::ERR_CONNECTION_REFUSED', 'net::ERR_CONNECTION_ABORTED', 'net::ERR_CONNECTION_FAILED', 'net::ERR_NAME_NOT_RESOLVED', 'net::ERR_INTERNET_DISCONNECTED', 'net::ERR_SSL_PROTOCOL_ERROR', 'net::ERR_ADDRESS_INVALID', 'net::ERR_ADDRESS_UNREACHABLE', 'net::ERR_SSL_CLIENT_AUTH_CERT_NEEDED', 'net::ERR_TUNNEL_CONNECTION_FAILED', 'net::ERR_NO_SSL_VERSIONS_ENABLED', 'net::ERR_SSL_VERSION_OR_CIPHER_MISMATCH', 'net::ERR_SSL_RENEGOTIATION_REQUESTED', 'net::ERR_PROXY_AUTH_UNSUPPORTED', 'net::ERR_CERT_ERROR_IN_SSL_RENEGOTIATION', 'net::ERR_BAD_SSL_CLIENT_AUTH_CERT', 'net::ERR_CONNECTION_TIMED_OUT', 'net::ERR_HOST_RESOLVER_QUEUE_TOO_LARGE', 'net::ERR_SOCKS_CONNECTION_FAILED', 'net::ERR_SOCKS_CONNECTION_HOST_UNREACHABLE', 'net::ERR_ALPN_NEGOTIATION_FAILED', 'net::ERR_SSL_NO_RENEGOTIATION', 'net::ERR_WINSOCK_UNEXPECTED_WRITTEN_BYTES', 'net::ERR_SSL_DECOMPRESSION_FAILURE_ALERT', 'net::ERR_SSL_BAD_RECORD_MAC_ALERT', 'net::ERR_PROXY_AUTH_REQUESTED', 'net::ERR_PROXY_CONNECTION_FAILED', 'net::ERR_MANDATORY_PROXY_CONFIGURATION_FAILED', 'net::ERR_PRECONNECT_MAX_SOCKET_LIMIT', 'net::ERR_SSL_CLIENT_AUTH_PRIVATE_KEY_ACCESS_DENIED', 'net::ERR_SSL_CLIENT_AUTH_CERT_NO_PRIVATE_KEY', 'net::ERR_PROXY_CERTIFICATE_INVALID', 'net::ERR_NAME_RESOLUTION_FAILED', 'net::ERR_NETWORK_ACCESS_DENIED', 'net::ERR_TEMPORARILY_THROTTLED', 'net::ERR_SSL_CLIENT_AUTH_SIGNATURE_FAILED', 'net::ERR_MSG_TOO_BIG', 'net::ERR_WS_PROTOCOL_ERROR', 'net::ERR_ADDRESS_IN_USE', 'net::ERR_SSL_PINNED_KEY_NOT_IN_CERT_CHAIN', 'net::ERR_CLIENT_AUTH_CERT_TYPE_UNSUPPORTED', 'net::ERR_SSL_DECRYPT_ERROR_ALERT', 'net::ERR_WS_THROTTLE_QUEUE_TOO_LARGE', 'net::ERR_SSL_SERVER_CERT_CHANGED', 'net::ERR_SSL_UNRECOGNIZED_NAME_ALERT', 'net::ERR_SOCKET_SET_RECEIVE_BUFFER_SIZE_ERROR', 'net::ERR_SOCKET_SET_SEND_BUFFER_SIZE_ERROR', 'net::ERR_SOCKET_RECEIVE_BUFFER_SIZE_UNCHANGEABLE', 'net::ERR_SOCKET_SEND_BUFFER_SIZE_UNCHANGEABLE', 'net::ERR_SSL_CLIENT_AUTH_CERT_BAD_FORMAT', 'net::ERR_ICANN_NAME_COLLISION', 'net::ERR_SSL_SERVER_CERT_BAD_FORMAT', 'net::ERR_CT_STH_PARSING_FAILED', 'net::ERR_CT_STH_INCOMPLETE', 'net::ERR_UNABLE_TO_REUSE_CONNECTION_FOR_PROXY_AUTH', 'net::ERR_CT_CONSISTENCY_PROOF_PARSING_FAILED', 'net::ERR_SSL_OBSOLETE_CIPHER', 'net::ERR_WS_UPGRADE', 'net::ERR_READ_IF_READY_NOT_IMPLEMENTED', 'net::ERR_NO_BUFFER_SPACE', 'net::ERR_SSL_CLIENT_AUTH_NO_COMMON_ALGORITHMS', 'net::ERR_EARLY_DATA_REJECTED', 'net::ERR_WRONG_VERSION_ON_EARLY_DATA', 'net::ERR_TLS13_DOWNGRADE_DETECTED', 'net::ERR_SSL_KEY_USAGE_INCOMPATIBLE', 'net::ERR_INVALID_ECH_CONFIG_LIST', 'net::ERR_ECH_NOT_NEGOTIATEDnet::ERR_ECH_FALLBACK_CERTIFICATE_INVALID', 'net::ERR_PROXY_UNABLE_TO_CONNECT_TO_DESTINATIONnet::ERR_PROXY_DELEGATE_CANCELED_CONNECT_REQUESTnet::ERR_PROXY_DELEGATE_CANCELED_CONNECT_RESPONSE')
- get_location()
Get the ‘location’ of the job, i.e. the (user_visible) URL.
- Returns:
The user_visible_url or URL of the job.
- Return type:
str
- set_base_location(location)
Sets the job’s location (command or url) to location. Used for changing location (uuid).
- Parameters:
location (str)
- Return type:
None
- static get_user_agent_platform()
- Return type:
str
- retrieve(job_state, headless=True, response_handler=None, content_handler=None, return_data=None)
Runs job to retrieve the data, and returns data and ETag.
- Parameters:
job_state (JobState) – The JobState object, to keep track of the state of the retrieval.
headless (bool) – For browser-based jobs, whether headless mode should be used.
response_handler (Callable[[Page, str, Literal['commit', 'domcontentloaded', 'load', 'networkidle'] | None, str | None], Response] | None)
content_handler (Callable[[Page], tuple[str | bytes, str, str]] | None)
return_data (Callable[[Page, str, Literal['commit', 'domcontentloaded', 'load', 'networkidle'] | None, str | None], tuple[str | bytes, str, str]] | None)
- Raises:
ValueError – If there is a problem with the value supplied in one of the keys in the configuration file.
TypeError – If the value provided in one of the directives is not of the correct type.
ImportError – If the playwright package is not installed.
BrowserResponseError – If a browser error or an HTTP response code between 400 and 599 is received.
- Returns:
The data retrieved and the ETag.
- Return type:
tuple[str | bytes, str, str]
- format_error(exception, tb)
Format the error of the job if one is encountered.
- Parameters:
exception (Exception) – The exception.
tb (str) – The traceback.format_exc() string.
- Returns:
A string to display and/or use in reports.
- Return type:
str
- ignore_error(exception)
Determine whether the error of the job should be ignored.
- Parameters:
exception (Exception) – The exception.
- Returns:
True if the error should be ignored, False otherwise.
- Return type:
bool
- additions_only: bool | float | str | None = None
- block_elements: list[str] | None = None
- command: str = ''
- compared_versions: int | None = None
- contextlines: int | None = None
- cookies: dict[str, str] | None = None
- data: str | list | dict | None = None
- data_as_json: bool | None = None
- deletions_only: bool | None = None
- diff_filters: str | list[str | dict[str, Any]] | None = None
- diff_tool: str | None = None
- differ: dict[str, Any] | None = None
- empty_as_transient: bool | None = None
- enabled: bool | None = None
- encoding: str | None = None
- evaluate: str | None = None
- filters: Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | list[Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | dict[str, Any]] | None = None
- fingerprints: dict[str, str | dict[str, Any]] | None = None
- classmethod from_dict(data, filenames)
Create a JobBase class from a dict, checking that all keys are recognized (i.e. listed in __required__ or __optional__).
- Parameters:
data (dict) – Job data in dict format (e.g. from the YAML jobs file).
filenames (list[Path])
- Returns:
A JobBase type object.
- Return type:
JobBase
- get_fips_guid()
Calculate the GUID as a SHA256 hash of the location (URL or command).
- Returns:
the GUID.
- Return type:
str
- get_guid()
Calculate the GUID, currently a simple SHA1 hash of the location (URL or command).
- Returns:
the GUID.
- Return type:
str
- get_headers(job_state, user_agent='webchanges/3.36.1 (+https://pypi.org/project/webchanges/)', include_cookies=True)
Get headers and modify them to add cookies and conditional request. If headers don’t contain User-Agent, either the default one or the one provided as user_agent is added.
- Parameters:
job_state (JobState) – The job state.
user_agent (str | None) – The user agent string.
include_cookies (bool)
- Include_cookies:
Whether to include cookies (from self.cookies) as a Cookie header.
- Returns:
The headers.
- Return type:
Headers
- get_indexed_location()
Get the job number plus its ‘location’, i.e. the (user_visible) URL or command. Typically used in error displays.
- Returns:
The job number followed by a colon and the ‘location’ of the job, i.e. its user_visible_url, URL, or command.
- Return type:
str
- get_proxy()
Return the correct proxy, depending on whether the URL is http or https.
- Return type:
str | None
- guid: str = ''
- headers = Headers({}, encoding='utf-8')
- http_client: Literal['httpx', 'requests', 'curl_cffi'] | None = None
- http_credentials: str | None = None
- http_version: Literal['v1', 'v2', 'v2tls', 'v2_prior_knowledge', 'v3', 'v3only'] | None = None
- ignore_cached: bool | None = None
- ignore_connection_errors: bool | None = None
- ignore_default_args: bool | str | list[str] | None = None
- ignore_dh_key_too_small: bool | None = None
- ignore_http_error_codes: list[int | str] | int | str | None = None
- ignore_https_errors: bool | None = None
- ignore_timeout_errors: bool | None = None
- ignore_too_many_redirects: bool | None = None
- impersonate: str | None = None
- index_number: int = 0
- init_script: str | None = None
- initialization_js: str | None = None
- initialization_url: str | None = None
- is_enabled()
Returns whether job is enabled.
- Returns:
Whether the job is enabled.
- Return type:
bool
- is_markdown: bool | None = None
- classmethod job_documentation()
Generates simple jobs documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- kind: str | None = None
- loop: AbstractEventLoop | None = None
- main_thread_enter()
Called from the main thread before running the job. No longer needed (does nothing).
- Return type:
None
- main_thread_exit()
Called from the main thread after running the job. No longer needed (does nothing).
- Return type:
None
- static make_guid(name)
Calculate the GUID from a string (currently a simple SHA1).
- Returns:
the GUID.
- Parameters:
name (str)
- Return type:
str
- markdown_padded_tables: bool | None = None
- max_tries: int | None = None
- method: Literal['GET', 'OPTIONS', 'HEAD', 'POST', 'PUT', 'PATCH', 'DELETE'] | None = None
- mime_type: str | None = None
- monospace: bool | None = None
- name: str | None = None
- navigate: str | None = None
- no_conditional_request: bool | None = None
- no_redirects: bool | None = None
- note: str | None = None
- params: str | list | dict[str, str] | None = None
- pretty_name()
Get the ‘pretty name’ of a job, i.e. either its ‘name’ (if defined) or the ‘location’ (user_visible_url, URL or command).
- Returns:
The ‘pretty name’ the job.
- Return type:
str
- proxy: str | None = None
- referer: str | None = None
- retries: int | None = None
- serialize()
Serialize the Job object, excluding its index_number (e.g. for saving).
- Returns:
A dict with the Job object serialized.
- Return type:
dict
- set_to_monospace()
If unset, sets the monospace flag to True (will not override).
- Return type:
None
- ssl_no_verify: bool | None = None
- stderr: str | None = None
- suppress_error_ended: bool | None = None
- suppress_errors: bool | None = None
- suppress_repeated_errors: bool | None = None
- switches: list[str] | None = None
- timeout: float | None = None
- to_dict()
Return all defined (not None) Job object directives, required and optional, as a serializable dict, converting Headers object (which are not JSON serializable) to dicts.
- Returns:
A dict with all job directives as keys, ignoring those that are extras.
- Return type:
dict
- tz: str | None = None
- classmethod unserialize(data, filenames=None)
Unserialize a dict with job data (e.g. from the YAML jobs file) into a JobBase type object.
- Parameters:
data (dict) – The dict with job data (e.g. from the YAML jobs file).
filenames (list[Path] | None)
- Returns:
A JobBase type object.
- Return type:
JobBase
- url: str = ''
- user_data_dir: str | None = None
- user_visible_url: str | None = None
- validate()
Checks all instance attributes against class type hints.
- Return type:
None
- wait_for: int | str | None = None
- wait_for_function: str | dict[str, str] | None = None
- wait_for_navigation: str | tuple[str, ...] | None = None
- wait_for_selector: str | dict[str, str] | list[str | dict[str, str]] | None = None
- wait_for_timeout: float | None = None
- wait_for_url: str | None = None
- wait_until: Literal['commit', 'domcontentloaded', 'load', 'networkidle'] | None = None
- with_defaults(config)
Obtain a Job object that also contains defaults from the configuration.
- Parameters:
config (_Config) – The configuration as a dict.
- Returns:
A JobBase object.
- Return type:
JobBase
- exception webchanges.jobs.BrowserResponseError(args, status_code=None)
Bases:
ExceptionRaised by ‘url’ jobs with ‘use_browser: true’ (i.e. using Playwright) when an HTTP error response status code is received and is not one of the other Exceptions.
- Parameters:
args (tuple[Any, ...]) – Tuple with the underlying error args, typically a string with the error text.
status_code (int | None) – The HTTP status code received.
- Return type:
None
- add_note(note, /)
Add a note to the exception
- args
- with_traceback(tb, /)
Set self.__traceback__ to tb and return self.
- class webchanges.jobs.Job(**kwargs)
Bases:
JobBaseJob class for jobs.
- Parameters:
kwargs (Any)
- get_location()
Get the ‘location’ of the job, i.e. the (user_visible) URL or command.
- Returns:
The user_visible_url, the URL, or the command of the job.
- Return type:
str
- get_indexed_location()
Get the job number plus its ‘location’, i.e. the (user_visible) URL or command. Typically used in error displays.
- Returns:
The job number followed by a colon and the ‘location’ of the job, i.e. its user_visible_url, URL, or command.
- Return type:
str
- pretty_name()
Get the ‘pretty name’ of a job, i.e. either its ‘name’ (if defined) or the ‘location’ (user_visible_url, URL or command).
- Returns:
The ‘pretty name’ the job.
- Return type:
str
- retrieve(job_state, headless=True)
Runs job to retrieve the data, and returns data and ETag.
- Parameters:
job_state (JobState) – The JobState object, to keep track of the state of the retrieval.
headless (bool) – For browser-based jobs, whether headless mode should be used.
- Returns:
The data retrieved, the ETag, and the mime_type.
- Return type:
tuple[str | bytes, str, str]
- additions_only: bool | float | str | None = None
- block_elements: list[str] | None = None
- command: str = ''
- compared_versions: int | None = None
- contextlines: int | None = None
- cookies: dict[str, str] | None = None
- data: str | list | dict | None = None
- data_as_json: bool | None = None
- deletions_only: bool | None = None
- diff_filters: str | list[str | dict[str, Any]] | None = None
- diff_tool: str | None = None
- differ: dict[str, Any] | None = None
- empty_as_transient: bool | None = None
- enabled: bool | None = None
- encoding: str | None = None
- evaluate: str | None = None
- filters: Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | list[Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | dict[str, Any]] | None = None
- fingerprints: dict[str, str | dict[str, Any]] | None = None
- format_error(exception, tb)
Format the error of the job if one is encountered.
- Parameters:
exception (Exception) – The exception.
tb (str) – The traceback.format_exc() string.
- Returns:
A string to display and/or use in reports.
- Return type:
str
- classmethod from_dict(data, filenames)
Create a JobBase class from a dict, checking that all keys are recognized (i.e. listed in __required__ or __optional__).
- Parameters:
data (dict) – Job data in dict format (e.g. from the YAML jobs file).
filenames (list[Path])
- Returns:
A JobBase type object.
- Return type:
JobBase
- get_fips_guid()
Calculate the GUID as a SHA256 hash of the location (URL or command).
- Returns:
the GUID.
- Return type:
str
- get_guid()
Calculate the GUID, currently a simple SHA1 hash of the location (URL or command).
- Returns:
the GUID.
- Return type:
str
- get_proxy()
Return the correct proxy, depending on whether the URL is http or https.
- Return type:
str | None
- guid: str = ''
- headers = Headers({}, encoding='utf-8')
- http_client: Literal['httpx', 'requests', 'curl_cffi'] | None = None
- http_credentials: str | None = None
- http_version: Literal['v1', 'v2', 'v2tls', 'v2_prior_knowledge', 'v3', 'v3only'] | None = None
- ignore_cached: bool | None = None
- ignore_connection_errors: bool | None = None
- ignore_default_args: bool | str | list[str] | None = None
- ignore_dh_key_too_small: bool | None = None
- ignore_error(exception)
Determine whether the error of the job should be ignored.
- Parameters:
exception (Exception) – The exception.
- Returns:
True or the string with the number of the HTTPError code if the error should be ignored, False otherwise.
- Return type:
bool
- ignore_http_error_codes: list[int | str] | int | str | None = None
- ignore_https_errors: bool | None = None
- ignore_timeout_errors: bool | None = None
- ignore_too_many_redirects: bool | None = None
- impersonate: str | None = None
- index_number: int = 0
- init_script: str | None = None
- initialization_js: str | None = None
- initialization_url: str | None = None
- is_enabled()
Returns whether job is enabled.
- Returns:
Whether the job is enabled.
- Return type:
bool
- is_markdown: bool | None = None
- classmethod job_documentation()
Generates simple jobs documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- kind: str | None = None
- loop: AbstractEventLoop | None = None
- main_thread_enter()
Called from the main thread before running the job. No longer needed (does nothing).
- Return type:
None
- main_thread_exit()
Called from the main thread after running the job. No longer needed (does nothing).
- Return type:
None
- static make_guid(name)
Calculate the GUID from a string (currently a simple SHA1).
- Returns:
the GUID.
- Parameters:
name (str)
- Return type:
str
- markdown_padded_tables: bool | None = None
- max_tries: int | None = None
- method: Literal['GET', 'OPTIONS', 'HEAD', 'POST', 'PUT', 'PATCH', 'DELETE'] | None = None
- mime_type: str | None = None
- monospace: bool | None = None
- name: str | None = None
- navigate: str | None = None
- no_conditional_request: bool | None = None
- no_redirects: bool | None = None
- note: str | None = None
- params: str | list | dict[str, str] | None = None
- proxy: str | None = None
- referer: str | None = None
- retries: int | None = None
- serialize()
Serialize the Job object, excluding its index_number (e.g. for saving).
- Returns:
A dict with the Job object serialized.
- Return type:
dict
- set_base_location(location)
Sets the job’s location (command or url) to location. Used for changing location (uuid).
- Parameters:
location (str)
- Return type:
None
- set_to_monospace()
If unset, sets the monospace flag to True (will not override).
- Return type:
None
- ssl_no_verify: bool | None = None
- stderr: str | None = None
- suppress_error_ended: bool | None = None
- suppress_errors: bool | None = None
- suppress_repeated_errors: bool | None = None
- switches: list[str] | None = None
- timeout: float | None = None
- to_dict()
Return all defined (not None) Job object directives, required and optional, as a serializable dict, converting Headers object (which are not JSON serializable) to dicts.
- Returns:
A dict with all job directives as keys, ignoring those that are extras.
- Return type:
dict
- tz: str | None = None
- classmethod unserialize(data, filenames=None)
Unserialize a dict with job data (e.g. from the YAML jobs file) into a JobBase type object.
- Parameters:
data (dict) – The dict with job data (e.g. from the YAML jobs file).
filenames (list[Path] | None)
- Returns:
A JobBase type object.
- Return type:
JobBase
- url: str = ''
- use_browser: bool | str | None = False
- user_data_dir: str | None = None
- user_visible_url: str | None = None
- validate()
Checks all instance attributes against class type hints.
- Return type:
None
- wait_for: int | str | None = None
- wait_for_function: str | dict[str, str] | None = None
- wait_for_navigation: str | tuple[str, ...] | None = None
- wait_for_selector: str | dict[str, str] | list[str | dict[str, str]] | None = None
- wait_for_timeout: float | None = None
- wait_for_url: str | None = None
- wait_until: Literal['commit', 'domcontentloaded', 'load', 'networkidle'] | None = None
- with_defaults(config)
Obtain a Job object that also contains defaults from the configuration.
- Parameters:
config (_Config) – The configuration as a dict.
- Returns:
A JobBase object.
- Return type:
JobBase
- class webchanges.jobs.JobBase(**kwargs)
Bases:
objectThe base class for Jobs.
- Parameters:
kwargs (Any)
- index_number: int = 0
- url: str = ''
- command: str = ''
- use_browser: bool | str | None = False
- additions_only: bool | float | str | None = None
- block_elements: list[str] | None = None
- compared_versions: int | None = None
- contextlines: int | None = None
- cookies: dict[str, str] | None = None
- data: str | list | dict | None = None
- data_as_json: bool | None = None
- deletions_only: bool | None = None
- differ: dict[str, Any] | None = None
- diff_filters: str | list[str | dict[str, Any]] | None = None
- diff_tool: str | None = None
- empty_as_transient: bool | None = None
- enabled: bool | None = None
- encoding: str | None = None
- evaluate: str | None = None
- filters: Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | list[Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | dict[str, Any]] | None = None
- fingerprints: dict[str, str | dict[str, Any]] | None = None
- guid: str = ''
- headers = Headers({}, encoding='utf-8')
- http_client: Literal['httpx', 'requests', 'curl_cffi'] | None = None
- http_version: Literal['v1', 'v2', 'v2tls', 'v2_prior_knowledge', 'v3', 'v3only'] | None = None
- http_credentials: str | None = None
- ignore_cached: bool | None = None
- ignore_connection_errors: bool | None = None
- ignore_default_args: bool | str | list[str] | None = None
- ignore_dh_key_too_small: bool | None = None
- ignore_http_error_codes: list[int | str] | int | str | None = None
- ignore_https_errors: bool | None = None
- ignore_timeout_errors: bool | None = None
- ignore_too_many_redirects: bool | None = None
- impersonate: str | None = None
- init_script: str | None = None
- initialization_js: str | None = None
- initialization_url: str | None = None
- is_markdown: bool | None = None
- kind: str | None = None
- loop: AbstractEventLoop | None = None
- markdown_padded_tables: bool | None = None
- max_tries: int | None = None
- method: Literal['GET', 'OPTIONS', 'HEAD', 'POST', 'PUT', 'PATCH', 'DELETE'] | None = None
- mime_type: str | None = None
- monospace: bool | None = None
- name: str | None = None
- navigate: str | None = None
- no_conditional_request: bool | None = None
- no_redirects: bool | None = None
- note: str | None = None
- params: str | list | dict[str, str] | None = None
- proxy: str | None = None
- referer: str | None = None
- retries: int | None = None
- ssl_no_verify: bool | None = None
- stderr: str | None = None
- suppress_error_ended: bool | None = None
- suppress_errors: bool | None = None
- suppress_repeated_errors: bool | None = None
- switches: list[str] | None = None
- timeout: float | None = None
- tz: str | None = None
- user_data_dir: str | None = None
- user_visible_url: str | None = None
- wait_for: int | str | None = None
- wait_for_function: str | dict[str, str] | None = None
- wait_for_navigation: str | tuple[str, ...] | None = None
- wait_for_selector: str | dict[str, str] | list[str | dict[str, str]] | None = None
- wait_for_timeout: float | None = None
- wait_for_url: str | None = None
- wait_until: Literal['commit', 'domcontentloaded', 'load', 'networkidle'] | None = None
- classmethod job_documentation()
Generates simple jobs documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- get_location()
Get the ‘location’ of the job, i.e. the (user_visible) URL or command.
- Returns:
The user_visible_url, the URL, or the command of the job.
- Return type:
str
- get_indexed_location()
Get the job number plus its ‘location’, i.e. the (user_visible) URL or command. Typically used in error displays.
- Returns:
The job number followed by a colon and the ‘location’ of the job, i.e. its user_visible_url, URL, or command.
- Return type:
str
- set_base_location(location)
Sets the job’s location (command or url) to location. Used for changing location (uuid).
- Parameters:
location (str)
- Return type:
None
- pretty_name()
Get the ‘pretty name’ of a job, i.e. either its ‘name’ (if defined) or the ‘location’ (user_visible_url, URL or command).
- Returns:
The ‘pretty name’ the job.
- Return type:
str
- serialize()
Serialize the Job object, excluding its index_number (e.g. for saving).
- Returns:
A dict with the Job object serialized.
- Return type:
dict
- validate()
Checks all instance attributes against class type hints.
- Return type:
None
- classmethod unserialize(data, filenames=None)
Unserialize a dict with job data (e.g. from the YAML jobs file) into a JobBase type object.
- Parameters:
data (dict) – The dict with job data (e.g. from the YAML jobs file).
filenames (list[Path] | None)
- Returns:
A JobBase type object.
- Return type:
JobBase
- to_dict()
Return all defined (not None) Job object directives, required and optional, as a serializable dict, converting Headers object (which are not JSON serializable) to dicts.
- Returns:
A dict with all job directives as keys, ignoring those that are extras.
- Return type:
dict
- classmethod from_dict(data, filenames)
Create a JobBase class from a dict, checking that all keys are recognized (i.e. listed in __required__ or __optional__).
- Parameters:
data (dict) – Job data in dict format (e.g. from the YAML jobs file).
filenames (list[Path])
- Returns:
A JobBase type object.
- Return type:
JobBase
- with_defaults(config)
Obtain a Job object that also contains defaults from the configuration.
- Parameters:
config (_Config) – The configuration as a dict.
- Returns:
A JobBase object.
- Return type:
JobBase
- get_fips_guid()
Calculate the GUID as a SHA256 hash of the location (URL or command).
- Returns:
the GUID.
- Return type:
str
- static make_guid(name)
Calculate the GUID from a string (currently a simple SHA1).
- Returns:
the GUID.
- Parameters:
name (str)
- Return type:
str
- get_guid()
Calculate the GUID, currently a simple SHA1 hash of the location (URL or command).
- Returns:
the GUID.
- Return type:
str
- retrieve(job_state, headless=True)
Runs job to retrieve the data, and returns data and ETag.
- Parameters:
job_state (JobState) – The JobState object, to keep track of the state of the retrieval.
headless (bool) – For browser-based jobs, whether headless mode should be used.
- Returns:
The data retrieved and the ETag.
- Return type:
tuple[str | bytes, str, str]
- main_thread_enter()
Called from the main thread before running the job. No longer needed (does nothing).
- Return type:
None
- main_thread_exit()
Called from the main thread after running the job. No longer needed (does nothing).
- Return type:
None
- format_error(exception, tb)
Format the error of the job if one is encountered.
- Parameters:
exception (Exception) – The exception.
tb (str) – The traceback.format_exc() string.
- Returns:
A string to display and/or use in reports.
- Return type:
str
- ignore_error(exception)
Determine whether the error of the job should be ignored.
- Parameters:
exception (Exception) – The exception.
- Returns:
True or the string with the number of the HTTPError code if the error should be ignored, False otherwise.
- Return type:
bool
- is_enabled()
Returns whether job is enabled.
- Returns:
Whether the job is enabled.
- Return type:
bool
- set_to_monospace()
If unset, sets the monospace flag to True (will not override).
- Return type:
None
- get_proxy()
Return the correct proxy, depending on whether the URL is http or https.
- Return type:
str | None
- exception webchanges.jobs.NotModifiedError
Bases:
ExceptionRaised when an HTTP 304 response status code (Not Modified client redirection) is received or the strong validation ETag matches the previous one; this indicates that there was no change in content.
- add_note(note, /)
Add a note to the exception
- args
- with_traceback(tb, /)
Set self.__traceback__ to tb and return self.
- class webchanges.jobs.ShellJob(**kwargs)
Bases:
JobRun a shell command and get its standard output.
- Parameters:
kwargs (Any)
- get_location()
Get the ‘location’ of the job, i.e. the command.
- Returns:
The command of the job.
- Return type:
str
- set_base_location(location)
Sets the job’s location (command or url) to location. Used for changing location (uuid).
- Parameters:
location (str)
- Return type:
None
- retrieve(job_state, headless=True)
Runs job to retrieve the data, and returns data, ETag (which is blank) and mime_type (also blank).
- Parameters:
job_state (JobState) – The JobState object, to keep track of the state of the retrieval.
headless (bool) – For browser-based jobs, whether headless mode should be used.
- Returns:
The data retrieved and the ETag and mime_type.
- Raises:
subprocess.CalledProcessError – Subclass of SubprocessError, raised when a process returns a non-zero exit status.
subprocess.TimeoutExpired – Subclass of SubprocessError, raised when a timeout expires while waiting for a child process.
- Return type:
tuple[str | bytes, str, str]
- format_error(exception, tb)
Format the error of the job if one is encountered.
- Parameters:
exception (Exception) – The exception.
tb (str) – The traceback.format_exc() string.
- Returns:
A string to display and/or use in reports.
- Return type:
str
- additions_only: bool | float | str | None = None
- block_elements: list[str] | None = None
- command: str = ''
- compared_versions: int | None = None
- contextlines: int | None = None
- cookies: dict[str, str] | None = None
- data: str | list | dict | None = None
- data_as_json: bool | None = None
- deletions_only: bool | None = None
- diff_filters: str | list[str | dict[str, Any]] | None = None
- diff_tool: str | None = None
- differ: dict[str, Any] | None = None
- empty_as_transient: bool | None = None
- enabled: bool | None = None
- encoding: str | None = None
- evaluate: str | None = None
- filters: Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | list[Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | dict[str, Any]] | None = None
- fingerprints: dict[str, str | dict[str, Any]] | None = None
- classmethod from_dict(data, filenames)
Create a JobBase class from a dict, checking that all keys are recognized (i.e. listed in __required__ or __optional__).
- Parameters:
data (dict) – Job data in dict format (e.g. from the YAML jobs file).
filenames (list[Path])
- Returns:
A JobBase type object.
- Return type:
JobBase
- get_fips_guid()
Calculate the GUID as a SHA256 hash of the location (URL or command).
- Returns:
the GUID.
- Return type:
str
- get_guid()
Calculate the GUID, currently a simple SHA1 hash of the location (URL or command).
- Returns:
the GUID.
- Return type:
str
- get_indexed_location()
Get the job number plus its ‘location’, i.e. the (user_visible) URL or command. Typically used in error displays.
- Returns:
The job number followed by a colon and the ‘location’ of the job, i.e. its user_visible_url, URL, or command.
- Return type:
str
- get_proxy()
Return the correct proxy, depending on whether the URL is http or https.
- Return type:
str | None
- guid: str = ''
- headers = Headers({}, encoding='utf-8')
- http_client: Literal['httpx', 'requests', 'curl_cffi'] | None = None
- http_credentials: str | None = None
- http_version: Literal['v1', 'v2', 'v2tls', 'v2_prior_knowledge', 'v3', 'v3only'] | None = None
- ignore_cached: bool | None = None
- ignore_connection_errors: bool | None = None
- ignore_default_args: bool | str | list[str] | None = None
- ignore_dh_key_too_small: bool | None = None
- ignore_error(exception)
Determine whether the error of the job should be ignored.
- Parameters:
exception (Exception) – The exception.
- Returns:
True or the string with the number of the HTTPError code if the error should be ignored, False otherwise.
- Return type:
bool
- ignore_http_error_codes: list[int | str] | int | str | None = None
- ignore_https_errors: bool | None = None
- ignore_timeout_errors: bool | None = None
- ignore_too_many_redirects: bool | None = None
- impersonate: str | None = None
- index_number: int = 0
- init_script: str | None = None
- initialization_js: str | None = None
- initialization_url: str | None = None
- is_enabled()
Returns whether job is enabled.
- Returns:
Whether the job is enabled.
- Return type:
bool
- is_markdown: bool | None = None
- classmethod job_documentation()
Generates simple jobs documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- kind: str | None = None
- loop: AbstractEventLoop | None = None
- main_thread_enter()
Called from the main thread before running the job. No longer needed (does nothing).
- Return type:
None
- main_thread_exit()
Called from the main thread after running the job. No longer needed (does nothing).
- Return type:
None
- static make_guid(name)
Calculate the GUID from a string (currently a simple SHA1).
- Returns:
the GUID.
- Parameters:
name (str)
- Return type:
str
- markdown_padded_tables: bool | None = None
- max_tries: int | None = None
- method: Literal['GET', 'OPTIONS', 'HEAD', 'POST', 'PUT', 'PATCH', 'DELETE'] | None = None
- mime_type: str | None = None
- monospace: bool | None = None
- name: str | None = None
- navigate: str | None = None
- no_conditional_request: bool | None = None
- no_redirects: bool | None = None
- note: str | None = None
- params: str | list | dict[str, str] | None = None
- pretty_name()
Get the ‘pretty name’ of a job, i.e. either its ‘name’ (if defined) or the ‘location’ (user_visible_url, URL or command).
- Returns:
The ‘pretty name’ the job.
- Return type:
str
- proxy: str | None = None
- referer: str | None = None
- retries: int | None = None
- serialize()
Serialize the Job object, excluding its index_number (e.g. for saving).
- Returns:
A dict with the Job object serialized.
- Return type:
dict
- set_to_monospace()
If unset, sets the monospace flag to True (will not override).
- Return type:
None
- ssl_no_verify: bool | None = None
- stderr: str | None = None
- suppress_error_ended: bool | None = None
- suppress_errors: bool | None = None
- suppress_repeated_errors: bool | None = None
- switches: list[str] | None = None
- timeout: float | None = None
- to_dict()
Return all defined (not None) Job object directives, required and optional, as a serializable dict, converting Headers object (which are not JSON serializable) to dicts.
- Returns:
A dict with all job directives as keys, ignoring those that are extras.
- Return type:
dict
- tz: str | None = None
- classmethod unserialize(data, filenames=None)
Unserialize a dict with job data (e.g. from the YAML jobs file) into a JobBase type object.
- Parameters:
data (dict) – The dict with job data (e.g. from the YAML jobs file).
filenames (list[Path] | None)
- Returns:
A JobBase type object.
- Return type:
JobBase
- url: str = ''
- use_browser: bool | str | None = False
- user_data_dir: str | None = None
- user_visible_url: str | None = None
- validate()
Checks all instance attributes against class type hints.
- Return type:
None
- wait_for: int | str | None = None
- wait_for_function: str | dict[str, str] | None = None
- wait_for_navigation: str | tuple[str, ...] | None = None
- wait_for_selector: str | dict[str, str] | list[str | dict[str, str]] | None = None
- wait_for_timeout: float | None = None
- wait_for_url: str | None = None
- wait_until: Literal['commit', 'domcontentloaded', 'load', 'networkidle'] | None = None
- with_defaults(config)
Obtain a Job object that also contains defaults from the configuration.
- Parameters:
config (_Config) – The configuration as a dict.
- Returns:
A JobBase object.
- Return type:
JobBase
- exception webchanges.jobs.TransientBrowserError(*args)
Bases:
ExceptionRaised by BrowserJob when a transient error is returned by the browser, either as a PlaywrightTimeoutError or as a browser error listed in the 100-199 Connection related errors.
The args[0] will contain the string ‘PlaywrightTimeoutError’ or the text of the browser error.
- Parameters:
args (object)
- Return type:
None
- add_note(note, /)
Add a note to the exception
- args
- with_traceback(tb, /)
Set self.__traceback__ to tb and return self.
- exception webchanges.jobs.TransientHTTPError(*args, status_code)
Bases:
ExceptionRaised by subclasses of UrlJobBase when one of these HTTP response status codes is received:
429 Too Many Requests
500 Internal Server Error
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout
- Parameters:
args (object)
status_code (int)
- Return type:
None
- status_code: int
- add_note(note, /)
Add a note to the exception
- args
- with_traceback(tb, /)
Set self.__traceback__ to tb and return self.
- class webchanges.jobs.UrlJob(**kwargs)
Bases:
UrlJobBaseRetrieve a URL from a web server.
- Parameters:
kwargs (Any)
- get_location()
Get the ‘location’ of the job, i.e. the (user_visible) URL.
- Returns:
The user_visible_url or URL of the job.
- Return type:
str
- set_base_location(location)
Sets the job’s location (command or url) to location. Used for changing location (uuid).
- Parameters:
location (str)
- Return type:
None
- retrieve(job_state, headless=True)
Runs job to retrieve the data, and returns data, ETag and media type.
- Parameters:
job_state (JobState) – The JobState object, to keep track of the state of the retrieval.
headless (bool) – For browser-based jobs, whether headless mode should be used.
- Returns:
The data retrieved, the ETag, and the media type (fka MIME type)
- Raises:
NotModifiedError – If an HTTP 304 response is received.
- Return type:
tuple[str | bytes, str, str]
- format_error(exception, tb)
Format the error of the job if one is encountered.
- Parameters:
exception (Exception) – The exception.
tb (str) – The traceback.format_exc() string.
- Returns:
A string to display and/or use in reports.
- Return type:
str
- ignore_error(exception)
Determine whether the error of the job should be ignored.
- Parameters:
exception (Exception) – The exception.
- Returns:
True if the error should be ignored, False otherwise.
- Return type:
bool
- additions_only: bool | float | str | None = None
- block_elements: list[str] | None = None
- command: str = ''
- compared_versions: int | None = None
- contextlines: int | None = None
- cookies: dict[str, str] | None = None
- data: str | list | dict | None = None
- data_as_json: bool | None = None
- deletions_only: bool | None = None
- diff_filters: str | list[str | dict[str, Any]] | None = None
- diff_tool: str | None = None
- differ: dict[str, Any] | None = None
- empty_as_transient: bool | None = None
- enabled: bool | None = None
- encoding: str | None = None
- evaluate: str | None = None
- filters: Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | list[Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | dict[str, Any]] | None = None
- fingerprints: dict[str, str | dict[str, Any]] | None = None
- classmethod from_dict(data, filenames)
Create a JobBase class from a dict, checking that all keys are recognized (i.e. listed in __required__ or __optional__).
- Parameters:
data (dict) – Job data in dict format (e.g. from the YAML jobs file).
filenames (list[Path])
- Returns:
A JobBase type object.
- Return type:
JobBase
- get_fips_guid()
Calculate the GUID as a SHA256 hash of the location (URL or command).
- Returns:
the GUID.
- Return type:
str
- get_guid()
Calculate the GUID, currently a simple SHA1 hash of the location (URL or command).
- Returns:
the GUID.
- Return type:
str
- get_headers(job_state, user_agent='webchanges/3.36.1 (+https://pypi.org/project/webchanges/)', include_cookies=True)
Get headers and modify them to add cookies and conditional request. If headers don’t contain User-Agent, either the default one or the one provided as user_agent is added.
- Parameters:
job_state (JobState) – The job state.
user_agent (str | None) – The user agent string.
include_cookies (bool)
- Include_cookies:
Whether to include cookies (from self.cookies) as a Cookie header.
- Returns:
The headers.
- Return type:
Headers
- get_indexed_location()
Get the job number plus its ‘location’, i.e. the (user_visible) URL or command. Typically used in error displays.
- Returns:
The job number followed by a colon and the ‘location’ of the job, i.e. its user_visible_url, URL, or command.
- Return type:
str
- get_proxy()
Return the correct proxy, depending on whether the URL is http or https.
- Return type:
str | None
- guid: str = ''
- headers = Headers({}, encoding='utf-8')
- http_client: Literal['httpx', 'requests', 'curl_cffi'] | None = None
- http_credentials: str | None = None
- http_version: Literal['v1', 'v2', 'v2tls', 'v2_prior_knowledge', 'v3', 'v3only'] | None = None
- ignore_cached: bool | None = None
- ignore_connection_errors: bool | None = None
- ignore_default_args: bool | str | list[str] | None = None
- ignore_dh_key_too_small: bool | None = None
- ignore_http_error_codes: list[int | str] | int | str | None = None
- ignore_https_errors: bool | None = None
- ignore_timeout_errors: bool | None = None
- ignore_too_many_redirects: bool | None = None
- impersonate: str | None = None
- index_number: int = 0
- init_script: str | None = None
- initialization_js: str | None = None
- initialization_url: str | None = None
- is_enabled()
Returns whether job is enabled.
- Returns:
Whether the job is enabled.
- Return type:
bool
- is_markdown: bool | None = None
- classmethod job_documentation()
Generates simple jobs documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- kind: str | None = None
- loop: AbstractEventLoop | None = None
- main_thread_enter()
Called from the main thread before running the job. No longer needed (does nothing).
- Return type:
None
- main_thread_exit()
Called from the main thread after running the job. No longer needed (does nothing).
- Return type:
None
- static make_guid(name)
Calculate the GUID from a string (currently a simple SHA1).
- Returns:
the GUID.
- Parameters:
name (str)
- Return type:
str
- markdown_padded_tables: bool | None = None
- max_tries: int | None = None
- method: Literal['GET', 'OPTIONS', 'HEAD', 'POST', 'PUT', 'PATCH', 'DELETE'] | None = None
- mime_type: str | None = None
- monospace: bool | None = None
- name: str | None = None
- navigate: str | None = None
- no_conditional_request: bool | None = None
- no_redirects: bool | None = None
- note: str | None = None
- params: str | list | dict[str, str] | None = None
- pretty_name()
Get the ‘pretty name’ of a job, i.e. either its ‘name’ (if defined) or the ‘location’ (user_visible_url, URL or command).
- Returns:
The ‘pretty name’ the job.
- Return type:
str
- proxy: str | None = None
- referer: str | None = None
- retries: int | None = None
- serialize()
Serialize the Job object, excluding its index_number (e.g. for saving).
- Returns:
A dict with the Job object serialized.
- Return type:
dict
- set_to_monospace()
If unset, sets the monospace flag to True (will not override).
- Return type:
None
- ssl_no_verify: bool | None = None
- stderr: str | None = None
- suppress_error_ended: bool | None = None
- suppress_errors: bool | None = None
- suppress_repeated_errors: bool | None = None
- switches: list[str] | None = None
- timeout: float | None = None
- to_dict()
Return all defined (not None) Job object directives, required and optional, as a serializable dict, converting Headers object (which are not JSON serializable) to dicts.
- Returns:
A dict with all job directives as keys, ignoring those that are extras.
- Return type:
dict
- tz: str | None = None
- classmethod unserialize(data, filenames=None)
Unserialize a dict with job data (e.g. from the YAML jobs file) into a JobBase type object.
- Parameters:
data (dict) – The dict with job data (e.g. from the YAML jobs file).
filenames (list[Path] | None)
- Returns:
A JobBase type object.
- Return type:
JobBase
- url: str = ''
- use_browser: bool | str | None = False
- user_data_dir: str | None = None
- user_visible_url: str | None = None
- validate()
Checks all instance attributes against class type hints.
- Return type:
None
- wait_for: int | str | None = None
- wait_for_function: str | dict[str, str] | None = None
- wait_for_navigation: str | tuple[str, ...] | None = None
- wait_for_selector: str | dict[str, str] | list[str | dict[str, str]] | None = None
- wait_for_timeout: float | None = None
- wait_for_url: str | None = None
- wait_until: Literal['commit', 'domcontentloaded', 'load', 'networkidle'] | None = None
- with_defaults(config)
Obtain a Job object that also contains defaults from the configuration.
- Parameters:
config (_Config) – The configuration as a dict.
- Returns:
A JobBase object.
- Return type:
JobBase
- class webchanges.jobs.UrlJobBase(**kwargs)
Bases:
JobThe base class for jobs that use the ‘url’ key. Includes UrlJob and BrowserJob.
- Parameters:
kwargs (Any)
- get_headers(job_state, user_agent='webchanges/3.36.1 (+https://pypi.org/project/webchanges/)', include_cookies=True)
Get headers and modify them to add cookies and conditional request. If headers don’t contain User-Agent, either the default one or the one provided as user_agent is added.
- Parameters:
job_state (JobState) – The job state.
user_agent (str | None) – The user agent string.
include_cookies (bool)
- Include_cookies:
Whether to include cookies (from self.cookies) as a Cookie header.
- Returns:
The headers.
- Return type:
Headers
- additions_only: bool | float | str | None = None
- block_elements: list[str] | None = None
- command: str = ''
- compared_versions: int | None = None
- contextlines: int | None = None
- cookies: dict[str, str] | None = None
- data: str | list | dict | None = None
- data_as_json: bool | None = None
- deletions_only: bool | None = None
- diff_filters: str | list[str | dict[str, Any]] | None = None
- diff_tool: str | None = None
- differ: dict[str, Any] | None = None
- empty_as_transient: bool | None = None
- enabled: bool | None = None
- encoding: str | None = None
- evaluate: str | None = None
- filters: Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | list[Literal['absolute_links', 'ascii85', 'base64', 'beautify', 'format-json', 'format-xml', 'hexdump', 'html2text', 'ical2text', 'jsontoyaml', 'pretty-xml', 'remove_repeated', 'reverse', 'sha1sum', 'sort', 'strip'] | dict[str, Any]] | None = None
- fingerprints: dict[str, str | dict[str, Any]] | None = None
- format_error(exception, tb)
Format the error of the job if one is encountered.
- Parameters:
exception (Exception) – The exception.
tb (str) – The traceback.format_exc() string.
- Returns:
A string to display and/or use in reports.
- Return type:
str
- classmethod from_dict(data, filenames)
Create a JobBase class from a dict, checking that all keys are recognized (i.e. listed in __required__ or __optional__).
- Parameters:
data (dict) – Job data in dict format (e.g. from the YAML jobs file).
filenames (list[Path])
- Returns:
A JobBase type object.
- Return type:
JobBase
- get_fips_guid()
Calculate the GUID as a SHA256 hash of the location (URL or command).
- Returns:
the GUID.
- Return type:
str
- get_guid()
Calculate the GUID, currently a simple SHA1 hash of the location (URL or command).
- Returns:
the GUID.
- Return type:
str
- get_indexed_location()
Get the job number plus its ‘location’, i.e. the (user_visible) URL or command. Typically used in error displays.
- Returns:
The job number followed by a colon and the ‘location’ of the job, i.e. its user_visible_url, URL, or command.
- Return type:
str
- get_location()
Get the ‘location’ of the job, i.e. the (user_visible) URL or command.
- Returns:
The user_visible_url, the URL, or the command of the job.
- Return type:
str
- get_proxy()
Return the correct proxy, depending on whether the URL is http or https.
- Return type:
str | None
- guid: str = ''
- headers = Headers({}, encoding='utf-8')
- http_client: Literal['httpx', 'requests', 'curl_cffi'] | None = None
- http_credentials: str | None = None
- http_version: Literal['v1', 'v2', 'v2tls', 'v2_prior_knowledge', 'v3', 'v3only'] | None = None
- ignore_cached: bool | None = None
- ignore_connection_errors: bool | None = None
- ignore_default_args: bool | str | list[str] | None = None
- ignore_dh_key_too_small: bool | None = None
- ignore_error(exception)
Determine whether the error of the job should be ignored.
- Parameters:
exception (Exception) – The exception.
- Returns:
True or the string with the number of the HTTPError code if the error should be ignored, False otherwise.
- Return type:
bool
- ignore_http_error_codes: list[int | str] | int | str | None = None
- ignore_https_errors: bool | None = None
- ignore_timeout_errors: bool | None = None
- ignore_too_many_redirects: bool | None = None
- impersonate: str | None = None
- index_number: int = 0
- init_script: str | None = None
- initialization_js: str | None = None
- initialization_url: str | None = None
- is_enabled()
Returns whether job is enabled.
- Returns:
Whether the job is enabled.
- Return type:
bool
- is_markdown: bool | None = None
- classmethod job_documentation()
Generates simple jobs documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- kind: str | None = None
- loop: AbstractEventLoop | None = None
- main_thread_enter()
Called from the main thread before running the job. No longer needed (does nothing).
- Return type:
None
- main_thread_exit()
Called from the main thread after running the job. No longer needed (does nothing).
- Return type:
None
- static make_guid(name)
Calculate the GUID from a string (currently a simple SHA1).
- Returns:
the GUID.
- Parameters:
name (str)
- Return type:
str
- markdown_padded_tables: bool | None = None
- max_tries: int | None = None
- method: Literal['GET', 'OPTIONS', 'HEAD', 'POST', 'PUT', 'PATCH', 'DELETE'] | None = None
- mime_type: str | None = None
- monospace: bool | None = None
- name: str | None = None
- navigate: str | None = None
- no_conditional_request: bool | None = None
- no_redirects: bool | None = None
- note: str | None = None
- params: str | list | dict[str, str] | None = None
- pretty_name()
Get the ‘pretty name’ of a job, i.e. either its ‘name’ (if defined) or the ‘location’ (user_visible_url, URL or command).
- Returns:
The ‘pretty name’ the job.
- Return type:
str
- proxy: str | None = None
- referer: str | None = None
- retries: int | None = None
- retrieve(job_state, headless=True)
Runs job to retrieve the data, and returns data and ETag.
- Parameters:
job_state (JobState) – The JobState object, to keep track of the state of the retrieval.
headless (bool) – For browser-based jobs, whether headless mode should be used.
- Returns:
The data retrieved, the ETag, and the mime_type.
- Return type:
tuple[str | bytes, str, str]
- serialize()
Serialize the Job object, excluding its index_number (e.g. for saving).
- Returns:
A dict with the Job object serialized.
- Return type:
dict
- set_base_location(location)
Sets the job’s location (command or url) to location. Used for changing location (uuid).
- Parameters:
location (str)
- Return type:
None
- set_to_monospace()
If unset, sets the monospace flag to True (will not override).
- Return type:
None
- ssl_no_verify: bool | None = None
- stderr: str | None = None
- suppress_error_ended: bool | None = None
- suppress_errors: bool | None = None
- suppress_repeated_errors: bool | None = None
- switches: list[str] | None = None
- timeout: float | None = None
- to_dict()
Return all defined (not None) Job object directives, required and optional, as a serializable dict, converting Headers object (which are not JSON serializable) to dicts.
- Returns:
A dict with all job directives as keys, ignoring those that are extras.
- Return type:
dict
- tz: str | None = None
- classmethod unserialize(data, filenames=None)
Unserialize a dict with job data (e.g. from the YAML jobs file) into a JobBase type object.
- Parameters:
data (dict) – The dict with job data (e.g. from the YAML jobs file).
filenames (list[Path] | None)
- Returns:
A JobBase type object.
- Return type:
JobBase
- url: str = ''
- use_browser: bool | str | None = False
- user_data_dir: str | None = None
- user_visible_url: str | None = None
- validate()
Checks all instance attributes against class type hints.
- Return type:
None
- wait_for: int | str | None = None
- wait_for_function: str | dict[str, str] | None = None
- wait_for_navigation: str | tuple[str, ...] | None = None
- wait_for_selector: str | dict[str, str] | list[str | dict[str, str]] | None = None
- wait_for_timeout: float | None = None
- wait_for_url: str | None = None
- wait_until: Literal['commit', 'domcontentloaded', 'load', 'networkidle'] | None = None
- with_defaults(config)
Obtain a Job object that also contains defaults from the configuration.
- Parameters:
config (_Config) – The configuration as a dict.
- Returns:
A JobBase object.
- Return type:
JobBase
- webchanges.jobs.represent_headers(dumper, data)
- Parameters:
dumper (SafeDumper)
data (Headers)
- Return type:
MappingNode