webchanges.filters module
Filters.
- class webchanges.filters.AbsoluteLinksFilter(state)
Bases:
FilterBaseReplace relative HTML <a> href links with absolute ones.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.Ascii85(state)
Bases:
FilterBaseConvert bytes data (e.g. images) into an ascii85 string.
Ascii85 encoding is much more efficient than Base64.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.AutoMatchFilter(state)
Bases:
FilterBaseBase class for filters that automatically exactly match one or more directives.
MATCH is a dict of {directive: text to match}.
- Parameters:
state (JobState) – the JobState.
- MATCH: dict[str, str] | None = None
- match()
Check whether the filter matches (i.e. needs to be executed).
- Returns:
True if match is found.
- Return type:
bool
- filter(data, mime_type, subfilter)
Method used by filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.Base64(state)
Bases:
FilterBaseConvert bytes data (e.g. images) into a base64 string.
Base64 encoding causes an overhead of 33–37% relative to the size of the original binary data.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.BeautifyFilter(state)
Bases:
FilterBaseBeautify HTML (requires Python package
BeautifulSoupand optionallyjsbeautifierand/orcssbeautifier).- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Filter (process) the data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.CSSFilter(state)
Bases:
FilterBaseFilter XML/HTML using CSS selectors.
- Parameters:
state (JobState) – the JobState.
- EXPR_NAMES: dict[str, str]
- expression: str
- exclude: str
- namespaces: dict[str, str]
- skip: int
- maxitems: int
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.Csv2TextFilter(state)
Bases:
FilterBaseConvert CSV to plaintext.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.DeleteLinesContainingFilter(state)
Bases:
FilterBaseRemove lines matching a regular expression.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
self (DeleteLinesContainingFilter | GrepIFilter)
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.ElementByClassFilter(state)
Bases:
FilterBaseGet all HTML elements matching a class.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.ElementByIdFilter(state)
Bases:
FilterBaseGet all HTML elements matching an ID.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.ElementByStyleFilter(state)
Bases:
FilterBaseGet all HTML elements matching a style.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.ElementByTagFilter(state)
Bases:
FilterBaseGet all HTML elements matching a tag.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.ElementsBy(filter_by, name, value=None)
Bases:
HTMLParser,ABCInitialize and reset this instance.
If convert_charrefs is True (the default), all character references are automatically converted to the corresponding Unicode characters.
- Parameters:
filter_by (FilterBy)
name (str)
value (Any)
- get_html()
- Return type:
str
- handle_starttag(tag, attrs)
- Parameters:
tag (str)
attrs (list[tuple[str, str | None]])
- Return type:
None
- handle_endtag(tag)
- Parameters:
tag (str)
- Return type:
None
- handle_data(data)
- Parameters:
data (str)
- Return type:
None
- CDATA_CONTENT_ELEMENTS = ('script', 'style')
- RCDATA_CONTENT_ELEMENTS = ('textarea', 'title')
- check_for_whole_start_tag(i)
- clear_cdata_mode()
- close()
Handle any buffered data.
- feed(data)
Feed data to the parser.
Call this as often as you want, with as little or as much text as you want (may include ‘n’).
- get_starttag_text()
Return full source of start tag: ‘<…>’.
- getpos()
Return current line number and offset.
- goahead(end)
- handle_charref(name)
- handle_comment(data)
- handle_decl(decl)
- handle_entityref(name)
- handle_pi(data)
- handle_startendtag(tag, attrs)
- parse_bogus_comment(i, report=1)
- parse_comment(i, report=True)
- parse_declaration(i)
- parse_endtag(i)
- parse_html_declaration(i)
- parse_marked_section(i, report=1)
- parse_pi(i)
- parse_starttag(i)
- reset()
Reset this instance. Loses all unprocessed data.
- set_cdata_mode(elem, *, escapable=False)
- unknown_decl(data)
- updatepos(i, j)
- class webchanges.filters.ExecuteFilter(state)
Bases:
FilterBaseFilter using a command.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.FilterBase(state)
Bases:
objectThe base class for filters.
- Parameters:
state (JobState) – the JobState.
- method: str
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- class webchanges.filters.FormatJsonFilter(state)
Bases:
FilterBaseConvert to formatted JSON.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.FormatXMLFilter(state)
Bases:
FilterBaseConvert to formatted XML using lxml.etree.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.GrepFilter(state)
Bases:
FilterBaseDeprecated; use
keep_lines_containinginstead.- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Filter (process) the data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.GrepIFilter(state)
Bases:
FilterBaseDeprecated; use
delete_lines_containinginstead.- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.HexDumpFilter(state)
Bases:
FilterBaseConvert string to hex dump format.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.Html2TextFilter(state)
Bases:
FilterBaseConvert a string consisting of HTML to Unicode plain text for easy difference checking.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Filter (process) the data.
Subfilter key can be
methodand any method-specific option to be passed to it. The followingmethodkeys are supported:html2text(default): Use html2text Python library to extract text (in Markdown).options: See https://github.com/Alir3z4/html2text/blob/master/docs/usage.md#available-options, however the following options are set to non-default values:
unicode_snob = Truebody_width = 0ignore_images = Truesingle_line_break = Truewrap_links = False
bs4: Use Beautiful Soup Python library to extract plain text.options:
parser: the type of markup you want to parse (currently supported are
html,xml, andhtml5) or the name of the parser library you want to use (currently supported options arelxml,html5libandhtml.parser) as per https://www.crummy.com/software/BeautifulSoup/bs4/doc/#specifying-the-parser-to-use. Different parsers are compared at https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser. Note:html5lib``requires having the ``html5libPython package already installed. Defaults to ‘lxml’.separator: Strings will be concatenated using this separator. Defaults to
``(empty string).strip: If True, strings will be stripped before being concatenated. Defaults to False.
strip_tags: A simple and fast regex-based HTML tag stripper.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.Ical2TextFilter(state)
Bases:
FilterBaseConvert iCalendar to plaintext (requires Python package
vobject).- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.JQFilter(state)
Bases:
FilterBaseParse, transform, and extract data from json as text using jq.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.JsontoYamlFilter(state)
Bases:
FilterBaseConvert JSON to formatted YAML. An alternative to format-json.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.KeepLinesContainingFilter(state)
Bases:
FilterBaseFilter only lines matching a regular expression.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
self (KeepLinesContainingFilter | GrepFilter)
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.LxmlParser(filter_kind, subfilter, expr_key, job)
Bases:
object- Parameters:
filter_kind (str)
subfilter (dict[str, Any])
expr_key (str)
job (JobBase)
- EXPR_NAMES: dict[str, str] = {'css': 'a CSS selector', 'xpath': 'an XPath expression'}
- parser: _FeedParser
- method: str
- expression: str
- namespaces: dict[str, str] | None
- skip: int
- feed(data)
- Parameters:
data (str)
- Return type:
None
- get_filtered_data(job_index_number=None)
- Parameters:
job_index_number (int | None)
- Return type:
str
- class webchanges.filters.OCRFilter(state)
Bases:
FilterBaseConvert text in images to plaintext (requires Python packages
pytesseractandPillow).- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.Pdf2TextFilter(state)
Bases:
FilterBaseConvert PDF to plaintext (requires Python package
pdftotextand its dependencies).- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.PrettyXMLFilter(state)
Bases:
FilterBasePretty-print XML using xml.dom.minidom.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.PypdfFilter(state)
Bases:
FilterBaseConvert PDF to plaintext (requires Python package
pypdf).- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.ReSubFilter(state)
Bases:
FilterBaseReplace text with regular expressions using Python’s re.sub.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.RegexFindall(state)
Bases:
FilterBaseExtract text using regular expressions using Python’s re.findall
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.RegexMatchFilter(state)
Bases:
FilterBaseBase class for filters that automatically match one or more directives.
Same as AutoMatchFilter but MATCH is a dict of {directive: Regular Expression Object}, where a Regular Expression Object is a compiled regex.
- Parameters:
state (JobState) – the JobState.
- MATCH: dict[str, re.Pattern] | None = None
- match()
Check whether the filter matches (i.e. needs to be executed).
- Returns:
True if match is found.
- Return type:
bool
- filter(data, mime_type, subfilter)
Method used by filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method
- class webchanges.filters.RemoveDuplicateLinesFilter(state)
Bases:
FilterBaseRemove duplicate lines (case sensitive).
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.RemoveRepeatedFilter(state)
Bases:
FilterBaseRemove repeated lines (uniq).
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.ReverseFilter(state)
Bases:
FilterBaseReverse sort input items.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.Sha1SumFilter(state)
Bases:
FilterBaseCalculate the SHA-1 checksum of the content.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.Sha256SumFilter(state)
Bases:
FilterBaseCalculate the SHA-256 checksum of the content.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.ShellPipeFilter(state)
Bases:
FilterBaseFilter using a shell command.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.SortFilter(state)
Bases:
FilterBaseSort input items.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Filter (process) the data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.StripFilter(state)
Bases:
FilterBaseStrip leading and trailing whitespace.
- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.StripLinesFilter(state)
Bases:
FilterBaseDeprecated; use
stripwith subfiltersplitlinesinstead.- Parameters:
state (JobState) – the JobState.
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- method: str
- class webchanges.filters.XPathFilter(state)
Bases:
FilterBaseFilter XML/HTML using XPath expressions.
- Parameters:
state (JobState) – the JobState.
- EXPR_NAMES: dict[str, str]
- expression: str
- exclude: str
- classmethod auto_process(state, data, mime_type)
Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.
- Parameters:
state (JobState) – The JobState object.
data (str | bytes) – The data to be processed (filtered).
mime_type (str)
- Returns:
The output from the chain of filters (filtered data).
- Return type:
tuple[str | bytes, str]
- classmethod filter_chain_needs_bytes(filter_name)
Checks whether the first filter requires data in bytes (not Unicode).
- Parameters:
filter_name (str | list[str | dict[str, Any]] | None) – The filter.
- Returns:
True if the first filter requires data in bytes.
- Return type:
bool
- classmethod filter_documentation()
Generates simple filter documentation for use in the –features command line argument.
- Returns:
A string to display.
- Return type:
str
- classmethod is_bytes_filter_kind(filter_kind)
Checks whether the filter requires data in bytes (not Unicode).
- Parameters:
filter_kind (str) – The filter name.
- Returns:
True if the filter requires data in bytes.
- Return type:
bool
- match()
Method used by automatch filters.
- Returns:
True if an automatch filter.
- Return type:
bool
- classmethod normalize_filter_list(filter_spec, job_index_number=None)
Generates a list of filters that has been checked for its validity.
- Parameters:
filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.
job_index_number (int | None) – The job index number.
- Returns:
Iterator of filter_kind, subfilter (where subfilter is a dict).
- Return type:
Iterator[tuple[str, dict[str, Any]]]
- classmethod process(filter_kind, subfilter, job_state, data, mime_type)
Process the filter.
- Parameters:
filter_kind (str) – The name of the filter.
subfilter (dict[str, Any]) – The subfilter information.
job_state (JobState) – The JobState object (containing the Job).
data (str | bytes) – The data upon which to apply the filter.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]
- raise_import_error(package_name, filter_name, error_message)
Raise ImportError for missing package.
- Parameters:
package_name (str) – The name of the module/package that could not be imported.
filter_name (str) – The name of the filter that needs the package.
error_message (str) – The error message from ImportError.
- Raises:
ImportError.
- Return type:
None
- namespaces: dict[str, str]
- method: str
- skip: int
- maxitems: int
- filter(data, mime_type, subfilter)
Method used by the filter to process data.
- Parameters:
data (str | bytes) – The data to be filtered (processed).
subfilter (dict[str, Any]) – The subfilter information.
mime_type (str)
- Returns:
The data and media type (fka MIME type) of the data after the filter has been applied.
- Return type:
tuple[str | bytes, str]