webchanges.filters module

Filters.

class webchanges.filters.AbsoluteLinksFilter(state)

Bases: FilterBase

Replace relative HTML <a> href links with absolute ones.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.Ascii85(state)

Bases: FilterBase

Convert bytes data (e.g. images) into an ascii85 string.

Ascii85 encoding is much more efficient than Base64.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.AutoMatchFilter(state)

Bases: FilterBase

Base class for filters that automatically exactly match one or more directives.

MATCH is a dict of {directive: text to match}.

Parameters:

state (JobState) – the JobState.

MATCH: dict[str, str] | None = None
match()

Check whether the filter matches (i.e. needs to be executed).

Returns:

True if match is found.

Return type:

bool

filter(data, mime_type, subfilter)

Method used by filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.Base64(state)

Bases: FilterBase

Convert bytes data (e.g. images) into a base64 string.

Base64 encoding causes an overhead of 33–37% relative to the size of the original binary data.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.BeautifyFilter(state)

Bases: FilterBase

Beautify HTML (requires Python package BeautifulSoup and optionally jsbeautifier and/or cssbeautifier).

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Filter (process) the data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.CSSFilter(state)

Bases: FilterBase

Filter XML/HTML using CSS selectors.

Parameters:

state (JobState) – the JobState.

EXPR_NAMES: dict[str, str]
expression: str
exclude: str
namespaces: dict[str, str]
skip: int
maxitems: int
filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.Csv2TextFilter(state)

Bases: FilterBase

Convert CSV to plaintext.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.DeleteLinesContainingFilter(state)

Bases: FilterBase

Remove lines matching a regular expression.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.ElementByClassFilter(state)

Bases: FilterBase

Get all HTML elements matching a class.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.ElementByIdFilter(state)

Bases: FilterBase

Get all HTML elements matching an ID.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.ElementByStyleFilter(state)

Bases: FilterBase

Get all HTML elements matching a style.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.ElementByTagFilter(state)

Bases: FilterBase

Get all HTML elements matching a tag.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.ElementsBy(filter_by, name, value=None)

Bases: HTMLParser, ABC

Initialize and reset this instance.

If convert_charrefs is True (the default), all character references are automatically converted to the corresponding Unicode characters.

Parameters:
  • filter_by (FilterBy)

  • name (str)

  • value (Any)

get_html()
Return type:

str

handle_starttag(tag, attrs)
Parameters:
  • tag (str)

  • attrs (list[tuple[str, str | None]])

Return type:

None

handle_endtag(tag)
Parameters:

tag (str)

Return type:

None

handle_data(data)
Parameters:

data (str)

Return type:

None

CDATA_CONTENT_ELEMENTS = ('script', 'style')
RCDATA_CONTENT_ELEMENTS = ('textarea', 'title')
check_for_whole_start_tag(i)
clear_cdata_mode()
close()

Handle any buffered data.

feed(data)

Feed data to the parser.

Call this as often as you want, with as little or as much text as you want (may include ‘n’).

get_starttag_text()

Return full source of start tag: ‘<…>’.

getpos()

Return current line number and offset.

goahead(end)
handle_charref(name)
handle_comment(data)
handle_decl(decl)
handle_entityref(name)
handle_pi(data)
handle_startendtag(tag, attrs)
parse_bogus_comment(i, report=1)
parse_comment(i, report=True)
parse_declaration(i)
parse_endtag(i)
parse_html_declaration(i)
parse_marked_section(i, report=1)
parse_pi(i)
parse_starttag(i)
reset()

Reset this instance. Loses all unprocessed data.

set_cdata_mode(elem, *, escapable=False)
unknown_decl(data)
updatepos(i, j)
class webchanges.filters.ExecuteFilter(state)

Bases: FilterBase

Filter using a command.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.FilterBase(state)

Bases: object

The base class for filters.

Parameters:

state (JobState) – the JobState.

method: str
classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

class webchanges.filters.FilterBy(*values)

Bases: Enum

ATTRIBUTE = 1
TAG = 2
class webchanges.filters.FormatJsonFilter(state)

Bases: FilterBase

Convert to formatted JSON.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.FormatXMLFilter(state)

Bases: FilterBase

Convert to formatted XML using lxml.etree.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.GrepFilter(state)

Bases: FilterBase

Deprecated; use keep_lines_containing instead.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Filter (process) the data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.GrepIFilter(state)

Bases: FilterBase

Deprecated; use delete_lines_containing instead.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.HexDumpFilter(state)

Bases: FilterBase

Convert string to hex dump format.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.Html2TextFilter(state)

Bases: FilterBase

Convert a string consisting of HTML to Unicode plain text for easy difference checking.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Filter (process) the data.

Subfilter key can be method and any method-specific option to be passed to it. The following method keys are supported:

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.Ical2TextFilter(state)

Bases: FilterBase

Convert iCalendar to plaintext (requires Python package vobject).

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.JQFilter(state)

Bases: FilterBase

Parse, transform, and extract data from json as text using jq.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.JsontoYamlFilter(state)

Bases: FilterBase

Convert JSON to formatted YAML. An alternative to format-json.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.KeepLinesContainingFilter(state)

Bases: FilterBase

Filter only lines matching a regular expression.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • self (KeepLinesContainingFilter | GrepFilter)

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.LxmlParser(filter_kind, subfilter, expr_key, job)

Bases: object

Parameters:
  • filter_kind (str)

  • subfilter (dict[str, Any])

  • expr_key (str)

  • job (JobBase)

EXPR_NAMES: dict[str, str] = {'css': 'a CSS selector', 'xpath': 'an XPath expression'}
parser: _FeedParser
method: str
expression: str
namespaces: dict[str, str] | None
skip: int
feed(data)
Parameters:

data (str)

Return type:

None

get_filtered_data(job_index_number=None)
Parameters:

job_index_number (int | None)

Return type:

str

class webchanges.filters.OCRFilter(state)

Bases: FilterBase

Convert text in images to plaintext (requires Python packages pytesseract and Pillow).

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.Pdf2TextFilter(state)

Bases: FilterBase

Convert PDF to plaintext (requires Python package pdftotext and its dependencies).

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.PrettyXMLFilter(state)

Bases: FilterBase

Pretty-print XML using xml.dom.minidom.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.PypdfFilter(state)

Bases: FilterBase

Convert PDF to plaintext (requires Python package pypdf).

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.ReSubFilter(state)

Bases: FilterBase

Replace text with regular expressions using Python’s re.sub.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.RegexFindall(state)

Bases: FilterBase

Extract text using regular expressions using Python’s re.findall

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.RegexMatchFilter(state)

Bases: FilterBase

Base class for filters that automatically match one or more directives.

Same as AutoMatchFilter but MATCH is a dict of {directive: Regular Expression Object}, where a Regular Expression Object is a compiled regex.

Parameters:

state (JobState) – the JobState.

MATCH: dict[str, re.Pattern] | None = None
match()

Check whether the filter matches (i.e. needs to be executed).

Returns:

True if match is found.

Return type:

bool

filter(data, mime_type, subfilter)

Method used by filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method
class webchanges.filters.RemoveDuplicateLinesFilter(state)

Bases: FilterBase

Remove duplicate lines (case sensitive).

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.RemoveRepeatedFilter(state)

Bases: FilterBase

Remove repeated lines (uniq).

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.ReverseFilter(state)

Bases: FilterBase

Reverse sort input items.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.Sha1SumFilter(state)

Bases: FilterBase

Calculate the SHA-1 checksum of the content.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.Sha256SumFilter(state)

Bases: FilterBase

Calculate the SHA-256 checksum of the content.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.ShellPipeFilter(state)

Bases: FilterBase

Filter using a shell command.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.SortFilter(state)

Bases: FilterBase

Sort input items.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Filter (process) the data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.StripFilter(state)

Bases: FilterBase

Strip leading and trailing whitespace.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.StripLinesFilter(state)

Bases: FilterBase

Deprecated; use strip with subfilter splitlines instead.

Parameters:

state (JobState) – the JobState.

filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

method: str
class webchanges.filters.XPathFilter(state)

Bases: FilterBase

Filter XML/HTML using XPath expressions.

Parameters:

state (JobState) – the JobState.

EXPR_NAMES: dict[str, str]
expression: str
exclude: str
classmethod auto_process(state, data, mime_type)

Processes all automatic filters (those with “MATCH” set) in JobState.Job over the data.

Parameters:
  • state (JobState) – The JobState object.

  • data (str | bytes) – The data to be processed (filtered).

  • mime_type (str)

Returns:

The output from the chain of filters (filtered data).

Return type:

tuple[str | bytes, str]

classmethod filter_chain_needs_bytes(filter_name)

Checks whether the first filter requires data in bytes (not Unicode).

Parameters:

filter_name (str | list[str | dict[str, Any]] | None) – The filter.

Returns:

True if the first filter requires data in bytes.

Return type:

bool

classmethod filter_documentation()

Generates simple filter documentation for use in the –features command line argument.

Returns:

A string to display.

Return type:

str

classmethod is_bytes_filter_kind(filter_kind)

Checks whether the filter requires data in bytes (not Unicode).

Parameters:

filter_kind (str) – The filter name.

Returns:

True if the filter requires data in bytes.

Return type:

bool

match()

Method used by automatch filters.

Returns:

True if an automatch filter.

Return type:

bool

classmethod normalize_filter_list(filter_spec, job_index_number=None)

Generates a list of filters that has been checked for its validity.

Parameters:
  • filter_spec (str | list[str | dict[str, Any]] | None) – A list of either filter_kind, subfilter (where subfilter is a dict) or a legacy string-based filter list specification.

  • job_index_number (int | None) – The job index number.

Returns:

Iterator of filter_kind, subfilter (where subfilter is a dict).

Return type:

Iterator[tuple[str, dict[str, Any]]]

classmethod process(filter_kind, subfilter, job_state, data, mime_type)

Process the filter.

Parameters:
  • filter_kind (str) – The name of the filter.

  • subfilter (dict[str, Any]) – The subfilter information.

  • job_state (JobState) – The JobState object (containing the Job).

  • data (str | bytes) – The data upon which to apply the filter.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]

raise_import_error(package_name, filter_name, error_message)

Raise ImportError for missing package.

Parameters:
  • package_name (str) – The name of the module/package that could not be imported.

  • filter_name (str) – The name of the filter that needs the package.

  • error_message (str) – The error message from ImportError.

Raises:

ImportError.

Return type:

None

namespaces: dict[str, str]
method: str
skip: int
maxitems: int
filter(data, mime_type, subfilter)

Method used by the filter to process data.

Parameters:
  • data (str | bytes) – The data to be filtered (processed).

  • subfilter (dict[str, Any]) – The subfilter information.

  • mime_type (str)

Returns:

The data and media type (fka MIME type) of the data after the filter has been applied.

Return type:

tuple[str | bytes, str]