Changelog
This changelog mostly follows ‘keep a changelog’. Release numbering mostly
follows Semantic Versioning
(<major>.<minor>.<patch>
). Release date is UTC. Major backward incompatible (breaking) changes will be introduced
in major versions with advance notice in the Deprecations section. Documentation updates are ongoing and mostly
unlisted here.
Development
The unreleased versions can be installed as follows (git needs to be installed):
pip install git+https://github.com/mborsetti/webchanges.git@unreleased
Unreleased documentation is here.
Contributions are always welcomed, and you can check out the wish list for inspiration.
Version 3.22
2024-04-25
⚠ Breaking Changes
Developers integrating custom Python code (hooks.py) should refer to the “Internals” section below for important changes.
Changed
Snapshot database
Moved the snapshot database from the “user_cache” directory (typically not backed up) to the “user_data” directory. The new paths are (typically):
Linux:
~/.local/share/webchanges
or$XDG_DATA_HOME/webchanges
macOS:
~/Library/Application Support/webchanges
Windows:
%LOCALAPPDATA%\webchanges\webchanges
Renamed the file from
cache.db
tosnapshots.db
to more clearly denote its contents.Introduced a new command line option
--database
to specify the filename for the snapshot database, replacing the previous--cache
option (which is deprecated but still supported).Many thanks to Markus Weimar for pointing this problem out in issue #75.
Modified the command line argument
--test-differ
to accept a second parameter, specifying the maximum number of diffs to generate.Updated the command line argument
--dump-history
to display themime_type
attribute when present.Enhanced differs functionality:
Standardized headers for
deepdiff
andimagediff
to align more closely with those ofunified
.Improved the
google_ai
differ:Enhanced error handling: now, the differ will continue operation and report errors rather than failing outright when Google API errors occur.
Improved the default prompt to
Analyze this unified diff and create a summary listing only the changes:\n\n{unified_diff}
for improved results.
Fixed
Fixed an AttributeError Exception when the fallback HTTP client package
requests
is not installed, as reported by yubiuser in issue #76.Addressed a ValueError in the
--test-differ
command, a regression reported by Markus Weimar in issue #79.To prevent overlooking changes, webchanges now refrains from saving a new snapshot if a differ operation fails with an Exception.
Internals
New
mime_type
attribute: we are now capturing and storing the data type (as a MIME type) alongside data in the snapshot database to facilitate future automation of filtering, diffing, and reporting. Developers using custom Python code will need to update their filter and retrieval methods in classes inheriting from FilterBase and JobBase, respectively, to accommodate themime_type
attribute. Detailed updates are available in the hooks documentation.Updated terminology: References to
cache
in object names have been replaced withssdb
(snapshot database).Introduced a new NamedTuple,
Snapshot
, to streamline the process of retrieving and saving data to the database.
Version 3.21
2024-04-16
Added
Job selectable differs: The differ, i.e. the method by which changes are detected and summarized, can now be selected job by job. Also gone is the restriction to have only unified diffs, HTML table diff, or calling an outside executable, as differs have become modular.
Python programmers can write their own custom differs using the
hooks.py
file.Backward-compatibility is preserved, so your current jobs will continue to work.
New differs:
difflib
to report element-by-element changes in JSON or XML structured data.imagediff
(BETA) to report an image showing changes in an image being tracked.ai_google
(BETA) to use a Generative AI provide a summary of changes (free API key required). We use Google’s Gemini Pro 1.5 since it is the first model that can ingest 1M tokens, allowing to analyze changes in long documents (up to 350,000 words, or about 700 pages single-spaced) such as terms and conditions, privacy policies, etc. where summarization adds the most value and which other models can’t handle. The differ can call the Gen AI model to summarize a unified diff or to find and summarize the differences itself. Also supported is Gemini 1.0, but it can handle a lower number of tokens.
Changed
Filter
absolute_links
now converts URLs of theaction
,href
andsrc
attributes in any HTML tag, as well as thedata
attribute of the<object>
tag; it previously converted only thehref
attribute of<a>
tags.Updated explanatory text and error messages for increased clarity.
You can now select jobs to run by using its url/command instead of its number, e.g.
webchanges https://test.com
is just as valid aswebchanges 1
.
Deprecated
Job directive
diff_tool
. Replaced with thecommand
differ (see here.
Fixed
Internals
Improved speed of creating a unified diff for an HTML report.
Reduced excessive logging from
httpx
’s sub-moduleshpack
andhttpcore
when running with-vv
.
Version 3.20.2
2024-03-16
Fixed
Parsing the
to
address for thesendmail
email
reporter.
Version 3.20.1
2024-03-16
Fixed
Regression introduced in supporting sending to multiple “to” addresses.
Version 3.20
2024-03-15
Added
re.findall
filter to extract, delete or replace non-overlapping text using Pythonre.findall
.
Changed
--test-reporter
now allows testing of reporters that are not enabled; if a reporter is not enabled, a warning will be issued. This simplifies testing.email
reporter (both SMTP and sendmail) supports sending to multiple “to” addresses.
Fixed
Reports from jobs with
monospace: true
were not being rendered correctly in Gmail.
Version 3.19.1
2024-03-07
Fixed
Version 3.19
2024-02-28
Fixed
Under certain circumstances, certain default jobs directives declared in the configuration file would not be applied to jobs.
Fixed automatic fallback to
requests
when the required HTTP client packagehttpx
is missing.
Added
block_elements
directive for jobs withuse_browser: true
is supported again and can be used to improve speed by preventing binary and media content loading, while providing all elements required dynamic web page load (see the advanced section of the documentation for a suggestion of elements to block). This was available under Pypetteer and has been reintroduced for Playwright.init_script
directive for jobs withuse_browser: true
to execute a JavaScript in Chrome after launching it and before navigating tourl
. This can be useful to e.g. unset certain default Chromenavigator
properties by calling a JavaScript function to do so.
Version 3.18.1
2024-02-20
Fixed
Fixed regression whereby configuration key
empty-diff
was inadvertently renamedempty_diff
.
Version 3.18
2024-02-19
Fixed
Fixed incorrect handling of HTTP client libraries when
httpx
is not installed (should graciously fallback torequests
). Reported by drws as an add-on to issuse #66.
Added
Job directive
enabled
to allow disabling of a job without removing or commenting it in the jobs file (contributed by James Hewitt upstream).webhook
reporter has a newrich_text
config option for preformatted rich text for Slack (contributed by K̶e̶v̶i̶n̶ upstream).
Changed
Command line argument
--errors
now uses conditional requests to improve speed. Do not use to test newly modified jobs since websites reporting no changes from the last snapshot stored by webchanges are skipped; use--test
instead.If the
simplejson
library is installed, it will be used instead of the built-injson
module (see https://stackoverflow.com/questions/712791).
Version 3.17.2
2023-12-11
Fixed
Version 3.17.1
2023-12-10
Fixed
Version 3.17
2023-12-10
Added
You can now specify a reporter name after the command line argument
--errors
to send the output to the reporter specified. For example, to be notified by email of any jobs that result in an error or who, after filtering, return no data (indicating they may no longer be monitoring resources as expected), runwebchanges --errors email
(requested by yubiuser in #63).You can now suppress the
footer
in anhtml
report using the newfooter: false
sub-directive inconfig.yaml
(same as the one already existing withtext
andmarkdown
).
Internal
Fixed a regression on the default
User-Agent
header forurl
jobs with theuse_browser: true
directive.
Version 3.16
2023-12-07
Added
The HTTP/2 network protocol (the same used by major browsers) is now used in
url
jobs. This allows the monitoring of certain websites who block requests made with older protocols like HTTP/1.1. This is implemented by using theHTTPX
andh2
HTTP client libraries instead of therequests
one used previously.Notes:
Handling of data served by sites whose encoding is misconfigured is done slightly differently by
HTTPX
, and if you newly encounter instances where extended characters are rendered as�
try addingencoding: ISO-8859-1
to that job.To revert to the use of the
requests
HTTP client library, use the new job sub-directivehttp_client: requests
(in individual jobs or in the configuration file for allurl
jobs) and installrequests
by runningpip install --upgrade webchanges[requests]
.If the system is misconfigured and the
HTTPX
HTTP client library is not found, an attempt to use therequests
one will be made. This behaviour is transitional and will be removed in the future.HTTP/2 is theoretically faster than HTTP/1.1 and preliminary testing confirmed this.
New
pypdf
filter to convert pdf to text without having to separately install OS dependencies. If you’re usingpdf2text
(and its OS dependencies), I suggest you switch topypdf
as it’s much faster; however do note that theraw
andphysical
sub-directives are not supported. Install the required library by runningpip install --upgrade webchanges[pypdf]
.New
absolute_links
filter to convert relative links in HTML<a>
tags to absolute ones. This filter is not needed if you are already using thebeautify
orhtml2text
filters (requested by by Paweł Szubert in #62).New
{jobs_files}
substitution for thesubject
of theemail
reporter. This will be replaced by the name of the jobs file(s) different than the defaultjobs.yaml
in parentheses, with a prefix ofjobs-
in the name removed. To use, replace thesubject
line for your reporter(s) inconfig.yaml
with e.g.[webchanges] {count} changes{jobs_files}: {jobs}
.html
reports now have a configurabletitle
to set the HTML document title, defaulting to[webchanges] {count} changes{jobs_files}: {jobs}
.Added reference to a Docker implementation to the documentation (requested by by yubiuser in #64).
Changed
url
jobs will use theHTTPX
library instead ofrequests
if it’s installed since it uses the HTTP/2 network protocol (when theh2
library is also installed) as browsers do. To revert to the use ofrequests
even ifHTTPX
is installed on the system, addhttp_client: requests
to the relevant jobs or make it a default by editing the configuration file to add the sub-directivehttp_client: requests
forurl
jobs underjob_defaults
.The
beautify
filter converts relative links to absolute ones; use the newabsolute_links: false
sub-directive to disable.
Internal
Removed transitional support for the
beautifulsoup<4.11
library (i.e. older than 7 April 2022) for thebeautify
filter.Removed dependency on the
requests
library and its own dependency on theurllib3
library.Code cleanup, including removing support for Python 3.8.
Version 3.15
2023-10-25
Added
Support for Python 3.12.
data_as_json
job directive forurl
jobs to indicate thatdata
entered as a dict should be serialized as JSON instead of urlencoded and, if missing, the headerContent-Type
set toapplication/json
instead ofapplication/x-www-form-urlencoded
.
Changed
Removed
Support for Python 3.8. A reminder that older Python versions are supported for 3 years after being obsoleted by a new major release (i.e. about 4 years since their original release).
Internals
Upgraded build environment to use the
build
frontend andpyproject.toml
, eliminatingsetup.py
.Migrated to
pyproject.toml
the configuration of all tools who support it.Increased the default
timeout
forurl
jobs withuse_browser: true
(i.e. using Playwright) to 120 seconds.
Version 3.14
2023-09-01
Added
When running in verbose (
-v
) mode, if aurl
job withuse_browser: true
fails with a Playwright error, capture and save in the temporary folder a screenshot, a full page image, and the HTML contents of the page at the moment of the error (see logs for filenames).
Version 3.13
2023-08-28
Added
Reports have a new
separate
configuration option to split reports into one-per-job.url
jobs withoutuse_browser
have a newretries
directive to specify the number of times to retry a job that errors before giving up. Usingretries: 1
or higher will often solve the('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
error received from a misconfigured server at the first connection.remove_duplicates
filter has a newadjacent
sub-directive to de-duplicate non-adjacent lines or items.css
andxpath
have a newsort
subfilter to sort matched elements lexicographically.Command line arguments:
New
--footnote
to add a custom footnote to reports.New
--change-location
to keep job history when theurl
orcommand
changes.--gc-database
and--clean-database
now have optional argumentRETAIN-LIMIT
to allow increasing the number of retained snapshots from the default of 1.New
--detailed-versions
to display detailed version and system information, inclusive of the versions of dependencies and, in certain Linux distributions (e.g. Debian), of system libraries. It also reports available memory and disk space.
Changed
command
jobs now have improved error reporting which includes the error text from the failed command.--rollback-database
now confirms the date (in ISO-8601 format) to roll back the database to and, if webchanges is being run in interactive mode, the user will be asked for positive confirmation before proceeding with the un-reversible deletion.
Internals
Added bandit testing to improve the security of code.
headers
are now turned into strings before being passed to Playwright (addresses the errorplaywright._impl._api_types.Error: extraHTTPHeaders[13].value: expected string, got number
).Exclude tests from being recognized as package during build (contributed by Max in #54).
Refactored and cleaned up some tests.
Initial testing with Python 3.12.0-rc1, but a reported bug in
typing.TypeVar
prevents thepyee
dependency ofplaywright
from loading, causing a failure. Awaiting for fix in Python 3.12.0-rc2 to retry.
Version 3.12
2022-11-19
Added
Support for Python 3.11. Please note that the
lxml
dependency may fail to install on Windows due to this bug and that therefore for now webchanges can only be run in Python 3.10 on Windows. [Update:lxml wheels
for Python 3.11 on Windows are available as of 2022-12-13].
Removed
Support for Python 3.7. As a reminder, older Python versions are supported for 3 years after being obsoleted by a new major release; support for Python 3.8 will be removed on or about 5 October 2023.
Fixed
Job sorting for reports is now case-insensitive.
Documentation on how to anonymously monitor GitHub releases (due to changes in GitHub) (contributed by Luis Aranguren upstream).
Handling of
method
subfilter for filterhtml2text
(reported by kongomondo upstream).
Internals
Jobs base class now has a
__is_browser__
attribute, which can be used with custom hooks to identify jobs that run a browser so they can be executed in the correct parallel processing queue.Fixed static typing to conform to the latest mypy checks.
Extended type checking to testing scripts.
Version 3.11
2022-09-22
Notice
Support for Python 3.7 will be removed on or about 22 October 2022 as older Python versions are supported for 3 years after being obsoleted by a new major release.
Added
The new
no_conditional_request
directive forurl
jobs turns off conditional requests for those extremely rare websites that don’t handle it (e.g. Google Flights).Selecting the database engine and the maximum number of changed snapshots saved is now set through the configuration file, and the command line arguments
--database-engine
and--max-snapshots
are used to override such settings. See documentation for more information. Suggested by jprokos in #43.New configuration setting
empty-diff
within thedisplay
configuration for backwards compatibility only: use theadditions_only
job directive instead to achieve the same result. Reported by bbeevvoo in #47.Aliased the command line arguments
--gc-cache
with--gc-database
,--clean-cache
with--clean-database
and--rollback-cache
with--rollback-database
for clarity.The configuration file (e.g.
conf.yaml
) can now contain keys starting with a_
(underscore) for remarks (they are ignored).
Changed
Reports are now sorted alphabetically and therefore you can use the
name
directive to affect the order by which your jobs are displayed in reports.Implemented measures for
url
jobs usingbrowser: true
to avoid being detected: webchanges now passes all the headless Chrome detection tests here. Brought to attention by amammad in #45.Running
webchanges --test
(without specifying a JOB) will now check the hooks file (if any) for syntax errors in addition to the config and jobs file. Error reporting has also been improved.No longer showing the the text returned by the server when a 404 - Not Found error HTTP status code is returned by for all
url
jobs (previously only for jobs withuse_browser: true
).
Fixed
Bug in command line arguments
--config
and--hooks
. Contributed by Klaus Sperner in PR #46.Job directive
compared_versions
now works as documented and testing has been added to the test suite. Reported by jprokos in #43.The output of command line argument
--test-differ
now takes into considerationcompared_versions
.Markdown containing code in a link text now converts correctly in HTML reports.
Internals
The job
kind
ofshell
has been renamedcommand
to better reflect what it does and the way it’s described in the documentation, butshell
is still recognized for backward compatibility.Readthedocs build upgraded to Python 3.10
Version 3.10.3
2022-07-22
Added
url
jobs withuse_browser: true
that receive an error HTTP status code from the server will now include the text returned by the server in the error message (e.g. “Rate exceeded.”, “upstream request timeout”, etc.), except if HTTP status code 404 - Not Found is received.
Changed
The command line argument
--jobs
used to specify a jobs file now accepts a glob pattern, e.g. wildcards, to specify multiple files. If more than one file matches the pattern, their contents will be concatenated before a job list is built. Useful e.g. if you have multiple jobs files that run on different schedules and you want to clean the snapshot database of URLs/commands no longer monitored (“garbage collect”) using--gc-cache
(e.g.webchanges --jobs *.yaml --gc-cache
).The command line argument
--list
will now list the full path of the jobs file(s).Traceback information for Python Exceptions is suppressed by default. Use the command line argument
--verbose
(or-v
) to display it.
Fixed
Internals
The source distribution is now available on PyPI to support certain packagers like
fpm
.Improved handling and reporting of Playwright browser errors (for
url
jobs withuse_browser: true
).
Version 3.10.2
2022-06-22
⚠ Breaking Changes
Due to a fix to the
html2text
filter (see below), the first time you run this new version you may get a change report with deletions and additions of lines that look identical. This will happen one time only and will prevent future such change reports.
Added
You can now run the command line argument
--test
without specifying a JOB; this will check the config (default:config.yaml
) and job (default:job.yaml
) files for syntax errors.New job directive
compared_versions
allows change detection to be made against multiple saved snapshots; useful for monitoring websites that change between a set of states (e.g. they are running A/B testing).New command line argument
--check-new
to check if a new version of webchanges is available.Error messages for
url
jobs failing with HTTP reason codes of 400 and higher now include any text returned by the website (e.g. “Rate exceeded.”, “upstream request timeout”, etc.). Not implemented in jobs withuse_browser: true
due to limitations in Playwright.
Changed
On Linux and macOS systems, for security reasons we now check that the hooks file and the directory it is located in are owned and writeable by only the user who is running the job (and not by its group or by other users), identical to what we do with the jobs file if any job uses the
shellpipe
filter. An explanatory ImportWarning message will be issued if the permissions are not correct and the import of the hooks module is skipped.The command line argument
-v
or--verbose
now shows reduced verbosity logging output while-vv
(or--verbose --verbose
) shows full verbosity.
Fixed
The
html2text
filter is no longer retaining any spaces found in the HTML after the end of the text on a line, which are not displayed in HTML and therefore a bug in the conversion library used. This was causing a change report to be issued whenever the number of such invisible spaces changed.The
cookies
directive was not adding cookies correctly to the header for jobs withbrowser: true
.The
wait_for_timeout
job directive was not accepting integers (only floats). Reported by Markus Weimar in #39.Improved the usefulness of the message of FileNotFoundError exceptions in filters
execute
andshellpipe
and in reporterrun_command
.Fixed an issue in the legacy parser used by the
xpath
filter which under specific conditions caused more html than expected to be returned.Fixed how we determine if a new version has been released (due to an API change by PyPI).
When adding custom JobBase classes through the hooks file, their configuration file entries are no longer causing warnings to be issued as unrecognized directives.
Internals
Changed bootstrapping logic so that when using
-vv
the logs will include messages relating to the registration of the various classes.Improved execution speed of certain informational command line arguments.
Updated the vendored version of
packaging.version.parse()
to 21.3, released on 2021-11-27.Changed the import logic for the
packaging.version.parse()
function so that ifpackaging
is found to be installed, it will be imported from there instead of from the vendored module.urllib3
is now an explicit dependency due to the refactoring of therequests
package (we previously usedrequests.packages.urllib3
). Has no effect sinceurllib3
is already being installed as a dependency ofrequests
.Added
py.typed
marker file to implement PEP 561.
Version 3.10.1
2022-05-03
Fixed
KeyError: 'indent'
error when usingbeautify
filter. Reported by César de Tassis Filho in #37.
Version 3.10
20220502
⚠ Breaking changes
Pyppeteer has been replaced with Playwright
This change only affects jobs that use_browser: true
(i.e. those running on a browser to run JavaScript). If none
of your jobs have use_browser: true
, there’s nothing new here (and nothing to do).
Must do
If any of your jobs have use_browser: true
, you MUST:
Install the new dependencies:
pip install --upgrade webchanges[use_browser]
(Optional) ensure you have an up-to-date Google Chrome browser:
webchanges --install-chrome
Additionally, if any of your use_browser: true
jobs use the wait_for
directive, it needs to be replaced with
one of:
wait_for_function
if you were specifying a JavaScript function (see here for full function details).wait_for_selector
if you were specifying a selector string or xpath string (see here for full function details), orwait_for_timeout
if you were specifying a timeout; however, this function should only be used for debugging because it “is going to be flaky”, so use one of the other twowait_for
if you can.; full details here.
Optionally, the values of wait_for_function
and wait_for_selector
can now be dicts to take full advantage of all
the features offered by those functions in Playwright (see documentation links above).
If you are using the wait_for_navigation
directive, it is now called wait_for_url
and offers both glob pattern
and regex matching; wait_for_navigation
will act as an alias for now but but a deprecation warning will be issued.
If you are using the chromium_revision
or _beta_use_playwright
directives in your configuration file, you
should delete them to prevent future errors (for now only a deprecation warning is issued).
Finally, if you are using the experimental block_elements
sub-directive, it is not (yet?) implemented in Playwright
and is simply ignored.
Improvements
wait_until
has additional functionality, and now takes one of:
load
(default): Consider operation to be finished when theload
event is fired.domcontentloaded
: Consider operation to be finished when theDOMContentLoaded
event is fired.networkidle
(oldnetworkidle0
andnetworkidle2
map into this): Consider operation to be finished when there are no network connections for at least 500 ms.commit
(new): Consider operation to be finished when network response is received and the document started loading.
New directives
The following directives are new to the Playwright implementation:
referer
: Referer header value (a string). If provided, it will take preference over the referer header value set by theheaders
sub-directive.initialization_url
: A url to navigate to before theurl
(e.g. a home page where some state gets set).initialization_js
: Only used in conjunction withinitialization_url
, a JavaScript to execute after loadinginitialization_url
and before navigating to theurl
(e.g. to emulate a log in). Advanced usageignore_default_args
directive forurl
jobs withuse_browser: true
(using Chrome) to control how Playwright launches Chrome.
In addition, the new --no-headless
command line argument will run the Chrome browser in “headed” mode, i.e.
displaying the website as it loads it, to facilitate with debugging and testing (e.g. webchanges --test 1
--no-headless --test-reporter email
).
See more details of the new directives in the updated documentation.
Freeing space by removing Pyppeteer
You can free up disk space if no other packages use Pyppeteer by, in order:
Removing the downloaded Chromium images by deleting the entire directory (and its subdirectories) shown by running:
python -c "import pathlib; from pyppeteer.chromium_downloader import DOWNLOADS_FOLDER; print(pathlib.Path(DOWNLOADS_FOLDER).parent)"
Uninstalling the Pyppeteer package by running:
pip uninstall pyppeteer
Rationale
The implementation of use_browser: true
jobs (i.e. those running on a browser to run JavaScript) using Pyppeteer
and the Chromium browser it uses has been very problematic, as the library:
is in alpha,
is very slow,
defaults to years-old obsolete versions of Chromium,
can be insecure (e.g. found that TLS certificates were disabled for downloading browsers!),
creates conflicts with imports (e.g. requires obsolete version of websockets),
is poorly documented,
is poorly maintained,
may require OS-specific dependencies that need to be separately installed,
does not work with Arm-based processors,
is prone to crashing,
and outright freezes withe the current version of Python (3.10)!
Pyppeteer’s open issues now exceed 130 and are growing almost daily.
Playwright has none of the issues above, the core dev team apparently is the same who wrote Puppeteer (of which Pyppeteer is a port to Python), and is supported by the deep pockets of Microsoft. The Python version is officially supported and up-to-date, and (in our configuration) uses the latest stable version of Google Chrome out of the box without the contortions of manually having to pick and set revisions.
Playwright has been in beta testing within webchanges for months and has been performing very well (significantly more so than Pyppeteer).
Documentation
Advanced
If you subclassed JobBase in your
hooks.py
file, and are defining aretrieve
method, please note that the number of arguments has been increased to 3 as follows:
def retrieve(self, job_state: JobState, headless: bool = True) -> tuple[Union[str, bytes], str]:
"""Runs job to retrieve the data, and returns data and ETag.
:param job_state: The JobState object, to keep track of the state of the retrieval.
:param headless: For browser-based jobs, whether headless mode should be used.
:returns: The data retrieved and the ETag.
"""
Version 3.9.2
2022-04-13
⚠ Last release using Pyppeteer
This is the last release using Pyppeteer for jobs with
use_browser: true
, which will be replaced by Playwright in release 9.10, forthcoming hopefully in a few weeks. See above for more information on how to prepare – and start using Playwright now!
Added
New
ignore_dh_key_too_small
directive forurl
jobs to overcome thessl.SSLError: [SSL: DH_KEY_TOO_SMALL] dh key too small (_ssl.c:1129)
error.New
indent
sub-directive for thebeautify
filter (requires BeautifulSoup version 4.11.0 or later).New
--dump-history JOB
command line argument to print all saved snapshot history for a job.Playwright only: new``–no-headless`` command line argument to help with debugging and testing (e.g. run
webchanges --test 1 --no-headless
). Not available for Pyppeteer.Extracted Discord reporting from
webhooks
into its owndiscord
reporter to fix it not working and to add embedding functionality as well as color (contributed by Michał Ciołek upstream. Reported by jprokos <https://github.com/jprokos>`__ in #33.)
Fixed
We are no longer rewriting to disk the entire database at every run. Now it’s only rewritten if there are changes (and minimally) and, obviously, when running with the
--gc-cache
or--clean-cache
command line argument. Reported by JsBergbau upstream. Also updated documentation suggesting to run--clean-cache
or--gc-cache
periodically.A ValueError is no longer raised if an unknown directive is found in the configuration file, but a Warning is issued instead. Reported by c0deing in #26.
The
kind
job directive (used for custom job classes inhooks.py
) was undocumented and not fully functioning.For jobs with
use_browser: true
and aswitch
directive containing--window-size
, turn off Playwright’s default fixed viewport (of 1280x720) as it overrides--window-size
.Email headers (“From:”, “To:”, etc.) now have title case per RFC 2076. Reported by fdelapena in #29.
Documentation
Added warnings for Windows users to run Python in UTF-8 mode. Reported by Knut Wannheden in #25.
Added suggestion to run
--clean-cache
or--gc-cache
periodically to compact the database file.Continued improvements.
Internals
Updated licensing file to GitHub naming standards and updated its contents to more clearly state that this software redistributes source code of release 2.21 dated 30 July 2020 of urlwatch (https://github.com/thp/urlwatch/tree/346b25914b0418342ffe2fb0529bed702fddc01f) retaining its license, which is distributed as part of the source code.
Pyppeteer has been removed from the test suite.
Deprecated
webchanges.jobs.ShellError
exception in favor of Python’s nativesubprocess.SubprocessError
one and its subclasses.
Version 3.9.1
2022-01-27
Fixed
Config file directives checker would incorrect reject reports added through
hooks.py
. Reported by Knut Wannheden in #24.
Version 3.9
2022-01-26
Changed
The method
bs4
of filterhtml2text
has a newstrip
sub-directive which is passed to BeautifulSoup, and its default value has changed to false to conform to BeautifulSoup’s default. This gives better output in most cases. To restore the previous non-standard behavior, add thestrip: true
sub-directive to thehtml2text
filter of jobs.Pyppeteer (used for
url
jobs withuse_browser: true
) is now crashing during certain tests with Python 3.7. There will be no new development to fix this as the use of Pyppeteer will soon be deprecated in favor of Playwright. See above to start using Playwright now (highly suggested).
Added
The method
bs4
of filterhtml2text
now accepts the sub-directivesseparator
andstrip
.When using the command line argument
--test-diff
, the output can now be sent to a specific reporter by also specifying the--test-reporter
argument. For example, if running on a machine with a web browser, you can see the HTML version of the last diff(s) from job 1 withwebchanges --test-diff 1 --test-reporter browser
on your local browser.New filter
remove-duplicate-lines
. Contributed by Michael Sverdlin upstream here (with modifications).New filter
csv2text
. Contributed by Michael Sverdlin upstream here (with modifications).The
html
report type has a new job directivemonospace
which sets the output to use a monospace font. This can be useful e.g. for tabular text extracted by thepdf2text
filter.The
command_run
report type has a new environment variableWEBCHANGES_CHANGED_JOBS_JSON
.Opt-in to use Playwright for jobs with
use_browser: true
instead of pyppeteer (see above).
Fixed
During conversion of Markdown to HTML, * Code blocks were not rendered without wrapping and in monospace font; * Spaces immediately after
`
(code block opening) were being dropped.The
email
reporter’ssendmail
sub-directive was not passing thefrom
sub-directive (when specified) to thesendmail
executable as an-f
command line argument. Contributed by Jonas Witschel upstream here (with modifications).HTML characters were not being unescaped when the job name is determined from the <title> tag of the data monitored (if present).
Command line argument
--test-diff
was only showing the last diff instead of all saved ones.The
command_run
report type was not setting variablescount
andjobs
(always 0). Contributed by Brian Rak in #23.
Documentation
Updated the “recipe” for monitoring Facebook public posts.
Improved documentation for filter
pdf2text
.
Internals
Support for Python 3.10 (except for
url
jobs withuse_browser
using pyppeteer since it does not yet support it; use Playwright instead).Improved speed of detection and handling of lines starting with spaces during conversion of Markdown to HTML.
Logging (
--verbose
) now shows thread IDs to help with debugging.
Known issues
Pyppeteer (used for
url
jobs withuse_browser: true
) is now crashing during certain tests with Python 3.7. There will be no new development to fix this as the use of Pyppeteer will soon be deprecated in favor of Playwright. See above to start using Playwright now (highly suggested).
Version 3.8.3
2021-08-29
Fixed
Fixed incorrect handling of timeout when checking if new version has been released.
Internals
DictType hints for configuration.
Version 3.8.2
2021-08-19
⚠ Breaking Changes (dependencies)
Filter
pdf2text
’s dependency Python package pdftotext in its latest version 2.2.0 has changed the way it displays text to no longer try to emulate formatting (columns etc.). This is generally a welcome improvement as changes in formatting no longer trigger change reports, but if you want to return to the previous layout we have added aphysical
sub-directive which you need to set totrue
on the jobs affected. Note that otherwise all yourpdf2text
jobs will report changes (in formatting) the first time they are run after the pdftotext Python package is updated.
Changed
Updated default Chromium executables to revisions equivalent to Chromium 92.0.4515.131 (latest stable release); this fixes unsupported browser error thrown by certain websites. Use
webchanges --chromium-directory
to locate where older revision were downloaded to delete them manually.
Added
Filter
pdf2text
now supports theraw
andphysical
sub-directives, which are passed to the underlying Python package pdftotext (version 2.2.0 or higher).New
--chromium-directory
command line displays the directory where the downloaded Chromium executables are located to facilitate the deletion of older revisions.Footer now indicates if the run was made with a jobs file whose stem name is not the default ‘jobs’, to ease identification when running webchanges with a variety of jobs files.
Fixed
Fixed legacy code handling
--edit-config
command line argument to allow editing of a configuration file with YAML syntax errors (#15 by Markus Weimar).Telegram reporter documentation was missing instructions on how to notify channels (#16 by Sean Tauber).
Internals
Type hints are checked during pre-commit by mypy.
Imports are rearranged during pre-commit by isort.
Now testing all database engines, including redis, and more, adding 4 percentage points of code coverage to 81%.
The name of a FilterBase subclass is always its __kind__ + Filter (e.g. the class for
element-by-id
filter is named ElementByIDFilter and not GetElementByID)
Version 3.8.1
2021-08-03
Fixed
Files in the new _vendored directory are now installed correctly.
Version 3.8
2021-07-31
Added
url
jobs withuse_browser: true
(i.e. using Pyppeteer) now recognizedata
andmethod
directives, enabling e.g. to make aPOST
HTTP request using a browser with JavaScript support.New
tz
key forreport
in the configuration sets the timezone for the diff in reports (useful if running e.g. on a cloud server in a different timezone). See documentation.New
run_command
reporter to execute a command and pass the report text as its input. Suggested by Marcos Alano upstream here.New
remove_repeated
filter to remove repeated lines (similar to Unix’suniq
). Suggested by Michael Sverdlin upstream here.The
user_visible_url
job directive now applies to all type of jobs, includingcommand
ones. Suggested by kongomongo upstream here.The
--delete-snapshot
command line argument now works with Redis database engine (--database-engine redis
). Contributed by Scott MacVicar with pull request #`13 <https://github.com/mborsetti/webchanges/pull/13>`__.The
execute
filter (andshellpipe
) sets more environment variables to allow for more flexibility; see improved documentation (including more examples).Negative job indices are allowed; for example, run
webchanges -1
to only run the last job of your jobs list, orwebchanges --test -2
to test the second to last job of your jobs list.Configuration file is now checked for invalid directives (e.g. typos) when program is run.
Whenever a HTTP client error (4xx) response is received, in
--verbose
mode the content of the response is displayed with the error.If a newer version of webchanges has been released to PyPI, an advisory notice is printed to stdout and added to the report footer (if footer is enabled).
Fixed
The
html2text
filter’s methodstrip_tags
was returning HTML character references (e.g. >, >, >) instead of the corresponding Unicode characters.Fixed a rare case when html report would not correctly reconstruct a clickable link from Markdown for items inside elements in a list.
When using the
--edit
or--edit-config
command line arguments to edit jobs or configuration files, symbolic links are no longer overwritten. Reported by snowman upstream here.
Internals
--verbose
command line argument will now list configuration keys ‘missing’ from the file, keys for which default values have been used.tox
testing can now be run in parallel usingtox --parallel
.Additional testing, adding 3 percentage points of coverage to 78%.
bump2version now follows PEP440 and has new documentation in the file
.bumpversion.txt
(cannot document.bumpversion.cfg
as remarks get deleted at every version bump).Added a vendored version of packaging.version.parse() from Packaging 20.9, released on 2021-02-20, used to check if the version in PyPI is higher than the current one.
Migrated from unmaintained Python package AppDirs to its friendly fork platformdirs, which is maintained and offers more functionality. Unless used by another package, you can uninstall appdirs with
pip uninstall appdirs
.
Version 3.7
2021-06-27
⚠ Breaking Changes
Removed Python 3.6 support to simplify code. Older Python versions are supported for 3 years after being obsoleted by a new major release; as Python 3.7 was released on 27 June 2018, the last date of Python 3.6 support was 26 June 2021
Changed
Improved
telegram
reporter now uses MarkdownV2 and preserves most formatting of HTML sites processed by thehtml2text
filter, e.g. clickable links, bolding, underlining, italics and strikethrough
Added
New filter
execute
to filter the data using an executable without invoking the shell (asshellpipe
does) and therefore exposing to additional security risksNew sub-directive
silent
fortelegram
reporter to receive a notification with no sound (true/false) (default: false)Github Issues templates for bug reports and feature requests
Fixed
Job
headers
stored in the configuration file (config.yaml
) are now merged correctly and case-insensitively with those present in the job (injobs.yaml
). A header in the job replaces a header by the same name if already present in the configuration file, otherwise is added to the ones present in the configuration file.Fixed
TypeError: expected string or bytes-like object
error in cookiejar (called by requests module) caused by somecookies
being read from the jobs YAML file in other formats
Internals
Strengthened security with bandit to catch common security issues
Standardized code formatting with black
Improved pre-commit speed by using local libraries when practical
More improvements to type hinting (moving towards testing with mypy)
Removed module jobs_browser.py (needed only for Python 3.6)
Version 3.6.1
2021-05-28
Reminder
Older Python versions are supported for 3 years after being obsoleted by a new major release. As Python 3.7 was released on 27 June 2018, the codebase will be streamlined by removing support for Python 3.6 on or after 27 June 2021.
Added
Clearer results messages for
--delete-snapshot
command line argument
Fixed
Version 3.6
2021-05-14
Added
Run a subset of jobs by adding their index number(s) as command line arguments. For example, run
webchanges 2 3
to only run jobs #2 and #3 of your jobs list. Runwebchanges --list
to find the job numbers. Suggested by Dan Brown upstream here. API is experimental and may change in the near future.Support for
ftp://
URLs to download a file from an ftp server
Fixed
Sequential job numbering (skip numbering empty jobs). Suggested by Markus Weimar in issue #9.
Readthedocs.io failed to build autodoc API documentation
Error processing jobs with URL/URIs starting with
file:///
Internals
Improvements of errors and DeprecationWarnings during the processing of job directives and their inclusion in tests
Additional testing adding 3 percentage points of coverage to 75%
Temporary database being written during run is now in memory-first (handled by SQLite3) (speed improvement)
Updated algorithm that assigns a job to a subclass based on directives found
Migrated to using the pathlib standard library
Version 3.5.1
2021-05-06
Fixed
Crash in
RuntimeError: dictionary changed size during iteration
with custom headers; updated testing scenariosAutodoc not building API documentation
Version 3.5
2021-05-04
Added
New sub-directives to the
strip
filter:chars
: Set of characters to be removed (default: whitespace)side
: One-sided removal, eitherleft
(leading characters) orright
(trailing characters)splitlines
: Whether to apply the filter on each line of text (true/false) (default:false
, i.e. apply to the entire data)
--delete-snapshot
command line argument: Removes the latest saved snapshot of a job from the database; useful if a change in a website (e.g. layout) requires modifying filters as invalid snapshot can be deleted and webchanges rerun to create a truthful diff--log-level
command line argument to control the amount of logging displayed by the-v
argumentignore_connection_errors
,ignore_timeout_errors
,ignore_too_many_redirects
andignore_http_error_codes
directives now work withurl
jobs havinguse_browser: true
(i.e. using Pyppeteer when running in Python 3.7 or higher
Changed
Diff-filter
additions_only
will no longer report additions that consist exclusively of added empty lines (issue #6, contributed by Fedora7)Diff-filter
deletions_only
will no longer report deletions that consist exclusively of deleted empty linesThe job’s index number is included in error messages for clarity
--smtp-password
now checks that the credentials work with the SMTP server (i.e. logs in)
Fixed
First run after install was not creating new files correctly (inherited from urlwatch); now webchanges creates the default directory, config and/or jobs files if not found when running (issue #8, contributed by rtfgvb01)
test-diff
command line argument was showing historical diffs in wrong order; now showing most recent firstAn error is now raised when a
url
job withuse_browser: true
returns no data due to an HTTP error (e.g. proxy_authentication_required)Jobs were included in email subject line even if there was nothing to report after filtering with
additions_only
ordeletions_only
hexdump
filter now correctly formats lines with less than 16 bytessha1sum
andhexdump
filters now accept data that is bytes (not just text)An error is now raised when a legacy
minidb
database is found but cannot be converted because theminidb
package is not installedRemoved extra unneeded file from being installed
Wrong ETag was being captured when a URL redirection took place
Internals
url
jobs usinguse_browser: true
(i.e. using Pyppeteer) now capture and save the ETagSnapshot timestamps are more accurate (reflect when the job was launched)
Each job now has a run-specific unique index_number, which is assigned sequentially when loading jobs, to use in errors and logs for clarity
Improvements in the function chunking text into numbered lines, which used by certain reporters (e.g. Telegram)
More tests, increasing code coverage by an additional 7 percentage points to 72% (although keyring testing had to be dropped due to issues with GitHub Actions)
Additional cleanup of code and documentation
Known issues
url
jobs withuse_browser: true
(i.e. using Pyppeteer) will at times display the below error message in stdout (terminal console). This does not affect webchanges as all data is downloaded, and hopefully it will be fixed in the future (see Pyppeteer issue #225):future: <Future finished exception=NetworkError('Protocol error Target.sendMessageToTarget: Target closed.')>
pyppeteer.errors.NetworkError: Protocol error Target.sendMessageToTarget: Target closed.
Future exception was never retrieved
Version 3.4.1
2021-04-17
Internals
Temporary database (
sqlite3
database engine) is copied to permanent one exclusively using SQL code instead of partially using a Python loop
Known issues
url
jobs withuse_browser: true
(i.e. using Pyppeteer) will at times display the below error message in stdout (terminal console). This does not affect webchanges as all data is downloaded, and hopefully it will be fixed in the future (see Pyppeteer issue #225):future: <Future finished exception=NetworkError('Protocol error Target.sendMessageToTarget: Target closed.')>
pyppeteer.errors.NetworkError: Protocol error Target.sendMessageToTarget: Target closed.
Future exception was never retrieved
Version 3.4
2021-04-12
⚠ Breaking Changes
Fixed the database from growing unbounded to infinity. Fix only works when running in Python 3.7 or higher and using the new, default,
sqlite3
database engine. In this scenario only the latest 4 snapshots are kept, and older ones are purged after every run; the number is selectable with the new--max-snapshots
command line argument. To keep the existing grow-to-infinity behavior, run webchanges with--max-snapshots 0
.
Added
--max-snapshots
command line argument sets the number of snapshots to keep stored in the database; defaults to 4. If set to 0 an unlimited number of snapshots will be kept. Only applies to Python 3.7 or higher and only works if the defaultsqlite3
database is being used.no_redirects
job directive (forurl
jobs) to disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection (true/false). Suggested by snowman upstream here.Reporter
prowl
for the Prowl push notification client for iOS (only). Contributed by nitz upstream in PR 633.Filter
jq
to parse, transform, and extract ASCII JSON data. Contributed by robgmills upstream in PR 626.Filter
pretty-xml
as an alternative toformat-xml
(backwards-compatible with urlwatch 2.28)Alert user when the jobs file contains unrecognized directives (e.g. typo)
Changed
Job name is truncated to 60 characters when derived from the title of a page (no directive
name
is found in aurl
job)--test-diff
command line argument displays all saved snapshots (no longer limited to 10)
Fixed
Diff (change) data is no longer lost if webchanges is interrupted mid-execution or encounters an error in reporting: the permanent database is updated only at the very end (after reports are dispatched)
use_browser: false
was not being interpreted correctlyJobs file (e.g.
jobs.yaml
) is now loaded only once per run
Internals
Database
sqlite3
engine now saves new snapshots to a temporary database, which is copied over to the permanent one at execution end (i.e. database.close())Upgraded SMTP email message internals to use Python’s email.message.EmailMessage instead of
email.mime
(obsolete)Pre-commit documentation linting using
doc8
Added logging to
sqlite3
database engineAdditional testing increasing overall code coverage by an additional 4 percentage points to 65%
Renamed legacy module browser.py to jobs_browser.py for clarity
Renamed class JobsYaml to YamlJobsStorage for consistency and clarity
Known issues
url
jobs withuse_browser: true
(i.e. using Pyppeteer) will at times display the below error message in stdout (terminal console). This does not affect webchanges as all data is downloaded, and hopefully it will be fixed in the future (see Pyppeteer issue #225):future: <Future finished exception=NetworkError('Protocol error Target.sendMessageToTarget: Target closed.')>
pyppeteer.errors.NetworkError: Protocol error Target.sendMessageToTarget: Target closed.
Future exception was never retrieved
Version 3.2.6
2021-03-21
Changed
Tweaked colors (esp. green) of HTML reporter to work with Dark Mode
Restored API documentation using Sphinx’s autodoc (removed in 3.2.4 as it was not building correctly)
Internal
Replaced custom atomic_rename function with built-in os.replace() (new in Python 3.3) that does the same thing
Added type hinting to the entire code
Added new tests, increasing coverage to 61%
GitHub Actions CI now runs faster as it’s set to cache required packages from prior runs
Known issues
Discovered that upstream (legacy) urlwatch 2.22 code has the database growing to infinity; run
webchanges --clean-cache
periodically to discard old snapshots until this is addressed in a future releaseurl
jobs withuse_browser: true
(i.e. using Pyppeteer) will at times display the below error message in stdout (terminal console). This does not affect webchanges as all data is downloaded, and hopefully it will be fixed in the future (see Pyppeteer issue #225):future: <Future finished exception=NetworkError('Protocol error Target.sendMessageToTarget: Target closed.')>
pyppeteer.errors.NetworkError: Protocol error Target.sendMessageToTarget: Target closed.
Future exception was never retrieved
Version 3.2
2021-03-08
Added
Job directive
note
: adds a freetext note appearing in the report after the job headerJob directive
wait_for_navigation
forurl
jobs withuse_browser: true
(i.e. using Pyppeteer): wait for navigation to reach a URL starting with the specified one before extracting content. Useful when the URL redirects elsewhere before displaying content you’re interested in and Pyppeteer would capture the intermediate page.command line argument
--rollback-cache TIMESTAMP
: rollback the snapshot database to a previous time, useful when you miss notifications; see here. Does not work with database engineminidb
ortextfiles
.command line argument
--cache-engine ENGINE
: specifyminidb
to continue using the database structure used in prior versions and urlwatch 2. New defaultsqlite3
creates a smaller database due to data compression with msgpack and offers additional features; migration from old minidb database is done automatically and the old database preserved for manual deletion.Job directive
block_elements
forurl
jobs withuse_browser: true
(i.e. using Pyppeteer) (⚠ ignored in Python < 3.7) (experimental feature): specify resource types (elements) to skip requesting (downloading) in order to speed up retrieval of the content; only resource types supported by Chromium are allowed (typical list includesstylesheet
,font
,image
, andmedia
). ⚠ On certain sites it seems to totally freeze execution; test before use.
Changes
A new, more efficient indexed database is used and only the most recent saved snapshot is migrated the first time you run this version. This has no effect on the ordinary use of the program other than reducing the number of historical results from
--test-diffs
util more snapshots are captured. To continue using the legacy database format, launch withdatabase-engine minidb
and ensure that the packageminidb
is installed.If any jobs have
use_browser: true
(i.e. are using Pyppeteer), the maximum number of concurrent threads is set to the number of available CPUs instead of the default to avoid instability due to Pyppeteer’s high usage of CPUDefault configuration now specifies the use of Chromium revisions equivalent to Chrome 89.0.4389.72 for
url
jobs withuse_browser: true
(i.e. using Pyppeteer) to increase stability. Note: if you already have a configuration file and want to upgrade to this version, see here. The Chromium revisions used now are ‘linux’: 843831, ‘win64’: 843846, ‘win32’: 843832, and ‘mac’: 843846.Temporarily removed code autodoc from the documentation as it was not building correctly
Fixed
Specifying
chromium_revision
had no effect (bug introduced in version 3.1.0)Improved the text of the error message when
jobs.yaml
has a mistake in the job parameters
Internals
Removed dependency on
minidb
package and are now directly using Python’s built-insqlite3
, allowing for better control and increased functionalityDatabase is now smaller due to data compression with msgpack
Migration from an old schema database is automatic and the last snapshot for each job will be migrated to the new one, preserving the old database file for manual deletion
No longer backing up database to *.bak now that it can be rolled back
New command line argument
--database-engine
allows selecting engine and acceptssqlite3
(default),minidb
(legacy compatibility, requires package by the same name) andtextfiles
(creates a text file of the latest snapshot for each job)When running in Python 3.7 or higher, jobs with
use_browser: true
(i.e. using Pyppeteer) are a bit more reliable as they are now launched usingasyncio.run()
, and therefore Python takes care of managing the asyncio event loop, finalizing asynchronous generators, and closing the threadpool, tasks that previously were handled by custom code11 percentage point increase in code testing coverage, now also testing jobs that retrieve content from the internet and (for Python 3.7 and up) use Pyppeteer
Known issues
url
jobs withuse_browser: true
(i.e. using Pyppeteer) will at times display the below error message in stdout (terminal console). This does not affect webchanges as all data is downloaded, and hopefully it will be fixed in the future (see Pyppeteer issue #225):future: <Future finished exception=NetworkError('Protocol error Target.sendMessageToTarget: Target closed.')>
pyppeteer.errors.NetworkError: Protocol error Target.sendMessageToTarget: Target closed.
Future exception was never retrieved
Version 3.1.1
2021-02-08
Fixed
Documentation was failing to build at https://webchanges.readthedocs.io/
Version 3.1
2021-02-07
Added
Can specify different values of
chromium_revision
(used in jobs withuse_browser" true
, i.e. using Pyppeteer) based on OS by specifying keyslinux
,mac
,win32
and/orwin64
If
shellpipe
filter returns an error it now shows the error textShow deprecation warning if running on the lowest Python version supported (mentioning the 3 years support from the release date of the next major version)
Fixed
Internals
First PyPI release with new continuous integration (CI) and continuous delivery (CD) pipeline based on bump2version, git tags, and GitHub Actions
Moved continuous integration (CI) testing from Travis to GitHub Actions
Moved linting (flake8) and documentation build testing from pytest to the pre-commit framework
Added automated pre-commit local testing using tox
Added continuous integration (CI) testing on macOS platform
Version 3.0.3
2020-12-21
⚠ Breaking Changes
Compatibility with urlwatch 2.22, including the ⚠ breaking change of removing the ability to write custom filters that do not take a subfilter as argument (see here upstream)
Inadvertently released as a PATCH instead of a MAJOR release as it should have been under Semantic Versioning rules given the incompatible API change upstream (see discussion here upstream)
Added
Changed
The Markdown reporter now supports limiting the report length via the
max_length
parameter of thesubmit
method. The length limiting logic is smart in the sense that it will try trimming the details first, followed by omitting them completely, followed by omitting the summary. If a part of the report is omitted, a note about this is added to the report. (# 572 upstream by Denis Kasak)
Fixed
Make imports thread-safe. This might increase startup times a bit, as dependencies are imported on boot instead of when first used, but importing in Python is not (yet) thread-safe, so we cannot import new modules from the parallel worker threads reliably (# 559 upstream by Scott MacVicar)
Write Unicode-compatible YAML files
Internals
Upgraded to use of subprocess.run
Version 3.0.2
2020-12-06
Fixed
Logic error in reading
EDITOR
environment variable (# 1 contributed by MazdaFunSun)
Version 3.0.1
2020-12-05
Added
New
format-json
sub-directivesort_keys
sets whether JSON dictionaries should be sorted (defaults to false)New
markdown
directive forwebhook
reporter for services such as Mattermost, which expects Markdown-formatted textCode autodoc, highlighting just how badly the code needs documentation!
Output from
diff_tool: wdiff
is colorized in html reportsReports now show date/time of diffs when using an external
diff_tool
Changed and deprecated
Reporter
slack
has been renamed towebhook
as it works with any webhook-enabled service such as Discord. Updated documentation with Discord example. The nameslack
, while deprecated and in line to be removed in a future release, is still recognized.Improvements in report colorization code
Fixed
Fixed
format-json
filter from unexpectedly reordering contents of dictionariesFixed documentation for
additions_only
anddeletions_only
to specify that value of true is requiredNo longer creating a config directory if command line contains both
--config
and--urls
. Allow running on read-only systems (e.g. using redis or a database cache residing on a writeable volume)Deprecation warnings now use the
DeprecationWarning
category, which is always printedAll filters take a subfilter (# 600 upstream by Martin Monperrus)
Version 3.0
2020-11-12
Milestone
Initial release of webchanges, based on reworking of code from urlwatch 2.21 dated 30 July 2020.
Added
Relative to urlwatch 2.21:
If no job
name
is provided, the title of an HTML page will be used for a job name in reportsThe Python
html2text
package (used by thehtml2text
filter, previously known aspyhtml2text
) is now initialized with the following purpose-optimized non-default options: unicode_snob = True, body_width = 0, single_line_break = True, and ignore_images = TrueThe output from
html2text
filter is reconstructed into HTML (for html reports), preserving basic formatting such as bolding, italics, underlining, list bullets, etc. as well as, most importantly, rebuilding clickable linksHTML formatting uses color (green or red) and strikethrough to mark added and deleted lines
HTML formatting is radically more legible and useful, including long lines wrapping around
HTML reports are now rendered correctly by email clients who override stylesheets (e.g. Gmail)
Filter
format-xml
reformats (pretty-prints) XMLwebchanges --errors
will run all jobs and list all errors and empty responses (after filtering)Browser jobs now recognize
cookies
,headers
,http_proxy
,https_proxy
, andtimeout
sub-directivesThe revision number of Chromium browser to use can be selected with
chromium_revision
Can set the user directory for the Chromium browser with
user_data_dir
Chromium can be directed to ignore HTTPs errors with
ignore_https_errors
Chromium can be directed as to when to consider a page loaded with
wait_until
Additional command line arguments can be passed to Chromium with
switches
New
browser
reporter to display HTML-formatted report on a local browser when monitoring only new content)New
additions_only
directive to report only added lines (useful when monitoring only new content)New
deletions_only
directive to report only deleted linesNew
contextlines
directive to set the number of context lines in the unified diffSupport for Python Version 3.9
Backward compatibility with urlwatch 2.21 (except running on Python 3.5 or using
lynx
, which is replaced by the built-inhtml2text
filter)
Changed and deprecated
Relative to urlwatch 2.21:
Navigation by full browser is now accomplished by specifying the
url
and adding theuse_browser: true
directive. Thenavigate
directive has been deprecated for clarity and will trigger a warning; it will be removed in a future releaseThe name of the default program configuration file has been changed to
config.yaml
; if at program launchurlwatch.yaml
is found and noconfig.yaml
exists, it is copied over for backward-compatibility.In Windows, the location of config files has been moved to
%USERPROFILE%\Documents\webchanges
where they can be more easily edited (they are indexed there) and backed upThe
html2text
filter defaults to using the Pythonhtml2text
package (with optimized defaults) instead ofre
keyring
Python package is no longer installed by defaulthtml2text
andmarkdown2
Python packages are installed by defaultInstallation of Python packages required by a feature is now made easier with pip extras (e.g.
pip install -U webchanges[ocr,pdf2text]
)The name of the default job’s configuration file has been changed to
jobs.yaml
; if at program launchurls.yaml
is found and nojobs.yaml
exists, it is copied over for backward-compatibilityThe
html2text
filter’sre
method has been renamedstrip_tags
, which is deprecated and will trigger a warningThe
grep
filter has been renamedkeep_lines_containing
, which is deprecated and will trigger a warning; it will be removed in a future releaseThe
grepi
filter has been renameddelete_lines_containing
, which is deprecated and will trigger a warning; it will be removed in a future releaseBoth the
keep_lines_containing
anddelete_lines_containing
accepttext
(default) in addition tore
(regular expressions)--test
command line argument is used to test a job (formerly--test-filter
, deprecated and will be removed in a future release)--test-diff
command line argument is used to test a jobs’ diff (formerly--test-diff-filter
, deprecated and will be removed in a future release)-V
command line argument added as an alias to--version
If a filename for
--jobs
,--config
or--hooks
is supplied without a path and the file is not present in the current directory, webchanges now looks for it in the default configuration directoryIf a filename for
--jobs
or--config
is supplied without a ‘.yaml’ suffix, webchanges now looks for one with such a suffixIn Windows,
--edit
defaults to using built-in notepad.exe if %EDITOR% or %VISUAL% are not setWhen using
--job
command line argument, if there’s no file by that name in the specified directory will look in the default one before giving up.The use of the
kind
directive injobs.yaml
configuration files has been deprecated (but is, for now, still used internally); it will be removed in a future releaseThe
slack
webhook reporter allows the setting of maximum report length (for, e.g., usage with Discord) using themax_message_length
sub-directiveLegacy
lib/hooks.py
file is no longer supported;hooks.py
needs to be in the same directory as the configuration files.The database (cache) file is backed up at every run to *.bak
The mix of default and optional dependencies has been updated (see documentation) to enable “Just works”
Dependencies are now specified as PyPI extras to simplify their installation
Changed timing from datetime to timeit.default_timer
Upgraded concurrent execution loop to concurrent.futures.ThreadPoolExecutor.map
Reports’ elapsed time now always has at least 2 significant digits
Expanded (only slightly) testing
Using flake8 to check PEP-8 compliance and more
Using coverage to check unit testing coverage
Upgraded Travis CI to Python Version 3.9 from Version 3.9-dev and cleaned up pip installs
Removed
Relative to urlwatch 2.21:
The
html2text
filter’slynx
method is no longer supported; usehtml2text
insteadPython 3.5 (obsoleted by 3.6 on December 23, 2016) is no longer supported
Fixed
Relative to urlwatch 2.21:
The
html2text
filter’shtml2text
method defaults to Unicode handlingHTML href links ending with spaces are no longer broken by
xpath
replacing spaces with%20
Initial config file no longer has directives sorted alphabetically, but are saved logically (e.g. ‘enabled’ is always the first sub-directive)
The presence of the
data
directive in a job would force the method to POST preventing PUTs
Security
Relative to urlwatch 2.21:
None
Documentation changes
Relative to urlwatch 2.21:
Complete rewrite of the documentation
Known bugs
None