Examples
Running webchanges
Checking different sources at different intervals
You can divide your jobs into multiple job lists depending on how often you want to check. For example, you can have
a daily.yaml
job list for daily jobs, and a weekly.yaml
for weekly ones. You then set up the scheduler to
run webchanges, defining which job list to use, at different intervals. For example in Linux/macOS using
cron (crontab -e
):
0 0 * * * webchanges --jobs daily.yaml
0 0 0 * * webchanges --jobs weekly # alias for weekly.yaml (if 'weekly' isn't found)
Alternatively, you can ref:select of a subset of jobs<job_subset> to run only a few jobs. For example, if you want to run all jobs every day at midnight and in addition you want to run jobs 1 and 4 also at noon, you can do (in Linux/macOS using crontab):
0 0 * * * webchanges
0 12 * * * webchanges 1 4
Getting reports via different channels for different sources
Job-specific alerts (reports) is not a functionality of webchanges, but you can work around this by creating
multiple configurations and job lists, and run webchanges multiple times specifying --jobs
and
--config
.
For example, you can create two configuration files, e.g. config-slack.yaml
and config-email.yaml
(the
first set for slack reporting and the second for email reporting) and two job lists, e.g. slack.yaml
and
email.yaml
(the first containing jobs you want to be notified of via slack, the second for jobs you want to be
notified of via email). You can then run webchanges similarly to the below example (taken from Linux/macOS
crontab -e
):
00 00 * * * webchanges --jobs slack.yaml --config config-slack.yaml
05 00 * * * webchanges --jobs email --config config-email # .yaml not necessary if no conflict
Comparing with several latest snapshots
If a webpage frequently changes between several known stable states (e.g. A/B layout testing), it may be desirable to have changes reported only if the content (webpage, URI, command result, etc.) changes into a new unknown state. You can use compared_versions to do this.
url: https://example.com/
compared_versions: 3
In this example, changes are only reported if the webpage becomes different from the latest three distinct states. The differences are shown relative to the closest match.
Receiving a report for every run
If you are watching pages that change seldomly, but you still want to be notified every time webchanges
runs to know it’s still working, you can add a job that monitors the output of the date
command, for example:
name: Run date
command: date
Since the output of date
changes every second, this job should produce a report every time webchanges is
run.
Watching specific sites
Facebook posts
If you want to be notified of new posts on a public Facebook page, you can use the following job pattern; just
replace USERNAME
with the name of the user (which can be found by navigating to user’s page on your browser):
name: USERNAME's Facebook posts
url: https://m.facebook.com/USERNAME/pages/permalink/?view_type=tab_posts
filter:
- xpath: //div[@data-ft='{"tn":"*s"}']
- html2text: strip_tags
additions_only: true
Facebook events
If you want to be notified of new events on a public Facebook page, you can use the following job pattern; just
replace USERNAME
with the name of the user (which can be found by navigating to the user’s page on your browser):
name: USERNAME's Facebook events
url: https://m.facebook.com/USERNAME/pages/permalink/?view_type=tab_events
filter:
- css:
selector: div#objects_container
exclude: 'div.x, #m_more_friends_who_like_this, img'
- re.sub:
pattern: '(/events/\d*)[^"]*'
repl: '\1'
- html2text:
additions_only: true
GitHub releases
This is an example how to anonymously watch the GitHub “releases” page of a project to be notified of new releases (i.e. the latest/top-most tag):
url: https://github.com/git/git/releases
filter:
- xpath:
path: //*[@class="Link--primary"]
maxitems: 1
- html2text:
If you only want to monitor the latest release and not include pre-releases:
url: https://github.com/Novik/ruTorrent/releases/latest
filter:
- xpath: //*[@class="ml-1"]
- html2text:
Note that the easiest way to be notified if you have a GitHub account is to simply “watch” the project and subscribe to email notifications (see here.
Resolving typical issues
Below are some job configurations that have helped to solve typical issues.
Setting default headers
Many websites expect to receive headers that look like they came from a browser, and will fail if they don’t. It is
possible to set default headers for HTTP requests by entering them in config.yaml
under job_defaults
. If a
headers
key is also found in a job, for that job the headers will be merged (case-insensitively) one by one with
any conflict resolved in favor of the header specified in the job.
Below are headers extracted from Google Chrome 120.0.6099.225 running in incognito mode in the YAML format of the
default for the config.yaml
file (note that the header “Accept-Encoding” is set by the ref:Python HTTP client
library <http_client> based on the encoding protocols it supports):
job_defaults:
url:
headers:
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Language: en-US,en;q=0.9
DNT: 1
Sec-CH-UA: '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"'
Sec-CH-UA-Mobile: ?0
Sec-CH-UA-Platform: '"Windows"'
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; 64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
Changing the default timeout
By default, url
jobs timeout after 60 seconds. If you want a different timeout period, use the timeout
directive
to specify it in number of seconds, or set it to 0 to never timeout.
url: https://example.com/
timeout: 300
Ignoring TLS/SSL errors
Setting ssl_no_verify
to true may be useful during local development or testing.
When set to true, webchanges requests will accept any TLS certificate presented by the server, and will ignore hostname mismatches and/or expired certificates. Because this will make your application vulnerable to man-in-the-middle (MitM) attacks, never use it outside of local development or testing.
url: https://example.com/
ssl_no_verify: true
Ignoring HTTP connection errors
In some cases, it might be useful to ignore (temporary) network errors to avoid notifications being sent. While
you can set the errors
directive of the display
section to false
in the configuration file to suppress global reporting of all jobs that end up with any type of error, to ignore
network errors for specific jobs only you can use the ignore_connection_errors
directive in the job. For
connection errors during local development or testing with an invalid TLS certificate use the ssl_no_verify
directive above instead.
url: https://example.com/
ignore_connection_errors: true
Similarly, you might want to ignore some (temporary) HTTP errors on the server side by using
ignore_http_error_codes
:
url: https://example.com/
ignore_http_error_codes: 408, 429, 500, 502, 503, 504
or ignore all HTTP errors if you like by using ignore_http_error_codes:
url: https://example.com/
ignore_http_error_codes: 4xx, 5xx
Receive short notifications only containing the URL
If you only want to be alerted that there is a change without any information about the change itself, you can use a a reporter that uses text and set report -> text -> details to false to avoid details being sent; you can also set report -> text -> footer to false to make the report even shorter.
Don’t forget that you can also use the directive user_visible_url to customize the URL that is reported visible (e.g. watching a REST API endpoint, but wanting to show the “web-visible” URL in the report).
If you want the alert for one job only (of many), consider using the sha1sum filter instead.
For example, for email set these in the configuration file (webchanges --edit-config
):
report:
# ...
text:
details: false
footer: false
# ...
email:
html: false
# ...