Under the hood

Parallelism

All jobs are run in parallel threads for optimum speed.

Jobs that don’t have use_browser: true are run first using the default maximum number of workers set by Python’s concurrent.futures.ThreadPoolExecutor, currently the number of processors on the machine multiplied by 5 (Python 3.10).

Jobs that have use_browser: true (and therefore require the Google Chrome browser to run) will be run next using a maximum number of workers that is the lower of the number of processors on the machine and, if known, the available physical memory (as reported by the Python package psutil) divided by 200 MB.

You can see the number of threads employed on your machine by running webchanges with --verbose and searching for the DEBUG log messages having the text max_workers.

Use of conditional requests (timestamp, ETag)

Once a website (url) has been checked once, any subsequent checks will be made as a conditional request by setting the HTTP headers If-Modified-Since and, if an ETag was returned, the If-None-Match. This is also true for jobs where use_browser is set to true (i.e. using Google Chrome).

The conditional request is an optimization to speed up execution: if there are no changes to the resource, the server doesn’t need to send it but instead just sends a 304 HTTP response code which webchanges understands.

In the extremely rare cases where the web server does not correctly process conditional requests (e.g. Google Flights), it can be turned off with the no_conditional_request: true directive.

Details

With the If-Modified-Since request HTTP header the server sends back the requested resource, with a 200 status, only if it has been last modified after the given date. If the resource has not been modified since, the response is a 304 without any body; the Last-Modified response header of a previous request contains the date of last modification.

With the If-None-Match request HTTP header, for GET and HEAD methods, the server will return the requested resource, with a 200 status, only if it doesn’t have an ETag matching the given ones. For other methods, the request will be processed only if the eventually existing resource’s ETag doesn’t match any of the values listed. When the condition fails for GET and HEAD methods, then the server must return HTTP status code 304 (Not Modified). The comparison with the stored ETag uses the weak comparison algorithm, meaning two files are considered identical if the content is equivalent — they don’t have to be identical byte by byte. For example, two pages that differ by their creation date in the footer would still be considered identical. When used in combination with If-Modified-Since, If-None-Match has precedence (if the server supports it).