Differs

Overview

A differ is applied to the filtered data if it has changed from the previous run(s). A differ summarizes the changes in the data and produces the content of the report sent to you. The output of the differ can be further filtered using any of the filters listed in Filters (see Filtering the diff below).

At the moment, the following differs are available:

unified: (default) Compares data line-by-line, showing changed lines in a “unified format”;

command: Executes an outside command that acts as a differ;

deepdiff: Compares structured data (JSON or XML) element-by-element;

table: A Python version of the unified differ where the changes are displayed as an HTML table;

wdiff: Compares data word-by-word, highlighting changed words and maintaining line breaks.

In addition, the following BETA differs are available:

ai_google: Detects and summarizes changes using Generative AI (free API key required);

image: Detects changes in an image and displays them as overlay over a grayscale version of the old image.

A differ is specified using the job directive differ. To select a differ with its default directive values, simply use the name of the differ as the directive’s value:

url: https://example.net/unified.html
differ: unified  # this entire line can be omitted as it's the default differ

url: https://example.net/deepdiff.html
differ: deepdiff  # use the deepdiff differ with its default values

Otherwise, the differ directive is a dictionary, and the name key contains the name of the differ:

url: https://example.net/unified_no_range.html
differ:
  name: unified
  range_info: false

`unified` (default)

This is the default differ used when the differ job directive is not specified (except, for backward compatibility, when in the configration file the html report has the deprecated diff key set to table).

It does a line-by-line comparison of the data and reports lines that have been added (+), deleted (-), or changed. Changed lines are displayed twice: once marked as “deleted” (-) representing the old content, and once as “added” (+) representing the new content. Results are displayed in the unified format (the “unified diff”).

For HTML reports, webchanges colorizes the unified diff for easier legibility.

Examples

Using default settings:

url: https://example.net/unified.html
differ: unified  # this can also be omitted as it's the default

Range information lines

Range information lines (those starting with @@) can be suppressed using range_info: false:

url: https://example.net/unified_no_range.html
differ:
  name: unified
  range_info: false

Context lines

The context_lines directive causes a unified diff to have a set number of context lines that might be different from Python’s default of 3 (or 0 if the job contains additions_only: true or deletions_only: true).

Example using 5 context lines:

url: https://example.com/#lots_of_contextlines
differ:
  name: unified
  context_lines: 5

Output:

--------------------------------------------------------------------------- CHANGED: https://example.com/#lots_of_contextlines --------------------------------------------------------------------------- --- @ Thu, 01 Oct 2020 00:00:00 +0000 ... @ Thu, 01 Oct 2020 01:00:00 +0000 @@ -1,15 +1,15 @@ This is line 10 This is line 11 This is line 12 This is line 13 This is line 14 -This is line 15 +This is line fifteen This is line 16 This is line 17 This is line 18 This is line 19 This is line 20

The same example using the default number of context lines, i.e. 3:

url: https://example.com/#default_contextlines

Output:

--------------------------------------------------------------------------- CHANGED: https://example.com/#default_contextlines --------------------------------------------------------------------------- --- @ Thu, 01 Oct 2020 00:00:00 +0000 ... @ Thu, 01 Oct 2020 01:00:00 +0000 @@ -1,15 +1,15 @@ This is line 12 This is line 13 This is line 14 -This is line 15 +This is line fifteen This is line 16 This is line 17 This is line 18

Optional directives

context_lines (int): The number of lines on each side surrounding changes to include in the report (default: 3).
range_info (true/false): Whether to include line range information lines (those starting with @) (default: true).

Changed in version 3.21: Became a standalone differ. Added the range_info and context_line directives, the latter replacing the job directive contextlines (added in version 3.0).

`ai_google`

Added in version 3.21: as BETA.

Added in version 3.33.

Prefaces a unified diff with a textual summary of changes generated by any of Google’s Gemini Generative AI models called via an API call. This can be free of charge for most developers.

Gemini models are the first widely available models with a large context window (currently 1 million tokens), which allow to analyze changes in long documents (of 350,000, or about 700 single-spaced pages) such as terms and conditions, privacy policies, etc. that other models can’t handle. For clarity, these models can handle up to approximately 700,000 words, but to do a comparison between two versions we need approximately half of this for the old text and the other half for the new text. They are also offered with a free tier.

Important

Requires a system environment variable GEMINI_API_KEY containing the Google Cloud AI Studio API Key, which you obtain here and which itself requires a Google Cloud account.

Warning

Gemini offers free use for developers for feedback and testing only (no production use; your data is used to train their models). If your use qualifies, you must create a free of charge plan, which you obtain by creating an API key from a (separate) Google Cloud project with billing disabled. Otherwise we highly recommend that you set up a budget with threshold notification enabled to avoid the potential for surprises!

By default, we specify the Gemini 2.0 Flash model (gemini-2.0-flash) since it’s the last released model that allows 1,000,000 tokens per minute on the free tier, and (if you are on a paid plan) is cheaper than Gemini 2.5 or Gemini 3. You can find the full list of models here. To evaluate responses between models side-by-side, use the tool here.

Tip

If you can fit your input within the 250,000 tokens per minute rate limit of the free tier we have been having great success with the Gemini 2.5 Flash (gemini-2.5-flash) and Gemini 3 Flash Preview (gemini-3-flash-preview) models which we highly recommend trying.

You can change the default model in the configuration file as follows:

differ_defaults:
  _note: Default directives that are applied to individual differs.
  unified: {}
  ai_google:
    model: gemini-2.5-pro
  command: {}
  deepdiff: {}
  image: {}
  table: {}
  wdiff: {}

Note

These models work with 38 languages and are available in over 220 countries and territories.

Warning

Generative AI can “hallucinate” (make things up), so always double-check the AI-generated summary with the accompanying unified diff.

The default prompt asks the Generative AI model make the comparison (see below for default prompt). However, to save tokens and time (and potentially $), you might want the model to only summarize the differences from a unified diff by using a prompt similar to the one here:

differ:
  name: ai_google
  prompt: >-
    Describe the differences between the two versions of text as summarized in this unified diff.
    Only highlight the most significant modifications.\n\n{unified_diff}

More information about writing input prompts for these models can be found here. You may also use the “Help me write” function in AI Vertex Vertex Prompt or ask the model itself (in AI Studio) to suggest prompts that are appropriate to your use case.

Example

Using the default prompt, a summary is prefaced to a unified diff:

The new version simply updates the time from 00:00:00 UTC to 01:00:00 UTC. This represents a difference of 1 hour.

--- @ Thu, 01 Oct 2020 00:00:00 +0000

+++ @ Thu, 01 Oct 2020 01:00:00 +0000

@@ -1 +1 @@

Sat Oct 1 00:00:00 UTC 2020

Sat Oct 1 01:00:00 UTC 2020

---
Summary by Google Generative AI model gemini-2.0-flash

Tip

You can do “dry-runs” of this (or any) differ on an existing job by editing the differ in the job file and running e.g. webchanges --test-differ 1 --test-reporter browser. Don’t forget to revert your job file if you don’t like the new outcome!

Mandatory environment variable

GEMINI_API_KEY: Must contain your Google Cloud AI Studio API Key.

Optional directives

model (str): A model code (default: gemini-2.0-flash).
system_instructions (str): Optional tone and style instructions for the model (default: see below).
prompt (str): The prompt sent to the model; the strings {unified_diff}, {unified_diff_new}, {old_text} and {new_text} will be replaced by the respective content; Any \n in the prompt will be replaced by a newline (default: see below).
timeout (float): The number of seconds before timing out the API call (default: 300).

Data to diff

additions_only (bool): provide a summary of only the new text (i.e. the lines added per unified diff).
prompt_ud_context_lines (int): if {unified_diff} is present in the prompt, the number of context lines in the unified diff sent to the model (default: 999). If the resulting model prompt becomes approximately too big for the model to handle, the unified diff will be recalculated with the default number of context lines (3). Note that this unified diff is a different one than the diff included in the report itself.

Model tuning

temperature (float between 0.0 and 2.0): The model’s Temperature parameter, which controls randomness; higher values increase diversity (default: 0.0).
thinking_budget (int): Only for Gemini 2.5: The model’s thinking budget; see model documentation (default: unset, effect varies by model as per documentation).
thinking_level (‘low’, ‘medium’, or ‘high’): For Gemini 3 and above, the model’s thinking level; see model documentation (default: unset).
top_k (int of 1 or greater): The model’s TopK parameter, i.e. k most likely next tokens to sample from at each step. Lower k focuses on higher probability tokens (default: Google’s default, which is model-dependent, but typically 1; see model documentation).
top_p (float between 0.0 and 1.0): The model’s TopP parameter, or the cumulative probability cutoff for token selection. Lower p means sampling from a smaller, more top-weighted nucleus and reduces diversity (default: 1.0 if temperature is 0.0 (default), otherwise Google’s default, which is model-dependent, but typically 0.95 or 1.0; see model documentation).
tools (list): Data passed on to the API’s ‘tool’ field, which calls a piece of code that enables the system to interact with external systems to perform an action, or set of actions, outside of knowledge and scope of the model (see here).

Note

You can learn about Temperature, TopK and TopP parameters here and here. In general, temperature increases creativity and diversity in phrasing variety, while top-p and top-k influences variety of individual words with low values leading to potentially repetitive summaries. The only way to get these “right” is through experimentation with actual data, as the results are highly dependent on the input and subjective to your personal preferences.

Underlying unified diff

unified (dict): Directives passed to unified differ, which prepares the unified diff attached to this report. Example:

command: date
differ:
  name: ai_google
  unified:
    context_lines: 5
    range_info: false

Default system instructions and prompts:

Special variables for prompt

When present in the prompt text, the following will be replaced:

{old_text}: Replaced with the old text.
{new_text}: Replaced with the new (currently retrieved) text.
{unified_diff}: Replaced with a unified_diff, with 999 context lines unless changed by prompt_ud_context_lines (see above).
{unified_diff_new} Replaced with the added lines from the unified_diff, with the initial + stripped (e.g. roughly the new text).

Default

System instructions

You are a skilled journalist tasked with analyzing two versions of a text and summarizing the key differences in meaning between them. The audience for your summary is already familiar with the text’s content, so you can focus on the most significant changes.

Instructions:

Carefully examine the old version of the text, provided within the <old_version> and </old_version> tags.

Carefully examine the new version of the text, provided within the <new_version> and </new_version> tags.

Compare the two versions, identifying areas where the meaning differs. This includes additions, removals, or alterations that change the intended message or interpretation.

Ignore changes that do not affect the overall meaning, even if the wording has been modified.

Summarize the identified differences, except those ignored, in a clear and concise manner, explaining how the meaning has shifted or evolved in the new version compared to the old version only when necessary. Be specific and provide examples to illustrate your points when needed.

If there are only additions to the text, then summarize the additions.

Use Markdown formatting to structure your summary effectively. Use headings, bullet points, and other Markdown elements as needed to enhance readability.

Restrict your analysis and summary to the information provided within the <old_version> and <new_version> tags. Do not introduce external information or assumptions.

Prompt

<old_version> {old_text} </old_version>

<new_version> {new_text} </new_version>

With `additions_only`

System instructions

You are a skilled journalist. Your task is to summarize the provided text in a clear and concise manner. Restrict your analysis and summary only to the text provided. Do not introduce any external information or assumptions.

Format your summary using Markdown. Use headings, bullet points, and other Markdown elements where appropriate to create a well-structured and easily readable summary.

Prompt

{unified_diff_new}

Changed in version No: changes are tracked here prior to v3.33 as the differ was in BETA; please refer to the changelog.

Changed in version 3.33: Removed the BETA tag. Added thinking_level and media_resolution sub-directives.

`command`

Call an external differ. The old data and new data are written to a temporary file, and the names of the two files are appended to the command. The external program will have to exit with a status of 0 if no differences are found, a status of 1 if any differences are found, or any other status to signify an error (mimicking wdiff’s behavior).

If your differ outputs HTML, you should set is_html is true.

If wdiff is called, its output will be colorized when displayed on stdout (typically a screen) and for HTML reports. However, we strongly recommend you use the built-in wdiff differ instead!

Tip

Use the job directive monospace if you want to use a monospace font in the report.

Example

url: https://example.net/command.html
differ:
  name: command
  command: python mycustomscript.py
  is_html: true  # if the custom differ outputs HTML

Note

See this note for the file security settings required to run jobs with this differ in Linux.

Changed in version 3.21: Was previously a job sub-directive by the name of diff_tool.

Changed in version 3.29: Added is_html sub-directive.

Changed in version 3.33: Added context_lines sub-directive.

Required directives

command: The command to execute.

Optional directives

is_html (true/false): Whether the output of the command is HTML, for correct formatting in reports (default: false).

Changed in version 3.29: Added is_html sub-directive.

`deepdiff`

Added in version 3.21.

Inspects structured data (JSON, YAML, or XML) on an element by element basis and reports which elements have changed, using a customized report based on deepdiff’s library DeepDiff module.

Examples

url: https://example.net/deepdiff_json.html
differ: deepdiff

url: https://example.net/deepdiff_xml_ignore_oder.html
differ:
  name: deepdiff
  data_type: xml  # override deriving it from data type (fka MIME type)
  ignore_order: true

Output:

--- @ Thu, 01 Oct 2020 00:00:00 +0000 +++ @ Thu, 01 Oct 2020 01:00:00 +0000 • Type of root['Items'][0]['CurrentInventory'] changed from int to NoneType and value changed from "1" to None. • Type of root['Items'][0]['Description'] changed from str to NoneType and value changed from "Gadget" to None.

With `compact`

url: https://example.net/deepdiff_xml_ignore_oder.html
differ:
  name: deepdiff
  compact: true  #  more compact YAML-style report, does not report type changes

Output:

--- @ Thu, 01 Oct 2020 00:00:00 +0000 +++ @ Thu, 01 Oct 2020 01:00:00 +0000 • ⊤['Items'][0]['CurrentInventory']: "1" ⮕ None. • ⊤['Items'][0]['Description']: "Gadget" ⮕ None.

Optional directives

data_type (json, yaml, or xml): The type of data being analyzed if different than the data’s media type (fka MIME type), defaulting to json if unable to derive.
ignore_order (true/false): Whether to ignore the order in which the items have appeared (default: false).
ignore_string_case (true/false): Whether to be case-sensitive or not when comparing strings (default: false).
significant_digits (int): The number of digits AFTER the decimal point to be used in the comparison (default: no limit).
compact (true/false): Produce a more compact YAML-style report which also ignores type changes (e.g. “type changed from NoneType to str”).

Note

When you set ignore_order: true, DeepDiff will treat lists as if they were sets. To compare two sets, it needs to be able to pair up the items and DeepDiff’s default strategy is to try and hash the objects in the list. However, if the items in the list are dictionaries, since they are not hashable in Python, when DeepDiff finds a dictionary in new_data that has even a tiny difference from its counterpart in old_data, since it can’t be sure they are “the same object, but modified”, it reports that the entire old dictionary is gone and that the entire new dictionary has been added. This will cause the report to show a change for the entire, and potentially large, dictionary, not just of the any changed nested value(s).

Required packages

To run jobs with this differ, you need to first install additional Python packages as follows:

uv pip install --upgrade webchanges[deepdiff]

Changed in version 3.30: Added support for YAML data.

Changed in version 3.30.1: Added compact sub-directive.

`image`

Added in version 3.21: As BETA.

Note

This differ is currently in BETA, mostly because it’s unclear what more needs to be developed, changed or parametrized in order to make the differ work with the vast variety of images. Feedback welcomed here.

Highlights changes in an image by overlaying them in yellow on a greyscale version of the original image. Only works with HTML reports.

Examples

Monitor a URL of an image directly, and see if the image changes:

url: https://sources.example.net/productimage.jpg
filters:
  - ascii85
differ:
  name: image
  data_type: ascii85

Extract an image URL from an HTML <img> tag and monitor if this URL changes:

url: https://www.example.net/productpage.html
filters:
  - xpath: //div[@class="image"]/img/@src
differ:
  name: image
  data_type: url

Optional directives

This differ is currently in BETA and these directives may change in the future.

data_type (url, filename, ascii85 or base64): The type of data to process: a link to the image, the path to the file containing the image, or the image itself encoded as Ascii85 or RFC 4648 Base_64 text (default: url).
mse_threshold (float): The minimum mean squared error (MSE) between two images to consider them changed; requires the package numpy to be installed (default: 2.5).

Note

If you pass a url or filename to the differ, it will detect changes only if the url or filename changes, not if the image behind the url/filename does; no change will be reported if the url or filename changes but the image doesn’t. To detect changes in an image when the url or filename doesn’t change, build a job that captures the image itself encoded in Ascii85 (preferably, see the ascii85 filter) or Base64 and set data_type accordingly.

Required packages

To run jobs with this differ, you need to first install additional Python packages as follows:

uv pip install --upgrade webchanges[imagediff]

In addition, you can only run it with a default configuration of :program:webchanges:, which installs the httpx HTTP client library; requests is not supported.

`table`

Similar to unified, it performs a line-by-line comparison and reports lines that have been added, deleted, or changed. However, this is reported in an HTML table format showing a side by side, line by line comparison of text with inter-line and intra-line change highlights produced by Python’s difflib.HtmlDiff class.

Example

url: https://example.net/table.html
differ: table

Output:

For backwards compatibility, this is the default differ for an html reporter with the configuration setting diff (deprecated) set to html.

Optional directives

tabsize (int): Tab stop spacing (default: 8).

Changed in version 3.21: Became a standalone differ (previously only accessible through configuration file settings). Added the tabsize directive.

`wdiff`

Added in version 3.24.

Performs a word-by-word comparison highlighting words that have been added (added) or deleted (deleted). Changed words are displayed twice: once marked as “deleted” (deleted) representing the old word(s), and the new word(s) as “added” (added). Line breaks are maintained.

It is similar to GNU’s Wdiff, but requires no external dependency.

When unchanged lines are skipped, they are reported using @@. For example, @@ 1...22 @@ means that lines 1 to 22 are skipped from the report as they are unchanged.

Example

command: The time now is %time% UTC  # Windows
differ: wdiff

Output:

--- @ Thu, 01 Oct 2020 00:00:00 +0000
+++ @ Thu, 01 Oct 2020 01:00:00 +0000
The time now is 00:00:00.00 01:00:00.00 UTC

Checked 1 source in 0.1 seconds with webchanges.

Optional directives

context_lines (int): The number of context lines on each side of changes to provide surrounding content to better understand the changes (default: 3).
range_info (true/false): Include range information lines for unreported lines (default: true).

f	1	This line is the same	f	1	This line is the same
n	2	This line is in the left file but not the right	n
	3	Another line that is the same		2	Another line that is the same
t			t	3	This line is in the right file but not the left

Differs

Overview

unified (default)

Examples

Range information lines

Context lines

Optional directives

ai_google

Example

Mandatory environment variable

Optional directives

Data to diff

Model tuning

Underlying unified diff

Default system instructions and prompts:

Special variables for prompt

Default

System instructions

Prompt

With additions_only

System instructions

Prompt

command

Example

Required directives

Optional directives

deepdiff

Examples

With compact

Optional directives

Required packages

image

Examples

Optional directives

Required packages

table

Example

Optional directives

wdiff

Example

Optional directives

`unified` (default)

`ai_google`

With `additions_only`

`command`

`deepdiff`

With `compact`

`image`

`table`

`wdiff`