WebParser measure


Measure=WebParser reads and parses information from web pages.

The measure uses Perl Compatible Regular Expressions to extract information from any web page or local file.

Usage

WebParser measures take the form:

[MeasureParent]
Measure=WebParser
URL=https://SomeSite.com
RegExp=(?siU)<Item>(.*)</Item>.*<Item>(.*)</Item>

This example creates two StringIndex values in what is referred to as the "parent" WebParser measure. The information is generally used in subsequent "child" WebParser measures:

[MeasureChild1]
Measure=WebParser
URL=[MeasureParent]
StringIndex=1

[MeasureChild2]
Measure=WebParser
URL=[MeasureParent]
StringIndex=2

The values of the two child measures are now the information parsed into StringIndexes 1 and 2 by the parent measure. These can then be used with MeasureName and other options in meters.

Note: More information and examples for WebParser can be found at WebParser Tutorial and RSS/Atom Feed Tutorial.

Note: WebParser was previously a plugin measure.

In many existing skins you might see the syntax:

[MeasureParent]
Measure=Plugin
Plugin=WebParser
or
Plugin=WebParser.dll
or
Plugin=Plugins\WebParser.dll

WebParser still works with those forms, and changing existing skins to the new Measure=WebParser syntax is entirely optional. However, new skins created going forward should use the correct syntax for accuracy and clarity. WebParser is a measure, and not a plugin.

[MeasureParent]
Measure=WebParser

Options

General measure options

All general measure options are valid.

URL

URL to the site or file to be downloaded and parsed. If the name of another WebParser measure is used, e.g. URL=[SomeMeasure], then the value of the parent measure is used, generally by referring to a specific StringIndex number.

WebParser cannot use cookies or other session-based authentication, so it cannot be used to retrieve information from web sites requiring a login. However, Webparser can be used on sites which support HTTP authentication. E.g. https://myname:mypassword@somesite.com.

WebParser can read and parse local files on your computer by using the file:// URI scheme. E.g. URL=file://#CURRENTPATH#SomeFile.txt. Note that this must be a fully qualified path to the file.

If you want to use the current value of a measure in a dynamic way as a Section Variable, rather than as a reference to a "parent" WebParser measure, you must prefix the name of the measure with the & character.

Example: URL=https://somesite.com/[&WebMeasure]


A note on URL encoding: WebParser will automatically URL-Encode, also known as Percent Encoding, any characters after the protocol://host/path/ portion of the URL that are not one of the following:

Unreserved URL-safe characters:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.~
Reserved URL-delimiter characters:
!*'();:@&=+$,/?%#[]


This means that a URL like:
https://somesite.com?search=I live in München
Would be sent as:
https://somesite.com?search=I%20live%20in%20M%C3%BCnchen


This encoding is transparently done before the URL is sent to the remote site. This is not done when the protocol is file://.

RegExp

The Perl compatible regular expression used in parsing.

StringIndex

Defines which captured string from the RegExp this measure returns. This option is generally used in a child measure to determine which of the captured values in a parent measure to use.

Note: There is a limit of 99 StringIndex values allowed on a single WebParser measure.

StringIndex2

The second string index is used when using a RegExp in a measure that uses data from another WebParser measure (i.e. the URL points to a parent measure. In this case the StringIndex defines the index of the result of the parent measure's RegExp and the StringIndex2 defines the index of this measure's RegExp (i.e. it defines the string that the measure returns).

More information on using StringIndex2 can be found here.

Note: If the RegExp is not defined in this measure, the StringIndex2 has no effect.

UpdateRate Default: 600

The rate in milliseconds determining how often the webpage is downloaded. This is relative to the config's main Update rate and any UpdateDivider on the measure. So the formula would be Update X UpdateDivider X UpdateRate = "how often the measure connects to the site".

Notes: Some caution should be used in determining how often to connect to a site with WebParser. Excessively accessing a site can cause your computer to be seen as an "attack" and result in being blocked. The UpdateRate option defaults to 600 as a safety measure. This should not be changed unless there is some reason to connect more or less often to the site.

In order to override the UpdateRate set on a WebParser measure, to have it connect to the site and download the data "right now", the !CommandMeasure bang must be used, with the name of the "parent" measure as the first parameter, and "Update" as the second.

LeftMouseUpAction=[!SetOption WebMeasure URL "https://SomeNewSite.com"][!CommandMeasure WebMeasure Update]


More information on UpdateRate is at WebParser: How UpdateRate Works.

DecodeCharacterReference Default: 0

Automatically decodes HTML Character References. This will eliminate the need to use a Substitute statement to translate character references like &quot;, &amp;, &lt;, and &gt; to the actual character. Valid values are:

  • 0: Does nothing (default).
  • 1: Decodes both numeric character references and character entity references.
  • 2: Decodes only numeric character references.
  • 3: Decodes only character entity references.

Note: This option is used on the child measures actually returning the value of a StringIndex.

DecodeCodePoints Default: 0

When set to 1, automatically decodes Unicode Code Point values in the form \uXXXX or \UXXXXXXXX. Codes from \u0000 to \uFFFF are supported.

Note: This option is used on the child measures actually returning the value of a StringIndex.

Debug Default: 0

Logs DEBUG messages to the Rainmeter log or to a file. Valid values are:

  • 0: Does not log DEBUG messages from WebParser.
  • 1: Logs DEBUG message to the log. Rainmeter must also be in Debug mode.
  • 2: Saves the downloaded webpage to WebParserDump.txt in the current skin folder. This can be useful since some web servers send different information depending which client requests it. Remember to remove this from your config once you have it working correctly.

Hint: Determining StringIndex values to use in a child measure can be done by setting Debug=1 on a measure having the RegExp option, which will display matched strings and StringIndex numbers in the Rainmeter log

Debug2File

If the Debug option is set to 2, this option can be set to the path and name of the file to use for the downloaded webpage instead of WebParserDump.txt in the current skin folder.

Note: The folder for the file must already exist.

Download Default: 0

If Download=1, the URL is downloaded to Window's TEMP folder and the name to the file is returned as string value. The measure can then be used with MeasureName on an Image meter to download images from a site and display them.

Note: When used on a child measure, the download itself is treated as a parent function, and any FinishAction on the measure will be executed if the download succeeds, and any OnDownloadErrorAction on the measure will be executed if the download fails.

DownloadFile

If the Download option is set to 1, this option defines a relative path and file name where the downloaded file will be saved instead of in Windows TEMP.

A folder DownloadFile will be created in the current folder, and the defined relative path and file name will be created under that. It is not possible to specify an absolute path.

Note: This file is not a temporary file so it is not deleted after unloading a skin or exiting Rainmeter.

ErrorString

The value of the measure will be set to the string defined in this option if the RegExp results in a regular expression parsing error.

Note: While there might be cases where this option is desirable, the OnRegExpErrorAction option might be a more flexible and robust way to deal with regular expression errors.

LogSubstringErrorsDefault: 1

If set to "0", this will suppress logging of "Not enough substring" errors. This can be useful when for instance you are using lookahead assertions in a regular expression, and missing (captures) should not be treated as an "error".

Note: This option is set on the parent WebParser measure.

CodePage Default: 0

Specifies the code page of the downloaded URL=https:// web page or external file read with URL=file://.

Most web sites on the web today are encoded with the Unicode UTF-8 standard. This is the default for WebParser, and it will seamlessly handle the site. No CodePage option is needed.

However, there may be some older web sites that are encoded in a language / character set specific way. On a web site, the encoding used can generally be determined by viewing the raw HTML source and checking the "charset" meta value in the "head" section of the page. (i.e. meta charset="UTF-8")

Some Examples are:

  • CodePage=1200 : Unicode UTF-16 LE (Little Endian)
  • CodePage=1251 : ANSI Cyrillic; Cyrillic (Windows)
  • CodePage=1252 : ANSI Latin 1; Western European (Windows)
  • CodePage=28605 : ISO 8859-15 Latin 9
  • CodePage=65001 : Unicode UTF-8

In addition, there are times when an external local file to be parsed with URL=file:// will be encoded in other than the ANSI (really ASCII plus "extended ASCII" specific to the locale of the computer) encoding used as the default in most Windows-based text editors. Primarily this will be in Unicode UFT-16 LE. In this case, the CodePage=1200 option must be used to tell WebParer how to interpret the resource being read.

Codepage definitions and more information can be found at Code Page Identifiers.

Additional general help with Unicode encoding in Rainmeter can be found at Unicode in Rainmeter.

ProxyServer Default: /auto

Proxy server to use with the measure. The following settings are valid:

  • /auto
    This will use the proxy settings contained in the options for Internet Explorer. (default)
  • /none
    This will make a direct connection, and will not use any proxy setting.
  • ServerName:Port
    This will connect to the proxy server hostname or ip address and port defined. Port is often optional with proxy servers.

The measure doesn't support any authentication, so only use proxy settings that do not require it.

This setting is applied when the measure is initialized on skin load / refresh, and cannot be changed dynamically.

Examples: ProxyServer=/none, ProxyServer=192.168.1.1:8080, ProxyServer=ProxyHostname.net

[WebParserParentMeasure]
ProxyServer=localhost:8080

This option can also be set in the Rainmeter.data file. If set there, it will be used as the global setting for all WebParser measures unless overridden in an individual measure(s). This must be set in a [WebParser] section in Rainmeter.data.

[WebParser]
ProxyServer=localhost:8080
UserAgent Default: Rainmeter WebParser plugin

Specifies a custom User Agent String to be sent when the parent WebParser measure connects to a remote resource using HTTP(S).

If you want to find out what the User Agent String is for the browser you use, you can connect to WhatIsMyBrowser in your browser and copy the string it returns. Other common User Agent String values can be found at UserAgentString.

This setting is applied when the measure is initialized on skin load / refresh, and cannot be changed dynamically.

[WebParserParentMeasure]
UserAgent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36 Edg/92.0.902.84

This option can also be set in the Rainmeter.data file. If set there, it will be used as the global setting for all WebParser measures unless overridden in an individual measure(s). This must be set in a [WebParser] section in Rainmeter.data.

[WebParser]
UserAgent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36 Edg/92.0.902.84

Specifies one or more custom HTTP Header Fields to be sent when the parent WebParser measure connects to a remote resource using HTTP(S).

Example: Header=Cache-Control: no-cache

Flags Default: Resync

Specifies one or more flags to be used by a parent WebParser measure. These flags control some aspects of how WebParser connects to the remote resource via HTTP/HTTPS.


Multiple flags are set by separating them with the | pipe character.


Supported Flags are:

  • Resync
    Only downloads if the resource has been modified since the last time it was downloaded. Otherwise the cache is used. This is the default if no Flags are set.
  • ForceReload
    Forces a download of the requested resource from the origin server, not from the cache.
  • NoCookies
    Does not add cookie headers to requests, and does not add any cookies set by the resource to the cookie database. Does not "send" or "accept" cookies from any resource.
  • NoCacheWrite
    Does not add the returned resource data to the cache.
  • Hyperlink
    Forces a download if there was no Expires time and no LastModified time returned from the server when determining whether to reload the item from the network.
  • TempFile
    Causes a temporary file to be created if the file cannot be cached.
  • NoAuth
    Does not attempt URL-based authentication.
  • PragmaNoCache
    Forces the request to be resolved by the origin server, even if a cached copy exists on a proxy.
  • Secure
    Uses secure transaction semantics. This translates to using Secure Sockets Layer/Private Communications Technology (SSL/PCT) and is only meaningful in HTTP requests.
  • IgnoreCertName
    Disables checking of SSL/PCT-based certificates that are returned from the server against the host name given in the request. WinINet functions use a simple check against certificates by comparing for matching host names and simple wildcarding rules. Use with caution.
  • IgnoreCertDate
    Disables checking of SSL/PCT-based certificates for proper validity dates. Use with caution.
  • IgnoreHTTPRedirect
    Disables detection of this special type of redirect. When this flag is used, WinINet transparently allows redirects from HTTPS to HTTP URLs. Use with caution.
  • IgnoreHTTPSRedirect
    Disables the detection of this special type of redirect. When this flag is used, WinINet transparently allows redirects from HTTP to HTTPS URLs. Use with caution.

[WebParserParentMeasure]
Flags=ForceReload | NoCookies
ForceReload Default: 0

WebParser reads the resource only if it has been modified since last read. This can be overridden with ForceReload=1.

Note: This option has been deprecated in favor of the Flags option, and should not be used in new skins.

Action Options

These options are only valid on parent measures that connect to a site or file with URL, and / or have a RegExp or Download option. They are not valid on child measures.

FinishAction

Bangs or other actions that are executed when the resource has been downloaded and the regular expression (PCRE) parsing is done.

Since WebParser is a "threaded" measure, and Rainmeter does not "wait" for it to return information from the resource, FinishAction can be used to ensure that things in the skin that are dependent on the values from the measure don't generate errors or undesirable visual displays while the WebParser measure hasn't finished getting and parsing the information, or that they are immediately updated as soon as WebParser is done.

Note: FinishAction will be executed when the resource is connected to and the regular expression parsing is complete. It will be executed whether the parsing of the information by RegExp succeeds or fails. It will not be executed if the resource cannot be connected to, or if the regular expression parsing fails, AND an OnRegExpErrorAction option is set on the measure.

OnConnectErrorAction

Bangs or other actions that are executed if WebParser is unable to connect to the hostname or file resource in URL.

This might be due to network unavailability on the local computer, or network or server problems at the remote resource. In the case of the file:// protocol, this would be triggered by not finding the defined local file.

Note: It will take some time, perhaps as long as 10-20 seconds, for WebParser to "time out" and execute this action if the resource cannot be connected to over the network.

OnRegExpErrorAction

Bangs or other actions that are executed if the resource is connected to, but the regular expression defined in RegExp is unable to successfully parse the information. It will not be executed if the resource cannot be connected to.

This might be due to an incorrectly defined RegExp option, a change to a web site that causes the parsing to fail, or a remote server problem where the site is accessible, but the specific page defined on the URL can't be found or causes a redirect to an HTTP error condition.

Note: If this option is defined and the parsing fails, FinishAction will not be executed.

OnDownloadErrorAction

Bangs or other actions that are executed if a measure has a Download option set to 1, and the download of the resource fails.

This may be due to a missing file on the remote host, or the inability of Windows to save the file locally in either the Windows TEMP location or the location specified in a DownloadFile option.

The action will not be executed if the connection to or parsing of the resource in the parent URL option fails. It is executed if all else succeeds, but the specific download process fails.

Note: When Download is used on a child measure, the download itself is treated as a parent function, and any FinishAction on the measure will be executed if the download succeeds, and any OnDownloadErrorAction on the measure will be executed if the download fails.


Note: If some condition causes the error actions to be executed, care should be taken not to create an endless loop or cause the skin to repeatedly try to access a resource in a short period of time. For instance, it would not be wise to have these actions automatically update the measure they are on, or refresh the skin.

Measure Commands

Commands that can be sent to a parent WebParser measure using the !CommandMeasure bang.

These are only valid with a target parent measure that connects to a site or file with URL, and / or has a RegExp option. They are not valid with a child measure as the target.

Update

This will cause a WebParser parent measure to override any current UpdateRate setting, and immediately access and parse the resource defined in URL.

Example: [!CommandMeasure MeasureName "Update"]


More information on UpdateRate is at WebParser: How UpdateRate Works.

Reset

This will cause a WebParser parent measure to reset all values for the parent and any related child measures to their initial empty values.

If a WebParser measure is able to connect and parse information from a web site or file, then that information is "remembered", and is only replaced when new information is successfully received on subsequent connections to the resource. This is both to allow a seamless transition from the old data to the new, and to allow a skin to continue displaying information if it is temporarily unable to connect or parse the resource.

Generally, the above behavior works best. However, this command might be used in conjunction with the OnConnectErrorAction and / or OnRegExpErrorAction actions if the skin design makes it desirable that WebParser "forget" old information when some error condition is triggered on subsequent connections to the resource.

Example: [!CommandMeasure MeasureName "Reset"]

WebParser and Dynamic Variables

Dynamic variables can be used with the WebParser measure. There are some things specific to WebParser that should be kept in mind when doing things in a dynamic way in WebParser measures:

WebParser uses UpdateRate to determine how often the measure should actually access the site or file. While you can dynamically change any option on a WebParser measure, the measure will not use the changes and access the site again until the next UpdateRate is reached. Just using !Update or !UpdateMeasure will NOT override the UpdateRate.

In order to have a dynamic change make WebParser parse the site "right now", you will use the !CommandMeasure bang with the "parent" WebParser measure as the first parameter, and "Update" as the second.

LeftMouseUpAction=[!SetOption WebMeasure URL "https://SomeNewSite.com"][!CommandMeasure WebMeasure Update]

If you dynamically change an option on a "child" WebParser measure that depends on a "parent" measure, (like StringIndex for instance) you MUST use !CommandMeasure with "Update", targeting the "parent" WebParser measure. The values of child WebParser measures are a function of the parent measure, and are only updated when the parent is. You should never use !CommandMeasure on a "child" measure.

If you want to use the current value of a measure in a dynamic way as a Section Variable, rather than as a reference to a "parent" WebParser measure, you must prefix the name of the measure with the & character.

URL=https://SomeSite.com/[&WebMeasure]

Examples

Retrieve the site title, first item and link from Slashdot's RSS feed.

[Rainmeter]
Update=1000
DynamicWindowSize=1
AccurateText=1
BackgroundMode=2
SolidColor=0,0,0,255

[MeasureRSSParent]
Measure=WebParser
URL=https://rss.slashdot.org/Slashdot/slashdotMain
RegExp=(?siU)<title>(.*)</title>.*<item>.*<title>(.*)</title>.*<link>(.*)</link>

[MeasureRSSTitle]
Measure=WebParser
URL=[MeasureRSSParent]
StringIndex=1

[MeasureRSSItemTitle]
Measure=WebParser
URL=[MeasureRSSParent]
StringIndex=2

[MeasureRSSItemLink]
Measure=WebParser
URL=[MeasureRSSParent]
StringIndex=3

[MeterRSSTitle]
Meter=String
MeasureName=MeasureRSSTitle
FontSize=14
FontColor=222,255,227,255
StringStyle=Bold
AntiAlias=1

[MeterRSSItemTitle]
Meter=String
MeasureName=MeasureRSSItemTitle
Y=2R
FontSize=11
FontColor=255,255,255,255
StringStyle=Bold
AntiAlias=1
LeftMouseUpAction=["[MeasureRSSItemLink]"]
DynamicVariables=1

Retrieve, download and display the latest four submissions from the Rainmeter area on deviantART.

[Rainmeter]
Update=1000
DynamicWindowSize=1
AccurateText=1

[Variables]
Item=.*<item>.*<title>(.*)</title>.*<link>(.*)</link>.*<pubDate>(.*)</pubDate>.*role="author".*>(.*)<.*<media:thumbnail url="(.*)"

[MeasureDA]
Measure=WebParser
URL=https://backend.deviantart.com/rss.xml?q=in%3Acustomization%2Fskins%2Fsysmonitor%2Frainmeter+sort%3Atime&type=deviation
RegExp=(?siU)#Item##Item##Item##Item#
UpdateRate=300

[MeasureTitle1]
Measure=WebParser
URL=[MeasureDA]
StringIndex=1

[MeasureLink1]
Measure=WebParser
URL=[MeasureDA]
StringIndex=2

[MeasurePubDate1]
Measure=WebParser
URL=[MeasureDA]
StringIndex=3

[MeasureAuthor1]
Measure=WebParser
URL=[MeasureDA]
StringIndex=4

[MeasureImage1]
Measure=WebParser
URL=[MeasureDA]
Download=1
StringIndex=5

[MeasureTitle2]
Measure=WebParser
URL=[MeasureDA]
StringIndex=6

[MeasureLink2]
Measure=WebParser
URL=[MeasureDA]
StringIndex=7

[MeasurePubDate2]
Measure=WebParser
URL=[MeasureDA]
StringIndex=8

[MeasureAuthor2]
Measure=WebParser
URL=[MeasureDA]
StringIndex=9

[MeasureImage2]
Measure=WebParser
URL=[MeasureDA]
Download=1
StringIndex=10

[MeasureTitle3]
Measure=WebParser
URL=[MeasureDA]
StringIndex=11

[MeasureLink3]
Measure=WebParser
URL=[MeasureDA]
StringIndex=12

[MeasurePubDate3]
Measure=WebParser
URL=[MeasureDA]
StringIndex=13

[MeasureAuthor3]
Measure=WebParser
URL=[MeasureDA]
StringIndex=14

[MeasureImage3]
Measure=WebParser
URL=[MeasureDA]
Download=1
StringIndex=15

[MeasureTitle4]
Measure=WebParser
URL=[MeasureDA]
StringIndex=16

[MeasureLink4]
Measure=WebParser
URL=[MeasureDA]
StringIndex=17

[MeasurePubDate4]
Measure=WebParser
URL=[MeasureDA]
StringIndex=18

[MeasureAuthor4]
Measure=WebParser
URL=[MeasureDA]
StringIndex=19

[MeasureImage4]
Measure=WebParser
URL=[MeasureDA]
Download=1
StringIndex=20

[MeterBackground]
Meter=Image
W=106
H=306
SolidColor=10,20,30,255
LeftMouseUpAction=["https://www.deviantart.com/rainmeter/gallery/45661692/system-monitoring"]

[MeterImage1]
Meter=Image
MeasureName=MeasureImage1
X=6
Y=17
W=95
H=50
PreserveAspectRatio=2
AntiAlias=1
ToolTipTitle=[MeasureTitle1]
ToolTipType=1
ToolTipText=[MeasureAuthor1]#CRLF#[MeasurePubDate1]
LeftMouseUpAction=["[MeasureLink1]"]
DynamicVariables=1

[MeterImage2]
Meter=Image
MeasureName=MeasureImage2
X=6
Y=91
W=95
H=50
PreserveAspectRatio=2
AntiAlias=1
ToolTipTitle=[MeasureTitle2]
ToolTipType=1
ToolTipText=[MeasureAuthor2]#CRLF#[MeasurePubDate2]
LeftMouseUpAction=["[MeasureLink2]"]
DynamicVariables=1

[MeterImage3]
Meter=Image
MeasureName=MeasureImage3
X=6
Y=165
W=95
H=50
PreserveAspectRatio=2
AntiAlias=1
ToolTipTitle=[MeasureTitle3]
ToolTipType=1
ToolTipText=[MeasureAuthor3]#CRLF#[MeasurePubDate3]
LeftMouseUpAction=["[MeasureLink3]"]
DynamicVariables=1

[MeterImage4]
Meter=Image
MeasureName=MeasureImage4
X=6
Y=239
W=95
H=50
PreserveAspectRatio=2
AntiAlias=1
ToolTipTitle=[MeasureTitle4]
ToolTipType=1
ToolTipText=[MeasureAuthor4]#CRLF#[MeasurePubDate4]
LeftMouseUpAction=["[MeasureLink4]"]
DynamicVariables=1