WebParser plugin

Plugin=WebParser reads and parses information from web pages.

The plugin uses Perl Compatible Regular Expressions to extract information from any web page or local file.

Usage

WebParser measures take the form:

[MeasureParent]
Measure=Plugin
Plugin=WebParser
URL=http://SomeSite.com
RegExp="(?siU)<Item>(.*)</Item>.*<Item>(.*)</Item>"

This example creates two StringIndex values in what is referred to as the "parent" WebParser measure. The information is generally used in subsequent "child" WebParser measures:

[MeasureChild1]
Measure=Plugin
Plugin=WebParser
URL=[MeasureParent]
StringIndex=1

[MeasureChild2]
Measure=Plugin
Plugin=WebParser
URL=[MeasureParent]
StringIndex=2

The values of the two child measures are now the information parsed into StringIndexes 1 and 2 by the parent measure. These can then be used with MeasureName and other options in meters.

Note: More information and examples for WebParser can be found at WebParser Tutorial and RSS/Atom Feed Tutorial.

Options

General measure options

All general measure options are valid.

URL

URL to the site or file to be downloaded and parsed. If the name of another WebParser measure is used, e.g. URL=[SomeMeasure], then the value of the parent measure is used, generally by referring to a specific StringIndex number.

WebParser cannot use cookies or other session-based authentication, so it cannot be used to retrieve information from web sites requiring a login. However, Webparser can be used on sites which support HTTP authentication. E.g. http://myname:mypassword@somesite.com.

WebParser can read and parse local files on your computer by using the file:// URI scheme. E.g. URL=file://#CURRENTPATH#SomeFile.txt.

If you want to use the current value of a measure in a dynamic way as a Section Variable, rather than as a reference to a "parent" WebParser measure, you must prefix the name of the measure with the & character.

URL=http://SomeSite.com\[&WebMeasure]

RegExp

The Perl compatible regular expression used in parsing.

FinishAction

A bang or action that is executed when the page has been downloaded and the parsing is done. This option is only valid on measures that connect to a site or file with URL, and not on child measures.

StringIndex

Defines which captured string from the RegExp this measure returns. This option is generally used in a child measure to determine which of the caputured values in a parent measure to use.

StringIndex2

The second string index is used when using a RegExp in a measure that uses data from another WebParser measure (i.e. the URL points to a parent measure. In this case the StringIndex defines the index of the result of the parent measure's RegExp and the StringIndex2 defines the index of this measure's RegExp (i.e. it defines the string that the measure returns).

More information on using StringIndex2 can be found here.

Note: If the RegExp is not defined in this measure, the StringIndex2 has no effect.

UpdateRate Default: 600

The rate in milliseconds determining how often the webpage is downloaded. This is relative to the config's main Update rate and any UpdateDivider on the measure. So the formula would be Update X UpdateDivider X UpdateRate = "how often the measure connects to the site".

Notes: Some caution should be used in determining how often to connect to a site with WebParser. Excessively accessing a site can cause your computer to be seen as an "attack" and result in being blocked. The UpdateRate option defaults to 600 as a safety measure. This should not be changed unless there is some reason to connect more or less often to the site.

In order to override the UpdateRate set on a WebParser measure, to have it connect to the site and download the data "right now", the !CommandMeasure bang must be used, with the name of the "parent" measure as the first parameter, and "Update" as the second.

LeftMouseUpAction=[!SetOption WebMeasure URL "http://SomeNewSite.com"][!CommandMeasure WebMeasure Update]

DecodeCharacterReference Default: 0

Automatically decodes HTML Character References. This will eliminate the need to use a Substitute statement to translate character references like &quot;, &amp;, &lt;, and &gt; to the actual character. Valid values are:

  • 0: Does nothing (default).
  • 1: Decodes both numeric character references and character entity references.
  • 2: Decodes only numeric character references.
  • 3: Decodes only character entity references.
Debug Default: 0

Logs DEBUG messages to the Rainmeter log or to a file. Valid values are:

  • 0: Does not log DEBUG messages from WebParser.
  • 1: Logs DEBUG message to the log. Rainmeter must also be in Debug mode.
  • 2: Saves the downloaded webpage to WebParserDump.txt in the current skin folder. This can be useful since some web servers send different information depending which client requests it. Remember to remove this from your config once you have it working correctly.

Hint: Determining StringIndex values to use in a child measure can be done by setting Debug=1 on a measure having the RegExp option, which will display matched strings and StringIndex numbers in the Rainmeter log

Debug2File

If the Debug option is set to 2, this option can be set to the path and name of the file to use for the downloaded webpage instead of WebParserDump.txt in the current skin folder.

Note: The folder for the file must already exist.

Download Default: 0

If Download=1, the URL is downloaded to Window's TEMP folder and the name to the file is returned as string value. The measure can then be used with MeasureName on an Image meter to download images from a site and display them.

DownloadFile

If the Download option is set to 1, this option defines a relative path and file name where the downloaded file will be saved instead of in Windows TEMP.

A folder DownloadFile will be created in the current folder, and the defined relative path and file name will be created under that. It is not possible to specify an absolute path.

Note: This file is not a temporary file so it is not deleted after unloading a skin or exiting Rainmeter.

ErrorString

The value of the measure will be set to the string defined in this option if the RegExp results in a regular expression parsing error.

ForceReload Default: 0

WebParser reads the resource only if it has been modified since last read. This can be overridden with ForceReload=1.

ProxyServer Default: /auto

Proxy server to use with the plugin. The following settings are valid:

  • /auto
    This will use the proxy settings contained in the options for Internet Explorer. (default)
  • /none
    This will make a direct connection, and will not use any proxy setting.
  • ServerName:Port
    This will connect to the proxy server hostname or ip address and port defined. Port is often optional with proxy servers.

This option can also be set in the Rainmeter.data file. If set there, it will be used as the global setting for all WebParser measures unless overridden in an individual measure(s).

Note: The plugin doesn't support any authentication, so only use proxy settings that do not require it.

Examples: ProxyServer=/none, ProxyServer=192.168.1.1:8080, ProxyServer=ProxyHostname.net

CodePage Default: 0

Defines the code page of the downloaded URL=http:// web page or external file read with URL=file://.

Most web sites on the web today are encoded with the Unicode UTF-8 standard. This is the default for WebParser, and it will seamlessly handle the site. No CodePage option is needed.

However, there may be some older web sites that are encoded in a language / character set specific way. On a web site, the encoding used can generally be determined by viewing the raw HTML source and checking the "charset" meta value in the "head" section of the page. (i.e. meta charset="UTF-8")

Some Examples are:

  • CodePage=1200 : Unicode UTF-16 LE (Little Endian)
  • CodePage=1251 : ANSI Cyrillic; Cyrillic (Windows)
  • CodePage=1252 : ANSI Latin 1; Western European (Windows)
  • CodePage=28605 : ISO 8859-15 Latin 9
  • CodePage=65001 : Unicode UTF-8

In addition, there are times when an external local file to be parsed with URL=file:// will be encoded in other than the ANSI (really ASCII plus "extended ASCII" specific to the locale of the computer) encoding used as the default in most Windows-based text editors. Primarily this will be in Unicode UFT-16 LE. In this case, the CodePage=1200 option must be used to tell WebParer how to interpret the resource being read.

Codepage definitions and more information can be found at Windows code pages.

Additional general help with Unicode encoding in Rainmeter can be found at Character Encoding in Rainmeter.

WebParser and Dynamic Variables

Dynamic variables can be used with the WebParser plugin. There are some things specific to WebParser that should be kept in mind when doing things in a dynamic way in WebParser measures:

WebParser uses UpdateRate to determine how often the plugin should actually access the site or file. While you can dynamically change any option on a WebParser measure, the plugin will not use the changes and access the site again until the next UpdateRate is reached. Just using !Update or !UpdateMeasure will NOT override the UpdateRate.

In order to have a dynamic change make WebParser parse the site "right now", you will use the !CommandMeasure bang with the "parent" WebParser measure as the first parameter, and "Update" as the second.

LeftMouseUpAction=[!SetOption WebMeasure URL "http://SomeNewSite.com"][!CommandMeasure WebMeasure Update]

If you dynamically change an option on a "child" WebParser measure that depends on a "parent" measure, (like StringIndex for instance) you MUST use !CommandMeasure with "Update", targeting the "parent" WebParser measure. The values of child WebParser measures are a function of the parent measure, and are only updated when the parent is. You should never use !CommandMeasure on a "child" measure.

If you want to use the current value of a measure in a dynamic way as a Section Variable, rather than as a reference to a "parent" WebParser measure, you must prefix the name of the measure with the & character.

URL=http://SomeSite.com\[&WebMeasure]

Examples

Retrieve the site title, first item and link from Slashdot's RSS feed.

[Rainmeter]
Update=1000
DynamicWindowSize=1
BackgroundMode=2
SolidColor=0,0,0,255

[MeasureRSSParent]
Measure=Plugin
Plugin=WebParser
URL=http://slashdot.org/slashdot.rdf
RegExp="(?siU)<title>(.*)</title>.*<item>.*<title>(.*)</title>.*<link>(.*)</link>"

[MeasureRSSTitle]
Measure=Plugin
Plugin=WebParser
URL=[MeasureRSSParent]
StringIndex=1

[MeasureRSSItemTitle]
Measure=Plugin
Plugin=WebParser
URL=[MeasureRSSParent]
StringIndex=2

[MeasureRSSItemLink]
Measure=Plugin
Plugin=WebParser
URL=[MeasureRSSParent]
StringIndex=3

[MeterRSSTitle]
Meter=String
MeasureName=MeasureRSSTitle
FontSize=14
FontColor=222,255,227,255
StringStyle=Bold
AntiAlias=1

[MeterRSSItemTitle]
Meter=String
MeasureName=MeasureRSSItemTitle
Y=2R
FontSize=11
FontColor=255,255,255,255
StringStyle=Bold
AntiAlias=1
LeftMouseUpAction=["[MeasureRSSItemLink]"]
DynamicVariables=1

Retrieve the title, download and display an image for the first item in the Customize.org RSS feed.

[Rainmeter]
Update=1000
DynamicWindowSize=1

[MeasureCustoParent]
Measure=Plugin
Plugin=WebParser.dll
URL=http://customize.org/feeds/submissions
RegExp="(?siU).*<item>.*<title>(.*)</title>.*<description>.*src="(.*)".*</description>.*<link>(.*)</link>"

[MeasureTitle]
Measure=Plugin
Plugin=WebParser.dll
URL=[MeasureCustoParent]
StringIndex=1

[MeasureImage]
Measure=Plugin
Plugin=WebParser.dll
URL=[MeasureCustoParent]
StringIndex=2
Download=1

[MeasureLink]
Measure=Plugin
Plugin=WebParser.dll
URL=[MeasureCustoParent]
StringIndex=3

[MeterTitle]
Meter=String
MeasureName=MeasureTitle
FontSize=12
FontColor=255,255,255,255
SolidColor=0,0,0,255
AntiAlias=1

[MeterImage]
Meter=Image
MeasureName=MeasureImage
Y=2R
W=80
H=60
PreserveAspectRatio=2
LeftMouseUpAction=["[MeasureLink]"]
DynamicVariables=1