WebParser: Debugging RegExp


One of the keys to using WebParser in Rainmeter is the all-important RegExp option.

It is not the intention of this guide to teach WebParser or regular expressions. There is a lot of help to get you started at:

What we are going to focus on is the methods you can use to debug your RegExp (regular expression) option in Rainmeter, to build, test and find problems when parsing a web site or local file.

Overview

Just as a quick overview, WebParser is used with a parent measure to connect to a resource, a web site or local file, and parse the text that is returned into StringIndex values that can be used in child measures to extract the individual elements of information you want. You can then display the information in String, Image or other meters.

At its simplest, you use WebParser like this:

[Rainmeter]
Update=1000
AccurateText=1
DynamicWindowSize=1

[MeasureSite]
Measure=WebParser
URL=https://www.tell-my-ip.com/index.html
RegExp=(?siU)<td.*>Your IP Address.*<td>(.*)</td>

[MeasureIP]
Measure=WebParser
URL=[MeasureSite]
StringIndex=1

[MeterIP]
Meter=String
MeasureName=MeasureIP
FontSize=12
FontColor=255,255,255,255
AntiAlias=1

Where you provide a URL to connect to, and a RegExp to parse the text, finding and capturing the information you want with (.*) into StringIndex numbers. You then use the information in those StringIndexes to display the data.

Generally, you are going to be capturing multiple bits of information from the site, so you can use multiple child measures and meters to create your skin.

[MeasureSite]
Measure=WebParser
URL=http://www.tell-my-ip.com/index.html
RegExp=(?siU)<td.*>Your IP Address.*<td>(.*)</td>.*<td.*>Country:.*<img src="(.*)"> (.*)</td>.*<td>Region.*<td>(.*)</td>.*<td>City.*<td>(.*)</td>.*<td>ISP:.*<td>(.*)</td>.*<td>Latitude:.*<td>(.*)</td>.*<td>Longitude:.*<td>(.*)</td>

In the above, we have eight captures (.*), creating eight StringIndex numbers. So we would create eight child WebParser measures to pick off each one, and we can then display the data any way we like in meters. Again, refer to WebParser Tutorial for a full explanation of how WebParser works.

Starting our RegExp

To create that RegExp option, what you will generally want to do is get a copy of the text / HTML that the website is returning to WebParser, and use that as a reference to iteratively build your regular expression. In our example above, this is part of the text that the site returns, which we use to create our RegExp.

<thead>
<tr><th colspan="2" style="background-color: #ccc"><b>IP Address Location</b></th></tr>
</thead>
<tbody>
<tr><td width="100">Your IP Address:</td><td>72.205.25.79</td></tr>
<tr><td>Country:</td><td> <img src="flags/us.png"> United States</td></tr>
<tr><td>Region:</td><td>Virginia</td></tr>
<tr><td>City:</td><td>Great Falls</td></tr>
<tr><td>ISP:</td><td>Cox Communications</td></tr>
<tr><td>Latitude:</td><td>38.9981689453</td></tr>
<tr><td>Longitude:</td><td>-77.2883224487</td></tr>
<tr><td>Host:</td><td>ip72-205-25-79.dc.dc.cox.net</td></tr>
<tr><td>Anonymity Level:</td><td>Elite (or no proxy)</td></tr>
<tr><td>Map:</td><td><a href='detailed.html'>View your IP on satellite map (Big)</a></td></tr>
</tbody>

We then start building our RegExp, using <td.*>Your IP Address.*<td>(.*)</td> to search for <td.*>Your IP Address.*<td>, then capture that IP address into a StringIndex with (.*) until we hit </td>.

So we would start with:

[MeasureSite]
Measure=WebParser
URL=http://www.tell-my-ip.com/index.html
RegExp=(?siU)<td.*>Your IP Address.*<td>(.*)</td>

[MeasureIP]
Measure=WebParser
URL=[MeasureSite]
StringIndex=1

And add on to the RegExp, saving and refreshing the skin as we go to ensure it is working as we want.

[MeasureSite]
Measure=WebParser
URL=http://www.tell-my-ip.com/index.html
RegExp=(?siU)<td.*>Your IP Address.*<td>(.*)</td>.*<td.*>Country:.*<img src="(.*)"> (.*)</td>.*<td>Region.*<td>(.*)</td>.*<td>City.*<td>(.*)</td>.*<td>ISP:.*<td>(.*)</td>.*<td>Latitude:.*<td>(.*)</td>.*<td>Longitude:.*<td>(.*)</td>

Use About / Skins to check that our measures are returning the correct values.

Getting the HTML / text

So what is the best way to get a copy of the text the site is returning to WebParser, so we can use that to create our RegExp? There are a few options...

View page source

If you connect to the site in your web browser, (Internet Explorer, Google Chrome, Firefox, whatever you use) you can right click and select "View page source" from the context menu. That will open a tab with the HTML code that the site is sending to the browser displayed as text. You can copy and paste that into a local text file on your PC, to use while you are working on your skin.

However... There is an issue that can cause this approach to give you some problems. Many websites will tailor the code that is output based on detecting the browser you are using. This is generally to ensure compatibility in many different browsers, not all of which support the standards for HTML / CSS / Javascript in quite the same way. The upshot of this is that what is returned to your Google Chrome browser may be different than what is returned to the WebParser measure. This can cause a lot of confusion and frustration. We recommend that you don't use "View page source" in your browser. There are more reliable methods...

Debug=2

If you set Debug=2 on the parent WebParser measure, (the one with the RegExp on it) and load the skin, it will create a text file WebParserDump.txt in the same folder as your skin .ini file. WebParserDump.txt is simply an exact copy of what the site sent to WebParser, and is an excellent resource when building your RegExp option.

[MeasureSite]
Measure=WebParser
URL=http://www.tell-my-ip.com/index.html
RegExp=
Debug=2

As you can see, we don't even have a RegExp option yet, just a WebParser measure that connects to the site and downloads the HTML to our WebParserDump.txt text file.

Open that file in your favorite text editor, and it can be used as a reference as you continue.

Note: When you are done building and testing your RegExp option, be sure to remove the Debug=2 line, so it isn't constantly downloading and writing that file as you actually use your skin.

This is a good approach. It ensures that you have exactly the same output from the site that WebParer gets, and if you "refresh" the skin from time to time you can be sure that it is also the most current version of the information from the site. However there is one more method that can be even more useful...

RainRegExp

RainRexExp is an add-on utility you can use with WebParser / Rainmeter to create, test and debug your RegExp regular expression options. The way the tool works is to actually connect to the site you are using in your URL option, download the text exactly as WebParser would see it, (it basically tells the site it IS WebParser) and give you a way to build and test your RegExp while seeing exactly what is being set in the StringIndex values.

Get the tool here: RainRegExp.zip. Simply unzip the files into any folder you desire, and run the RainRegExp.exe executable.

(1): Type or paste in the URL to connect to and press "Connect".

(2): The HTML / text returned by the site will be in this panel. You can scroll through it or use the "Search" capability to find the text you want to search for.

(3): Create your RegExp statement here. Click on "Parse" to test as you go. Once you are happy that your expression is working correctly, click on "Copy" to copy the entire RegExp to the Windows clipboard, so you can paste it in your skin.

(4): The output of the "Parse" test will be here. It will consist of the StringIndex number that will be assigned in WebParser and the current value based on the text from the site.

Hope this helps with creating and debugging those pesky RegExp statements.