RSS Feed Tutorial
One of the most common uses of the WebParser plugin in Rainmeter is to create a skin to parse and display information from an RSS or Atom feed.
This guide is not intended as a general purpose tutorial for the WebParser plugin, nor does it attempt to explore all that regular expressions can do. A great deal of additional help is available at Regular Expression Options.
RSS and Atom feeds follow certain patterns that make them fairly easy to parse. While there are differences between the RSS and Atom standards, which we will touch on a bit later, let's use a simple RSS output to outline how they are laid out.
As you can see, the pattern is that there is a header section which contains "tags" for the title, description, link and lastBuildDate of the overall feed. This is followed by a series of one or more item entries, each of which has its own set of title/description/link/pubDate tags. It should be noted that there can be MANY other tags in a feed. However, in an RSS feed, the ones we are going to look at are the ones that are certain to be there, and will have the information we want for a simple RSS skin.
Sample RSS feed
What we want to do is take some representative example of an RSS feed, and get that header title, link and last build date information, then a few "items", with their title, link and publication dates. Then we will build a simple skin to display our results.
For our example of an RSS feed, let's use the feed from the site http://www.bbc.co.uk/news/. The RSS feed is found at the URL http://feeds.bbci.co.uk/news/rss.xml. If we go to that URL in our web browser, we get HTML output that looks like this:
Since a feed is meant to be read by a computer program that will parse the data from the XML / HTML output, there is often no attempt made to have the text be particularly formatted or easy to read. Sometimes a feed in your browser WILL have some HTML formatting codes to make it easier to view. If it does, you should right click and use "view page source" (or some variant of that depending on your browser), so you are looking at the raw HTML output and not formatted data.
If we look through this output, we can spot the patterns of XML "tags" that we described above.
We have the title of the feed in
<title>BBC News - Home</title>
a link to the feed in
and the date the feed was last updated in
<lastBuildDate>Wed, 13 Nov 2013 14:26:44 GMT</lastBuildDate>.
We can then see the first of our <item> tags
This is followed by a title of
<title>UK recovery takes hold, says Bank</title>
a link of
and a bit later on, a publication date of
<pubDate>Wed, 13 Nov 2013 12:46:25 GMT</pubDate>.
We know when we are done with any given item when we see the </item> tag.
This is followed by more <item> tags, each with their own titles, links and publication dates.
Building our skin
First, let's get the skin started.
Now we will start building our first WebParser measure.
The next thing we will need is the all-important
RegExp option. Again, this guide is not a tutorial for the WebParser measure, nor a tutorial for Regular Expressions. However, at a high level what we want to do is use a regular expression in the "parent measure" to search the feed and capture parts of it into
StringIndex numbers that we will use later in "child measures".
We can begin by getting the title of the feed into a StringIndex.
So we are searching for <title>, capturing
() any number of characters of any kind
.* into StringIndex number 1, and ending the capture when the text </title> is found.
Let's extend that
RegExp option and get the link for the site and the last build date into StringIndex numbers 2 and 3.
Notice that we use
.* without parentheses to "skip" any number of any characters, without capturing anything.
So now we have those three bits of information we parsed into StringIndex numbers 1, 2 and 3. We can build some child WebParser measures to contain each of those values, that we can then use in meters.
Getting the first feed item
Now that we have retrieved the site header information into our first three StringIndex numbers, let's get the first of the items from the feed. We do that by again extending our
The added parsing code will skip to the first <item>, then find the <title>, <link> and <pubDate>. It will put them in StringIndex numbers 4, 5 and 6. Just count the instances of
(.*) in the expression to determine which StringIndex numbers contain each value.
Following the same pattern as above, we can add new child measures to reference these StringIndex numbers.
Adding more feed items
Adding another item is as simple as extending our
RegExp option to get the next <item> into the next three StringIndex numbers.
And create some more child WebParser measures for those values.
Adding a third item is just more of the same.
That RegExp is getting pretty long...
If you are retrieving a number of items, let's say five for instance, that
RegExp is going to get pretty long and a bit hard to debug. Let's use a little trick with a variable to simplify it a bit. Go back up to the top of your skin right under the [Rainmeter] section, and create a new [Variables] section.
And change our
RegExp option in our parent measure to use that new variable.
Since the variable
#Item# is replaced in the
.*<item>.*<title>(.*)</title>.*<link>(.*)</link>.*<pubDate>(.*)</pubdate>, we still get our parse of the site header information into StringIndex numbers 1, 2 and 3, and our first <item> into StringIndex numbers 4, 5 and 6. However the
RegExp is quite a bit shorter and easier to read and debug. Adding more items is as simple as repeating that variable.
That will get a total of five items, with the total of StrinIndex numbers at 18. Now you just create all the child measures you need to pick off those 18 values, and display them as you like in meters.
A few more advanced suggestions
An RSS or Atom feed can contain the HTML codes for some characters, ones that you would want to display as the actual character instead of the code. For instance, the raw data might contain the code
< which is the
< character. You don't want to dislay the code, but the actual
If you add the option DecodeCharacterReference=1 to any or all desired child measures, Rainmeter will translate and display the correct character for the embedded HTML code.
Generally, you will need this only on the "title" child measures, as the "link" and "pubdate" entries will almost never contain any of these HTML codes.
One of the other things you will run into in feeds, more often in Atom feeds than in RSS, is the
<![CDATA[ ]]> tag around some data. It will take the form
<![CDATA[Some Text in here]]>. This special tag construct in HTML just means that any text inside the opening
<![CDATA[ and the closing
]]> will be displayed as the characters, without any problems with characters that are illegal in XML data. (like &, <, > " and others)
While it would be complicated and annoying to use your regular expression to "parse around" this tag, you certainly don't want that text displayed as part of your skin output. You can use the Substitute option to find and remove this text on any or all child measures. Note that sometime the feed will start and end the tag with
> and sometimes not. We should use Substitute to take care of both cases.
Regular Expression Substitute
Sometimes a feed will be formatted in such a way that there are leading spaces or tabs in front of a value. An example would be:
We don't want to capture and display those leading tab characters in this example, so we can use the RegExpSubstitute option instead of the normal Substitute, so we can take advantage of the additional power of regular expressions in our substitution.
We are using the regular expression construct at the beginning to say "if the start of the string
^ contains one or more "white space" characters
\s+, then change them to empty strings
Note that since the
] characters are "reserved" in regular expressions, we needed to "escape" them using the
\ character in our
Substitute when we use
In our example above, we used our #Item# variable in our
RegExp option to retrieve five "items" from the feed. What if there are currently only four items in the feed? Our regular expression will fail, and no information at all will be obtained.
You can protect against this by using what is known in regular expressions as a "lookahead assertion". What this does is basically say "if the following text exists, then parse it this way. If not, then just skip that part of the regular expression without failing". The format for using a lookahead assertion in regular expressions looks a bit daunting, but in concept it is not too hard to wrap your head around. I strongly suggest looking at the guide at WebParser - Lookahead Assertions.
To cut to the chase, if we change our #Item# variable to this:
Then we are saying "look ahead"
(? for a string that equals
(?= .*<item>)" Note the trailing parentheses to end the "look for" part of the expression. If the string
.*<item> is found, then execute the the parse
.*<item>.*<title>(.*)</title>.*<link>(.*)</link>.*<pubDate>(.*)</pubDate>. Note that then we follow the entire thing with another ending parentheses, to end the "look ahead" part of the expression.
If the lookahead assertion fails, only that part contained in it will be ignored, and any other parts of the expression will still be evaluated and return data.
Putting it all together
Here is a full skin, using all we have learned so far, that will parse and display the first five items from our RSS feed site. You can copy this code and paste it into a new skin to test it out, modify the output meters as and if you like, and you are well on your way!
Atom vs. RSS feeds
The instructions and examples above should work for virtually any RSS feed site. However, if the site you are parsing is based on the Atom feed standard, there are some differences we need to take into account. Here is a sample of an Atom feed:
As you can see, there are differences in both the names of the "tags" used to identify data values, and some differences in formatting within the tags themselves.
Fortunately, we can for the most part just change our
RegExp to account for these differences, and the balance of what we learned making our RSS skin works just fine. It should also be noted that the order of the tags in different Atoms feeds can vary, where RSS feeds generally have the tags always in the same order. This may mean some re-ordering of the StringIndex numbers you use with your measures.
Here is an example output from the Atom feed on the Rainmeter forums:
And the variable for the #Item# and the parent WebParser measure you can use to parse the same five items from this feed: