Unicode in Rainmeter


Given the international audience of Rainmeter, one of the things that comes up regularly is how to use and display non-ASCII character sets in skins.

Encoding standards

Without getting into all the arcane details of how encoding is done in the computer world, there are basically two ways things you use in Rainmeter are going to be encoded.

ANSI

This is an informal (and really inaccurate) term used by text editors for files encoded to display the ASCII - Extended ASCII character set. ASCII is the first 129 characters, including A-Z, a-z, numbers, punctuation, and some control characters and symbols.

The "extended" part of the set from characters 129 to 256 will be based on the Windows Codepage (locale) active in your Windows system. For people in the USA and Western Europe, this will be Windows-1252 (ISO 8859-1), which will display international characters from European languages. Things like umlauts, accents and graves. However, if you for instance have your Windows locale set to "Russian", it will use Windows-1251 (ISO 8859-5), which will replace the European characters with Cyrillic.

If you won't be trying to use characters other than standard (0-128) ASCII in your skins, or do not intend to distribute your skins to folks using other languages on their system, then just staying with the default ANSI encoding in your text editor is going to work fine for any files used in Rainmeter, and you probably don't even need to read the rest of this. However...

Unicode

Unicode is a standard for displaying a much wider variety of characters on any system, without worrying about localization or code pages. It includes all the ASCII characters, pretty much every character in every alphabet, as well as a large number of special characters and symbols.

There are several ways that characters are encoded in the Unicode standard. The ones you will see most often, and which we will deal with below are:

  • Unicode / UTF-16 LE / UCS-2 Little Endian
    This 16-bit encoding standard is what Windows uses by default, to manage and display information in all Windows applications.
  • UTF-8
    This 8-bit encoding standard is the most popular way of encoding text in the World-Wide-Web.
  • UTF-8 With BOM
    A variant of the UTF-8 standard with a Byte Order Mark.

Using Unicode in Rainmeter

Rainmeter skins

Like all Windows applications (since Windows 98), Rainmeter manages and displays all text using the UTF-16 LE Unicode standard. If you wish to embed Unicode characters in Rainmeter .ini (skin) .inc (include) or text files read by the Quote plugin, simply encode the file as:

  • Unicode (Windows Notepad.exe)
  • UCS-2 LE BOM (Notepad++)
  • UTF-16 LE (Sublime Text)

It's that simple. Now you can embed strings like Самое прекрасное в стандартах то, что есть так много, чтобы выбрать из. in your file(s) and Rainmeter will handle them properly on any user's system.

We will refer to this as UTF-16 from here out. While the naming convention varies from editor to editor, it is basically the same standard in each.

NEVER encode any of these files in UTF-8. Rainmeter and the Quote plugin will not be able to properly read them.

Using Unicode with WebParser

In order to reduce the amount of data that must be transmitted to support Unicode, and to maintain compatibility with the widest range of legacy operating systems and applications, web sites that you access with the WebParser plugin will almost always be encoded in UTF-8. The WebParser plugin automatically uses this encoding standard when retrieving web sites, and converts the text to UTF-16 for Rainmeter to handle. There is nothing you need to do with any local file encoding for this to work correctly and seamlessly on any user's system.

Using the file:// protocol with WebParser to read local files

If you are going to be creating a local text file containing Unicode characters that you want to parse with WebParser using the URL=File://SomeFile.txt URL option, you can encode this file as UTF-16. However, there is another step required in Rainmeter to have this work correctly.

As noted above, WebParser assumes all resources it accesses (web sites or files) are encoded in the web-standard UTF-8. When parsing a local file encoded in UTF-16, you must set a Codepage=1200 option on the parent WebParser measure reading the file. Using the Codepage of 1200 will tell WebParser that the resource is encoded as UTF-16.

An alternative is to encode the file as UTF-8. This will allow WebParser to read the file correctly without any Codepage setting. I tend to recommend using UTF-16 however, as I prefer the relative simplicity of just encoding all files in UTF-16 in Rainmeter if Unicode is desired. (The only exception then will be some rare cases with Lua scripting involving reading/writing external files, as noted below.)

Using Unicode with Lua scripting

Note: Unicode with Lua requires at least Rainmeter 3.0.

If you are using a Lua script with your skin, any use of Unicode pretty much anywhere in the skin will require that that .lua script file be encoded as UTF-16 to work. This would include:

  • Embedding Unicode characters in your Lua script file.
  • Having the Lua read variables, options or measure values from the skin that contain Unicode characters.
  • Having the Lua use the value of a WebParser measure that contains Unicode characters.

Simply encode the .lua file as UTF-16, and it will work seamlessly when you are using Unicode in the Lua or in Rainmeter.

NEVER encode a .lua script file in UTF-8. The Rainmeter implementation of Lua will not be able to properly read it.

Caveats with Unicode in Lua

The Lua programming language is an older language, and is designed to be very platform independent. The result is that the native string libraries in Lua are written in such a way that the handling of multi-byte Unicode characters is imperfect. Without getting too far under the covers, the language internally always handles strings as 8-bit single-byte characters. What this means is that certain string functions that use hard-coded character positions (i.e. "return the 4th character from the string") can have incorrect results with multi-byte Unicode character strings.

This will normally not be an issue, as that kind of syntax is going to be very rare in the context of Lua with Rainmeter, but it is worth understanding the implications described at the end of this post.

Reading and writing external Unicode files in Lua

As noted above, you must always save the Lua script .lua file encoded as UTF-16 to use Unicode with Lua and Rainmeter. However, there may be times you want to read or write to external files from your Lua script. If you want to use Unicode with these external files, things have to be handled in a particular way.

I mentioned earlier that Lua always treats text as 8-bit single-byte characters. What the means in this context is that in effect Lua "internally" views all text as UTF-8, even though we always have the .lua file encoded as UTF-16 to work with Rainmeter. So Lua requires that any text file it reads or creates itself be encoded in UTF-8 to handle Unicode characters properly.

Reading an external file with Unicode in Lua

To read an external file which will contain Unicode characters, you should encode the external file in UTF-8 with BOM. In Notepad++ that is UTF-8 with BOM, and in Notepad it is just UTF-8 (which always has a BOM).

You can do this UTF-8 without BOM, but I recommend against it. The reason is that there is literally no difference between a file encoded as ANSI with just "Hello world" in it and the same file encoded as UTF-8 without BOM. The first 128 characters of ANSI and the first 128 characters of UTF-8 are exactly the same, and it is only either the fact that the file already contains Unicode characters, or the BOM, that specifically identifies the encoding.

The upshot of this is that any text editor opening this file will see it as ANSI. If you encode the file as UTF-8 without BOM, save it, and open it again, it will be ANSI. You won't be able to add Unicode characters to it until you change the encoding to UTF-8 again, add the Unicode characters while it is still open, and save it. Then when it is opened again, it will be seen as UTF-8. Adding the BOM to the file makes it unambiguously UTF-8, and will save a lot of head scratching by users trying to edit your files. In any case, the Windows Notepad.exe editor simply won't save a UTF-8 file without BOM. It always adds it when you save.

Anyway, as long as your .lua is encoded in UTF-16, and the external file is encoded in UTF-8, there is nothing special you need to do to read in the file contents in Lua.

Writing an external file with Unicode in Lua

This is also quite simple. When Lua creates a file, it will open the file in Windows and write any string of characters you want to the file. Lua can't, and doesn't need to, "decide" or "ask" that the file be encoded in ANSI or UTF-8.

If you plan to write Unicode characters to the file, you will just need to first write the BOM (Byte Order Mark) to the file after you open it. Then just write away with any characters you want.

The act of writing the BOM to the first three characters of the file is literally all that is needed to turn an ANSI file into a UTF-8 with BOM file in Windows.

Ok, what is the BOM sequence of three characters? It is "EF BB BF" in hex. To write it to the file in Lua, you can either use the string  or the Lua character escape codes \239\187\191.

So after you open a new file in write mode, first just:

FileHandle:write('\239\187\191') or FileHandle:write('')

Then you can write anything else you want or need to the file, and when done, close it.

The file you created will be UTF-8 with BOM, and will properly support all Unicode characters.

Reading and writing in Lua example:

LuaUTF8.ini (not encoded in anything special, just ANSI):

[Rainmeter]
Update=1000
DynamicWindowSize=1

[MeasureRead]
Measure=Calc
Formula=1
IfEqualValue=1
IfEqualAction=[!CommandMeasure MeasureScript "Read()"][!UpdateMeter *][!Redraw]

[MeasureWrite]
Measure=Calc
Formula=1
IfEqualValue=1
IfEqualAction=[!CommandMeasure MeasureScript "Write()"]

[MeasureScript]
Measure=Script
ScriptFile=#CURRENTPATH#LuaUTF8.lua
UpdateDivider=-1

[MeterUTF8]
Meter=String
FontSize=12
FontColor=255,255,255,255
SolidColor=60,60,60,255
Padding=10,10,10,10
AntiAlias=1

LuaUTF8.lua (encoded in UTF-16 LE):

function Initialize()

end

function Read()
local FilePath = SKIN:MakePathAbsolute('TestIn.txt')

local File = io.open(FilePath, 'r')

if not File then
print('LuaUTF8: unable to open file for read at ' .. FilePath)
return
end

local Contents = {}
for Line in File:lines() do
table.insert(Contents, Line)
end

File:close()

SKIN:Bang('!SetOption', 'MeterUTF8', 'Text', Contents[1])
end

function Write()
local FilePath = SKIN:MakePathAbsolute('TestOut.txt')

local File = io.open(FilePath, 'w')

if not File then
print('LuaUTF8: unable to open file for write at ' .. FilePath)
return
end

local BOM = '\239\187\191'

File:write(BOM)
File:write('Бешеные псы и англичане')

File:close()
end

TestIn.txt (encoded in UTF-8 with BOM):

Испытание некоторых русский текст