Home | pfodApps/pfodDevices | WebStringTemplates | Java/J2EE | Unix | Torches | Superannuation | | About Us
 

Forward Logo (image)      

Using Non-ASCII chars in Arduino and other micro-processors
How to code messages in Your Own (non-English) Language

by Matthew Ford 12th October 2013 (original1st September 2013)
© Forward Computing and Control Pty. Ltd. NSW Australia
All rights reserved.

How to display Chinese, Italian, Russian and other Non-English languages
on your Android mobile's pfodApp
using Arduino and other micro-processors
via Unicode encoded as UTF-8.

Summary

If you are using the Arduino IDE, then everything almost always just works. Open the IDE and paste in the characters you want displayed by your print( ) statement and then compile and upload to the processor. If you are connecting to the pfodApp, that's it finished.

If you are writing a web server you need to start every page you serve with
<html><head><meta http-equiv="Content-Type" content="text/html;charset=utf-8">

If you are trying to test using the Arduino SerialMonitor, forget it. As of V1.5.3, The Arduino SerialMonitor only accepts ASCII chars. Hopefully this will be fixed in some future release.

If you don't get the output you expected OR you are not using the Arduino IDE as your editor OR your are programming some other micro-processor then read on for solutions.
(Note: Notepad on Windows can edit non-ASCII by saving and reading as UTF-8 files, just choose UTF-8 as the encoding when saving.)

Introduction – Why non-ASCII characters in UTF-8

Outputting messages or adding code comments in your own (non-English) language is very convenient. Also if you are writing a micro-processor driven web server that will server web pages containing non-ASCII characters, or if you are writing a pfodDevice where user's want to see the menus in their own (non-English) language then you need to code these non-ASCII characters in your code and have them compiled and uploaded to your micro-processor.

Unicode has become the standard means of handling the multitude of characters, 110,000 characters covering 100 scripts. There are many ways of encoding Unicode characters, UTF-8 has the advantage that it is completely compatible with plain ASCII. That means if you are only sending ASCII characters you cannot tell the difference between ASCII encoding and UTF-8 encoding. Because of the ASCII compatibility UTF-8 has become the de facto encoding for storing Unicode in files and transmitting it.

The Arduino IDE explicitly reads and writes it sketches in UTF-8 encoding. The Arduino gcc-avr compiler also uses UTF-8 encoded files by default. But as noted above the Arduino SerialMonitor does not.

Displaying Non-ASCII Characters

Sending UTF-8 encoded characters is only half the issue. The receiving display device, pfodApp or web browser needs to correctly display the characters. To do this the receiving device needs to a) process the bytes as UTF-8 encoded characters and b) have the required font installed to display the resulting characters.

pfodApp works as expected processing the received bytes using UTF-8 encoding. As mentioned above for the web browser to correctly interpret the characters you need tell it that the page is encoded in UTF-8. You do that by starting every web page you serve with
<html><head><meta http-equiv="Content-Type" content="text/html;charset=utf-8">

But getting the receiving encoding correct only satisfies point a) above. Once the receiving software has read and decoded the character it need to display it on the screen. Most computers and mobile devices do NOT have all 110,000 character shapes available to display every Unicode character.

For example here is a sketch with displays 4 buttons (which do nothing) on a pfodApp. The buttons are “Hello World” translated into Chinese, Russian and Hindi.

Here is the display of the buttons by pfodApp running on my Nexus Android phone.

My Nexus mobile does not have the necessary Hindi font loaded so it just displays missing characters. Although you would expect an Android mobile in India to have the font installed by default.

Coding and compiling UTF-8 characters when all you have is ASCII

If you are not using the Arduino IDE or some other editor that supports UTF-8 OR your compiler or assembler only supports ASCII then you can still code and send non-ASCII characters in UTF-8 format by first converting the characters to the equivalent UTF-8 bytes and then coding those bytes directly.

The UTF8converter that can be downloaded from here does the conversion for you.

Encoding UTF-8 using UTF8converter

I have provided a UTF8converter that can be downloaded from here. This program allows you to paste in the characters you want to display to your user and then convert them to the correct UTF-8 sequence of bytes (as octal) for inserting into the coding of your pfodDevice.

Downloading, Installation and Running of UTF8converter

To run the application, download the jar file, UTF8converter1_0_1.jar. Save it in a directory which you can write to.

Running UTF8converter on Windows machines

You should double click on the jar file and it should run. If not, you do not have Java installed. To install Java goto www.java.com and download and install the Java runtime.

Running UTF8converter on Non-Windows machines

Put the downloaded UTF8converter1_0_1.jar file in a directory.
Then from a terminal window, change directory to where the UTF8converter1_0_1.jar file is and run the command:-
java -jar UTF8converter1_0_1.jar
If the UTF8converter window does not appear, goto www.java.com and download and install Java.

As well on Mac OS, you can assign "Jar Launcher" as the default app. to use when you double-click a jar file, as follows (I don't believe you need the developer tools installed for this):
i) Click once on the .jar file in the Finder and then from the menubar in the Finder select File -> Get Info".
ii) Click on "Open with" and from the popup menu select "Other". A file browser window will open.
iii) In this window, go to the /System/Library/CoreServices folder and select 'Jar Launcher'.
iv) Then make sure the "Always Open With" checkbox is checked and then click Add.
v) Then click the "Change all" button so that any jar file will be opened automatically.
vi) Finally, close the Info window and now when you double-click any of your jar files they should run automatically.
(see http://macosx.com/tech-support/how-to-execute-a-jar-file-in-os-x/9549.html )

Using the UTF8converter

Run the UTF8converter, as described above, and then type or paste the text to want to convert.

For example, using google translate,
“Geniuses eat a peach and engage in fishing.”
in italian, with the accents, becomes
Genî mangiano una pèsca e pratichino la pésca.

Pasting this into the UTF8converter and converting gives

There are the UTF-8 bytes representing the text. Most of them are just standard ASCII except for the characters with accents which have been replaced with their UTF-8 equivalents (in octal). Hex \x.. is not used because C compilers can get confused if the next character after the two hex digits is 'a' to 'f'. Using octal avoids this problem. The GCC compiler used by Arduino also does not accept all unicode sequences such as \u0020

Right clicking the UTF-8 field and choosing “copy” from the popup menu copies the bytes to your clipboard to paste into your code.

The method will work of any language and will display that language on any pfodApp provided the mobile has the appropriate font necessary to display the characters.



AndroidTM is a trademark of Google Inc. For use of the Arduino name see http://arduino.cc/en/Main/FAQ


The General Purpose Android/Arduino Control App.
pfodDevice™ and pfodApp™ are trade marks of Forward Computing and Control Pty. Ltd.


Forward home page link (image)

Contact Forward Computing and Control by
©Copyright 1996-2020 Forward Computing and Control Pty. Ltd. ACN 003 669 994