|
Chapter 6. Unicode and Non-ASCII Support
AspEmail is capable of sending messages in alphabets other than US-ASCII
by supporting the "Quoted-Printable" format. This format is described in
RFC-2045. The idea of the format is that characters with codes less than 33 and greater
than 126 are represented by an "=" followed by a two digit hexadecimal representation
of the character's value. For example, the decimal value 12 (US-ASCII form feed)
is represented as =0C, and the decimal value 61 (US-ASCII "=") can be represented
as =3D.
AspEmail encodes the message body in the Quoted-Printable format
automatically if the ContentTransferEncoding property is set to
the string "Quoted-Printable" (letter case is immaterial).
You may also set the Charset property
to the appropriate character set. The following code snippet sends
a message in Russian:
<% @codepage=1251 %>
<%
...
Mail.Charset = "Windows-1251"
Mail.Body = "Сообщение по-русски."
Mail.ContentTransferEncoding = "Quoted-Printable"
%>
The directive <% @codepage=1251 %> instructs
the ASP interpreter to treat the hard-coded characters in the script
as Russian symbols (1251 is the Russian code page). As a result,
the Body property will receive a Russian Unicode string.
If you wish to send a message with certain mail headers such as Subject:,
To: or From: containing non-US-ASCII characters, you should use
the method Mail.EncodeHeader to encode your character string according to the
RFC 1522. The method takes one required parameter, the header string,
and one optional parameter, the character set, which is "ISO-8859-1" by default.
For example:
<% @codepage=1251 %>
<%
Mail.Subject = Mail.EncodeHeader("Тема По-русски", "Windows-1251")
Mail.FromName = Mail.EncodeHeader("Иван", "Windows-1251")
Mail.AddAddress "stein@somecompany.no", Mail.EncodeHeader("Штейн")
%>
From MSDN: "Unicode is a 16-bit, fixed-width character encoding standard that
encompasses virtually all of the characters commonly used on computers today.
This includes most of the world's written languages, plus publishing characters,
mathematical and technical symbols, and punctuation marks."
From Unicode.org: "Computers ... store letters and other characters by
assigning a number for each one. Before Unicode was invented, there were
hundreds of different encoding systems for assigning these numbers.
No single encoding could contain enough characters...
Unicode provides a unique number for every character,
no matter what the platform, no matter what the program, no matter what the language."
For example, the basic Latin letter "A" has the code Hex 0041 (65), the Russian
letter has the code Hex 0416 (1046), and the Chinese character
has the code Hex 32A5 (12965).
UTF-8 (Unicode Transformation Format, 8-bit encoding form) is the recommended
format to be used to send Unicode-based data across networks, in particular the Internet.
UTF-8 represents a Unicode value as a sequence of 1, 2, or 3 bytes.
Unicode characters in the range Hex 0000 to 007F are encoded simply as bytes
00 to 7F. This means that files and strings which contain only 7-bit ASCII
characters have the same encoding under both ASCII and UTF-8.
Therefore, the Unicode 0041 ("A") in UTF-8 is Hex 41.
Unicode characters in the range Hex 0080 to 07FF are encoded as a sequence of two bytes
For example, the Unicode 0416 ( )
is encoded as Hex D0 96. Unicode characters in the range Hex 0800 to FFFF are encoded
as a sequence of three bytes. For example the Unicode 32A5 ( )
is encoded as Hex E3 8A A5.
AspEmail 5.0 offers full UTF-8 support in both a message body and headers.
To send a UTF-8 encoded message, you must set the CharSet
property to the string "UTF-8" (case is immaterial),
and ContentTransferEncoding to "Quoted-Printable".
You should also pass "UTF-8" as the second argument to EncodeHeader.
The following code sample demonstrates the UTF-8 usage:
<%
' change to address of your own SMTP server
strHost = "smtp.broadviewnet.net"
' Enable UTF-8 -> Unicode translation for form items
Session.CodePage = 65001 ' UTF-8 code
If Request("Send") <> "" Then
Set Mail = Server.CreateObject("Persits.MailSender")
' enter valid SMTP host
Mail.Host = strHost
Mail.From = "info@aspemail.com" ' From address
Mail.FromName = Mail.EncodeHeader(Request("FromName"), "utf-8")
Mail.AddAddress Request("To")
' message subject
Mail.Subject = Mail.EncodeHeader( Request("Subject"), "utf-8")
' message body
Mail.Body = Request("Body")
' UTF-8 parameters
Mail.CharSet = "UTF-8"
Mail.ContentTransferEncoding = "Quoted-Printable"
Mail.Send ' send message
Response.Write "Message sent to " & Request("To")
End If
%>
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" content="text/html; charset=utf-8">
<TITLE>AspEmail: Unicode.asp</TITLE>
</HEAD>
<BODY>
<FORM METHOD="POST" ACTION="Unicode.asp">
<TABLE CELLSPACING=0 CELLPADDING=0>
<TR><TD>Enter email:</TD><TD><INPUT TYPE="TEXT" NAME="To"></TD></TR>
<TR><TD>Enter your name:</TD><TD><INPUT TYPE="TEXT" NAME="FromName"></TD></TR>
<TR><TD>Enter Subject:</TD><TD><INPUT TYPE="TEXT" NAME="Subject"></TD></TR>
<TR><TD>Enter Body:</TD><TD><TEXTAREA cols="50" rows="10" NAME="Body"></TEXTAREA></TD></TR>
<TR><TD COLSPAN=2><INPUT TYPE=SUBMIT NAME="Send" VALUE="Send"></TD></TR>
</TABLE>
</FORM>
</BODY>
</HTML>
|
This code sample has several important elements you must not overlook:
<META HTTP-EQUIV="Content-Type" content="text/html; charset=utf-8">
This META tag specifies the character set for this page to be UTF-8.
This, among other things, instructs the browser to UTF8-encode all form items
when the form is submitted.
Session.CodePage = 65001
This line instructs our ASP script to convert UTF8-encoded form items
(returned by the Request.Form collection) back to regular Unicode strings. The number
65001 is the UTF-8 code page.
Mail.Subject = Mail.EncodeHeader( Request("Subject"), "utf-8")
The second optional argument is set to "UTF-8" for proper encoding of the header.
Mail.CharSet = "UTF-8"
Mail.ContentTransferEncoding = "Quoted-Printable"
These two lines ensure proper UTF-8 encoding of the message body.
Click the links below to run this code sample:
http://localhost/aspemail/NonAscii/Unicode.asp
http://localhost/aspemail/NonAscii/Unicode.aspx
You may specify the following string values for the CharSet property,
as well as the second optional argument to the EncodeHeader method:
Value |
Meaning |
"UTF-8" |
UTF-8 |
"UTF-7" |
UTF-7 |
"Windows-1250" "cp1250" |
ANSI - Central Europe |
"Windows-1251" "cp1251" |
ANSI - Cyrillic |
"Windows-1252" "cp1252" "ascii" "us-ascii" |
Latin I |
"Windows-1253" "cp1253" |
ANSI - Greek |
"Windows-1254" "cp1254" |
ANSI - Turkish |
"Windows-1255" "cp1255" |
ANSI - Hebrew |
"Windows-1256" "cp1256" |
ANSI - Arabic |
"Windows-1257" "cp1257" |
ANSI - Baltic |
"Windows-1258" "cp1258" |
ANSI - Vietnamese |
"ISO-8859-1" |
Latin I (default value) |
"ISO-8859-2" |
Central Europe |
"ISO-8859-3" |
Latin 3 |
"ISO-8859-4" |
Baltic |
"ISO-8859-5" |
Cyrillic |
"ISO-8859-6" |
Arabic |
"ISO-8859-7" |
Greek |
"ISO-8859-8" |
Hebrew |
"ISO-8859-9" |
Latin 5 |
"ISO-8859-15" |
Latin 9 |
"cp866" |
Russian DOS |
"koi8-r" |
Russian |
"koi8-u" |
Ukrainian |
"shift_jis" |
Japanese Windows |
"ks_c_5601-1987" "korean" |
Korean |
"EUC-KR" "korean" |
EUC - Korean |
"BIG5" |
Traditional Chinese Windows |
"GB2312" "chinese" |
Simplified Chinese |
"HZ-GB-2312" |
Simplified Chinese HZ |
"EUC-JP" |
EUC - Japanese |
"X-EUC-TW" |
EUC - Traditional Chinese |
Copyright © 1999 - 2003 Persits Software, Inc.
All Rights Reserved
Questions? Comments? Write us!
|