Swedish characters in email: The SUNET initiative

Background

The Board of Directors for the Swedish University Network, SUNET, has started a project to deal with the problem of Swedish characters in electronic mail. The current situation, in which several different character sets are used simultaneously, is clearly unacceptable.

The main problem concerns the last three characters of the Swedish alphabet, ("a with ring", "a with umlaut" and "o with umlaut") which are often displayed incorrectly on the computer display of the mail message's recipient. There are two reasons for this:

  1. The sender and the recipient use different character sets ( Swedish 7-bit, Latin-1, Macintosh, PC, etc.)
  2. A program that handles transportation of electronic mail has destroyed or changed the letter's contents to something which cannot be interpreted by the recipient's mail program.

    A typical symptom of this -- when the recipient sees "EDV" instead of Swedish characters -- results when a mail transportation program which handled the letter on its way from the sender to the recipient changes the high bit from one to zero in every octet/byte. This behavior is nevertheless completely in accordance with the SMTP standard which is used in SUNET and the Internet.

New Recommendation

After a meeting of electronic mail support personnel from Swedish universities on the 28th of September, and after consultation with the SUNET Technical Advisory Group, SUNET has decided to recommend that starting on the 1st of January, 1995, all electronic mail communication outside an individual organization should conform to the MIME standard for electronic mail in the Internet. Electronic mail sent within an organization ought to conform to this standard, too.

The character set to be used for Swedish text is Latin-1. The Swedish characters are represented by high octets in this character set.

According to the SMTP standard for electronic mail, which has been in use since the beginning of the 1980's, high octets should never be used. This is still the case; under NO circumstances should high octets simply be transmitted 'as is' on the network. The preferred solution is to implement ESMTP, the extended version of SMTP. This protocol allows transmission of high octets only when the receiving system confirms that it is capable of handling letters which contain such. Alternatively, letters containing Latin-1 text can be encoded before transmission using the 'Quoted Printable' encoding process described in the MIME standard.

This means that the former SUNET recommendation to use the Swedish 7-bit character set in electronic mail will no longer be valid after January 1st, 1995.

On the other hand, we do not consider it appropriate at this time to recommend MIME for use in Internet News - an official Internet recommendation for this does not now exist and few programs for reading and creating articles using MIME are available. The recommendation that the Swedish 7-bit character set be used for News is still in effect until further notice. (High octets should not be sent in News articles.)

It is no doubt unsatisfactory that two fundamentally different methods for the representation of Swedish text are used in electronic mail and News, especially in consideration of the close relationship between these two services. SUNET therefore wishes to stimulate discussion on how the problem with representing Swedish characters in News can be solved. The discussion will be conducted in the News group swnet.mail.

SUNET intends to recommend the general use of the Latin-1 character set in plain text files and HTML files provided by Gopher, World Wide Web (WWW) and anonymous FTP services. Also Gopher menus and titles of WWW pages should use Latin-1. A discussion on the advisability of this will be conducted in the News group swnet.mail.

Support Activities

SUNET plans to evaluate electronic mail programs in the Macintosh, MS Windows and UNIX environments in order to ease the transition to MIME. The evaluation will be confined to a program's MIME compatibility and usability. SUNET will recommend suitable programs afterwards.

SUNET will also continue development of the EMIL electronic mail conversion system in order to improve its functionality and ease of installation and configuration. Using EMIL it is possible to provide MIME support in environments where a transition to MIME cannot be accomplished within the given time frame.

Further Information

More information about MIME and in particular this project is available via the World Wide Web. The URL is:

http://www.nada.kth.se/sunet-mime/

Some of the documents can also be acquired via anonymous FTP to ftp.nada.kth.se from the directory pub/sunet-mime.

SUNET recommends that further discussion concerning this project be conducted in the News group swnet.mail. Questions and suggestions can be sent to the project members at the electronic mail address <sunet-mime-info@sunet.se>.


GLOSSARY

Character Set: A complete set of rules for how different characters are represented in a computer using different combinations of bits (quantities that are either 0 or 1).

Swedish 7-bit Character Set: The character set for Swedish text which became popular in the beginning of the 1980's. It is similar to the American ASCII character set except that the braces, brackets and some other special characters are substituted by the Swedish diacritic characters. It is a Swedish standard with the official name of 'SEN_850200_B' within MIME. It is also informally referred to as "Swedish ASCII".

Latin-1: The character set that will be recommended for use within SUNET. It is already used in Microsoft Windows and by many UNIX computers. It is twice the size of ASCII and the Swedish 7-bit character set and contains not only the entire ASCII character set but also all diacritic letters and similar characters used by western European languages. It is an international standard with the officially registered MIME name of 'ISO-8859-1'.

High Octets: Octets (bytes) in which the highest bit is a one (1). All information in a computer is stored as combinations of zeros and ones, bits, often handled in groups of eight called octets or bytes. 256 different combinations are possible with eight bits and are commonly referred to by the numbers 0 to 255, inclusive, in which the high octets have values between 128 and 255. Latin-1 contains 256 characters since every character is represented by a different octet. The Swedish diacritical letters are represented by high octets in Latin-1.

Quoted Printable: A method, defined in MIME, of temporarily representing high octets as low octets during transport. The high octet uppercase Swedish diacritic characters ("a with ring", "a with umlaut" and "o with umlaut") are represented in this system as '=C5', '=C4', and '=D6' and the lowercase as '=E5', '=E4', and '=F6'.

SMTP: Simple Mail Transfer Protocol. The fundamental standard used for electronic mail in SUNET and the Internet. It is defined by the Internet document RFC 821.

MIME: Multipurpose Internet Mail Extensions. An extension of SMTP (and other electronic mail standards) which describes how characters not included in ASCII and multimedia information can be transmitted in the Internet. MIME is defined in the Internet documents RFC 1521 and RFC 1522.

ESMTP: Extended Simple Mail Transfer Protocol. A modification of SMTP which enables transmission of high octets in electronic mail. This is accomplished by using the 'EHLO' command in combination with the 'MAIL FROM' parameter 'BODY=8BITMIME'. This standard is defined in the Internet document RFC 1652.

RFC: Request For Comments. A series of technical documents written during the evolution of the Internet. Among other things, all communication protocols in the Internet are defined in different RFCs. These documents are free and are available from many computers in the Internet, including .

Internet News: The first world-wide, fully-open, computer conferencing system. Discussions are divided into thousands of different interest groups called News Groups. All users of the Internet can read News articles as well as post their own.

World Wide Web (WWW), Gopher, Anonymous FTP: Different methods for reading and acquiring information, graphics, programs and so forth that are available in the Internet.

HTML: HyperText Markup Language. The document format normally used for information that is provided via the World Wide Web.


The SUNET MIME Project. Last changed November 16, 1994