(aao-errors-memo.txta E 941021 OJ) Olle Jarnefors MEMO Royal Institute of Technology, Stockholm Version D Teknisk Service - Data 1994-10-21 SUNET-MIME 08-790 71 26 Fax:08-10 25 10 Telex:11421 kth s ! 70 ways (or so) of (mis)representing Swedish letters ==================================================== By Olle Jarnefors, TS-Data, KTH (+46-8-790 71 26, ) Summary: There is a great deal of confusion about how to best represent the national Swedish letters in Internet email and netnews (USENET). Some methods in use and occurring modes of misrepresentation are described: 6 character sets, 4 transport ! encodings, 50 transliterations, 10 types of distorted text. NOTE: To fully appreciate the content of this file it is essential that it is viewed by means of a monospaced font. : News in this version: See last section. ! Text marked by "!" in the left margin is new or changed ! compared to version D. : Text marked by ":" in the left margin is new or changed : compared to version C. Content: 1. Introduction 2. Direct character representation 3. Not restored transport encoding 4. Transliteration to ASCII, often performed manually 5. Transport-distorted representation 6. Occasionally observed representations 7. Overview of the national Swedish letters in common coded character sets 8. Document history 1. Introduction --------------- This is an inventoryof the many different ways of _representing_ the national Swedish letters that occurs in Internet email in Sweden adn articles in netnews (USENET). Those letters are: I have also included the most common forms of _automatic distortion_ of these letters one may be exposed to. I have myself observed most, if not all, of these in email from Internet or articles in netnews. To simplify the matter I have disregarded the letters , , and other letters which are not included in the Swedish alphabet but still are sometimes used in Swedish text. It's risky to write about characters that computers and networks often fail to represent correctly in a text file made available on a public computer network. The _Swedish_ original of this text is therefore available in three different versions: aao-fel-pm.txt1 The text is encoded according to the standard ISO 8859-1 (also called Latin-1). This should work without problems in modern Unix systems, MS Windows, MS Windows NT, and OS/2. aao-fel-pm.txts The text is encoded by a Swedish 7-bit character set according to the standard SS 63 61 27. This form is recommended by SUNET for use in email. aao-fel-pm.txta The text is encoded in (American) ASCII according to the standard ANSI X3.4. THe Swedish national letters are not available and have been replaced by AAO, except where a risk for misunderstanding might occur. I have allocated unique _o-numbers_ to the different representation/distortion modes. The letters are accounted for in the order Either I write, directly, the sequence of ASCII characters by which the letter is represented, or I indicate, indirectly, the octet by which is represented, by means of the two hexadecimal digits of its value. Special marks used: Mark Meaning ---- ------- (+) common occurs now and then (-) rare 2. Direct character representation ---------------------------------- o1 (+) Swedish 7-bit character sets 5D 5B 5C 7D 7B 7C (hexadecimal representation) o2 (+) Latin-1 (= ISO 8859-1), MS Windows character set, DEC MCS C5 C4 D6 E5 E4 F6 (hexadecimal representation) o3 IBM PC character sets (CP437 and CP850) 8F 8E 99 86 84 94 (hexadecimal representation) o4 Macintosh character sets 81 80 85 8C 8A 9A (hexadecimal representation) o5 (-) HP ROMAN-8 D0 D8 DA D4 CC CE (hexadecimal representation) o6 (-) NeXT, PostScript 86 85 96 DA D9 F0 (hexadecimal representation) 3. Not restored transport encoding ---------------------------------- o7 Quoted-Printable with Latin-1 as coded character set =C5 =C4 =D6 =E5 =E4 =F6 (teckensekvenser) Together with, or instead of, these may occur: =c5 =c4 =d6 =e5 =e4 =f6 o8 (-) The so-called mnemonic character sets &AA &A: &O: &aa &a: &o: (character sequences) Sometimes the first character of these sequences is instead o9 CTRL-] (1D) o10 or SP followed by BS (20 08). 4. Transliteration to ASCII, often performed manually ----------------------------------------------------- o11 A A O a a o Its common to replace the Swedish letters by character pairs to decrease the risk of confusion. For and often some of the following representations are used: oa1 AA aa oa2 A* a* oa3 *A *a oa4 A. a. oa5 .a .a ! oa6 A' a' ! oa7 'A 'a For often: oo1 AE OE ae oe oo2 A: O: a: o: oo3 :A :O :a :o oo4 A" O" a" o" oo5 "A "O "a "o oo6 A% O% a% o% oo7 %A %O %a %o The o-number for combinations of these methods are computed by the formulas: ! o_no = 5*(oo_no-1) + oa_no + 11 if oa_no <= 5 ! o_no = 2*(oo_no-1) + oa_no + 51 if 6 <= oa_no <= 7 5. Transport-distorted representation ------------------------------------- o47 (+) EDV-destroyed text (Latin-1 with the 8th bit cleared) E D V e d v o48 (-) Simplest fall-back ? ? ? ? ? ? Other fall-back characters occurs: " ", "_", "#", "!", "x", "^" o49 (-) All national Swedish letters has disappeared. o50 (-) PXZ-distroyed text (the ISO-compatibel coded character set ROMAN-8, when the 8th bit has been cleared) P X Z T L N When text written with a IBM PC, Macintosh, or NeXT character set is mangled by bit-stripping, one gets different control characters within the text. o51 (-) IBM PC character sets 0F 0E 19 06 04 14 (CTRL-O, CTRL-N, CTRL-Y, CTRL-F, CTRL-D, CTRL-T) o52 (-) Macintosh character sets 01 00 05 0C 0A 1A (CTRL-A, NUL, CTRL-E, FF, LF, CTRL-Z) o53 (-) NeXT, PostScript 06 05 16 5A 59 70 (CTRL-F, CTRL-E, CTRL-V, "Z", ";", "p") 6. Occasionally observed representations ---------------------------------------- o54 (-) Article <1994May2.171600.5818@lin.foa.se> in swnet.general ) ( ! o55 (-) Email <199409082211.AAA27130@mail.swip.net> 60 5E 5F : o56 (-) What was sent from the BBS of the Prime Minister's Office : at the beginning of 1994 : 70 3F 20 7. Overview of the national Swedish letters in common coded character sets -------------------------------------------------------------------------- 7 8 P M R N = = = = = = 5B [ 7 = Swedish 7-bit character set (SS 63 61 27) 5C \ 8 = Latin-1 (ISO 8859-1), MS Windows 5D ] P = IBM PC character sets 7B { M = Macintosh character sets 7C | R = ROMAN-8 (HP) 7D } N = NeXT, PostScript 80 [ 81 ] ] = 84 { [ = 85 \ [ \ = 86 } ] } = 8A { { = 8C } | = 8E [ 8F ] 94 | 96 \ 99 \ 9A | C4 [ C5 ] CC { CE | D0 ] D4 } D6 \ D8 [ D9 { DA \ } E4 { E5 } F0 | F6 | : 8. Document history : ------------------- : : C 941014 First English version of the document. : D 941017 The variant o56 added. Document history added. ! E 941021 New ways of transliterating discovered in ! soc.culture.nordic, so the number of ways to write ! Swedish letters has increased to the 70 level. (aao-errors-memo.txta: END)