Suggestion for hyphenation indications in HTML - <HYPH>
Issued in May 1996 by
Peter Svanberg
psv@nada.kth.se
and Olle Järnefors
ojarnef@nada.kth.se
. Zucker example revised 18 April 1997.
Summary
We suggest a new HTML element HYPH, in the general form
<HYPH BEF=
"before_linebreak_string"
AFT=
"after_linebreak_string"
>
no_linebreak_string
</HYPH>
to indicate where and how a word can be hyphenated.
In the normal case it is reduced to <HYPH></HYPH>
at the point where hyphenation may take place.
Motivation for hyphenation indication
The need for supporting hyphenation has been
accentuated by recent developments in the WWW.
- tables
- When tables where introduced in HTML, the need for hyphenation
greatly increased, as the columns which text should
fit into became more narrow
- a wider spectrum of languages
- Some languages have a special need of hyphenation, as
very long (compound) words are used quite frequently.
Examples of this is the languages
in the Scandinavian countries and the German language.
Server-side or client-side solution
In principle, hyphenation can be performed by the client
program on its own or by the client program guided by
information from the server about suitable points for
hyphenation. Generally the latter approach seems easier to
realize, because the information provider has the best
qualifications for selecting the best points of hyphenation. He
can use word processing programs specialized for the language
he uses or the subject are covered.
A client program can not be
expected to be competent as regards hyphenation for the many
different languages that are used in the internationalized
WWW. If the information provider can use HTML markup to
indicate hyphenation points, no new functionality in server
software is needed at all, and the extra program support for
hyphenation in clients will be extremely limited.
Hyphenation certainly is a more demanding task for HTML
documents than ordinary paper documents, produced by
word processing, because the dynamic nature of word
wrapping in HTML documents makes necessary the
inclusion of hyphenation hints virtually everywhere
in a paragraph, not only at the end of a few lines.
Current situation
The current situation in HTML is that the only possible
way to specify to the client where a hyphenation can be
done is by using the soft hyphen character.
The RFC 1866, however, discourages its usage:
NOTE - Use of the non-breaking space and soft hyphen indicator
characters is discouraged because support for them is not
widely deployed.
The more popular commercial client programs do not support the
use of soft hyphen. Even worse, these implementations even
sabotage its use (which was defined by ISO 8859-1 in 1987) by
showing a visual hyphen for every soft hyphen character. Had
they elected to show no symbol at all, it would have been
possible to include this special kind of markup into HTML
documents without so bad side effects.
The insufficiency of soft hyphen for i18n use
A more fundamental problem with soft hyphen is that
it cannot represent hyphenation behaviour in
some special cases in languages such as Swedish
and German. To give some simple examples, in
Swedish the word "tillaga", if hyphenated,
becomes "till-
laga", i.e. an extra letter suddenly appears.
In some German cases, the situation is even
more complicated. The word "Zucker", when hyphenated
properly, is transformed to "Zuk-
ker".
Finally, we would like to note that the hyphenation of certain
words in these languages is dependent on the
meaning of the word in its context, which
makes an adequate
client-side solution almost impossible.
Specification of element HYPH
The soft hyphen problems show that there is a need for a more
backwards compatible and more general solution for indicating
possible hyphenation points than using the specific character
soft hyphen. It is also an advantage if that solution
is very simple to implement in browsers. We have a
proposal that meets all three of these requirements:
A new element HYPH to specify hyphenation points
should be introduced. Example:
internationa<HYPH></HYPH>lization
An old HTML browser will show the full word
"internationalization". A browser implementing this
proposal can use the indicated point to hyphenate
this word, if that would enhance the appearance of
the current paragraph.
To handle the special cases mentioned above, the
following three attributes can be used:
- BEF
- Gives the string to insert before the line break if
hyphenation is performed. Default value (if unspecified)
is "-".
- AFT
- Gives the string to insert after the line break. Default
value (if unspecified) is "".
- SUBST
- Identifies the hyphenation character in BEF, if other
than the default "-".
This makes it possible for the browser to substitute
this hyphenation character by a special hyphenation
character preferred by the user.
If there is text inside the HYPH element, it is displayed only if
hyphenation is not done.
(This functionality is inspired by the
discretionary
function in the text formatting and typesetting system TeX.)
Here is a more complex example to illustrate the use of
the attributes:
The correct hyphenation behaviour of the German word
"Zucker" can be specified in this way:
Zu<HYPH BEF="k-">c</HYPH>ker
Note that in future usage, based on
ISO/IEC 10646 as character set, there is two characters
in this context: the ASCII character
HYPHEN-MINUS (hex 002D) and HYPHEN (hex 2016, decimal 8208).
The latter should normally be used for hyphenation.
Test file
This suggestion is tested in this test text.
Latest update April 18, 1997
<webmaster@nada.kth.se>