Download Using the - Computing With Accents and Foreign Scripts

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

URL redirection wikipedia , lookup

Transcript
Penn State Computing with Foreign Symbols
HOME
ACCENTS
NON-ENGLISH KEYBOARDS
WEB SITES
LINKS
SITEMAP
LOCATION: Web Sites >> Developing Tips
Tips for Developing
Non-English Web Sites
This section discusses how tips and strategies on reading and developing non-English Web sites.
Developing Properly Encoded Websites
These methods are recommended for any language encoded in Latin 1 such as Spanish,
French, German and Italian as well as for other major world languages including Chinese,
Japanese, Korean, and Russian.
1. Declare The Encoding.
2. Declare the Language
3. Encoding, Fonts, Recommended Browsers by Language
4. HTML Special Entity Codes (Latin-1 only - See encoding chart to check.)
5. Tips for Front Page (Windows)
6. Tips for Dreamweaver
7. Export text from International Word-Processors
8. Text Alignment
Workarounds
Sometimes, especially when you are working with a language with relatively few speakers,
you may need to use alternate methods to deliver content.
1. PDF Files
2. Using Image Files
3. About Using the <FONT FACE> tag
4. ASCII Substitution
©Penn State University, 2001, 2002.
This Web page is maintained by Elizabeth J. Pyatt (ejp10@psu.edu) for the Center for Education Technology Services.
Last Modified: January 2, 2002.
This publication is available in alternate media upon request.
http://cac.psu.edu/ets/presentations/international/web/tips/ [3/5/2002 3:33:00 PM]
Penn State Computing with Foreign Symbols
HOME
ACCENTS
NON-ENGLISH KEYBOARDS
WEB SITES
LINKS
SITEMAP
LOCATION: Web Sites >> Developing Tips >> Declare the Encoding
Declare the Encoding
If you create a Web site, it is good practice to declare the encoding. Properly encoded Web pages declare
the encoding to a broswer through a meta tag in the header. Some examples are given below. If you are not
sure which encoding system to declare, you may want to refer to the encoding by language chart or look at
which system is declared in other Web sites written in the language.
Sample Encoding Declarations
Template
<head>
<meta http-equiv="Content-Type" content="text/html; charset=???">
...
</head>
Declare Latin-1 (English & Western Europe)
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">
...
<head>
NOTE: It IS good practice to declare the encoding even for an English Web site.
One function of this is to tag is to "reset" the browser back to Latin-1 and ensure
proper font settings. A browser that is not reset to Latin 1 display unusual font
effects after it leaves a non-Latin-1 site.
Declare Windows-1252 (default in Front Page)
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=win-1252">
...
<head>
NOTE: FrontPage actually encodes English Web sites not in "ISO-8859-1"
(Latin-1), but in the very similar "Windows-1252". In most cases the results will
be the same, but there may be an occassional differences between the character
specified by Windows-1252 and by Latin-1.
Declare Unicode (UTF-8 version)
http://cac.psu.edu/ets/presentations/international/web/tips/declare.html (1 of 2) [3/5/2002 3:33:01 PM]
Penn State Computing with Foreign Symbols
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
...
<head>
If no encoding is declared, then the browser uses the default setting, which in the U.S. is
typically Latin-1.
Top of Page
©Penn State University, 2001, 2002.
This Web page is maintained by Elizabeth J. Pyatt (ejp10@psu.edu) for the Center for Education Technology Services.
Last Modified: March 9, 2002.
This publication is available in alternate media upon request.
http://cac.psu.edu/ets/presentations/international/web/tips/declare.html (2 of 2) [3/5/2002 3:33:01 PM]
Penn State Computing with Foreign Symbols
HOME
ACCENTS
NON-ENGLISH KEYBOARDS
WEB SITES
LINKS
SITEMAP
LOCATION: Web Sites >> Developing Tips >> Declare the Language
Declare the Language
The <lang=> attribute can be used to declare the language of a Web page or a portion of a Web page. This
is meant to assist search engine spiders, page formatting and screen reader technology.
NOTE: You must also declare the encoding in addition to the language. The language and its script are
independent.
Page Language
The official W3C recommendation is to declare the primary language for each Web page with a <...lang
=> attribute in the <html> tag. Codes are ISO-636 codes.
For instance:
Template
<html lang="??">
...
</html>
English (U.S.)
<html lang="en-US">
...
</html>
English (U.K./Great Britain)
<html lang="en-GB">
...
</html>
Spanish
<html lang="es">
...
</html>
Masai
<html lang="mas">
...
</html>
http://cac.psu.edu/ets/presentations/international/web/tips/langtag.html (1 of 3) [3/5/2002 3:33:01 PM]
Penn State Computing with Foreign Symbols
Switching Languages
If you switch languages within one page, you can embed the <lang=> attribute in other tags such as a <p>,
<h1>, <span> and other tags. For example
Text
This sentence is in English.
Esta frase es en español. (Spanish)
Mae'r frawddeg hon yn cymraeg. (Welsh)
Code
<p>This sentence is in English.</p>
<p lang="es">Esta frase es en espa&ntilde;ol.</p> (Spanish)
<p lang="cy">Mae'r frawddeg hon yn cymraeg.</p> (Welsh)
Language Codes
Language codes are primarily taken from the list of ISO-639 language codes. This list has recently been
expanded to a three letter set (e.g. "eng" for English), from an older two-letter set. If a language has both a
three-letter code (e.g. "eng") and a two-letter code ("en"), then use the two-letter code. If there is is only a
three-letter code (e.g. "mas" for Masai), then use that code. Note that language codes are in lower case.
Languages can also have an optional regional code (usually an ISO-3166 country code) if more
information about dialect is needed. Note that country codes are in all caps.
XHTML
In XHTML, the language is declared in the <head> as follows:
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
Links
W3C Reccomendations
● http://www.w3.org/International/O-HTML-tags.html
ISO-639 Language Codes
● http://www.loc.gov/standards/iso639-2/langcodes.html (Full List)
●
http://babel.alis.com/langues/iso639.en.htm (2-Letter only)
ISO-3166 Country Codes
● http://www.din.de/gremien/nas/nabd/iso3166ma/codlstp1/en_listp1.html
Top of Page
©Penn State University, 2001, 2002.
http://cac.psu.edu/ets/presentations/international/web/tips/langtag.html (2 of 3) [3/5/2002 3:33:01 PM]
Penn State Computing with Foreign Symbols
This Web page is maintained by Elizabeth J. Pyatt (ejp10@psu.edu) for the Center for Education Technology Services.
Last Modified: January 2, 2002.
This publication is available in alternate media upon request.
http://cac.psu.edu/ets/presentations/international/web/tips/langtag.html (3 of 3) [3/5/2002 3:33:01 PM]
Browsers, Fonts, Encodings by Language
HOME
ACCENTS
NON-ENGLISH KEYBOARDS
WEB SITES
LINKS
SITEMAP
LOCATION: Web Sites >> Developing Tips >> Encoding & Fonts by Language
Encoding & Fonts by Language
Below is a list of common scripts with their encodings and recommended fonts.
Scripts: Roman = English Alphabet, Cyrillic = Russian Alphabet
Font Needed: N/A = no special fonts needed, <Language Kit> - generated by
Mac Language Kit
Best Browsers: Although recommendations often refer to Netscape or Internet
Explorer, other browsers such as Mozilla or Opera could be usable.
Languages Taught at Penn State
Language
Arabic
Font Needed
Win: Arial Unicode,
Arabic Transparent,
Tahoma, etc.
Encodings
Windows-1256
ISO-8859-6
Others
Mac: AB Cairo, AB
Geeza, AB Baghdad,
AB Nadeema, etc.
Chinese
(Traditional)
Win: MS Hei, Arial
Unicode, etc.
Mac: Taipei, Hei,
etc.
Chinese (Simplified)
Win: MS Song,
Arial Unicode, etc.
Mac: Beijing, Song,
etc.
Best Browsers
Win: Internet
Explorer 5, Netscape
6
Mac: Netscape 6,
iCab
Big5
EUC-TW
Others
Win: Recent
Netscape or Internet
Explorer.
Mac: Most recent
browsers plus
Language Kits. If
you cannot install
Language Kits, then
use Netscape 4.7.
GB2312
GBK
Others
Win: Recent
Netscape or Internet
Explorer.
Mac: Most recent
browsers plus
Language Kits. If
you cannot install
Language Kits, then
use Netscape 4.7.
http://cac.psu.edu/ets/presentations/international/web/tips/encoderef.html (1 of 5) [3/5/2002 3:33:02 PM]
Browsers, Fonts, Encodings by Language
French
N/A
ISO-8859-1
Most Browsers
German
N/A
ISO-8859-1
Most Browsers
Greek
(Modern)
Win: Recent
versions of Times
New Roman, Arial,
Tahoma, Comic
Sans, Arial Unicode
ISO-8859-7
Windows-1253
Win: Recent
Netscape or Internet
Explorer.
Mac: Internet
Explorer 5, Netscape
6.
Mac: Language Kits
must be installed
NOTES
Hebrew
1. Mac computers manually "draw" Greek characters. Not all
characters may be rendered accurately.
2. There is not wide support of certain Ancient Greek accents.
Read any Ancient Greek Web site carefully for additional
instruction
Win: Arial Unicode,
David, David
Transparent, Fixed
Miriam Transparent
ISO-8859-8
Windows-1255
Win: Internet
Explorer 5, Netscape
6
Mac: Netscape 6,
iCab
Mac: Arial Hebrew,
Corsiva Hebrew, HB
Arial, etc.
Italian
N/A
ISO-8859-1
Most Browsers
Japanese
Win: Arial Unicode,
MS Gothic
Shift_JIS
EUC-JP
Others
Win: Recent
versions of Netscape
or Internet Explorer
Mac: Osaka and
Osaka Å..., etc
Korean
Win: GulimChe,
Arial Unicode
Mac: Most recent
browsers plus
Language Kits. If
you cannot install
Language Kits, then
use Netscape 4.7.
EUC-KR
Mac: Seoul, etc
Latin
N/A
Win: Recent
Netscape or Internet
Explorer.
Mac: Most recent
browsers plus
Language Kits. If
you cannot install
Language Kits, then
use Netscape 4.7.
ISO-8859-1
All Browsers
http://cac.psu.edu/ets/presentations/international/web/tips/encoderef.html (2 of 5) [3/5/2002 3:33:02 PM]
Browsers, Fonts, Encodings by Language
NOTES There is not wide support of Latin long vowel marks. Most Web
sites use Latin without long marks. If you need long marks, try
encoding the page in ISO-8859-13 or Windows-1257. See the tips
section for tips on developing on non Latin 1 Web sites for more
details.
Old
English/Icelandic
Win: N/A
ISO-8859-1
Mac: Language Kit
needs to be installed
Win: Recent
Netscape or Internet
Explorer.
Mac: Internet
Explorer 5, Netscape
6 (ð and þ may be
displayed
inconsistently)
Portuguese
N/A
ISO-8859-1
Most Browsers
Russian
Win: Recent
versions of Times
New Roman, Arial,
Tahoma, Arial
Unicode
Windows-1251
KOI-8
Win: Most
Browsers.
Mac: Most recent
browsers plus
Language Kits. If
you cannot install
Language Kits, then
use Netscape 4.7.
Mac: Geneva CY,
Times CY, Helvetica
CY, Monacao CY,
PrimaProj, Latinskij,
etc.
Spanish
N/A
ISO-8859-1
Most Browsers
Swahili
N/A
ISO-8859-1
All Browsers
This list contains other languages not necessarily taught at Penn State. It is not
complete, by any means. If anyone has a question about a particular language,
please send an e-mail to Elizabeth Pyatt (ejp10@psu.edu).
Other Languages
Language
Cyrillic
Font Needed?
Win: Recent
versions of Times
New Roman, Arial,
Tahoma, Arial
Unicode
Encoding
Windows-1251
KOI-8-R
KOI-8-U (Ukranian)
Others
Mac: Geneva CY,
Times CY, Helvetica
CY, Monacao CY,
PrimaProj, Latinskij,
etc.
http://cac.psu.edu/ets/presentations/international/web/tips/encoderef.html (3 of 5) [3/5/2002 3:33:02 PM]
Best Browsers
Win: Most
Browsers.
Mac: Most
recent browsers
plus Language
Kits. If you
cannot install
Language Kits,
then use
Netscape 4.7.
Browsers, Fonts, Encodings by Language
Central Europe
Win: Recent
versions of Times
New Roman, Arial,
Tahoma, Arial
Unicode
ISO-8859-2 (Latin 2)
Windows-1250
Others
Mac: Geneva CE,
Times CE, Helvetica
CE, etc.
LANGUAGES
Lithuanian and
Lativian (Baltic)
Win: Recent
versions of Times
New Roman, Arial,
Tahoma, Arial
Unicode
Win: Arial Unicode,
AngsanaNew,
Tahoma
Windows-1257
Win: Internet
ISO-8859-13 (Latin 7) Explorer 5,
ISO-8859-4 (Latin 4) Netscape 4.7,
Netscape 6
Mac: Internet
Explorer 5,
Netscape 6
TIS-620
Mac: Third Party
font
Win: Recent
versions of Times
New Roman, Arial,
Tahoma, Arial
Unicode
Turkish
Mac: Most
recent browsers
plus Language
Kits. If you
cannot install
Language Kits,
then use
Netscape 4.7.
Includes - Croatian (Serbo-Croatian in Roman alphabet), Czech,
Hungarian, Polish, Romanian, Slovak, Slovenian
Mac: Language Kits
should be installed
Thai
Win: Most
Browsers.
Win: Internet
Explorer 5,
Netscape 6
Mac: Netscape 6
ISO-8859-9
Windows-1254
Most Browsers
Mac: <Language
Kit>
India & Sri Lanka
NOTES
Western Europe, etc.
See each Website
N/A
Most Browsers
Many Web sites from India and Sri Lanka provide free fonts to
download. Check each Web site for instructions.
N/A
ISO-8859-1
http://cac.psu.edu/ets/presentations/international/web/tips/encoderef.html (4 of 5) [3/5/2002 3:33:02 PM]
Most Browsers
Browsers, Fonts, Encodings by Language
Europe - Albanian, Basque, Catalan, Danish, Dutch, English,
Faroese, Finnish, French, Gaelic (Scots), Galician, German,
Icelandic, Irish, Italian, Norwegian, Portuguese, Spanish,
Swedish.
LANGUAGES Some Baltic languages are not included
Elsewhere - Swahili and many Bantu languages, Hawiian and
many Polynesian languages, many native American languages,
Afrikaans
Does NOT include Vietnamese
Top of Page
©Penn State University, 2001, 2002.
This Web page is maintained by Elizabeth J. Pyatt (ejp10@psu.edu) for the Center for Education Technology Services.
Last Modified: March 9, 2002.
This publication is available in alternate media upon request.
http://cac.psu.edu/ets/presentations/international/web/tips/encoderef.html (5 of 5) [3/5/2002 3:33:02 PM]
Penn State Computing with Foreign Symbols - FrontPage (PC)
HOME
ACCENTS
NON-ENGLISH KEYBOARDS
WEB SITES
LINKS
SITEMAP
LOCATION: Web Sites >> Developing Tips >> Front Page Tips
Using Front Page (PC)
Dreamweaver Tips
The Technique
Many developers use Microsoft FrontPage on a PC in conjunction with the Windows keyboard
utilities to create non-English Web sites. This is an effective tool, but care must be taken not to
make the pages incompatible outside a PC computer.
Tools Needed
Users need to have fonts compliant with the encoding.
Configuring FrontPage
Developers need a recent version of Microsoft FrontPage, the relevant fonts installed, and the
Windows keyboard for that language or script installed and activated. To configure Front Page:
NOTE: These instructions are for the Windows version of FrontPage.
1. Follow the instructions for activating a Windows keyboard through the Regional Options
Control Panel.
2. Open a new document in FrontPage.
3. Follow the instructions for switching Windows keyboards or "Input Locales".
4. You should be able to type in the foreign script in FrontPage.
When to Use it
This is best used for extended passages of scripts such as Cyrillic, Chinese, Japanese, Korean,
Arabic, or Hebrew which are widely supported in browser preferences.
Potential Pitfalls
1. The HTML code must be inspected for extraneous or vendor-specific tags and modified
accordingly. In particular stray <FONT FACE> tags or style-sheet commands could make a file
incompatible on certain browsers and platforms. Whenever possible, avoid using any <FONT
FACE> tags or specifying fonts through a style sheet. Let the browser match the font with the
encoding.
http://cac.psu.edu/ets/presentations/international/web/tips/frontpage.html (1 of 2) [3/5/2002 3:33:03 PM]
Penn State Computing with Foreign Symbols - FrontPage (PC)
2. The output for some scripts, such as Arabic, may not be correct. In those cases, another method is
recommended.
3. For U.S. audiences, it is best to provide instructions to users on how to configure their browsers.
4. Unfortunately, some scripts may be so undersupported that there may not be a viable encoding
system available. In these cases another option should be used.
5. FrontPage will declare the encoding with the appropriate Microsoft Windows encoding scheme
with a meta tag.
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=win-1251"> (Cyrrilic Windows)
</HEAD>
In most cases, that is a preferred encoding, but there may be occassional exceptions depending on
language or script.
NOTE: FrontPage actually encodes English Web sites not in "ISO-8859-1" (Latin-1), but in the
very similar "Windows-1252". In most cases the results will be the same, but there may be an
occassional differences between the character specified by Windows-1252 and by Latin-1.
Top of Page
©Penn State University, 2001, 2002.
This Web page is maintained by Elizabeth J. Pyatt (ejp10@psu.edu) for the Center for Education Technology Services.
Last Modified: March 9, 2002.
This publication is available in alternate media upon request.
http://cac.psu.edu/ets/presentations/international/web/tips/frontpage.html (2 of 2) [3/5/2002 3:33:03 PM]
Penn State Computing with Foreign Symbols - Dreamweaver
HOME
ACCENTS
NON-ENGLISH KEYBOARDS
WEB SITES
LINKS
SITEMAP
LOCATION: Web Sites >> Developing Tips >> Dreamweaver Tips
Using Dreamweaver
FrontPage (PC) Tips
The Technique
If you wish to use Dreamweaver, it is suggested that you first export text from an international
word processor into an HTML file, then modify it in Dreamweaver.
Tools Needed
Users need to have fonts compliant with the encoding.
Configuring Dreamweaver
Developers need a recent version of Dreamweaver and the relevant fonts installed. You should also
configure Dreamweaver to work with a non-English HTML file.
1. Open Dreamweaver, then under the Edit menu, choose Preferences to open the Preferences
window.
2. In the Category menu to the left, select Fonts/Encoding.
http://cac.psu.edu/ets/presentations/international/web/tips/dreamweaver.html (1 of 3) [3/5/2002 3:33:04 PM]
Penn State Computing with Foreign Symbols - Dreamweaver
3. In the Font Settings menu to the right, choose an appropriate script (e.g. "Cyrillic"). Be
careful not to choose Default Encoding.
4. Select an appropriate font which matches that script from the Proportional Font, Fixed and
HTML Inspector pull-down menus. Click OK to shut the window.
5. Open a document which is encoded in a non-English script. The characters should be in that
script, even in the HTML Source window.
When to Use it
This is best used for extended passages of scripts such as Cyrillic, Chinese, Japanese, Korean,
Arabic, or Hebrew which are widely supported in browser preferences. Dreamweaver can be
benefical if, for some reason, you wish to avoid a Windows encoded file.
Potential Pitfalls
1. Make sure an encoding is declared in a meta tag such as the one listed below.
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8"> (Unicode)
</HEAD>
2. The HTML code should be inspected for extraneous or vendor-specific tags and modified
accordingly. In particular stray <FONT FACE> tags or style-sheet commands could make a file
incompatible on certain browsers and platforms. Whenever possible, avoid using any <FONT
FACE> tags or specifying fonts through a style sheet. Let the browser match the font with the
encoding.
3. For U.S. audiences, it is best to provide instructions to users on how to configure their browsers.
http://cac.psu.edu/ets/presentations/international/web/tips/dreamweaver.html (2 of 3) [3/5/2002 3:33:04 PM]
Penn State Computing with Foreign Symbols - Dreamweaver
4. For languages whose encoding systems are not widely supported by browsers, the text editor and
Dreamweaver can still be used to develop the web page, but you will need to take extra steps to
provide information on recommended browsers and fonts.
5. Unfortunately, some scripts may be so undersupported that there may not be a viable encoding
system or text editor available. In these cases another option should be used.
Top of Page
©Penn State University, 2001, 2002.
This Web page is maintained by Elizabeth J. Pyatt (ejp10@psu.edu) for the Center for Education Technology Services.
Last Modified: March 9, 2002.
This publication is available in alternate media upon request.
http://cac.psu.edu/ets/presentations/international/web/tips/dreamweaver.html (3 of 3) [3/5/2002 3:33:04 PM]
Penn State Computing with Foreign Symbols - Export from Word Processor
HOME
ACCENTS
NON-ENGLISH KEYBOARDS
WEB SITES
LINKS
SITEMAP
LOCATION: Web Sites >> Developing Tips >> International Word Processor
Export HTML Text from
an International Word Processor
The Technique
The ideal for developing a non-Roman site is to encode text in a standard character encoding system
for a given script. The easiest way to do that is to purchase a text editor or word processing
program designed for that script. This encoded text can then be exported as an HTML file.
Tools Needed
Users need to have fonts compliant with the encoding.
Developers need a text editor or word processor developed for a specific script which includes an
export to HTML utility. For instance, the Global Writer international word-processor allows export
into HTML for many scripts. Other script-specific word-processors such as Chinese Star may
include a similar export to HTML utility. In some cases you will need to select an appropriate
encoding for the script.
NOTE: Because of Microsoft formatting issues, export from Microsoft Word is not recommended
When to Use it
This is best used for extended passages of scripts such as Cyrillic, Chinese, Japanese, Korean,
Arabic, or Hebrew which are widely supported in browser preferences.
Potential Pitfalls
1. Make sure the HTML file declares the encoding system at the beginning of the HTML file.
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ENCODING HERE">
</HEAD>
2. Any HTML code should be inspected for extraneous or vendor-specific tags and modified
accordingly. In particular stray <FONT FACE> tags or style-sheet commands could make a file
incompatible on certain browsers and platforms. Whenever possible, avoid using any <FONT
FACE> tags or specifying fonts through a style sheet.
http://cac.psu.edu/ets/presentations/international/web/tips/export.html (1 of 2) [3/5/2002 3:33:04 PM]
Penn State Computing with Foreign Symbols - Export from Word Processor
Here's an example from Swarthmore of how to tweak exported Chinese text in HTML with Claris
Homepage.
3. For U.S. audiences, it is best to provide instructions to users on how to configure their browsers.
4. For languages whose encoding systems are not widely supported by browsers, the text editor can
still be used to develop the web page, but you will need to take extra steps to provide information on
recommended browsers and fonts.
5. Unfortunately, some scripts may be so undersupported that there may not be a viable encoding
system or text editor available. In these cases, other options should be used.
Top of Page
©Penn State University, 2001, 2002.
This Web page is maintained by Elizabeth J. Pyatt (ejp10@psu.edu) for the Center for Education Technology Services.
Last Modified: March 9, 2002.
This publication is available in alternate media upon request.
http://cac.psu.edu/ets/presentations/international/web/tips/export.html (2 of 2) [3/5/2002 3:33:04 PM]
Penn State Computing with Foreign Symbols - Text Alignment
HOME
ACCENTS
NON-ENGLISH KEYBOARDS
WEB SITES
LINKS
SITEMAP
LOCATION: Web Sites >> Developing Tips >> Text Alignment
Text Alignment Options
A. Right Aligned Text
Text enclosed in the tags <DIV ALIGN="RIGHT">...</DIV> will be right aligned.
HTML Code
<DIV ALIGN="RIGHT">
<P>Look for me WAAAAAY on the right.</P>
</DIV>
Result
Look for me WAAAAY on the right.
Web sites geared towards devleoping Hebrew or Arabic sites discuss other strategies, but they
may not work on all browsers. One of these Web sites is
www.microsoft.com/globaldev/articles/mideast.asp.
B. Vertical Text
For purposes of parsing textual data and screen reader access, it's best to use horizontal text
whenever possible. If you absolutely need vertical text, using image files or PDFs are the
best alternatives. In the near future, style sheet options, such as those discussed in the
proposed ruby system will allow developers to generate vertical text pages for the Web.
Top of Page
©Penn State University, 2001, 2002.
This Web page is maintained by Elizabeth J. Pyatt (ejp10@psu.edu) for the Center for Education Technology Services.
Last Modified: March 9, 2002.
This publication is available in alternate media upon request.
http://cac.psu.edu/ets/presentations/international/web/tips/align.html [3/5/2002 3:33:04 PM]
Penn State Computing with Foreign Symbols - PDF Files
HOME
ACCENTS
NON-ENGLISH KEYBOARDS
WEB SITES
LINKS
SITEMAP
LOCATION: Web Sites >> Developing Tips >> PDF Files
PDF Files
This section discusses how tips and strategies on reading and developing non-English Web sites.
The Technique
PDF (Portable Document Format) format files are readable and printable to all users, regardless of
what fonts may be installed on their computer. Adobe Acrobat can be used to convert files
composed in a word-processor such as MS Word to PDF.
Tools Needed
Site users must have Acrobat Reader (http://www.adobe.com/products/acrobat/readstep.html) FREE FROM ADOBE.
NOTE: All Student Computing Lab machines have Acrobat Reader installed onto them.
Developers must own PDF Acrobat (full version), Acrobat Writer or Acrobat Distiller, all available
from Adobe. This software may purchased from the Penn State MOC.
When to Use It
PDFs are excellent for longer documents, and can preserve a great detail in formatting & graphic
information. PDFs may be a good solution for developers with a large archive of foreign-language
documents in word-processing format. On the other hand, PDF's are not optimal for short passages.
Best of all, you can use any font in a PDF (print or Internet).
Potential Pitfalls
PDF technology is widely used and supported by the Penn State community, but here are a few
minor quirks to look out for.
1. Once files are in PDF format, they are difficult to edit. Always keep the original text file on
hand, in case you need to make changes.
2. Not all fonts are "licensed" for PDFs. In these cases you need to find another similar font
which is licensed.
3. Students with older computers may need to download Acrobat Reader in order to use your
files - many sites warn users of the need for the PDF Acrobat Reader, then point them to
Adobe's Web site. Adobe will even let Web designers use their download graphic on a Web
http://cac.psu.edu/ets/presentations/international/web/tips/pdf.html (1 of 2) [3/5/2002 3:33:05 PM]
Penn State Computing with Foreign Symbols - PDF Files
site.
4. PDF files must be downloaded onto the user's machine. If they are large, they may be slow to
download over a modem connection. Many Web sites list file sizes, so users are aware of
potentially long download times.
5. Strictly speaking, PDF files do not always meet "accessibility" requirements.
Visually-impaired students with screen readers may not be able to read PDF files. For some
minority languages, this may be a moot point, but for some languages like Spanish or French,
the issue may be more critical.
Top of Page
©Penn State University, 2001, 2002.
This Web page is maintained by Elizabeth J. Pyatt (ejp10@psu.edu) for the Center for Education Technology Services.
Last Modified: March 9, 2002.
This publication is available in alternate media upon request.
http://cac.psu.edu/ets/presentations/international/web/tips/pdf.html (2 of 2) [3/5/2002 3:33:05 PM]
Penn State Computing with Foreign Symbols - Using Image Files
HOME
ACCENTS
NON-ENGLISH KEYBOARDS
WEB SITES
LINKS
SITEMAP
LOCATION: Web Sites >> Developing Tips >> Image Files
Using Image Files
The Technique
Use a graphic image in .gif (GIF) format of the desired text. This also allows for maximum control
over the visual appearance of a piece of text. Below are some examples of how to incorporate GIF
images.
Buttons
These buttons show "PSU" in Runic, Cyrillic and Cherokee. Buttons do NOT work.
(Author does not guarantee 100% accuracy of transliteration.)
Runic "PSU"
Cyrillic "PSU"
Cherokee "PSU"
GIF's Masquerading as Text
This list combines Latin-1 Text with Image Files. Again links are non-functional.
● PSU (This is text)
●
(Image of Runic PSU)
(Image of Cyrillic PSU)
●
(Image of Cherokee PSU)
●
This technique is often used in sites with links to translated pages. The Non-Roman
script images have to be manually colored the same as your link color with the
underline inserted. In addition, care must be taken with the layout to preserve the
illusion of "textness", especially across platforms. See below for a not-so-good
example.
PSU |
|
|
Tools Needed
Users only need a graphical browser such as Netscape or Internet Explorer, making images nearly
universal.
http://cac.psu.edu/ets/presentations/international/web/tips/gif.html (1 of 2) [3/5/2002 3:33:06 PM]
Penn State Computing with Foreign Symbols - Using Image Files
Developers need a graphics program (Photoshop, Painter, file converters) which to generate or
convert files in the .gif format.
When to Use It
This is best used for small pieces of foreign language text such as buttons, stray non Latin-1 glyphs
or short link text. Graphic buttons are often used point users to properly encoded Web pages or PDF
files written in the relevant language.
Potential Pitfalls
1. Remember to include the alt="text" and title = "text" attribute when inserting image files into
HTML. The ALT attribute is useful for users who have turned off their graphics or who rely on
synthesized screen readers.
HTML Code
<img src="../../graphics/IPAschwa.gif">
<img src="../../graphics/IPAschwa.gif alt="schwa"
title="schwa">
Result (Graphics Result (Graphics
Enabled)
Disabled)
-IMAGE-schwa-
2. Use .gif files instead of .jpg files, which are better suited to photographs.
3. Once files are in .gif format, the ability to edit type is lost. Always keep the original graphics file on
hand, in case you need to make changes.
4. Graphics files are not scalable. An in-text graphics file that looks fine in one platform may look out
of scale on another. Some minor adustments may be necessary.
5. Print quality of an image file of a glyph is generally lower than a textual equivalent.
Top of Page
©Penn State University, 2001, 2002.
This Web page is maintained by Elizabeth J. Pyatt (ejp10@psu.edu) for the Center for Education Technology Services.
Last Modified: March 9, 2002.
This publication is available in alternate media upon request.
http://cac.psu.edu/ets/presentations/international/web/tips/gif.html (2 of 2) [3/5/2002 3:33:06 PM]
Penn State Computing with Foreign Symbols - Font Face Tag
HOME
ACCENTS
NON-ENGLISH KEYBOARDS
WEB SITES
LINKS
SITEMAP
LOCATION: Web Sites >> Developing Tips >> Font Face Tag
Using the <FONT FACE> tag
The Technique
Use of a print font (such as the font Symbol for Greek text) instead of an Internet to display foreign
language material. The font is specified in HTML with the tags <FONT FACE="(font name)">
</FONT>.
This technique is not well-received because of implementation problems, the most important of
which is that if the user can't download the right font on their machine, the site is unreadable.
This technique works best for Web sites which can reasonably expect users to return. This would
include user groups, news sites and possibly class sites. For instance, some news organizations offer
free print fonts to their users so that the site is usable on all platforms.
If you like, you can view a phonetic transcription SIL Encore phonetic transcription test file for
Macintosh or a phonetic SIL Encore phonetic transcription test file for Windows. Of course, you
will not be able to read these files until you download the SIL Encore IPA fonts
(www.sil.org/computing/fonts/encore-ipa.html) from the Summer Institure of Linguistics
(www.sil.org) available in Mac & PC formats.
Incidentally, because the SIL fonts are designed for PRINT use and not INTERNET use, the
glyph-character number mappings are different in the Mac & PC versions of the font. Therefore, I
had to create two separate files (a MAC version and a PC version), so that the right glyphs appear
in the browser.
Tools Needed
Developers MUST provide information to users on downloading and installing the print font (both
Mac & PC) in question. Even if the font is the "same" in Windows and Mac, you may have to
develop dual versions of the same text.
NOTE ON CAC LABS: It is possible for students to download the font temporarily onto a
MACINTOSH ONLY from a floppy disk. The font will be available on that single machine until the
user logs off. Users CANNOT download fonts onto PCs. Faculty can request that fonts be installed
on Student Computing Lab machines, but should be prepared to provide information about the
course and font supplier and licensing.
http://cac.psu.edu/ets/presentations/international/web/tips/fontface.html (1 of 2) [3/5/2002 3:33:06 PM]
Penn State Computing with Foreign Symbols - Font Face Tag
When to Use it
This could be a way to minimize file size or clean up images for large documents written in an
undersupported script such as Cherokee, Ogam, minority South Asian scripts, or a non-living
language.
Generally speaking, the print quality of the glyps will be better than a .gif image file of a glyph.
Make sure your audience is willing to download the font (a free font is better). This may be used in
conjunction with larger PDF files.
Here are some Web sites which offers print fonts:
● www.info.lk/slword/swdowns.htm - Fonts for Sinhala (Sri Lanka) newsgroups
●
www.perseus.tufts.edu/Help/fonthelp.html - Perseus Ancient Greek Font support
The Pitfalls
1. If the user can't download and install a font, the site is useless. Provide multiple download links or
on disk (to a class) if possible.
2. If you are using a characters generated by keystrokes outside the ASCII range (e.g. a Windows ALT
key code or Macintosh Option key code), check to be sure the file is readable on both platforms.
You may have to develop to two files.
3. Search engines and screen readers will think the Web site is Latin-1 and read it as such (resulting in
a string of nonsense characters). A separate audio file or additional text description may be needed
for visually impaired users.
4. If you provide a font that is not yours, you must read the font licensing conditions. Many can be
distributed free for non-commercial use, but there may be additional restrictions.
Dynamic Font Technology
Some Web sites use "dynamic font" technology in which a specific font is automatically
downloaded onto a user's computer. However, both Netscape and Internet Explorer
implement them differently and they are not cross-compatible. Typically these Web sites are
viewable only in one browser.
Here are some sites which use Dynamic fonts.
Alabama Dictionary Characters (Native American language)
Australian Phonetic Course (Netscape only)
Top of Page
©Penn State University, 2001, 2002.
This Web page is maintained by Elizabeth J. Pyatt (ejp10@psu.edu) for the Center for Education Technology Services.
Last Modified: March 9, 2002.
This publication is available in alternate media upon request.
http://cac.psu.edu/ets/presentations/international/web/tips/fontface.html (2 of 2) [3/5/2002 3:33:06 PM]
Penn State Computing with Foreign Symbols - ASCII Substution
HOME
ACCENTS
NON-ENGLISH KEYBOARDS
WEB SITES
LINKS
SITEMAP
LOCATION: Web Sites >> Developing Tips >> ASCII Substitution
ASCII Substitution
The Technique
Use an ASCII subsititute for a non-Latin 1 glyph. For intance Welsh Web texts replace "circumflex
w" ( ) with plain "w" or "w+". Similarly, many Old Irish scholars replace the "amperagus" ( )
symbol (Old Irish "&" symbol), with just the number seven (7).
ASCII substitutions for phonetic symbols are very common - here's a standardized IPA phonetic
alphabet ASCII substitution key.
Tools Needed
Developers need a keyboard. All users will be able to see the glyph.
When to Use it
Only with a Roman script for cases when there may be one or two glyphs missing in ISO-8859-1.
Best used as a last resort when other resources fail.
Potential Pitfalls
1. Search engines and screen readers will not be able to parse it. Put a keymap (e.g. "7" =
"amperagus") on the home page explaining your subsitutions.
2. You should not use this technique to replace non ASCII-glyphs (e.g. é, ¢) available through Latin 1
HTML special entity codes.
Top of Page
©Penn State University, 2001, 2002.
This Web page is maintained by Elizabeth J. Pyatt (ejp10@psu.edu) for the Center for Education Technology Services.
Last Modified: March 9, 2002.
This publication is available in alternate media upon request.
http://cac.psu.edu/ets/presentations/international/web/tips/asciisub.html [3/5/2002 3:33:07 PM]
Alt Key Codes
HOME
ACCENTS
NON-ENGLISH KEYBOARDS
WEB SITES
LINKS
SITEMAP
LOCATION: Web Sites >> HTML Special Etity Codes
HTML - Special Entitiy Codes
This Web page contains lists of special entity codes needed in HTML to generate special characters such
as ñ, ¢, ÷ and other characters. Full instructions are in the "Using the Codes" section followed by lists
organized by character type. Information on
NOTE: If you are composing Web pages in an HTML editor such as Dreamweaver or FrontPage, the
programs may generate the characters based on what is typed in (check the HTML to be sure). See the
Accents section for more information on typing or inputting accents through a keyboard on a PC or
Macintosh computer.
Contents
1. Using the Codes
2. Letters with Accents - (e.g. ó, ò, ñ)
3. Other Foreign Characters - (e.g. ç, ¿, ß)
4. Currency Symbols - (e.g. ¢, £, ¥)
5. Math Symbols - (e.g. ±, °, ÷)
6. Other Punctuation - (e.g. &, ©, §)
7. Links to Other References
Using the Codes
To input non-English into an Web page, HTML employs a series of entity codes enclosed
with an & on the left side and a ; (semi-colon) on the right.
HTML SPECIAL CHARACTER TEMPLATE &(code);
For example, the code for ç is "ccedil". To generate French ç in HTML, type the code
&ccedil; into your HTML document as in:
HTML - fran&cecedil;ais
Result - français
Here's another example using &cent; for ¢.
HTML - It cost 5&cent;.
Result - It cost 5¢.
Some characters like œ (#156) are known by a number, not an entity code. For these
http://cac.psu.edu/ets/presentations/international/web/codehtml.html (1 of 6) [3/5/2002 3:42:53 PM]
Alt Key Codes
characters the template is:
HTML CHARACTER NUMBER TEMPLATE
&#(number);
For example to input sœur, the French word for sister you use the following code:
HTML - s&#156;ur 'sister'
Result - sœur 'sister'
Top of Page
Letters with Accents
This list is organized by Accent type. To determine the appropriate numeric code, match the
accent with the vowel. The general template for each accent is in the left column in blue. For
instace &Vcirc; means that all the entity codes for vowels with circumflex accents contain
"circ" as part of the code.
Example 1: To input the circumflex â (&acirc;) in HTML, type in &acirc;.
Exampe 2: To input circumflex ô (&ocirc;) in HTML, type in &ocirc;.
Accent
a/A
e/E
i/I
o/O
u/U
á
é
í
ó
ú
Accute
&aacute;
&eaccute
&iaccute;
&oaccute; &uaccute;
&Vaccute;
Á
É
Í
Ó
Ú
&Aaccute;
&Eaccute;
&Iaccute;
&Oaccute;
&Uaccute;
â
ê
î
ô
û
Circumflex
&acirc;
&ecirc;
&icirc;
&ocirc;
&ucirc;
&Vcirc;
Â
Ê
Î
Ô
Û
&Acirc;
&Ecirc;
&Icirc;
&Ocirc;
&Ucirc;
à
è
ì
ò
ù
Grave
&agrave;
&egrave;
&igrave;
&ograve;
&ugrave;
&Vgrave;
À
È
Ì
Ò
Ù
&Agrave;
&Egrave;
&Igrave;
&Ograve;
&Ugrave;
ã
ñ
õ
Tilde
&atilde;
&ntilde;
&otilde;
&Vtilde;
Ã
Ñ
Õ
&Atilde;
&Ntilde;
&Otilde;
ä
ë
ï
ö
ü
Umlaut
&auml;
&euml;
&iuml;
&ouml;
&uuml;
&Vuml;
Ä
Ë
Ï
Ö
Ü
http://cac.psu.edu/ets/presentations/international/web/codehtml.html (2 of 6) [3/5/2002 3:42:53 PM]
Alt Key Codes
&Auml;
&Euml;
&Iuml;
&Ouml;
&Uuml;
If you are having problems inputting these codes, please review the instructions for using the
codes on top of this Web page.
Top of Page
Other Foreign Characters
Example 1: To generate the upside-down question mark ¿, type &iquest; into the HTML
code.
Example 2: To generate French oe ligature œ, type &#156; into the HTML code.
SYMBOL
¡
¿
ç,Ç
œ,Œ
ß
ø,Ø
å,Å
æ,Æ
,
,
«»
CODE
NOTES
&iexcl;
&iquest;
&ccedil;
&Ccedil;
&#156;
&#140;
&szlig;
&oslash;
&Oslash;
&aring;
&Aring;
&aelig;
&AElig;
&thorn;
&THORN;
&eth;
&ETH;
&laquo;
&raquo;
This is Spanish style quote mark.
If you are having problems inputting these codes, please review the instructions for using the
codes on top of this Web page.
http://cac.psu.edu/ets/presentations/international/web/codehtml.html (3 of 6) [3/5/2002 3:42:53 PM]
Alt Key Codes
Top of Page
Currency Symbols
Example: To generate the cent sign ¢, type &cent; into the HTML code.
SYMBOL CODE
¢
£
¥
NOTES
&cent;
&pound; British Pound
&yen;
Japanese Yen
&curren; Generic currency symbol
If you are having problems inputting these codes, please review the instructions for using the
codes on top of this Web page.
Top of Page
Math Symbols
Example: To generate the division sign ÷, type &divide; into the HTML code.
SYMBOL CODE
÷
NOTES
&divide;
°
¬
±
µ
&deg;
Degree symbol
&not;
Not symbol
&plusmn;
&micro;
Micro
If you are having problems inputting these codes, please review the instructions for using the
codes on top of this Web page.
Top of Page
http://cac.psu.edu/ets/presentations/international/web/codehtml.html (4 of 6) [3/5/2002 3:42:53 PM]
Alt Key Codes
Other Punctuation
Example 1: To generate the and symbol & (&amp;) type in &amp;.
Example 2: To generate the string &amp; in HTML, type &amp;amp;.
SYMBOL
(blank space)
>
<
&
"
©
®
™
¶
•
§
–
—
CODE
NOTES
&nbsp; Inserts a blank space
&lt;
&gt;
&amp;
&quot;
Regular quotes are fine, but avoid "Smart Quotes"
&copy;
&reg;
&#153; Trademark
&para;
Paragraph Symbol
&#149; List Dot
&sect;
Section Symbol
&#150; en-dash
&#151; em-dash
If you are having problems inputting these codes, please review the instructions for using the
codes on top of this Web page.
Top of Page
Links to External Reference Pages
Ian S. Graham (Wiley) www.wiley.com/legacy/compbooks/graham-quin/html4ed/appa/en_test.html
Webmonkey - hotwired.lycos.com/webmonkey/reference/special_characters/
http://cac.psu.edu/ets/presentations/international/web/codehtml.html (5 of 6) [3/5/2002 3:42:53 PM]
Alt Key Codes
Avoid the first set of entries ("left single quote" to "trademark sign") - these are
not widely supported across browsers
Top of Page
©Penn State University, 2001, 2002.
This Web page is maintained by Elizabeth J. Pyatt (ejp10@psu.edu) for the Center for Education Technology Services.
Last Modified: March 9, 2002.
This publication is available in alternate media upon request.
http://cac.psu.edu/ets/presentations/international/web/codehtml.html (6 of 6) [3/5/2002 3:42:53 PM]