Pages

Saturday 19 April 2014

# Specifying the document's character encoding

There are several ways to specify which character encoding is used in the document. First, the web server can include the character encoding or "charset" in the Hypertext Transfer Protocol (HTTP) Content-Type header, which would typically look like this:
Content-Type: text/html; charset=ISO-8859-1
This method gives the HTTP server a convenient way to alter document's encoding according to content negotiation; certain HTTP server software can do it, for example Apache with the module mod_charset_lite.
For HTML it is possible to include this information inside the head element near the top of the document:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
HTML5 also allows the following syntax to mean exactly the same:
<meta charset="utf-8">
XHTML documents have a third option: to express the character encoding via XML declaration, as follows:
<?xml version="1.0" encoding="ISO-8859-1"?>
<meta http-equiv="Content-Type"> may be interpreted directly by a browser, like an ordinary HTML tag,[or it may be used by the HTTP server to generate corresponding headers when it serves the document. The HTTP/1.1 header specification for a HTML document must label an appropriate encoding in the Content-Type header, missing charset= parameter results in acceptance of ISO-8859-1 (so HTTP/1.1 formally does not offer such option as an unspecified character encoding), and this specification supersedes all HTML (or XHTML) meta element ones. This can pose a problem if the server generates an incorrect header and one does not have the access or the knowledge to change them.
As each of these methods explain to the receiver how the file being sent should be interpreted, it would be inappropriate for these declarations not to match the actual character encoding used. Because a server usuallydoes not know how a document is encoded—especially if documents are created on different platforms or in different regions—many servers simply do not include a reference to the "charset" in the Content-Type header, thus avoiding making false promises. However, if the document does not specify the encoding either, this may result in the equally bad situation where the user agent displays mojibake because it cannot find out which character encoding was used. Due to widespread and persistent ignorance of HTTP charset= over the Internet (at its server side), WWW Consortium disappointed in HTTP/1.1’s strict approach and encourages browser developers to use some fixes in violation of RFC 2616
If a user agent reads a document with no character encoding information, it can fall back to using some other information. For example, it can rely on the user's settings, either browser-wide or specific for a given document, or it can pick a default encoding based on the user's language. For Western European languages, it is typical and fairly safe to assume Windows-1252, which is similar to ISO-8859-1 but has printable characters in place of some control codes. The consequence of choosing incorrectly is that characters outside the printable ASCII range (32 to 126) usually appear incorrectly. This presents few problems for English-speaking users, but other languages regularly—in some cases, always—require characters outside that range. In CJK environments where there are several different multi-byte encodings in use, auto-detection is also often employed. Finally, browsers usually permit the user to override incorrect charset label manually as well.
It is increasingly common for multilingual websites and websites in non-Western languages to use UTF-8, which allows use of the same encoding for all languages. UTF-16 or UTF-32, which can be used for all languages as well, are less widely used because they can be harder to handle in programming languages that assume a byte-oriented ASCII superset encoding, and they are less efficient for text with a high frequency of ASCII characters, which is usually the case for HTML documents.
Successful viewing of a page is not necessarily an indication that its encoding is specified correctly. If the page's creator and reader are both assuming some platform-specific character encoding, and the server does not send any identifying information, then the reader will nonetheless see the page as the creator intended, but other readers on different platforms or with different native languages will not see the page as intended.


No comments:

Post a Comment

 

Animation news

Pixion is a state of the art post production facility that offers 360 degree creative and technological services with the cutting edge expertise to cinematic masterpieces. Pixion has been always enthusiastic about new projects with creative ideas and Bhoothnath Returns is one amongst their masterpieces.
Pixion, who is renowned for their work in movies like Grand Masti, Taalash, Son of Sardar, Barfi!, Chasme Badoor,Don 2 and ABCD recently delivered VFX for Bhoothnath Returns. It has delivered stunning VFX shots for this film.

Testimonials

“We were looking for a logo with a touch of modernism. Mitesh sata our needs and produced a stunning design. When feedback was needed, new versions of the logo or any modifications were made very quickly. We really appreciated Mitesh flexibility and efficiency. It’s great to work with someone so open-minded and responsive. Thank you!”


About

Mr Mitesh sata(CEO)
WEB : www.pixelcreation.in
Email : pixellcreation@gmail.com