Why are no letter symbols mentioned?

Operation of Windows 10



Character capture and output in Windows 10

1. Problem: character sets with diacritics 2. Latin writing system 3. Character set basics 4. Entering special letters
5. Save text with special letters 6. 8-bit character sets 7. Unicode and UTF 8. Appendix I: Hex Numbers 9. Appendix II: Links


This treatise aims to explore the environment of the Word processing in the pan-European writing space Examine and put together tips regarding the acquisition and storage of texts. This mainly concerns those that cannot be accessed directly via the keyboard Special letters with diacritics; here is a selection:

à á â ä å ã ă æ ą ć č ç ď đ è é ê ë ĕ ę ğ ì í î ï ı ľ ł ń ň ñ ö ò ó ô õ ő œ ø ŕ ř ś ş ș ť ț ù ú û ü ű ů ý я ź ż .

If you're just looking for a quick way to enter characters like this and do without the background information, consider the additional screen keyboard activate the in 4.6 Virtual PC keyboards is described.

If you're on the keyboard Type in a character string ("Text"), then the keyboard sends a corresponding Scancode chain to the Keyboard driver in the PC and the passes the associated characters on to the application. It depends on which one Code page is active in Windows, because it says which letters or which special characters are meant. And it depends Pleading on, because it says like that Glyphs of the individual characters should be shaped.

This character string / text is then usually mirrored on the screen and / or sent to the printer; special transmission codes are again involved. But especially when the text saved a special one is coming Memory code in the game; the text file consists, among other things, of its code numbers / code points. These Code points are in Code pages listed.

In Windows, by default threeCode pages / character sets available, everyone with one other Character set! The most important is the 8-bitANSI character set CP 1252 (also: Windows Westlich / Windows 1252). It can be exchanged for other country groups if necessary (CP 1250 - CP 1258). Most Windows character input also uses the old 8-bit character set CP 850/858 (MS-DOS-Latin) and supposedly even the very old one comes along CP 437 (IBM-PC) still in the DOS window cmd.exe for use. But internally, Windows always works with the 16-bit character set CP 65001 (Unicode).

Many characters are encoded differently on each code page. So when does which one take effect? And in Windows there are over 20 fonts by default /Fonts/fonts Installed. To almost everyone Codepoint in a Code page there is a matching font in every font Glyph/ Letter formation, but not for everyone.


 
   1. Problem: character sets with diacritics

In view of the "Euro Payments Area", an 8-bit character set is actually no longer sufficient for official documents even in (Western) Europe, especially since there are not only some non-printable control characters but also many non-letters such as punctuation marks, currency symbols, etc. So offers the ANSI character set despite its 256 characters "only" approx. 120 letters.

But of them are on one German QWERTY keyboard only 29+ 29 letters directly accessible via buttons and via Dead buttons a few more with Diacritics (see below). But already the French one ç and the Spanish ñ are not available in this way. And e.g. Łódź, Tomá & # x0161, Nový pičák or Kři ťanov are also not correctly comprehensible / representable.

in the english alphabet are only the 26th Basic letters (more precisely 26 uppercase + 26 lowercase letters) available. There are therefore no special letters whatsoever, so it corresponds to the Latin alphabet of the Renaissance. This forms the basis of the Latin writing system (see below).

Other alphabets of the Latin writing system know besides the Basic letters such with diacritical marks, furthermore Ligatures, Digraphs and real Special forms. So there are those in German Umlauts Ää Öö Üü and that Eszett ẞß, as well as in proper names / foreign words the ë (= e with Trema as a mark of a Diaeresis); see e.g. Pch (Proper name) with Pieity (Foreign word), respectively i-e spoke with piesag, i-i spoken. Thus there are 26 + 4 capital letters in German (ë not counted); E.g. in French it is even 26 + 16.

In order to do justice to the diversity, the German civil status law the special character set since 2012 string.latin for use. The currently comprises over 400 glyphs (letters); it should even be expanded to 600 glyphs.

Using string.latin, the German registration authorities can actually write European names correctly on ID cards and passports. But there are two limitations to that Length limitation on the small card and the Border control needs in international exchange.

in the Identity card the name and first name are written in the "clear text field" on the front as it can be seen from the (national) birth certificate, but only in one line for the surname and one for the first names, each approx. 28 characters long. In the machine-readable zone on the back (e.g. for border traffic) there is a first and first name together even only space for 30 characters and the OCR B font is used there, which only includes the English alphabet. So if necessary, the identity card must two different names to be listed!

By the way: Due to the length limitation, the "Karl-Theodor Maria Nikolaus Johann Jacob Philipp Franz Joseph Sylvester Buhl-Freiherr von und zu Guttenberg" competent passport authority have ID problems; the field would have to be for him Order and artist names have to hold out with "Karl-Theodor von und zu Guttenberg".


 
   2. Latin writing system

One distinguishes phonographic, pictographic and logographic Fonts or Letter fonts, Syllabary fonts and Word scripts or ... (see https://de.wikipedia.org/wiki/Schrift).

The Latin writing system is the most widespread writing system in the world: alphabets from over 60 countries are derived from it with adaptations (also e.g. Vietnamese; see below) and are called latin alphabets or Latin scripts designated.

In addition to the 26 Latin Basic forms / basic letters includes the Latin writing system over 90 more Special letters, in each case with upper and lower case (so much more than listed in the opening credits!). The individual letters of this Letter system put the Phonemes of the respective language, whereby the same letter can also represent different phonemes in different languages ​​and a letter with different diacritics is also pronounced differently in one language. See also https://de.wikipedia.org/wiki/Lateinisches_Schriftsystem.

The Special letters are divided into Basic letters with diacritical marks (ä ç č é è ê ö ü ữ ...), Ligatures (æ œ ...), Digraphs (ij nj dz…) and Special forms (ð þ ß…), where the German ß is actually also a ligature, which is also the name Eszett reflects.

The diacritical marks are combining Signs that are the result of such combinations combined Character.

Nice examples of the use of diacritics are:
Cộng hòa Xã hội chủ nghĩa Việt Nam (Socialist Republic of Vietnam) with Đà Nẵng (city in Central Vietnam),
Boğazlıyan, İmamoğlu, Ilıca and Şarkikaraağaç (Turkish cities).
Vietnamese is certainly not part of the pan-European writing space, that Turkish but in the broader sense it is.

In 2006 Bernd Kappenberg wrote a complete compilation of the pan-European Latin Character set (without Greek and Cyrillic) delivered; For details see http://www.medienssprache.net/networx/networx-49.pdf. It has 361 letter glyphs plus the digits, punctuation marks and some math. Characters etc. See also https://de.wikipedia.org/wiki/Liste_lateinischer_Alphabete.

The letters are always in one sequence arranged (numbered) what is the rule for sort by is used. But unfortunately the special forms are classified differently in each alphabet. For example, that is Ö in Swedish always sorted as the last letter in the alphabet, but in German behind O (so behind Oz) or in encyclopedias equivalent to to O or in phone books equivalent to to OE.

In the professional Letterpress becomes every occurring letter, every punctuation mark and every space as an individual Letter "cut" or cast in lead type. There are aesthetic and technical reasons Ligatures, e.g. next to the letters f and l also the ligature than a letter. In German, however, fl is not a letter of the alphabet (so it has no place number of its own) and is therefore used as f l sorted, in contrast to e.g. œ in the French alphabet, that there a Letter with place number is.


 
   3. Character set basics

"The character (English character) is the abstract idea of ​​a Character, the Glyph is its concrete graphic representation. Electronic texts like this one are called abstract signs saved, and their appearance depends on the one chosen font from. In the simplest case there is a single glyph for each visible character [...] in a certain font (typeface and size). " Quoted from https://de.wikipedia.org/wiki/Glyphe.

"Under a Character set one understands a supply of elements, character called making up Strings put together. Such elements can include the letters of an alphabet, numbers, but also other symbols such as special characters, […] or control characters. A Character set is less than one Character code, who also defined a numbering must contain the characters of the character set. [...]. " Quoted from https://de.wikipedia.org/wiki/Zeichsatz.


basis for Character sets are logically Alphabetsthat are in code tables (Character codes) are arranged. E.g. at Unicode the tables consist of the Letters including an arrangement /numbering and a description of the Letter molding and -Rubrication. The example opposite is Taken from LATIN CHARACTERS IN UNICODE.

Such tables are always "numbered", due to the line structure and the respective "place" in the table. Since these tables are mostly excerpts from larger tables, the actual character number, the Codepoint, be named (usually hexadecimal).

Since the printed out in these tables character (the Glyph) but again one Character set required for this representation, the basic form must be described / circumscribed in a standardized way (column Surname). Typographically is required in addition to Code table still information about the font, how Arial (without serifs) or Times New Roman (with serifs) and the Font style, how upright / normal or italic or fat and the font size, like 12 pp or 15 pt.

In the table above you can see that the letters are arranged without gaps based on their code points, but there are many more letters derived from the Latin alphabet than the ones listed here. This is shown using the example of the 12 special letters in the table section that belong to the 5 basic letters a - e: à á â ä å ã ă æ ą ć č ç ď đ è é ê ë ĕ ę. There are already 20 special letters here; in total there are probably at least 36. As a result, a number of letters are widely scattered in the code table.

Therefore one is dependent on "auxiliary means" to enter special letters.


 
   4. Entering special letters

4.1 typewriter 4.2 Letterpress 4.3 Hardware PC keyboards 4.4 Character tables charmap.exe   4.5 Character table symbol
4.6 Virtual PC keyboards 4.7 Photo typesetting 4.8 Unicode and Diacritics 4.9 Conclusion


If you're just looking for a quick way to input and can skip the background information, consider the additional screen keyboard activate the in 4.6 Virtual PC keyboards is described.
In 4.9 Conclusion you will find a summary of the conclusions from Chapter 4.

 

   4.1 typewriter

If you are on a normal typewriter sequentially ´ `e has typed, then the car is at ´ and ` stopped and did not move until after e moves further, with what a ê was written. The accent keys are so-called Dead buttons and the ê is then a combined letter.

There was only one on old typewriters ´ and ` and thus indirectly too ^. But you could give each letter an accent and thus create nonsensical combined letters.

 

   4.2 Letterpress

in the Letterpress (With leadsentence) it can no combined letters give (that is, soche that are made up of basic letters and possibly several additional combining characters), but only "very many" complete ones letters, from Type cases can be picked out and combined into strings set Need to become.

 

   4.3 Hardware PC keyboards

QWERTY keyboards MS-Word: keyboard shortcuts MS-Word / WordPad / Notepad: three-digit alt numbers
MS-Word / WordPad / Notepad: four-digit alt numbers MS-Word: Unicode

Working with dead buttons was (partly) on Computer keyboards transferred, in some cases even expanded there. The range of use depends heavily on the respective Keyboard driver and even more so from the one used Writing software and the one used font from.

 

   QWERTY keyboards

On German QWERTY keyboards usually only work´ ` ^ as Dead buttons, so do not cause a character feed. They work with the vowels including y and thus deliverá à â é è ê í ì î ó ò ô ú ù û ý ; but e.g. in French à â æ ç è é ê ë î ï ô œ ù û ü ÿ needed.

Other signs that are actually suitable° ' " ~ , ; . : / - are ruled out because they have to generate a character feed for normal writing flow. Only specially programmed editors can use them and other characters, but then sometimes in an adventurous way. In Word, for example, "CTRL + ALT + SHIFT +?" the leading Spanish question mark¿ result, so there are four buttons at the same time to press (and it actually works if you divide your fingers "correctly"!).

The "ALT GR" key is particularly "funny", it delivers all Keyboards the third character assignments usually listed there on the key caps² ³ {[]} \ @ € ~ | µ . But in Word the other four characters are  ® � © �  generated and in practically all editors / writing programs special program functions are triggered by some keys!

See also
https://de.wikipedia.org/wiki/Tastaturbelichtung and
https://support.microsoft.com/de-de/help/17073/windows-using-keyboard.

On some foreign keyboards there is a special Combine key with which combined characters can be entered.

 

   MS-Word: keyboard shortcuts

Preliminary remark:
The key with the two key cap designations 6 & results in the symbol 6 and the key combination + 6 & gives the sign &, but the N key actually yields n and first + N actually gives N. This is so common, but not entirely logical.
Therefore, in the following paragraphs, n is used forn and N forN written.
Also describes + the simultaneous presses two / more keys and , the presses to be performed one after the other two / more keys, the letter shifts or old gr are not listed separately.
- So ctrl + & means that three Keys at the same time are to be pressed, namely ctrl + + 6 (on the Main keyboard).
- In addition, 1, 6 or less than 1, 6 means that the numeric keys of the Number padsbehind each other are to be pressed.
- For example, alt + 1, 6, 9 means that while holding down alt on the numeric keypad, the specified digits must be pressed one after the other
(Short form Old number 169).


In MS-Word there are nifty keyboard shortcuts for prolific writers Dead buttons; there e.g. ctrl + &, A returns the letter Æ or ctrl +,, C the letter Ç or ctrl +:, e the letter ë.
In that sense, the keyboard shortcut has alt + ctrl + +? no dead buttons because all involved buttons at the same time are to be pressed to ¿ to obtain.

The individual dead keys and key combinations are in https://support.office.com/de-de/article/tastenkombinationen-für-internationale-zeichen listed; but unfortunately they do not work on all PCs as MS describes it.

And unfortunately the Word keyboard shortcuts are partly illogical in comparison, probably because they have grown over time. For the buttons 2 3 7 8 9 0 ß q + < there are three assignments that are printed on the key caps: triggered by the 1st key or the 2nd key. + Key or 3rd alt gr + key, e.g.2 " ²       e E €       m M µ or+ * ~ .
So ctrl + &, A results in the letter Æ, as described above and results in ctrl + &, a æ. But then ctrl + ~, N would have to be the letter Ñ surrendered, but it doesn't!
While namely & and A. always as a key combination + Key are to be pressed, also in connection with ctrl, this does not apply to ~ ; here the key combination ctrl + alt + +, N is required to Ñ instead of ctrl + alt gr + +, N, which would be logical, and also instead of ctrl + ++, N as the MS claims.

So it's not worth it to memorize these keyboard shortcuts. There are simpler ways to get there, as the following sections will show.


In addition to these keyboard shortcut tricks for Word, there are for MS editors still three more methods the input of special characters. For historical reasons, each uses a different code page, with the result that the same code may produce three different results depending on the input method:

 

   MS-Word / WordPad / Notepad: three-digit Alt numbers

On Keyboards with number pad (i.e. not notebooks!) you can press the alt in the Number padA maximum of three-digit numbers without leading zeros type in. For example, alt + 1, 5, 6 creates the character£ and alt + 9, 9, 9 the signϧ (Word, Wordpad) or þ (Notepad).

The character £ represents the "pound sign" (Unicode 00A3), the character ϧ the "coptic small letter khei" (Unicode 03E7) and the character þ the "latin small letter thorn", Icelandic, (Unicode 00FE).

According to MS, the three-digit ALT numbers as ANSI codes interpreted, so the characters of the 8-bit ANSI code page produce. Then alt + 1, 6, 9 should result in the character ©, but that results in ® and that is the "old" code page MS-DOS Latin CP 850.

Therefore alt + 1 to alt + 3, 1 produce the characters ☺ ☻ ♥ ♦ ♣ ♠ • ◘ ○ ◙ ♂ ♀ ♪ ♫ ☼ ► ◄ ↕‼ ¶ § ▬ ↨ ↑ ↓ → ← ∟ ↔ ▲ ▼,
although there are only control characters in the ANSI code page. See also https://de.wikipedia.org/wiki/Codepage_850#Codepage_858

As is well known, the 8-bit code page 850 only includes the 256 code points 0 ... 255. If you enter a larger three-digit alt number, you get different results depending on the program:

So deliver in Notepad / Editor e.g. the alt numbers 100, 356, 612 and 868 produce the same result, namely d (Codepoint 100) because 100 + 256 = 356, 100 + 2 × 256 = 612 and 100 + 3 × 256 = 868; so it starts all over again after multiples of 256 (modulo 256); only the code page CP 850 used.

Different with Word and WordPad. Here both deliver: 100 = d,  356 = Ť,  612 = ɤ and 868 = ͤ , the latter two being the combined character in Word ɤͤ deliver, but not in WordPad, there the combination is only "half" successful: ɤ ͤ.

d is the "latin small letter D" (Unicode 0064, dec 100), Ť is the "latin capital letter T with Caron" (Unicode 0164, dec 356), ɤ is the "latin small letter rams horn" (Unicode 0264, dec 612) and ͤ is the "combining latin small letter E" (Unicode 0364, dec 868).

Both programs use "below" CP 850/858 and put "above", i.e. from Codepoint 256, "seamless" with Unicode away.

Now it becomes clear why the alt number 999 delivers the two results þ (Notepad) and ϧ (Word, WordPad).

 

   MS-Word / WordPad / Notepad: four-digit Alt numbers

On Keyboards with number pad (i.e. not notebooks!) you can press the alt in the Number padexactly four-digit numbers, possibly with leading zeros type in. For example, alt + 0, 1, 5, 6 creates the characterœ and alt + 0, 9, 9, 9 the signϧ (Word, WordPad) or ç (Notepad).

The character œ represents the "latin small ligature Oe" (Unicode 0153), the character ϧ the "coptic small letter khei" (Unicode 03E7) and the character ç the "latin small letter C with cedilla" (Unicode 00E7).

All three programs use the 8-bit code page here CP 1252, Word and WordPad add "above" Unicode, but Notepad works exclusively with again CP 1252, so "modulo 256".

alt + 0, 0, 0, 1 to alt + 0, 0, 3, 1 no longer generate printable characters, but control characters without glyph, such as backspace, line end, line feed, page feed, because the ANSI code page CP 1252 has code points 0 … 31 assigned the original meanings from ASCII times.

The largest character alt + 9, 9, 9, 9 that can be entered via the numeric keypad results in the symbol ✏ "pencil" (Unicode 270F, dec 9999) in Word and WordPad and the symbol ☼ in Notepad (Unicode 263C {= dec. 9788}, dec 15 {9999 = 39x256 +15}).

 

   MS-Word: Unicode

If you are in Word either via the main keyboard or the numeric keypad exactly four digits Hex- Numbers with leading zeros if necessary Type in and then type alt + c immediately afterwards to retrieve Unicode characters. For example, 0, 1, 5, 6, alt + c creates the characterŖ ("latin letter R with cedilla", Unicode 0156, dec. 342) and 0, 9, 9, 9, alt + c the character ("bengali letter nga", Unicode 0999, dec. 2457).

9, 9, 9, 9, alt + c naturally generates the symbol ✏ "pencil" (Unicode 270F, dec. 9999) just like the alt number 9999.
The largest four-digit Unicode number that can be entered in Word is F, F, F, F, alt + c, it returns 香 and represents "CJK Unified Ideograph-9999" (Unicode 9999, dec. 39321) in the font "Han Chinese ( Hani) ".

0, 0, 0, 1, alt + c to 0, 0, 1, 6, alt + c do not produce any printable characters.

The complete Unicode is in https://unicode-table.com/de/ available. There are the currently 917,999 defined code points from 0 000016 to E 01EF16 callable; the browser does not provide a valid glyph for every one.
See also Unicode and diacritics and Unicode and UTF.


For most of the special characters you have to know the respective code number; But to make matters worse, the same code number gives different results depending on the input program; for example, provides the code number 0156 depending on the input one of the characters£ orœ orŖ . The fact that this historically caused confusion does not upset anyone is probably due to the fact that nobody knows the many codes that may be required anyway and takes the solution described below.

One way out is in Windowsfor all editors the service Windows accessories> Character map (charmap.exe) and another one in MS Office the service ...> insert> symbol. These two services simulate the Letterpress: The letters / glyphs are each made from a capital Type case searched together and formed into strings set.

 

   4.4 Character tables charmap.exe in Windows

In charmap.exe (s.r.) first set the font one and then in the Advanced view the 8-bitCharacter set/ Code page (green marked). In addition to the character set Windows: west are still among others Windows: Baltic, Windows: Central Europe and Windows: Turkish to select. This (almost) covers the pan-European writing space.

If you click on a letter in the table, it will be displayed enlarged and below its Unicode code point in hexadecimal four-digit Unicode notation (16 bit), then in brackets its current Windows character set code point in normal hexadecimal two-digit representation (8 bit), then the letter description and, if necessary, the keyboard code valid for Word on the far right ("Alt number") (blue marked). Please refer https://de.wikipedia.org/wiki/Hexadezimalsystem.

You can of course also use Unicode then the table becomes confusing because, for example, around 3200 glyphs of the Arial font are displayed in the order specified by their code points.

Regardless of whether a character is already activated in the table, you can move the mouse over the table and receive the code information and the character description for the current character (red marked). The example clearly shows the difference between the 8-bit Windows code point and the 16-bit Unicode code point: Unicode code point U +0160 (= 352 dec.), Windows codepoint 0x8A (= 130 dec.).

in The Field Character selection (orange you can enter normal text via the keyboard or via Copy and paste enter. If you then position the cursor in the text and activate a character in the table with the mouse, it is inserted at the cursor position by selecting it. Once the text has been completed, it is copied into the Clipboard transfer.

The size of the Charmap window cannot be varied, it always shows 10 lines of 20 glyphs. If you examine the tables as a whole, you can see how, on the one hand, logical sequences were used in the 8-bit scheme and, on the other hand, pure gap filling was carried out. E.g. the red marked character Š, like its neighboring characters Œ and Ž, has no relation to the otherwise ordered alphabet.

 

   4.5 Character table symbol in Office

And if you work with Office but don't want to remember the dead buttons there, you have to Office character map symbol use. It's much more powerful than charmap.exe, only works with single characters and only in Office. If you need text in other applications, you must first complete it in Office and then via Copy and paste transferred to the target application.

The symbol-Window in Office is scalable. If you set the character table to 32 characters per line, the entries correspond best with the underlying hexadecimal order. This makes it easy to see that the table was compiled at different times. For reasons of compatibility, the beginning of the table is coded according to the 8-bit ANSI tables, which means that the upper / lower case letters are arranged with a 25Offset results (left violet marked). Characters recorded later are arranged in a direct sequence of upper and lower case letters (right violet marked).

At the top, put the desired one on the left font one and on the right the one you want Subset/ Extract from the font table (green marked).

Below is a history list of the most recently used characters. If you then activate a letter in the table or this list, you will see it at the bottom Unicode names/ Letter description including Character code (blue marked).

By means of autocorrection ... you can agree on a character combination in the autocorrection, which replaces it with the corresponding special letter every time it occurs. And with a key combination ... you can define a key combination that you can type in instead of the special letter and which is then replaced by this special letter.

Everyone Double click on a character in the table transfers the character to the history list and to the Target text. A double-click on a character in the history list also transfers the character to the target text. The same is done by clicking the Insert button at the bottom.


 

   4.6 Virtual PC keyboards

The (usually well hidden) On-screen keyboard offers a little-known but highly effective substitute for the inadequacies of the Hardware keyboard and the one just described Character tables. The on-screen keyboard runs parallel to the hardware keyboard without any problems. And there is even in multiple Executions; Activate one via Settings> Ease of Access> Keyboard | Use the on-screen keyboard, another via the Taskbar context menu: Show on-screen keyboard (button).

The on-screen keyboard in the menu Ease of use has been around at least since Windows Vista, the other about that Taskbar can only be activated since Windows 8 in connection with the Tablet mode.

If you activate both, both are also displayed in the taskbar (s.r.), but the right one is blocked. But that is the more important one! If necessary, only activate the right one, which can be activated via the taskbar; it can be faded in and out at any time and it is a real "egg-laying woolly milk pig" with different layouts (inc. freehand text input). You can safely remove the other from the taskbar.

 

1. Desktop keyboard

Preliminary remark:
If the desktop keyboard does not look like the one shown in the first picture on the bottom right after starting, you must first select the correct layout. Select the settings highlighted in blue (s.r.). This will only be postponed with the next Windows update (this is what happened when switching to W10 2004).


You simply open a document or a text entry field in any application, then activate the on-screen keyboard and move it around so that the text area is visible. And then just tap on itas the following pictures show.

In the first picture the keyboard is open for the first time (see right).
In the second picture, Fnkt is activated, so that the "key" Ω appears next to it instead of ☺ (see right). This is the icon known from Office for Insert symbol.
The third picture shows this insert mode (see right). The section highlighted in light blue Ç (combined letters) is selected. But this is one of seven areas and this one already contains 340 characters, as you can guess from the scroll bar.

After clicking on abc you switch back to normal input. There is no other method faster than entering mixed text with special letters.


2. Tablet keyboard

In the fourth picture you can see how the keyboard layout can be changed (see right); the combination highlighted in light blue is currently selected.
If you choose the layout framed in yellow, you get the fifth picture (s.r.). That shows On-screen tablet keyboard. You can also operate it with the mouse, as you can see in the picture with the mouse pointer. In the picture, o was pressed with the left mouse button until all options of this button available here are displayed.

To the right of the selected option ô, the two normal key assignments can be seen (9 and o).

This mode is faster than the one described in Figures 1 to 3. But fewer signs are offered here; in the order of the keys:
q 1 w 2 e 3 êéèë r 4 t 5 z 6 u 7 ûúùü i 8 îíìï o 9 ôóòõö p 0 ü
a ãâáàä s ߧ d f g h j k l ö ä
y ÿý x c © v b n ñ m µ,; . :?! -;, _ - " ―_ ~ ¬ ·?!) '_ (@ # / ¿

For example, are missing from a The letters å ă æ ą and at c all combined letters ć č ç and at s all combined letters ś ş ș .

 

   4.7 Photo typesetting

In the modern Photocomposition there are actually no combined letters, after all, only letter prints end up on the print medium. But since the possibly"Many" special letters have to be entered using normal keyboards, this is only possible rationally using multiple key assignments with dead keys and special software. In addition, in the end it does not matter whether it is "printed" on foil, paper or monitors; the software always puts the "letters" together (somehow).

 

   4.8 Unicode and Diacritics

Therefore, in the UnicodeDead buttons simulated; i.e., there are in the category Marking without extra widthcombining signs Are defined (Diacritics; please refer https://de.wikipedia.org/wiki/Unicodeblock_Kombinierende_diakritische_Zeichen).

A unicode-enabled program can then consist of several code points combined sign put together. In contrast to typewriter But the following applies: first the basic letter, then the Diacritics.

For example, from the three code points for a (0061) and ̆ (0306) and ̃(0303) the sign be put together. Even modern browsers can do that: (codepoints 0061 0306 0303). In HTML, a & # x0306; & # x0303; to be written.
Here it depends on the order: ã̆ (codepoints 0061 0303 0306) probably do not exist!

Here you can also combine completely nonsensical characters, like on a typewriter. E.g., the sign x̭̤̃̐ (Codepoints 0078 0303 0310 032D 0324; in HTML x & # x0303; & # x0310; & # x032D; & # x0324;) certainly doesn't exist. The description would be: "LATIN SMALL LETTER X, COMBINING TILDE, COMBINING CANDRABINDU, COMBINING CIRCUMFLEX ACCENT BELOW, COMBINING DIAERESIS BELOW"; is combined "from the inside out".

For reasons of compatibility, the "most common" combined characters are recorded as "precomposed characters", i.e. with individual code points, e.g. (codepoint 1EB5). Such characters can also represent "simpler" programs, provided they are Unicode characters call can.

If you want to experiment here, copy all of the "finished" characters shown above MS-Word. If you then put the cursor to the right of one of the characters and press alt + c, its Unicode appears instead of the character; Entering alt + c again changes back.

And if you do that too Notepad and WordPad try and play with different fonts in all three programs, you will quickly see that the Representation of combined characters is an (in) ability of the respective character set (fonts) in interaction with the respective editor.

 

   4.9 Conclusion

This chapter described many more or less cumbersome ways of creating correct documents based on the Latin alphabet using standard keyboards with only around 100 keys. Either you need a lot Keyboard shortcuts know or many Character codes know or in large external Tables search.


Professional prolific writers
come with the standard keyboard T1 ( https://de.wikipedia.org/wiki/Tastaturbeletzung#Tastaturbeletzung_T1) not far, should therefore special keyboards insert, such as the one with the keyboard assignment E1 (s.r.).

For details see https://de.wikipedia.org/wiki/E1_ (Keyboard layout)
and with Link to driver installation https://www.europatastatur.de/e1
and https://www.bitkom.org/Bitkom/Publikationen/Stellungnahme-zum-Norm-Entwurf
DIN-2137-12018-04.html.


Occasional writer
find in Character map in Windows and Table Symbol in MS Office is an alternative, they also provide the various character codes; are still unwieldy accessories.


Normal user
but it is best to use the Standard hardware keyboard and additionally the one that can be activated if required On-screen keyboard 1: desktop keyboard. (s.r.).



Regardless of how you enter text, it is essential to pay attention to the next chapter regarding transfer and storage.


 
   5. Save text with special letters

A 8-BIT application can only Store 8-bit characters, i.e. all characters of a single 8-bit code page. That is regardless of what is shown on the screen or directly is output on the printer! Success / failure becomes apparent at the latest after saving and then reloading the text in question.

In Windows, the editorNotepad and the Mini text processingWordPad available. MS Office / Word must be bought additionally. All three can in principle 16-bit characters to process. Using these three programs, the limits of special letter processing are discussed using a small sample text:

Adélaïde Françoise Dufrénoy was in Havlíčkův Brod and Łódź.


notepad.exe is via Windows accessories> editor started (s.r.). NotePad knows not only the line break no further text formatting and markups, so saves "bare" text (plain text). The Keyboard shortcuts from MS-Word Not recognized, but only the usual keyboard entries, possibly with the Dead buttons
´ `and ^.

But Notepad knows three-digit alt numbers and can 16-bit unicode characters e.g. record using the on-screen keyboard. And at to save can you do the Coding choose.

However, the selections are formulated very technically: either the 8-bitCharacter code table ANSI or one of the UnicodeIllustration formats UTF. And each file is saved as a txt file, i.e. with the file name extension .txt .

If you choose here by mistake ANSI, then the sentence becomes
"Adélaïde Françoise Dufrénoy was in Havlíčkův Brod and in Łódź."
the sentence
"Adélaïde Françoise Dufrénoy was in Havlíckuv Brod and in Lodz."

This means that the ANSI code (CP 1252 = Windows west) contains the French special letters, but not the Polish or Czech letters, among others.


wordpad.exe is via Windows accessories> WordPad started (s.r.). WordPad knows Text formatting and Text markups, but no keyboard shortcuts from MS-Word, but only the usual keyboard entries, if necessary with the dead keys ´ `and ^.

And WordPad can also record 16-bit Unicode characters, e.g. via the on-screen keyboard. And at to save you can also get the Coding choose.

But unlike NotePad it will be here File types selected, which can be identified by the respective file name extension.

In addition to the for Word processing programs designed file types .docx and .odt is above all that Rich text format.rtf Interesting. All three record text including formatting and markups. And all three are used to pass documents on for further processing in Windows systems.

Like the format .pdf (in MS Office) actually only for the cross-platform transfer of manufacture Serves documents, the format serves .rtf in addition to the cross-platform exchange more editing Documents.

"The Rich Text Format (RTF) is a proprietary file format for texts that was introduced by Microsoft in 1987. It can be used as an exchange format between word processing programs from different manufacturers on different operating systems. It is also used, for example, to display formatted text in database fields.
In contrast to plain text, which only transports the pure text characters, but no formatting such as font sizes, types or layouts, an RTF document also contains numerous text formatting features, including embedded graphics, without being tied to a specific software . Virtually all word processing systems can write and read RTF files. However, there is no guarantee that the layout will be true to the layout; for example, page breaks may change on the target system.
Technically speaking, RTF files are pure text files, but contain formatting instructions embedded within the actual text content. This procedure is called text markup, it works in a similar way to HTML and LaTeX. In addition, binary data, e.g. B. an image. "

Please refer https://de.wikipedia.org/wiki/Rich_Text_Format and http://formatting-and-more.de/2016/01/31/rtf-rich-text-format/ and http://www.aboutvb.de/bas/formate/pdf/rtf.pdf.

And then there are three more .txt-Formats for "naked" text: Text document (ANSI, CP 1252), MS-DOS format text document (CP 437) and Unicode text documentThe latter is presumably stored in UTF-8 encoding.


Word is the "kingsize" word processor from MS; here all characters and all formatting and all markups are processed and as .docx saved. Macros cannot be stored in this format, so it is safe. If macros have to be processed, the format .docm be used.

The format .docx is in reality a .xlmFormat that is compressed. If you docx by zip replace the document, you can unzip the document and study the structure, which consists of several nested folders (see left). The actual content is in the folder word in document.xml (s.r.).

Documents can also be saved in the old .doc-Save format that is in principle an uncompressed and encrypted .rtfFormat is. .doc files can contain macros and are therefore per se dangerous, so shouldn't be used anymore.

.doc files are typically over twice the size of their .docx counterparts.

In addition to the many other formats belonging to Office, there is of course also .rtf possible, but since Word knows many more formatting options, its .rtf files are in principle much larger than their WordPad counterparts.

And there is also one that is initially unspecified .txt-Format. If you want to save the sample text in this format, the following warning appears (s.r.), which shows the many memory code options.


 
   6. 8-bit character sets

From 1963 there was the 7-bit ASCII character set with 128 Character positions / code points. In its standardized form in 1968, it only contained the English capital letters, but was soon supplemented by the English lowercase letters. Since a byte consists of 8 bits, several were later 8-bit character sets which were ASCII-compatible in the lower 128 bytes and which also contained special letters and symbols in the upper 128 bytes, but were unfortunately incompatible with each other. Well-known examples from DOS times were Code page 437 (IBM PC) and Code page 850. The CP 437 is apparently still used in Windows today in the DOS box.

Ultimately, all 8-bit character sets must be incompatible with one another, because they can only have 256 code points, but the European languages ​​require 512 code points, which means that they can only be accommodated in 9 bits. Typical current character sets are the ANSI code page (more precisely: Windows 1252 / CP 1252 "Western European") and the code pages 1250 "Central European", 1254 "Turkish" and 1257 "Baltic". Please refer https://de.wikipedia.org/wiki/Kategorie: Windows Codepage

The code page 1252 is shown in the Windows character table charmap.exe (see right). Since charmap.exe always represents 20 × 10 characters, the complete code page is never visible.

Only the code points from 33 (0x21) for which there are glyphs are listed in the tables. Therefore, every table has different gaps and a different length, as can be seen at the end of the table.

All the code pages just mentioned are shown below, each with the lower part showing the differences between the individual code pages. All letters are highlighted in yellow.

 

 

With regard to name management in master data programs, there is therefore the 80% solution using the always preset Code page 1252 and the 100% solution as a way out Unicodewhich, however, does not work in pure 8-bit programs. MS Office can use Unicodes, but not all glyphs in all selectable fonts are available.

String.Latin of the German registration authorities is a small subset of Unicode, which is supposed to cover the pan-European writing area. Therefore, everyone must Authoritieswho keep personal data, their Databases on Unicode rearrange.


 
   7. Unicode and UTF

Unicode is synonymous with Universal Coded Character Set (UCS),
UTF means Unicode Transformation Format (= Image format in a text file).

Unicode is divided into 17 code plans of 65,536 codepoints each, i.e. a total of 1,114,112 codepoints and can thus contain practically all current and past alphabets / languages ​​in the world, more precisely, encode all known writing systems and characters. The code points are usually numbered in hexadecimal.

The first Unicode code plan is called Basic multilingual plane, these BMP includes 216 = 65,536 code points, which can be addressed as double bytes in hexadecimal, i.e. from 00 00 to FF FF.

For comparison: Joseph D. Becker, the creator of the first Unicode draft, assumed in 1988 that the character set required in all newspapers and magazines in the world was available at that time "undoubtedly far below 214 = 16,384 lies "; please refer https://de.wikipedia.org/wiki/Unicode and https://de.wikipedia.org/wiki/Hexadezimalsystem.

"Unicode is just an abstract standard that gives each character a number (den codepoint) assigns. These code points are specified in hexadecimal (U + 1F46 etc.). The Coding (encoding) defines in which form the codepoints are saved in a file. UTF-8, UTF-16, UTF-32 are not synonymous with Unicode, but standards of how Unicode characters are stored.
UTF-8:
Frequent characters (Latin alphabet) are stored in 1 byte, less common characters in 2 or 3 bytes. This means that for a text that only consists of Latin letters without umlauts, a UTF-8-coded file is only half the size of a UTF-16 file. The number of bytes that belong to a character is coded in the bit sequence.
UTF-16:
Each character of the BMP is saved with 2 bytes, all others with 4 bytes. But two representations are possible: Big-Endian (higher-order byte first), Little-Endian (lower-order byte first), as is usual in computer science.
UTF-32:
Always requires four bytes and could directly encode over 4 billion characters, but is not used for storing and translating text.
BOM:
= Byte Order Mark, this is the marking (2-4 bytes) at the beginning of a file that specifies the encoding, e.g. for UTF-8: EF BB BF UTF-16 (BE): FE FF UTF-16 (LE): FF FE "

Correspondingly taken from Special characters, TEI and Unicode.
See also https://de.wikipedia.org/wiki/Unicode_Transformation_Format
and https://wiki.selfhtml.org/wiki/Zeichencodierung#UTF-8:_Die_Codierungsform_der_Wahl.

The complete Unicode is in https://unicode-table.com/de/ available. There are the currently 917,999 defined code points from 0 000016 to E 01EF16 callable; the table is apparently incomplete because it only contains about 120,000 actually "filled" code points; the browser does not provide a valid glyph for every one. It makes sense that operating instructions for the table and explanations are only given at the end of the meter-long table:

Unicode
Unicode is a character encoding standard. Simply put, this is a table of the correspondence of text characters (numbers, letters, punctuation marks) to binary codes. The computer only understands the sequence of zeros and ones. In order to know exactly what to show on the screen, you need to assign a unique number to each symbol. In the 1980s, characters were encoded in one byte, that is, eight bits (each bit is 0 or 1). So it turned out that a table (the same encoding or set) can only contain 256 characters. This may not even be enough for one language. As a result, many different encodings appeared, the confusion of which often led to strange characters appearing on the screen in place of the text being read. A single standard was needed. The most frequently used coding UTF-8 for the symbol image uses 1 to 4 bytes.
character
The characters in the Unicode tables are numbered with hexadecimal numbers. For example, the Cyrillic capital letter M is denoted by U + 041C. This means that it is at the intersection of row 041 and column C. It can just be copied and then pasted anywhere. To avoid rummaging through the multi-kilometer list, you should use the search function. If you go to the symbol's page, you can see its number in and the type of drawing in different fonts.In the search line you can also go to the character itself, even if a square is drawn instead to find out what it was. Also on this page there are special (rather than special - random) sets of the same type of symbols, collected from different sections for convenience of use.
The Unicode standard is international. It contains the characters of almost every written language in the world. Including those that no longer apply. Egyptian hieroglyphs, Germanic runes, Mayan script, cuneiform script and alphabets of ancient states. The notation of weights and measures, musical notation, mathematical concepts is presented.
The Unicode consortium itself is not inventing any new symbols. Symbols that are used in society have been added to the table. For example, the ruble symbol was actively used for six years before it was added to Unicode. Emoji symbols (emoticons) were also widely used first in Japan and before they were incorporated into coding. However, company brands and logos are generally not added. Even as common as Apple Apple or the Windows flag. To date in version 8.0 encodes around 120,000 characters.


 
   8. Appendix: Hexcode

8.1 Haptic 8.2 Historical 8.3 Mathematical


Much of the above is in the present text 8 bit and 16 bit the speech of Bytes and Hex code etc. Actually it is "only" about counting, that is, about Number systems and specially around Place value systems and then again to the dual system, decimal system and hexadecimal system.

There is a lot of information about it on the net. A small selection of links is listed at the end of the section. If these web pages are too much for you, the following lines offer you a short trip through the profession; Teacher stays (ex) teacher:

 

   8.1 Haptic

We have on two hands ten fingers together, so we calculate in Decimal system with the ten digits 0 1 2 3 4 5 6 7 8 9.

If only we had three fingers on either hand, we would probably im System of six count with the six digits 0 1 2 3 4 5. If only all people would have an arm with five fingers, then just im Five-point system with the five digits 0 1 2 3 4.

And we only have it on each hand a thumb, then we would in the two-party system /Dual system with the digits 0 1 calculate.

We would with hands and feet calculate, like the Mayans, if we had a system of twenty /Vigesimal system with the twenty "digits" 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19. To avoid confusion, 10 to 19 should be used as Digit sign exist, because 10 to 19 are not digits, but numbers composed of digits. So we would have to invent replacement characters or use letters: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I K, i.e. with the "digits" "A" (= 10) to "K" (= 19).

Fortunately, the twenties system didn't catch on. But the sixteen system /Hexadecimal system with the sixteen digits 0 1 2 3 4 5 6 7 8 9 A B C D E F has proven to be very practical in IT. It would match beings with that four limbs with four fingers / toes each would expect.

 

   8.2 Historical

The above mentioned Twenty system was common among the Mayas; and they actually had digits for 10 through 19 and they already had zero! In French, 80 = quattre vingt = 4 × 20 is reminiscent of the twenty system.

Of the above Sixteen system remember the French numerals 1 2 3 … 15 16, un deux trois… quinze seize (without zero).

There was also that in historical times Twelve system /Duodecimal system, in Germany with the dozen (= 12), shock (= 5 dozen = 60), Gros (= 12 dozen = 122 = 144) and the measure (= 12 Gros = 123 = 1728).

On Clock faces has been derived from the Roman to this day Figuring Ⅰ Ⅱ Ⅲ Ⅳ Ⅴ Ⅵ Ⅶ Ⅷ Ⅸ Ⅹ Ⅺ Ⅻ received. And especially ours Numerals reflect that Twelve system again in the form of the number digits one two three four five six seven eight nine ten eleven twelve; from thirteen these digits are repeated in compound numerals. Because the zero was not known at the time, the numerals go from one to twelve and not from zero to eleven.

And we also inherited remnants of the Sixties system /Sexagesimal systems in the form of time measurement 1 hour = 60 minutes = 602 Seconds and in the form of angle measurement full angle 360 ​​° = 6 × 60 degrees of arc = 6 × 602 Minutes of arc = 6 × 603 Arc seconds, where one arc minute at the equator corresponds to the length of one nautical mile.

The ancient romans knew the nine digits I V X L C M D ↁ ↂ but that is no nines system, because these digits have the fixed values ​​1 5 10 50 100 500 1000 5000 10000 and comes with it no zero in front! Presumably, because it was very difficult to calculate, there were no major mathematical and scientific achievements in the great Roman culture.

The digit zero but is an extremely important number for commercial arithmetic and for value systems.

A treasure trove for bonds to "non-metric" number systems offers / offered Great Britain with its monetary system and its system of measurement.
Until 1971: 1 Sovereign = 4 Crown = 8 Half Crown = 10 Florin = 20 shillings = 60 groats = 240 pennies = 20 x 12 pence = 960 farthing = 1 pound sterling.
And to date: 1 foot = 1 foot = 12 inches = 12 "= 12 inch = 12 × 25.4 mm (with us e.g. 55"-TV and ¾"-Thread).
And in typography: 1 inch = 6 pica = 12 line = 72 point = 25.4 mm = 6 × 4.23 mm = 12 × 2.12 mm = 72 × 0.353 mm.

The Concorde was a joint project between France and Great Britain. In view of the historical mania for decimalization of the French, perhaps not the creation of the only civilian supersonic aircraft was the achievement, but that the planning had worked despite the different measurement systems.

 

   8.3 Mathematical

Place value system I: decimal system Position system II: dual system Place value system III: Notation in place value systems
Significance system IV: From dual to byte to hexadecimal system

 

   Place value system I: decimal system

Power notation for Powers of ten = Basis of Decimal system with digits 0 1 2 3 4 5 6 7 8 9 and Base 10:

Ten thousand =10.000 = 10×10×10×10 = 104 
Thousand =1.000 = 10×10×10 = 103 
Hundred =100 = 10×10 = 102  "10 to the power of the number of zeros"
Ten =10 = 10 = 101 
One =1 = 1 = 100 

The decimal system is therefore a place value system with Base 10: the value (factor) of each digit is a Power of ten and results from the position where the number is:

Job    � 
factor�  Ten thousand thousand Hundred ten one
Powers of ten� 104 103 102 101 100 

If applies to the power of ten notation; "10 to the power of the number of zeros", then it is also logical that: 100 = 1, because 1 has 0 zeros ;-).


example 1

12.034  =  1×104 + 2×103 + 0×102 + 3×101 + 4×100

Zero is the sign of a power-of-ten deficit; in the example the hundreds are missing.

The zero was "invented" much later than the other digits, because it does "nothing and a lot": when adding zero, the value does not change; when multiplying by zero, the value itself is zero. That is zero neutral Element of addition and the absorbent (omnipotent) the multiplication.
Becomes a zero to the right of a number appended, their value is increased tenfold! Turns a zero into a number inserted, the value increases tenfold to the left of the insertion point:

120.340 = 12.034 × 10                 12.034 = 12 × 103 + 034         120.034 = 12 × 103× 10 + 034 = 12 × 104 + 34


Example 2

47.110.815  =  4×107 + 7×106    + 1×105 + 1×104 + 0×103    + 8×102 + 1×101 + 5×100  
=  (4×101 + 7)×106    + (1×102 + 1×101 + 0)×103    + (8×102 + 1×101 + 5)×100  
=  47 × 106  + 110 × 103  + 815 × 100  
=    forty-seven million  one hundred and ten thousand  and eight hundred fifteen  

One can obviously summarize / group potencies.
The sample number has 8 Decimal places. Greatest 8-digit Number = 99,999,999 = 100,000,000 - 1 = 108 -1

 

   Position system II: dual system

Power notation for Powers of two = Basis of Dual system with digits 0 1 and Base 2:
28 = 2×2×2×2×2×2×2×2 = 256     27 = 128     26 = 64     25 = 32     24 = 16     23 = 8     22 = 4     21 = 2     20 = 1


Example 3

Put 
Powers of two27 26 25