Character code – ソフトウェアエンジニアの技術ブログ：Software engineer tech blog

[font-family]OSのシステムフォントとfont-familyの設定

-Windows 8.1以降
Yu Gothic
Yu Gothic UI
Segoe UI

-Mac OS EI caption以降/ iOS
Hiragino Sans
Sanfrancisco

-Android
Noto Sans CJK JP
Roboto

<style>
    	body {
            font-family: "Arial", "メイリオ";
        }
    </style>

<style>
    	body {
            font-family: sans-serif, "Helvetica Neue";
        }
    </style>

Helvetica Neue
-> mac のみ

ゴシック系

font-family: "Hiragino Sans W3", "Hiragino Kaku Gothic ProN", "ヒラギノ角ゴ ProN W3", "メイリオ", Meiryo, "ＭＳ Ｐゴシック", "MS PGothic", sans-serif;

なんで、font-familyで書体を複数設定しているか全然わからなかったけど、それぞれのOSで書式がないときに厳密に指定するためなのね。

何年Webやってんだ、って感じだが、やっと理解できた。

.editorconfigとは？

EditorConfigとは？
-> インデントや文字コード、改行コードなどコーディングスタイルに関するエディタの設定を異なるエディタ間で共有する規格

laravel5.8のdefault EditorConfig

root = true

[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true
indent_style = space
indent_size = 4
trim_trailing_whitespace = true

[*.md]
trim_trailing_whitespace = false

[*.yml]
indent_size = 2

indentは4になってますね。

sublimeのeditorConfig
https://github.com/sindresorhus/editorconfig-sublime#readme
-for support
root
indent_style
indent_size
end_of_line
charset
trim_trailing_whitespace
insert_final_newline

[shift]+[command]+[p]でinstall packageを選択

EditorConfigインストール

ドキュメントルートに.editorconfigのファイルを作成し、コーディングするとタグ等が自動整形される。
すげーーーーーーーー、採用!

UTF8 BOM(byte order mark)

There are two types of character code, Unicode, which has no BOM and with a BOM. BOM stands for byte order mark, which is a few bytes of data attached to the beginning of Unicode encoded text.

When the program reads text data, it determines from the first few bytes that it is Unicode data and which kind of encoding format is adopted. In the case of UTF-8 with BOM, the first 3 bytes are BOM, and the data is <0xEF 0xBB 0xBF>

Depending on the application such as Microsoft Excel, it may not be possible to determine whether the encoding method is UTF-8, UTF-16, UTF-32, or another different character code unless BOM is added. On the other hand, for HTML files used as web pages, it is better to save / overwrite without BOM. This is because some programs such as PHP that process web pages dynamically can not process text files with BOMs correctly. It can not generally be said that which way is better depending on the situation like this.

The difference between a file with BOM and a file without BOM is the presence or absence of the first 3 bytes, and the contents of other files and the character code (encoding system) are exactly the same. This means that if the encoding method is not correct, it will be displayed normally as a simple text file, so it can be said that it is difficult to notice that the BOM is the cause if the file malfunctions in any program.

ref:

DINを見てみよう

DINはuniquloで使われているフォントだそうです。
https://www.uniqlo.com/jp/stylingbook/pc/user_styling

縦に細長いですね。Dinにもいろいろある様です。
「Din 1451 Alt」

更に細長い？
「OSP DIN」

「Bebas Neue」

“DIN” is a german font.
Originally, the font was made for industrial use, and it was said that the purpose was to unify the font such as the model number description in the industry to this “DIN”. So, “DIN” is an abbreviation of “Deutches Institut fur Normung” and Deutsches Institut fur Normung is the officcial name of DIN.

これは目的によるな～

ではユニクロのfontを見てみましょう。
https://www.uniqlo.com/jp/css/default.css
[ccs]
body{
font:13px ‘ヒラギノ角ゴPro W3′,’Hiragino Kaku Gothic Pro’,’游ゴシック’,’Yu Gothic’,’游ゴシック体’,’YuGothic’,メイリオ,Meiryo,sans-serif;
color: #000000;
background: #e6e6e6;
}
[/ccs]

なるほど、ヒラギノ角ゴPro W3か。iPhoneと一緒ですね。
ところで、webデザインの大手ってどこでしょう？　IMK？
imjのサイトを見てみましょう。あ、どーでもいいですが、知り合いimjで働いてますね。

で、imsのfont-familyはというと
https://www.imjp.co.jp/resources/css/common.css

body, input, textarea, select, option, button {
  background: #fff;
  font-family: "Gotham SSm A", "Gotham SSm B", "メイリオ", Meiryo, "ヒラギノ角ゴ Pro W3", "Hiragino Kaku Gothic Pro", Osaka, "ＭＳ Ｐゴシック", "MS PGothic", sans-serif;
  font-style: normal;
  line-height: 1.7;
  color: #000;
  font-size: 14px;
}

Gotham SSm A???????????????
なにいいいいいいいいいいいいいいいいいいいいいいいいいいいいい

San-serif Futura

Futura is a Latin-language sans-serif typeface announced by Paul Renner, who worked as a part-time lecturer at the Bauhaus in Germany in 1923. Futura is a Latin word meaning “future”.

Adobe Fonts: Futura
https://fonts.adobe.com/fonts/futura-pt

Futura is one of the most popular fonts in the world, and is used in many corporate logs.
ほう、会社のロゴはfuturaにしようかな。
A, M, V, Wが非対称で、〇が円になっているとのこと。

CCS3でfuturaは使えるようです。
CCS3で使える欧文フォント
– Futura
– Neue Helvetica
– Avenir
– Sackers Gothic
– DIN Next
– Garamond #3
– Trade Gothic
– Neue Frutiger
– VAG Rounded
– Univers

何？DINもCCS3で使えるんかい！！！

UTF8 and Shift_JIS

Shift_JIS
Merit
-The number of bytes consumed is relatively small.
-Code that can be read on domestic mobile phones.

Demerit
It is garbled by how to use it.
Since encoded data often contains control characters, they may malfunction or be garbled in environments where they are not assumed.
The character type is about 9000.

UTF-8
The character range is wide and the characters of any contry can not be garbled.
It can be read by default in almost any PC environment.

禁則文字

When it comes to the beginning or end of a line, it is a character that is inappropriate in appearance or hard to read. Typical examples include punctuation marks and parentheses. There are two types of punctuation characters: line beginning and end line punctuation characters. The former refers to characters that are inconvenient to come to the beginning of the line, and the latter refers to characters that do not allow to come to the end of line.

終わり
括弧類：」』）｝】＞≫］など
拗促音：ぁぃぅぇぉっゃゅょァィゥェォッャュョなど
中点や音引：・（中点／中黒）ー（音引き）―（ダーシ）-（ハイフン）など
句読点：、。,.など
その他約物：ゝ々！？：；／など

始め
括弧類：「『（｛【＜≪［など

なるほどねー

What is an escape sequence (escape character)?

An escape sequence is a special character string that controls character output, such as character color change, cursor movement, and character deletion, when outputting characters on the screen, rather than outputting the characters themselves.

In addition to the information that people can recognize as “character”, information other than “characters” such as control characters and character editing information on the OS is also included a s character information used in the OS. These characters are sometimes called “special characters”. An escape sequence exists for this special character to let people enter characters and tell it to the OS.

Typical escape sequences include the following characters.

\a : bell character(alert)
\b : return one character
\f : page feed(clear)
\n : line feed, return
\r : back to the beginning of the same line
\t : horizontal tab
\v : vertical tab
\\ : show \
\' : single quotation mark
\" : double quotation mark
\0 : null

What is ISO-2022-JP?

ISO-2022-jP is a character encoding system for Japanese characters used on the internet(especially e-mail). It is characterized by being a 7-bit code that switches the character set using the ISO / IEC 2022 escape sequence (the escape sequence of the announcement function is omitted). There are also sometimes called “JIS code”.

SCII letters, kanji for JIS x 2008 etc, can be used, but half size kana characters etc can not be used.

what is “sjis-win”??

sjis-win
When “sjis-win” is specified as character code in PHP, Windows-31J which is Shift_JIS extended by Microsoft.

Windows-31J is a character code based on “JIS X 0208-1990” by microsoft, incorporating some of NEC and IBM’s own extended letters, generally called SJIS “model dependent characters”.

What is difference between SJIS and SJIS-WIN?
sjis-win has more characters.
In frequently used places, the following characters are in sjis-win, but not in sjis.

-circle number（①②③…⑳）
-roman numerals（ⅠⅡⅢ…Ⅹ、ⅰⅱⅲ…ⅹ）
-stock with parentheses（㈱）
-High ladders（髙）

Characters that are in SJIS-WiN but not in SJIS
The following three types
– NEC special character
– NEC selection IBM extended character
– IBM extended letters