UTF8 BOM(byte order mark) – ソフトウェアエンジニアの技術ブログ：Software engineer tech blog

There are two types of character code, Unicode, which has no BOM and with a BOM. BOM stands for byte order mark, which is a few bytes of data attached to the beginning of Unicode encoded text.

When the program reads text data, it determines from the first few bytes that it is Unicode data and which kind of encoding format is adopted. In the case of UTF-8 with BOM, the first 3 bytes are BOM, and the data is <0xEF 0xBB 0xBF>

Depending on the application such as Microsoft Excel, it may not be possible to determine whether the encoding method is UTF-8, UTF-16, UTF-32, or another different character code unless BOM is added. On the other hand, for HTML files used as web pages, it is better to save / overwrite without BOM. This is because some programs such as PHP that process web pages dynamically can not process text files with BOMs correctly. It can not generally be said that which way is better depending on the situation like this.

The difference between a file with BOM and a file without BOM is the presence or absence of the first 3 bytes, and the contents of other files and the character code (encoding system) are exactly the same. This means that if the encoding method is not correct, it will be displayed normally as a simple text file, so it can be said that it is difficult to notice that the BOM is the cause if the file malfunctions in any program.

ref: