tag:blogger.com,1999:blog-32442424.post4581709963362320971..comments2024-02-11T10:17:31.048+00:00Comments on Chris Cant's developer blog: System.String hidden UTF8 BOMChris Canthttp://www.blogger.com/profile/11367082039820244178noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-32442424.post-56286156091611679542016-01-25T17:07:30.878+00:002016-01-25T17:07:30.878+00:00If you are creating the Byte[] from a String, it i...If you are creating the Byte[] from a String, it is possible to ensure no BOM is generated by creating your own Encoding instance using<br /><br /> new System.Text.UTF8Encoding(false)<br /><br />For example, when using a StreamWriter to produce Byte[], you can instance using<br /><br />using (StreamWriter feedWriter = new StreamWriter(outputStream, <br /> new System.Text.UTF8Encoding(false))) { <br /> feedWriter.Write(inputString);<br />}Anonymoushttps://www.blogger.com/profile/10887482980505029731noreply@blogger.comtag:blogger.com,1999:blog-32442424.post-91006250373308113882010-03-03T19:10:45.106+00:002010-03-03T19:10:45.106+00:00I find this behavior annoying. The Unicode specifi...I find this behavior annoying. The Unicode specification states: "Where the data is typed, such as a field in a database, a BOM is unnecessary. Do not tag every string in a database or set of fields with a BOM, since it wastes space and complicates string<br />concatenation. Moreover, it also means two data fields may have precisely the same content, but not be binary-equal (where one is prefaced by a BOM)." <br /><br />I also don't like the fact that the behavior of GetString()/GetBytes() is predicated on whether the argument contains a BOM. If the byte array has any BOM in the first 2-3 bytes, then the returned string starts with the garbage character 0xFEFF (actually this is the UTF-16 BigEndian BOM). If there is no BOM in the byte array, then the string is well-formed. Likwise if you call GetBytes() with a string argument that has the 0xFEFF, the resulting byte array will contain a BOM (regardless of the encoding used to convert to bytes the BOM will always be correct). If the string has no garbage BOM, then there is no BOM in the byte array. Of course, then you have to prepend the BOM yourself. This behavior is a hidden mechanism that is not documented and more than a little annoying as it can screw thing up (like two BOMs in a file).Williamhttp://www.sourcespringsoftware.comnoreply@blogger.com