Como faço para remover ï»¿ do início de um arquivo?

https://stackoverflow.com/questions/3255993

16-09-2020
|

Pergunta

Eu tenho um arquivo CSS que parece bom quando eu o abro usando gedit, mas quando é lido pelo PHP (para mesclar todos os arquivos CSS em um), esse CSS terá os seguintes caracteres anexados a ele:eu"

O PHP remove todos os espaços em branco, então um ï»¿ aleatório no meio do código bagunça tudo.Como mencionei, não consigo ver esses caracteres quando abro o arquivo no gedit, por isso não consigo removê-los facilmente.

Pesquisei o problema no Google e há claramente algo errado com a codificação do arquivo, o que faz sentido, já que tenho transferido os arquivos para diferentes servidores Linux/Windows via ftp e sincronizar novamente, com uma variedade de editores de texto.Eu realmente não sei muito sobre codificação de caracteres, então seria apreciada ajuda.

Se ajudar, o arquivo está sendo salvo no formato UTF-8 e o gedit não me permite salvá-lo no formato ISO-8859-15 (o documento contém um ou mais caracteres que não podem ser codificados usando a codificação de caracteres especificada).Tentei salvá-lo com terminações de linha do Windows e Linux, mas não ajudou.

Solução

Três palavras para você:

Marca de ordem de bytes (BOM)

Essa é a representação da lista técnica UTF-8 na ISO-8859-1.Você deve informar ao seu editor para não usar BOMs ou usar um editor diferente para removê-las.

Para automatizar a remoção da BOM você pode usar awk como mostrado em essa questão.

Como outra resposta diz, o melhor seria que o PHP realmente interpretasse o BOM corretamente, para isso você pode usar mb_internal_encoding(), assim:

 <?php
   //Storing the previous encoding in case you have some other piece 
   //of code sensitive to encoding and counting on the default value.      
   $previous_encoding = mb_internal_encoding();

   //Set the encoding to UTF-8, so when reading files it ignores the BOM       
   mb_internal_encoding('UTF-8');

   //Process the CSS files...

   //Finally, return to the previous encoding
   mb_internal_encoding($previous_encoding);

   //Rest of the code...
  ?>

Outras dicas

Abra o seu arquivo em Notepad ++ .No menu Codificação , selecione Converter para UTF-8 sem Bom , salve o arquivo, substitua o arquivo antigo por este novo arquivo.E vai funcionar, sem certeza.

Em PHP, você pode fazer o seguinte para remover todos os não caracteres, incluindo o caractere em questão.

$response = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $response);

Para aqueles com acesso shell aqui está um pequeno comando para encontrar todos os arquivos com o BOM definido no diretório public_html - certifique-se de alterá-lo para o caminho correto em seu servidor

Código:

grep -rl $'\xEF\xBB\xBF' /home/username/public_html

e se você se sentir confortável com o vi editor, abra o arquivo no vi:

vi /path-to-file-name/file.php

E digite o comando para remover o BOM:

set nobomb

Salve o arquivo:

wq

BOM é apenas uma sequência de caracteres ($EF $BB $BF para UTF-8), então basta removê-los usando scripts ou configurar o editor para que não sejam adicionados.

De Removendo BOM de UTF-8:

#!/usr/bin/perl
@file=<>;
$file[0] =~ s/^\xEF\xBB\xBF//;
print(@file);

Tenho certeza de que isso se traduz facilmente em PHP.

Para mim, isso funcionou:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Se eu remover esta meta, o ï»¿ aparece novamente.Espero que isso ajude alguém ...

Eu não sei php, então não sei se isso é possível, mas a melhor solução seria ler o arquivo como UTF-8, em vez de alguma outra codificação.O BOM é na verdade uma largura zero sem espaço de quebra.Este é o espaço em branco, por isso, se o arquivo estivesse sendo lido na codificação correta (UTF-8), o BOM seria interpretado como espaço em branco e seria ignorado no arquivo CSS resultante.

Além disso, outra vantagem de ler o arquivo na codificação correta é que você não precisa se preocupar com os personagens sendo mal interpretados.Seu editor está informando que a página de código que você deseja salvá-lo não fará todos os personagens que você precisa.Se o PHP estiver lendo o arquivo na codificação incorreta, é muito provável que outros caracteres além do Bom estejam sendo mal interpretados.Use UTF-8 em todos os lugares, e esses problemas desaparecem.

Você pode usar

vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq'

Substituir por awk parece funcionar, mas não está no lugar.

grep -rl $'\xEF\xBB\xBF' * | xargs vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq'

I had the same problem with the BOM appearing in some of my PHP files (ï»¿ï»¿).

If you use PhpStorm you can set at hotkey to remove it in Settings -> IDE Settings -> Keymap -> Main Menu - > File -> Remove BOM.

In Notepad++, choose the "Encoding" menu, then "Encode in UTF-8 without BOM". Then save.

See Stack Overflow question How to make Notepad to save text in UTF-8 without BOM?.

Open the PHP file under question, in Notepad++.

Click on Encoding at the top and change from "Encoding in UTF-8 without BOM" to just "Encoding in UTF-8". Save and overwrite the file on your server.

Same problem, different solution.

One line in the PHP file was printing out XML headers (which use the same begin/end tags as PHP). Looks like the code within these tags set the encoding, and was executed within PHP which resulted in the strange characters. Either way here's the solution:

# Original
$xml_string = "&lt;?xml version=\"1.0\" encoding=\"UTF-8\"?&gt;";

# fixed
$xml_string = "<" . "?xml version=\"1.0\" encoding=\"UTF-8\"?" . ">";

If you need to be able to remove the BOM from UTF-8 encoded files, you first need to get hold of an editor that is aware of them.

I personally use E Text Editor.

In the bottom right, there are options for character encoding, including the BOM tag. Load your file, deselect Byte Order Marker if it is selected, resave, and it should be done.

Alt text http://oth4.com/encoding.png

E is not free, but there is a free trial, and it is an excellent editor (limited TextMate compatibility).

You can open it by PhpStorm and right-click on your file and click on Remove BOM...

Here is another good solution for the problem with BOM. These are two VBScript (.vbs) scripts.

One for finding the BOM in a file and one for KILLING the damned BOM in the file. It works pretty fine and is easy to use.

Just create a .vbs file, and paste the following code in it.

You can use the VBScript script simply by dragging and dropping the suspicious file onto the .vbs file. It will tell you if there is a BOM or not.

' Heiko Jendreck - personal helpdesk & webdesign
' http://www.phw-jendreck.de
' 2010.05.10 Vers 1.0
'
' find_BOM.vbs
' ====================
' Kleines Hilfsmittel, welches das BOM finden soll
'
 Const UTF8_BOM = "ï»¿"
 Const UTF16BE_BOM = "þÿ"
 Const UTF16LE_BOM = "ÿþ"
 Const ForReading = 1
 Const ForWriting = 2
 Dim fso
 Set fso = WScript.CreateObject("Scripting.FileSystemObject")
 Dim f
 f = WScript.Arguments.Item(0)
 Dim t
 t = fso.OpenTextFile(f, ForReading).ReadAll
 If Left(t, 3) = UTF8_BOM Then
     MsgBox "UTF-8-BOM detected!"
 ElseIf Left(t, 2) = UTF16BE_BOM Then
     MsgBox "UTF-16-BOM (Big Endian) detected!"
 ElseIf Left(t, 2) = UTF16LE_BOM Then
     MsgBox "UTF-16-BOM (Little Endian) detected!"
 Else
     MsgBox "No BOM detected!"
 End If

If it tells you there is BOM, go and create the second .vbs file with the following code and drag the suspicios file onto the .vbs file.

' Heiko Jendreck - personal helpdesk & webdesign
' http://www.phw-jendreck.de
' 2010.05.10 Vers 1.0
'
' kill_BOM.vbs
' ====================
' Kleines Hilfmittel, welches das gefundene BOM löschen soll
'
Const UTF8_BOM = "ï»¿"
Const ForReading = 1
Const ForWriting = 2
Dim fso
Set fso = WScript.CreateObject("Scripting.FileSystemObject")
Dim f
f = WScript.Arguments.Item(0)
Dim t
t = fso.OpenTextFile(f, ForReading).ReadAll
If Left(t, 3) = UTF8_BOM Then
    fso.OpenTextFile(f, ForWriting).Write (Mid(t, 4))
    MsgBox "BOM gelöscht!"
Else
    MsgBox "Kein UTF-8-BOM vorhanden!"
End If

The code is from Heiko Jendreck.

In PHPStorm, for multiple files and BOM not necessarily at the beginning of the file, you can search \x{FEFF} (Regular Expression) and replace with nothing.

Same problem, but it only affected one file so I just created a blank file, copy/pasted the code from the original file to the new file, and then replaced the original file. Not fancy but it worked.

Use Total Commander to search for all BOMed files:

Elegant way to search for UTF-8 files with BOM?

Open these files in some proper editor (that recognizes BOM) like Eclipse.
Change the file's encoding to ISO (right click, properties).
Cut ï»¿ from the beginning of the file, save
Change the file's encoding back to UTF-8

...and do not even think about using n...d again!

I had the same problem. The problem was because one of my php files was in utf-8 (the most important, the configuaration file which is included in all php files).

In my case, I had 2 different solutions which worked for me :

First, I changed the Apache Configuration by using AddDefaultCharsetDirective in configuration files (or in .htaccess). This solution forces Apache to use the correct encodage.

AddDefaultCharset ISO-8859-1

The second solution was to change the bad encoding of the php file.

Copy the text of your filename.css file.
Close your css file.
Rename it filename2.css to avoid a filename clash.
In MS Notepad or Wordpad, create a new file.
Paste the text into it.
Save it as filename.css, selecting UTF-8 from the encoding options.
Upload filename.css.

Check on your index.php, find "... charset=iso-8859-1" and replace it with "... charset=utf-8".

Maybe it'll work.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow