Come posso rimuovere ï »¿Dall'inizio di un file?

https://stackoverflow.com/questions/3255993

16-09-2020
|

Domanda

Ho un file CSS che sembra fine quando lo apro usando gedit , ma quando è Leggi da PHP (per unire tutti i file CSS in uno), questo CSS ha i seguenti caratteri preferiti ad esso: ï »¿

PHP rimuove tutti gli spazi bianchi, quindi un casuale ï »¿nel mezzo del codice incasinare l'intera cosa. Come ho detto, non posso effettivamente vedere questi personaggi quando apro il file in GEDIT, quindi non posso rimuoverli molto facilmente.

Ho googato il problema, e c'è chiaramente qualcosa di sbagliato nella codifica dei file, che ha senso essere il senso come ho spostata i file in giro a diversi server Linux / Windows tramite FTP e rsync , con una gamma di editor di testo. Non so davvero molto della codifica dei personaggi però, quindi Aiuta sarebbe apprezzato.

Se aiuta, il file viene salvato in formato UTF-8, e GEDIT non mi permetterà di salvarlo in formato ISO-8859-15 (il documento contiene uno o più caratteri che non possono essere codificati utilizzando il carattere specificato codifica). Ho provato a salvarlo con Windows e Linux Line Endings, ma né aiutato.

Soluzione

Tre parole per te:

Byte Order Mark (BOM)

Questa è la rappresentazione per l'UTF-8 BOM in ISO-8859-1.Devi dire al tuo editor di non usare Boms o utilizzare un editor diverso per spuntarli.

Per automatizzare la rimozione della BOM è possibile utilizzare awk come mostrato in Domanda .

As un'altra rispostaDice , il meglio sarebbe per PHP di interpretare effettivamente il BOM correttamente, perché puoi usare mb_internal_encoding() , come questo:

 <?php
   //Storing the previous encoding in case you have some other piece 
   //of code sensitive to encoding and counting on the default value.      
   $previous_encoding = mb_internal_encoding();

   //Set the encoding to UTF-8, so when reading files it ignores the BOM       
   mb_internal_encoding('UTF-8');

   //Process the CSS files...

   //Finally, return to the previous encoding
   mb_internal_encoding($previous_encoding);

   //Rest of the code...
  ?>

Altri suggerimenti

Apri il tuo file in Notepad ++ .Dal menu Codifica , selezionare Converti in UTF-8 senza BOM , salvare il file, sostituisci il vecchio file con questo nuovo file.E funzionerà, dannazione sicuro.

in php , puoi fare quanto segue per rimuovere tutti i non caratteri incluso il carattere in questione.

$response = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $response);

Per quelli con accesso a shell qui è un piccolo comando per trovare tutti i file con il set di BOM nella directory Public_HTML - Assicurati di modificarlo su quale sia il tuo percorso corretto sul server

Codice:

grep -rl $'\xEF\xBB\xBF' /home/username/public_html

E se sei a tuo agio con il vi editor, apri il file in VI:

vi /path-to-file-name/file.php

e immettere il comando per rimuovere la BOM:

set nobomb

Salva il file:

wq

BOM è solo una sequenza di caratteri ($ EF $ BB $ BF per UTF-8), quindi basta rimuoverli utilizzando gli script o configurare l'editor in modo che non sia aggiunto.

da Rimozione di BOM da UTF-8 :

#!/usr/bin/perl
@file=<>;
$file[0] =~ s/^\xEF\xBB\xBF//;
print(@file);

Sono sicuro che si traduce facilmente in PHP.

Per me, questo funzionava:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Se rimuovo questo meta, il ï »¿appare di nuovo.Spero che questo aiuti qualcuno ...

Non conosco PHP, quindi non so se è possibile, ma la soluzione migliore sarebbe leggere il file come UTF-8 piuttosto che un'altra codifica.La BOM è in realtà una larghezza zero senza spazio di pausa.Questo è lo spazio bianco, quindi se il file veniva letto nella corretta codifica (UTF-8), la BOM verrà interpretata come spazi bianchi e sarebbe stato ignorato nel file CSS risultante.

Inoltre, un altro vantaggio di leggere il file nella codifica corretta è che non devi preoccuparti dei personaggi che vengono erroneamente interpretati.Il tuo editor ti sta dicendo che la pagina del codice che vuoi salvarlo non farà tutti i personaggi di cui hai bisogno.Se PHP ottiene quindi la lettura del file nella codifica errata, allora è molto probabile che altri personaggi oltre alla BOM venistici silenziosamente erroneamente interpretati.Utilizzare UTF-8 ovunque e questi problemi spariscono.

Puoi usare

vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq'

Sostituzione con Awk sembra funzionare, ma non è in posizione.

grep -rl $'\xEF\xBB\xBF' * | xargs vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq'

I had the same problem with the BOM appearing in some of my PHP files (ï»¿ï»¿).

If you use PhpStorm you can set at hotkey to remove it in Settings -> IDE Settings -> Keymap -> Main Menu - > File -> Remove BOM.

In Notepad++, choose the "Encoding" menu, then "Encode in UTF-8 without BOM". Then save.

See Stack Overflow question How to make Notepad to save text in UTF-8 without BOM?.

Open the PHP file under question, in Notepad++.

Click on Encoding at the top and change from "Encoding in UTF-8 without BOM" to just "Encoding in UTF-8". Save and overwrite the file on your server.

Same problem, different solution.

One line in the PHP file was printing out XML headers (which use the same begin/end tags as PHP). Looks like the code within these tags set the encoding, and was executed within PHP which resulted in the strange characters. Either way here's the solution:

# Original
$xml_string = "&lt;?xml version=\"1.0\" encoding=\"UTF-8\"?&gt;";

# fixed
$xml_string = "<" . "?xml version=\"1.0\" encoding=\"UTF-8\"?" . ">";

If you need to be able to remove the BOM from UTF-8 encoded files, you first need to get hold of an editor that is aware of them.

I personally use E Text Editor.

In the bottom right, there are options for character encoding, including the BOM tag. Load your file, deselect Byte Order Marker if it is selected, resave, and it should be done.

Alt text http://oth4.com/encoding.png

E is not free, but there is a free trial, and it is an excellent editor (limited TextMate compatibility).

You can open it by PhpStorm and right-click on your file and click on Remove BOM...

Here is another good solution for the problem with BOM. These are two VBScript (.vbs) scripts.

One for finding the BOM in a file and one for KILLING the damned BOM in the file. It works pretty fine and is easy to use.

Just create a .vbs file, and paste the following code in it.

You can use the VBScript script simply by dragging and dropping the suspicious file onto the .vbs file. It will tell you if there is a BOM or not.

' Heiko Jendreck - personal helpdesk & webdesign
' http://www.phw-jendreck.de
' 2010.05.10 Vers 1.0
'
' find_BOM.vbs
' ====================
' Kleines Hilfsmittel, welches das BOM finden soll
'
 Const UTF8_BOM = "ï»¿"
 Const UTF16BE_BOM = "þÿ"
 Const UTF16LE_BOM = "ÿþ"
 Const ForReading = 1
 Const ForWriting = 2
 Dim fso
 Set fso = WScript.CreateObject("Scripting.FileSystemObject")
 Dim f
 f = WScript.Arguments.Item(0)
 Dim t
 t = fso.OpenTextFile(f, ForReading).ReadAll
 If Left(t, 3) = UTF8_BOM Then
     MsgBox "UTF-8-BOM detected!"
 ElseIf Left(t, 2) = UTF16BE_BOM Then
     MsgBox "UTF-16-BOM (Big Endian) detected!"
 ElseIf Left(t, 2) = UTF16LE_BOM Then
     MsgBox "UTF-16-BOM (Little Endian) detected!"
 Else
     MsgBox "No BOM detected!"
 End If

If it tells you there is BOM, go and create the second .vbs file with the following code and drag the suspicios file onto the .vbs file.

' Heiko Jendreck - personal helpdesk & webdesign
' http://www.phw-jendreck.de
' 2010.05.10 Vers 1.0
'
' kill_BOM.vbs
' ====================
' Kleines Hilfmittel, welches das gefundene BOM löschen soll
'
Const UTF8_BOM = "ï»¿"
Const ForReading = 1
Const ForWriting = 2
Dim fso
Set fso = WScript.CreateObject("Scripting.FileSystemObject")
Dim f
f = WScript.Arguments.Item(0)
Dim t
t = fso.OpenTextFile(f, ForReading).ReadAll
If Left(t, 3) = UTF8_BOM Then
    fso.OpenTextFile(f, ForWriting).Write (Mid(t, 4))
    MsgBox "BOM gelöscht!"
Else
    MsgBox "Kein UTF-8-BOM vorhanden!"
End If

The code is from Heiko Jendreck.

In PHPStorm, for multiple files and BOM not necessarily at the beginning of the file, you can search \x{FEFF} (Regular Expression) and replace with nothing.

Same problem, but it only affected one file so I just created a blank file, copy/pasted the code from the original file to the new file, and then replaced the original file. Not fancy but it worked.

Use Total Commander to search for all BOMed files:

Elegant way to search for UTF-8 files with BOM?

Open these files in some proper editor (that recognizes BOM) like Eclipse.
Change the file's encoding to ISO (right click, properties).
Cut ï»¿ from the beginning of the file, save
Change the file's encoding back to UTF-8

...and do not even think about using n...d again!

I had the same problem. The problem was because one of my php files was in utf-8 (the most important, the configuaration file which is included in all php files).

In my case, I had 2 different solutions which worked for me :

First, I changed the Apache Configuration by using AddDefaultCharsetDirective in configuration files (or in .htaccess). This solution forces Apache to use the correct encodage.

AddDefaultCharset ISO-8859-1

The second solution was to change the bad encoding of the php file.

Copy the text of your filename.css file.
Close your css file.
Rename it filename2.css to avoid a filename clash.
In MS Notepad or Wordpad, create a new file.
Paste the text into it.
Save it as filename.css, selecting UTF-8 from the encoding options.
Upload filename.css.

Check on your index.php, find "... charset=iso-8859-1" and replace it with "... charset=utf-8".

Maybe it'll work.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow