Как мне удалить *»¿ из начала файла?

https://stackoverflow.com/questions/3255993

16-09-2020
|

Вопрос

У меня есть CSS-файл, который выглядит нормально, когда я открываю его с помощью править, но когда он считывается PHP (чтобы объединить все файлы CSS в один), к этому CSS добавляются следующие символы:*»¿

PHP удаляет все пробелы, поэтому случайное »¿" в середине кода портит все дело.Как я уже упоминал, на самом деле я не могу видеть эти символы, когда открываю файл в gedit, поэтому я не могу их очень легко удалить.

Я погуглил проблему, и явно что-то не так с кодировкой файла, что имеет смысл, поскольку я перемещал файлы на разные серверы Linux / Windows через ftp и rsync синхронизация, с целым рядом текстовых редакторов.Однако на самом деле я не очень разбираюсь в кодировке символов, так что буду признателен за помощь.

Если это поможет, файл сохраняется в формате UTF-8, и gedit не позволяет мне сохранить его в формате ISO-8859-15 (документ содержит один или несколько символов, которые не могут быть закодированы с использованием указанной кодировки символов).Я пробовал сохранить его с помощью окончаний строк Windows и Linux, но ни то, ни другое не помогло.

Решение

Три слова для тебя:

Метка порядка байтов (СПЕЦИФИКАЦИЯ)

Это представление спецификации UTF-8 в ISO-8859-1.Вы должны сказать своему редактору, чтобы он не использовал спецификации или использовал другой редактор, чтобы удалить их.

Чтобы автоматизировать удаление спецификации, вы можете использовать awk как показано на этот вопрос.

Как другой ответ гласит, лучше всего было бы, чтобы PHP действительно правильно интерпретировал спецификацию, для этого вы можете использовать mb_internal_encoding(), вот так:

 <?php
   //Storing the previous encoding in case you have some other piece 
   //of code sensitive to encoding and counting on the default value.      
   $previous_encoding = mb_internal_encoding();

   //Set the encoding to UTF-8, so when reading files it ignores the BOM       
   mb_internal_encoding('UTF-8');

   //Process the CSS files...

   //Finally, return to the previous encoding
   mb_internal_encoding($previous_encoding);

   //Rest of the code...
  ?>

Другие советы

Откройте файл в Notepad ++ .Из меню Кодирования выберите Преобразовать в UTF-8 без BOM , сохранить файл, замените старый файл этим новым файлом.И это будет работать, черт возьми.

в php вы можете сделать следующее, чтобы удалить все не символы, включая рассматриваемый символ.

$response = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $response);

Для тех, кто с доступом с оболочкой здесь - это небольшая команда, чтобы найти все файлы с набором BOM в каталоге public_html - обязательно измените его к тому, что ваш правильный путь на вашем сервере

код:

grep -rl $'\xEF\xBB\xBF' /home/username/public_html

И если вам удобно с помощью vi Editor, откройте файл в VI:

vi /path-to-file-name/file.php

и введите команду, чтобы удалить BOM:

set nobomb

Сохранить файл:

wq

BOM - это просто последовательность символов ($ EF $ BB $ BF для UTF-8), поэтому просто удалите их с помощью сценариев или настроить редактор, чтобы он не добавлен.

from Удаление спецификации из UTF-8 :

#!/usr/bin/perl
@file=<>;
$file[0] =~ s/^\xEF\xBB\xBF//;
print(@file);

Я уверен, что он легко переводит на PHP.

Для меня это сработало:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Если я удалю эту мета, снова появляется ï »¿.Надеюсь, это поможет кому-то ...

Я не знаю PHP, поэтому я не знаю, возможно ли это, но лучшее решение будет читать файл как UTF-8, а не какой-то другой кодировку.Гребец на самом деле нулевая ширина без пробела.Это пробел, поэтому, если файл прочитал в правильной кодировке (UTF-8), то BOM будет интерпретировать как пробел, и он будет игнорироваться в полученном файле CSS.

Кроме того, еще одно преимущество для чтения файла в правильном кодировании состоит в том, что вам не нужно беспокоиться о неверных персонажах.Ваш редактор говорит вам, что страница кода, которую вы хотите сохранить, не выполнят все необходимые символы.Если PHP затем читает файл в неверном кодировке, то очень вероятно, что другие символы, кроме спецификации, не проведенные неверно истолкованы.Используйте UTF-8 везде, и эти проблемы исчезают.

Вы можете использовать

vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq'

Замена с awk, кажется, работает, но он не на месте.

grep -rl $'\xEF\xBB\xBF' * | xargs vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq'

I had the same problem with the BOM appearing in some of my PHP files (ï»¿ï»¿).

If you use PhpStorm you can set at hotkey to remove it in Settings -> IDE Settings -> Keymap -> Main Menu - > File -> Remove BOM.

In Notepad++, choose the "Encoding" menu, then "Encode in UTF-8 without BOM". Then save.

See Stack Overflow question How to make Notepad to save text in UTF-8 without BOM?.

Open the PHP file under question, in Notepad++.

Click on Encoding at the top and change from "Encoding in UTF-8 without BOM" to just "Encoding in UTF-8". Save and overwrite the file on your server.

Same problem, different solution.

One line in the PHP file was printing out XML headers (which use the same begin/end tags as PHP). Looks like the code within these tags set the encoding, and was executed within PHP which resulted in the strange characters. Either way here's the solution:

# Original
$xml_string = "&lt;?xml version=\"1.0\" encoding=\"UTF-8\"?&gt;";

# fixed
$xml_string = "<" . "?xml version=\"1.0\" encoding=\"UTF-8\"?" . ">";

If you need to be able to remove the BOM from UTF-8 encoded files, you first need to get hold of an editor that is aware of them.

I personally use E Text Editor.

In the bottom right, there are options for character encoding, including the BOM tag. Load your file, deselect Byte Order Marker if it is selected, resave, and it should be done.

Alt text http://oth4.com/encoding.png

E is not free, but there is a free trial, and it is an excellent editor (limited TextMate compatibility).

You can open it by PhpStorm and right-click on your file and click on Remove BOM...

Here is another good solution for the problem with BOM. These are two VBScript (.vbs) scripts.

One for finding the BOM in a file and one for KILLING the damned BOM in the file. It works pretty fine and is easy to use.

Just create a .vbs file, and paste the following code in it.

You can use the VBScript script simply by dragging and dropping the suspicious file onto the .vbs file. It will tell you if there is a BOM or not.

' Heiko Jendreck - personal helpdesk & webdesign
' http://www.phw-jendreck.de
' 2010.05.10 Vers 1.0
'
' find_BOM.vbs
' ====================
' Kleines Hilfsmittel, welches das BOM finden soll
'
 Const UTF8_BOM = "ï»¿"
 Const UTF16BE_BOM = "þÿ"
 Const UTF16LE_BOM = "ÿþ"
 Const ForReading = 1
 Const ForWriting = 2
 Dim fso
 Set fso = WScript.CreateObject("Scripting.FileSystemObject")
 Dim f
 f = WScript.Arguments.Item(0)
 Dim t
 t = fso.OpenTextFile(f, ForReading).ReadAll
 If Left(t, 3) = UTF8_BOM Then
     MsgBox "UTF-8-BOM detected!"
 ElseIf Left(t, 2) = UTF16BE_BOM Then
     MsgBox "UTF-16-BOM (Big Endian) detected!"
 ElseIf Left(t, 2) = UTF16LE_BOM Then
     MsgBox "UTF-16-BOM (Little Endian) detected!"
 Else
     MsgBox "No BOM detected!"
 End If

If it tells you there is BOM, go and create the second .vbs file with the following code and drag the suspicios file onto the .vbs file.

' Heiko Jendreck - personal helpdesk & webdesign
' http://www.phw-jendreck.de
' 2010.05.10 Vers 1.0
'
' kill_BOM.vbs
' ====================
' Kleines Hilfmittel, welches das gefundene BOM löschen soll
'
Const UTF8_BOM = "ï»¿"
Const ForReading = 1
Const ForWriting = 2
Dim fso
Set fso = WScript.CreateObject("Scripting.FileSystemObject")
Dim f
f = WScript.Arguments.Item(0)
Dim t
t = fso.OpenTextFile(f, ForReading).ReadAll
If Left(t, 3) = UTF8_BOM Then
    fso.OpenTextFile(f, ForWriting).Write (Mid(t, 4))
    MsgBox "BOM gelöscht!"
Else
    MsgBox "Kein UTF-8-BOM vorhanden!"
End If

The code is from Heiko Jendreck.

In PHPStorm, for multiple files and BOM not necessarily at the beginning of the file, you can search \x{FEFF} (Regular Expression) and replace with nothing.

Same problem, but it only affected one file so I just created a blank file, copy/pasted the code from the original file to the new file, and then replaced the original file. Not fancy but it worked.

Use Total Commander to search for all BOMed files:

Elegant way to search for UTF-8 files with BOM?

Open these files in some proper editor (that recognizes BOM) like Eclipse.
Change the file's encoding to ISO (right click, properties).
Cut ï»¿ from the beginning of the file, save
Change the file's encoding back to UTF-8

...and do not even think about using n...d again!

I had the same problem. The problem was because one of my php files was in utf-8 (the most important, the configuaration file which is included in all php files).

In my case, I had 2 different solutions which worked for me :

First, I changed the Apache Configuration by using AddDefaultCharsetDirective in configuration files (or in .htaccess). This solution forces Apache to use the correct encodage.

AddDefaultCharset ISO-8859-1

The second solution was to change the bad encoding of the php file.

Copy the text of your filename.css file.
Close your css file.
Rename it filename2.css to avoid a filename clash.
In MS Notepad or Wordpad, create a new file.
Paste the text into it.
Save it as filename.css, selecting UTF-8 from the encoding options.
Upload filename.css.

Check on your index.php, find "... charset=iso-8859-1" and replace it with "... charset=utf-8".

Maybe it'll work.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow