ï»¿ characters appended to the beginning of each file

https://stackoverflow.com/questions/466653

19-08-2019
|

Question

I've downloaded an HttpHandler class that concatenates JS files into one file and it keeps appending the ï»¿ characters at the start of each file it concatenates.

Any ideas on what is causing this? Could it be that onces the files processed they are written to the cache and that's how the cache is storing/rendering it?

Any inputs would be greatly appreciated.

using System;
using System.Net;
using System.IO;
using System.IO.Compression;
using System.Text;
using System.Configuration;
using System.Web;

public class HttpCombiner : IHttpHandler {

    private const bool DO_GZIP = false;
    private readonly static TimeSpan CACHE_DURATION = TimeSpan.FromDays(30);

    public void ProcessRequest (HttpContext context) {

        HttpRequest request = context.Request;

        // Read setName, contentType and version. All are required. They are
        // used as cache key
        string setName = request["s"] ?? string.Empty;
        string contentType = request["t"] ?? string.Empty;
        string version = request["v"] ?? string.Empty;

        // Decide if browser supports compressed response
        bool isCompressed = DO_GZIP && this.CanGZip(context.Request);

        // Response is written as UTF8 encoding. If you are using languages
        // like Arabic, you should change this to proper encoding 
        UTF8Encoding encoding = new UTF8Encoding(false);

        // If the set has already been cached, write the response directly
        // from cache. Otherwise generate the response and cache it
        if (!this.WriteFromCache(context, setName, version, isCompressed,
            contentType))
        {
            using (MemoryStream memoryStream = new MemoryStream(5000))
            {
                // Decide regular stream or GZipStream based on whether the
                // response can be cached or not
                using (Stream writer = isCompressed
                    ? (Stream)(new GZipStream(memoryStream,
                        CompressionMode.Compress))
                    : memoryStream)
                {
                    // Load the files defined in <appSettings> and process
                    // each file
                    string setDefinition = System.Configuration
                        .ConfigurationManager.AppSettings[setName] ?? "";
                    string[] fileNames = setDefinition.Split(
                        new char[] { ',' }, 
                        StringSplitOptions.RemoveEmptyEntries);

                    foreach (string fileName in fileNames)
                    {
                        byte[] fileBytes = this.GetFileBytes(
                            context, fileName.Trim(), encoding);
                        writer.Write(fileBytes, 0, fileBytes.Length);
                    }

                    writer.Close();
                }

                // Cache the combined response so that it can be directly
                // written in subsequent calls 
                byte[] responseBytes = memoryStream.ToArray();
                context.Cache.Insert(
                    GetCacheKey(setName, version, isCompressed),
                    responseBytes, null,
                    System.Web.Caching.Cache.NoAbsoluteExpiration,
                    CACHE_DURATION);

                // Generate the response
                this.WriteBytes(responseBytes, context, isCompressed,
                    contentType);
            }
        }
    }

    private byte[] GetFileBytes(HttpContext context, string virtualPath,
        Encoding encoding)
    {
        if (virtualPath.StartsWith("http://",
            StringComparison.InvariantCultureIgnoreCase))
        {
            using (WebClient client = new WebClient())
            {
                return client.DownloadData(virtualPath);
            }
        }
        else
        {
            string physicalPath = context.Server.MapPath(virtualPath);
            byte[] bytes = File.ReadAllBytes(physicalPath);
            // TODO: Convert unicode files to specified encoding.
            // For now, assuming files are either ASCII or UTF8
            return bytes;
        }
    }

    private bool WriteFromCache(HttpContext context, string setName,
        string version, bool isCompressed, string contentType)
    {
        byte[] responseBytes = context.Cache[GetCacheKey(setName, version,
            isCompressed)] as byte[];

        if (null == responseBytes || 0 == responseBytes.Length) return false;

        this.WriteBytes(responseBytes, context, isCompressed, contentType);
        return true;
    }

    private void WriteBytes(byte[] bytes, HttpContext context, 
        bool isCompressed, string contentType)
    {
        HttpResponse response = context.Response;

        response.AppendHeader("Content-Length", bytes.Length.ToString());
        response.ContentType = contentType;
        if (isCompressed)
            response.AppendHeader("Content-Encoding", "gzip");

        context.Response.Cache.SetCacheability(HttpCacheability.Public);
        context.Response.Cache.SetExpires(DateTime.Now.Add(CACHE_DURATION));
        context.Response.Cache.SetMaxAge(CACHE_DURATION);
        context.Response.Cache.AppendCacheExtension(
            "must-revalidate, proxy-revalidate");

        response.OutputStream.Write(bytes, 0, bytes.Length);
        response.Flush();
    }

    private bool CanGZip(HttpRequest request)
    {
        string acceptEncoding = request.Headers["Accept-Encoding"];
        if (!string.IsNullOrEmpty(acceptEncoding) &&
             (acceptEncoding.Contains("gzip")
                 || acceptEncoding.Contains("deflate")))
            return true;
        return false;
    }

    private string GetCacheKey(string setName, string version,
        bool isCompressed)
    {
        return "HttpCombiner." + setName + "." + version + "." + isCompressed;
    }

    public bool IsReusable
    {
        get { return true; }
    }
}

Solution

OK, I've debugged your code.

BOM marks appear in the source stream when the files are being read from the disk:

byte[] bytes = File.ReadAllBytes(physicalPath);
// TODO: Convert unicode files to specified encoding. For now, assuming
// files are either ASCII or UTF8

If you read the files properly, you can get rid of the marks.

OTHER TIPS

The ï»¿ characters are the UTF BOM markers.

Its the UTF Byte Order Mark (BOM).

It will be at the start of each file, but your editor will ignore them there. When concatenated they end up in the middle, so you see them.

I think this is the Byte Order Mark (BOM) for files with UTF-8 encoding. This mark allows to determine in what encoding the file is stored.

If you have the file's contents in a string, .Trim() will lop off the "BOM" quite handily.

You may not be able to do that, or you may want the whitespace at the ends of the file, but it's certainly an option.

For .js whitespace isn't significant, so this could work.

Check how your js files are encoded and provide the same encoding in the code which does the reading and concatenation. These two characters usually point to unicode.

Those characters are UTF-8 BOM. It doesn't seem like they're coming from the gzipped stream. It's more likely they are inserted to the response stream, so I would suggest clearing the response before working with it:

context.Response.Clear();

You didn't post what the actual solution was. Here's my soulution. On the line where it reads the file into memory, I found a kind of strange way to remove the BOM:

byte[] bytes = File.ReadAllBytes(physicalPath);
String ss = new StreamReader(new MemoryStream(bytes), true).ReadToEnd();
byte[] b = StrToByteArray(ss);
return b;

And you also need this function:

public static byte[] StrToByteArray(string str)
{
    System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
    return encoding.GetBytes(str);
}

Nitech

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow