PHP Encoding Conversion to Windows-1252 whilst keeping UTF-8 Compatibility

Question 1

I think the main problem is that mb_detect_encoding() does not do exactly what you think it does. It attempts to detect the character encoding but it does it from a fairly limited list of predefined encodings. By default, those encodings are the ones returned by mb_detect_order(). In my computer they are:

ASCII
UTF-8

So this function is completely useless unless you take care of compiling a list of candidate encodings and feeding the function with it.

Additionally, there's basically no reliable way to guess the encoding of an arbitrary input string, even if you restrict yourself to a small subset of encodings. In your case, Windows-1252 is so close to ISO-8859-1 and ISO-8859-15 that you have no way to tell them apart other than visual inspection of key characters like ¤ or €.

Question 2

You can't have a string be Windows-1252 and UTF-8 at the same time. The character sets are identical for the first 128 characters (they contain e.g. the basic latin alphabet), but when it goes beyond that (like for Umlauts), it's either one or the other. They have different code points in UTF-8 than they have in Windows-1252.

Question 3

Keep to ASCII in the filesystem - if you need to sustain characters outside ASCII in a filename, there are schemes you can use to represent unicode characters while keeping to ASCII.

For example, percent encoding:

äöüÄÖÜ.txt <-> %C3%A4%C3%B6%C3%BC%C3%84%C3%96%C3%9C.txt

Of course this will hit the file name limit pretty fast and is not very optimal.

How about punycode?

äöüÄÖÜ.txt <-> xn--4caa7cb2ac.txt