Get index of two consecutive upper case characters

Question 1

I found the PATINDEX/COLLATE option to work fairly intermittently. Here is what I ended up doing:

--get rid of the sparsely used commas
--get rid of the duplicate spaces
update MyTable set
    CityStZip= 
        replace(
            replace(
                replace(CityStZip,'   ',' '),
                '  ',' '),
            ',','')

select
    --check if state and zip are there and then grab the city
    case when isNumeric(right(CityStZip,1))=1
            then left(CityStZip,len(CityStZip)-charindex(' ',reverse(CityStZip),
                                        charindex(' ',reverse(CityStZip))+1)+1)
        --no zip. check for state
        when left(right(CityStZip,3),1) = ' '
            then left(CityStZip,len(CityStZip)-charIndex(' ',reverse(CityStZip)))
        else CityStZip
        end as City,
    --check if zip is there and then grab the city
    case when isNumeric(right(CityStZip,1))=1
            then substring(CityStZip,
                    len(CityStZip)-charindex(' ',reverse(CityStZip),
                                                charindex(' ',reverse(CityStZip))+1)+2,
                    2)
        --no zip. check if 3rd to last char is a space and grab the last two chars
        when left(right(CityStZip,3),1) = ' '
            then right(CityStZip,2)
        end as [State],
    --grab everything after the last space if the last character is numeric
    case when isNumeric(right(CityStZip,1))=1
            then substring(CityStZip,
                    len(CityStZip)-charindex(' ',reverse(CityStZip))+1,
                    charindex(' ',reverse(CityStZip)))
        end as Zip
from MyTable

Question 2

PATINDEX should work for you:

PATINDEX('% [A-Z][A-Z] %', A COLLATE Latin1_general_cs_as)

So your full extract would be something like:

WITH CTE AS
(   SELECT  i = PATINDEX('% [A-Z][A-Z] %', A COLLATE Latin1_general_cs_as) + 1,
            A
    FROM    (VALUES 
                ('City ST Zip'),
                ('Another City ST Zip'),
                ('City, with comma ST Zip')
            ) t (A)
)
SELECT  City = LEFT(A, i - 2),
        State = SUBSTRING(A, i, 2),
        Zip = SUBSTRING(A, i + 3, LEN(A))
FROM    CTE;

Example on SQL Fiddle

Question 3

The reason why PATINDEX appears to work intermittently is that you cannot use a character range (i.e. A-Z) to accomplish a case-sensitive search, even if using a case-sensitive collation. The issue is that character ranges work like sorting, and case-sensitive sorting groups the upper-case letters with their lower-case equivalents, just like it would be ordered in a dictionary. Range sorting is really: a,A,b,B,c,C,d,D,etc. Or, depending on the collation, it might be: A,a,B,b,C,c,D,d,etc (there are 31 Collations that sort upper-case first). When doing this in a case-sensitive collation, that merely groups all A entries together, separate from the a entries, whereas in a case-insensitive sort they would be intermixed.

But if you specify each of the letters individually (hence not using a range), then it will work as expected:

PATINDEX(N'%[ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]%',
     [CityStZip] COLLATE Latin1_General_100_CS_AS)

The reason that PATINDEX and LIKE (both of which allow for a single character class of [A-Z]) work this way is that the [start-end] syntax is not a Regular Expression. Many people claim that PATINDEX and LIKE support "limited" RegEx due to supporting this syntax, but that is not true. It is merely a very similar (and a confusingly similar) syntax to RegEx where [A-Z] would normally not include any lower-case matches.

Of course, if you are guaranteed to only be searching on the US-English letters of A-Z, then a binary collation (i.e. one ending in _BIN2; don't use ones ending in _BIN as they have been deprecated since SQL Server 2005 was introduced, I believe) should work.

PATINDEX(N'%[A-Z][A-Z]%', [CityStZip] COLLATE Latin1_General_100_BIN2)

For more details about case-sensitive matching, especially in regards to including Unicode / NVARCHAR data, please see my related answer on DBA.StackExchange:

How to find values with multiple consecutive upper case characters

Question 4

If you have zip code and state at the end of the string, then this might work:

select right(address, 5) as zip,
       left(right(address, 8), 2) as state,
       left(address, len(address) - 9) as city

You can start by removing the commas and double spaces from the address.

Question 5

If you have a table of states(which you should) with a column of the abbreviations you can do things like this:

SELECT a.* FROM Addresses a
INNER JOIN States s ON
a.CityStateZip Like '% ' + s.UpperCaseAbbreviation + ' %' --space on either side of abbreviation

You can make it work for both commas and spaces:

SELECT a.* FROM Addresses a
INNER JOIN States s ON
Replace(a.CityStateZip, ',' , ' ') Like '% ' + s.UpperCaseAbbreviation + ' %'