Question

I need to look amongst thousands of files for 800 numbers that may have been hardcoded into some code. I'm a novice at regular expressions but I thought they might be useful in this case. I was able to find this online:

^(\+?1)?(8(00|44|55|66|77|88)[2-9]\d{6})$

Unfortunately I don't think this would take into account different formats such as:

  • 1(800) 765-4321
  • 1 877 765-4321
  • 1-855-765-4321
  • 1.800.765.4321

I don't know how varied the phone numbers can be, but I do think there should be a catch-all way to write this.

Was it helpful?

Solution 2

This matches all of your demo data and also matches only numbers, such as 18001234321

\+?1?\D*\d\d\d\D*\d\d\d\D*\d\d\d\d\b

Regular expression visualization

Debuggex Demo

You can also see its technical description at the bottom of this page: http://regex101.com/r/lE2fR2

Basically, its zero-or-more non-digits (\D) followed by some digits (\d).

For 800 numbers, just change the first \d to an 8.

If your goal is to strip out everything but the numbers, then capture the numbers and replace with the capture groups:

Find what:  \+?(1?)\D*(\d\d\d)\D*(\d\d\d)\D*(\d\d\d\d)\b
Replace with:  $1$2$3$4

Java:

import  java.util.regex.Pattern;
import  java.util.regex.Matcher;

/**
   <P>{@code java StripPhoneFormattingXmpl}</P>
 **/
public class StripPhoneFormattingXmpl  {
   public static final void main(String[] igno_red)  {


      String sToSearch = "1(800) 765-4321 1 877 765-4321 1-855-765-4321 1.800.765.4321 18001231234";

      String sRegex = "\\+?(1?)\\D*(\\d\\d\\d)\\D*(\\d\\d\\d)\\D*(\\d\\d\\d\\d)\\b";
      String sRplcWith = "$1$2$3$4";

      Matcher m = Pattern.compile(sRegex).matcher(sToSearch);
      StringBuffer sb = new StringBuffer();
      while(m.find())  {
         m.appendReplacement(sb, sRplcWith);
      }
      m.appendTail(sb);

      System.out.println("Original: " + sToSearch);
      System.out.println("Stripped: " + sb);
   }
}

Output:

[C:\java_code\]java StripPhoneFormattingXmpl
Original: 1(800) 765-4321 1 877 765-4321 1-855-765-4321 1.800.765.4321 18001231234
Stripped: 18007654321 18777654321 18557654321 18007654321 18001231234

OTHER TIPS

Edit: updated answer (see comments)

How about this:

(\d\D{0,2}8\d{2}\D{0,2}\d{3}\D{0,2}\d{4})

See it in action.

Matches:

1 (800) 765 4321
1 877 765-4321
1-855-765-4321
1.800.765.4321
1 (800) 123-4567
18007654321

Very similar to aliteralmind's but a bit shorter. The {} allow you to say that:

z{x}     #you want exactly x matches of z
z{x,y}   #you want between x and y matches of z

So:

a{3,6}  #you want between 3 and 6 repeats of a

Here's a somewhat complicated regex that will filter out invalid patterns, but still keep the ones you've listed.

For example, it will NOT accept numbers like:

  • 1-801-765-1234
  • 1-800-123-4567
  • 1ABCDEFGHIJK8007654321
  • +1 877765-4321
  • 1.800-765.4321

(+?\b1|\b)((()|([.-])|\s)?8(00|44|55|66|77|88)(?(3))|)(?(4)\4|(?(2)\s|))[2-9]\d{2}(?(4)\4|(?(2)[.-]|))\d{4}\b

It will ensure the following:

  1. Only 800, 844, 855, 866, 877, and 888 numbers are accepted
  2. The NXX must start with [2-9]
  3. Only the following delimiters can be used [ .-]
  4. If our format has a delimiter in the beginning (e.g. 1-800...), only that same delimiter can be used for the rest of the number.
  5. If our format does not have a delimiter in the beginning (e.g "1(800)" or "1 800"), then no delimiter appears between the 8XX and the NXX.

Regular expression visualization

See Debuggex for a commented version that breaks down what each part does.

I'd use, according to your own regex:

(?:^|\D)(?:\+?1)?\D*8([04-8])\1\D*[2-9]\d\d\D*\d{4}\b
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top