Question

I am trying to come up with a regular expression to match Bitcoin addresses according to these specs:

A Bitcoin address, or simply address, is an identifier of 27-34 alphanumeric characters, beginning with the number 1 or 3 [...]

I figured it would look something like this

/^[13][a-zA-Z0-9]{27,34}/

Thing is, I'm not good with regular expressions and I haven't found a single source to confirm this would not create false negatives.

I've found one online that's ^1[1-9A-Za-z][^OIl]{20,40}, but I don't even know what the [^OIl] part means and it doesn't seem to match the 3 a Bitcoin address could start with.

Was it helpful?

Solution 3

[^OIl] matches any character that's not O, I or l. The problems in your regex are:

  • You don't have a $ at the end, so it'd match any string beginning with a BC address.
  • You didn't count the first character in your {27,34} - that should be {26,33}

However, as mentioned in a comment, a regex is not a good way to validate a bitcoin address.

OTHER TIPS

^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$

will match a string that starts with either 1 or 3 and, after that, 25 to 34 characters of either a-z, A-Z, or 0-9, excluding l, I, O and 0 (not valid characters in a Bitcoin address).

^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$

bitcoin address is

  • an identifier of 26-35 alphanumeric characters
  • beginning with the number 1 or 3
  • random digits
  • uppercase
  • lowercase letters
  • with the exception that the uppercase letter O, uppercase letter I, lowercase letter l, and the number 0 are never used to prevent visual ambiguity.
^(bc1|[13])[a-zA-HJ-NP-Z0-9]{25,39}$

Based on the new address type Bech32

Based on answer of runeks and Erhard Dinhobl I got this that accepts bech32 and legacy:

\b(bc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|[13][a-km-zA-HJ-NP-Z1-9]{25,35})\b

Including testnet address:

\b((bc|tb)(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|([13]|[mn2])[a-km-zA-HJ-NP-Z1-9]{25,39})\b

Only testnet:

\b(tb(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|[mn2][a-km-zA-HJ-NP-Z1-9]{25,39})\b

Based on the description here: https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki I would say the regex for a Bech32 bitcoin address for Version 1 and Version 0 (only for mainnet) is:

\bbc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})\b

Here are some other links where I found infos:

As the OP didn't provide a specific use case (only matching criteria) and I came across this in researching methods to detect BitCoin addresses, wanted to post back and share with the community.

These RegEx provided will find BitCoin addresses either at the start of a line and/or end of the line. My use case was to find BitCoin addresses in the body of an email given the rise of blackmail/sextortion (Reference: https://krebsonsecurity.com/2018/07/sextortion-scam-uses-recipients-hacked-passwords/) - so these weren't effective solutions (as outlined later). The proposed RegEx will catch many FPs in email, due to filenames and other identifiers within URLs. I am not knocking the solutions, as they work for certain use cases, but they simply don't work for mine. One variation caught many spam emails within a short timeframe of passive alerting (examples follow).

Here are my test cases:

--------------------------------------------------------
BitCoin blackmail formats observed (my org and online):
--------------------------------------------------------
BTC Address: 1JHwenDp9A98XdjfYkHKyiE3R99Q72K9X4 
BTC Address: 1Unoc4af6gCq3xzdDFmGLpq18jbTW1nZD
BTC Address: 1A8Ad7VbWDqwmRY6nSHtFcTqfW2XioXNmj
BTC Address: 12CZYvgNZ2ze3fGPFzgbSCELBJ6zzp2cWc
BTC Address: 17drmHLZMsCRWz48RchWfrz9Chx1osLe67

Receiving Bitcoin Address: 15LZALXitpbkK6m2QcbeQp6McqMvgeTnY8
Receiving Bitcoin Address: 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5

--------------------------------------------------------
Other possible BitCoin test cases I added:
--------------------------------------------------------
- What if text comes before and/or after on same line?  Or doesn't contain BitCoin/BTC/etc. anywhere (or anywhere close to the address)?
    Send BitCoin payments here 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5
    1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5 to keep your secrets safe.
    Send payments here 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5 to keep your secrets safe.

- Standalone address:
    1Dvd7Wb72JBTbAcfTrxSJCZZuf4tsT8V72

--------------------------------------------------------
Redacted Body content generating FPs from spam emails:
--------------------------------------------------------
src=3D"https://example.com/blah=3D2159024400&t=3DXWP9YVkAYwkmif9RgKeoPhw2b1zdMnMzXZSGRD_Oxkk"

"cursor:pointer;color:#6A6C6D;-webkit-text-size-blahutm_campaign%253Drdboards%2526e_t%253Dd5c2deeaae5c4a8b8d2bff4d0f87ecdd%2526utm_cont=blah

src=3D"https://example.com/blah/74/328e74997261d5228886aab1a2da6874.jpg" 

src=3D"https://example.com/blah-1c779f59948fc5be8a461a4da8d938aa.jpg"

href=3D"https://example.com/blah-0ff3169b28a6e17ae8a369a3161734c1?alert_=id=blah

Some RegEx samples I tested (won't list those I'd knock for greedy globbing with backtraces):

^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
    (Too narrow and misses BitCoin addresses within a paragraph)

(bc1|[13])[a-zA-HJ-NP-Z0-9]{25,39}$
    (Still misses text after BTC on same line and triples execution time)

\W[13][a-km-zA-HJ-NP-Z1-9]{25,34}\W
    (Too broad and catches URL formats)

The current RegEx I am evaluating which catches all my known/crafted sample cases and eliminates known FPs (specifically avoiding end of sentence period for URL filename FPs):

[13][a-km-zA-HJ-NP-Z1-9]{25,34}\s

One reference point for execution times (shows cost in steps and time): https://regex101.com/

Please feel free to weigh in or provide suggestions on improvements (I am by no means a RegEx master). As I further vet it against email detection of Body content, I will update if other FP cases are observed or more efficient RegEx is derived.

Seth

for mainnet bitcoin

/^([13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}|bc1[a-z0-9]{39,59})$/

if you don't want to understand the above regex you can skip the detail below

breaking it down

For regular addresses

/[13]{1}/

address will start with 1 or 3, {1} defines that only match one character in square bracket

/[13]{1}[a-km-zA-HJ-NP-Z1-9]/

cannot have l (small el), I (capital eye), O (capital O) and 0 (zero)

/[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}/

can be 27 to 34 characters long, remember we already checked the first character to be 1 or 3, so remaining address will be 26 to 33 characters long

For segwit

/bc1/

starts with bc1

/bc1[a-z0-9]/

can only contain lower case letters and numbers

/bc1[a-z0-9]{39,59}/

can be 42 to 62 characters long, we already checked first three characters to be bc1, so remaining address will be 39 to 59 characters long

I am not into complicated solutions and this regex served the purpose for the most simplest validation, when you just don't want to receive complete nonsense.

\w{25,}

For matching legacy, nested SegWit, and native SegWit addresses:

/^(?:[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}|bc1[a-z0-9]{39,59})$/

Source: Regex for Bitcoin Addresses.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top