Question

When i use re:run, i find an interesting thing: the efficiency is very low when i use dotall option.

The source code:

main3() ->
    Sdp = 
"v=0\r\no=- 1001 11112 IN IP4 10.10.121.7\r\ns=-\r\nt=0 0\r\nm=audio 52363 RTP/AVPF 0 8\r\nc=IN IP4 10.10.121.7\r\na=rtcp:52369 IN IP4 138.85.151.208\r\na=candidate:1783138469 1 udp 2113937151 138.85.151.208 52363 typ host generation 0\r\na=candidate:4012290674 1 udp 2113937151 192.168.125.1 52364 typ host generation 0\r\na=candidate:1760259326 1 udp 2113937151 192.168.2.12 52367 typ host generation 0\r\na=candidate:2294684747 1 udp 2113937151 192.168.58.1 52368 typ host generation 0\r\na=candidate:1783138469 2 udp 2113937150 138.85.151.208 52369 typ host generation 0\r\na=candidate:4012290674 2 udp 2113937150 192.168.125.1 52370 typ host generation 0\r\na=candidate:1760259326 2 udp 2113937150 192.168.2.12 52371 typ host generation 0\r\na=candidate:2294684747 2 udp 2113937150 192.168.58.1 52372 typ host generation 0\r\na=candidate:617313365 1 tcp 1509957375 138.85.151.208 52530 typ host generation 0\r\na=candidate:2711965314 1 tcp 1509957375 192.168.125.1 52531 typ host generation 0\r\na=candidate:644386830 1 tcp 1509957375 192.168.2.12 52532 typ host generation 0\r\na=candidate:3326468283 1 tcp 1509957375 192.168.58.1 52533 typ host generation 0\r\na=candidate:617313365 2 tcp 1509957374 138.85.151.208 52534 typ host generation 0\r\na=candidate:2711965314 2 tcp 1509957374 192.168.125.1 52535 typ host generation 0\r\na=candidate:644386830 2 tcp 1509957374 192.168.2.12 52536 typ host generation 0\r\na=candidate:3326468283 2 tcp 1509957374 192.168.58.1 52537 typ host generation 0\r\na=ice-ufrag:root\r\na=ice-pwd:myreallysecretpassword\r\na=sendrecv\r\na=rtpmap:0 PCMU/8000\r\na=rtpmap:8 PCMA/8000\r\na=ssrc:1947760130 cname:OCGE4NpwFpLE/BFW\r\na=ssrc:1947760130 mslabel:oBAkRgSOpLdfl7u1JWdnMyUytcGGD4COvttP\r\na=ssrc:1947760130 label:oBAkRgSOpLdfl7u1JWdnMyUytcGGD4COvttP00\r\nm=video 52373 RTP/AVPF 126\r\nc=IN IP4 10.10.121.7\r\na=rtcp:52377 IN IP4 138.85.151.208\r\na=candidate:1783138469 1 udp 2113937151 138.85.151.208 52373 typ host generation 0\r\na=candidate:4012290674 1 udp 2113937151 192.168.125.1 52374 typ host generation 0\r\na=candidate:1760259326 1 udp 2113937151 192.168.2.12 52375 typ host generation 0\r\na=candidate:2294684747 1 udp 2113937151 192.168.58.1 52376 typ host generation 0\r\na=candidate:1783138469 2 udp 2113937150 138.85.151.208 52377 typ host generation 0\r\na=candidate:4012290674 2 udp 2113937150 192.168.125.1 52378 typ host generation 0\r\na=candidate:1760259326 2 udp 2113937150 192.168.2.12 52379 typ host generation 0\r\na=candidate:2294684747 2 udp 2113937150 192.168.58.1 52380 typ host generation 0\r\na=candidate:617313365 1 tcp 1509957375 138.85.151.208 52538 typ host generation 0\r\na=candidate:2711965314 1 tcp 1509957375 192.168.125.1 52539 typ host generation 0\r\na=candidate:644386830 1 tcp 1509957375 192.168.2.12 52540 typ host generation 0\r\na=candidate:3326468283 1 tcp 1509957375 192.168.58.1 52541 typ host generation 0\r\na=candidate:617313365 2 tcp 1509957374 138.85.151.208 52542 typ host generation 0\r\na=candidate:2711965314 2 tcp 1509957374 192.168.125.1 52543 typ host generation 0\r\na=candidate:644386830 2 tcp 1509957374 192.168.2.12 52544 typ host generation 0\r\na=candidate:3326468283 2 tcp 1509957374 192.168.58.1 52545 typ host generation 0\r\na=ice-ufrag:root\r\na=ice-pwd:myreallysecretpassword\r\na=sendrecv\r\na=rtpmap:126 H264/90000\r\n",
    ReStr = "(.*)a=candidate.*host.*a=candidate.*host(.*)a=ice-ufrag.*a=setup:active(.*)a=mid:audio(.*)a=candidate.*host.*a=candidate.*host(.*)a=ice-ufrag.*a=setup:active(.*)a=mid:video(.*)",
    {ok, Pattern1} = re:compile(ReStr, [{newline, crlf}]),
    {Time1, _} = timer:tc(re, run, [ Sdp, Pattern1, [{capture,all_but_first,list}] ]),
    io:format("not using dotall, time is ~p~n", [Time1]),

    {ok, Pattern2} = re:compile(ReStr, [{newline, crlf}, dotall]),
    {Time2, _} = timer:tc(re, run, [ Sdp, Pattern2, [{capture,all_but_first,list}] ]),
    io:format("using dotall, time is ~p~n", [Time2]).

the run result:

101> tt:main3().
not using dotall, time is 4499
using dotall, time is 2760364
ok

From the result, we can find the difference is so large.

Was it helpful?

Solution

Normally by default when dotall is not set then the . pattern does not match \n so the searching will only extend as far as the end of the line. When dotall is set the then . matches all characters until the end of the input string. This makes a difference in your case as your input string contains many lines.

The thing to remember is that re is based on PCRE which follows the Perl regular expressions. One feature of these is that they are implemented using a back-tracking algorithm which means that when you have alternatives in your pattern, like for example .*, it results in a lot of searching to find a match. This is a property of Perl regular expressions and not due to bad implementation.

For a longer discussion on the various ways of implementing regular expressions see wikipedia Regular Expression (the third algorithm), Implementing Regular Expressions by Russ Cox (the first paper), or Friedl's Mastering Regular Expressions (though he gives the 3 algorithms wrong names).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top