URL의 일부 가져오기(정규식)

https://stackoverflow.com/questions/27745

09-06-2019
|

문제

주어진 URL(한 줄):
http://test.example.com/dir/subdir/file.html

정규식을 사용하여 다음 부분을 어떻게 추출할 수 있습니까?

하위 도메인(테스트)
도메인(example.com)
파일이 없는 경로(/dir/subdir/)
파일(file.html)
파일이 있는 경로(/dir/subdir/file.html)
경로가 없는 URL(http://test.example.com)
(유용하다고 생각되는 다른 항목을 추가하세요)

다음 URL을 입력해도 정규식은 올바르게 작동합니다.

http://example.example.com/example/example/example.html

해결책

쿼리 매개 변수 및 앵커를 포함하여 전체 URL을 구문 분석하고 분해하기위한 단일 정규식 예를 들어

https://www.google.com/dir/1/2/search.html?arg=0-a&arg11-b&arg3-c#hash

^((http[s]?|ftp):\/)?\/?([^:\/\s]+)((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(.*)?(#[\w\-]+)?$

Rexex 위치 :

URL : regexp [ '$ &'],

프로토콜 : Regexp. $ 2,

호스트 : Regexp. $ 3,

경로 : Regexp. $ 4,

파일 : Regexp. $ 6,

쿼리 : Regexp. $ 7,

해시 : Regexp. $ 8

그런 다음 호스트 ( '.'구분)를 아주 쉽게 구문 분석 할 수 있습니다.

뭐 나 다음과 같은 것을 사용하는 것입니다.

/*
    ^(.*:)//([A-Za-z0-9\-\.]+)(:[0-9]+)?(.*)$
*/
proto $1
host $2
port $3
the-rest $4

더 많은 구문 분석 '나머지'는 가능한 한 구체적이어야합니다. 한 번의 성과에서 그것을하는 것은 조금 미쳤다.

다른 팁

나는 파티에 늦었다는 것을 알고 있지만, 브라우저가 정규식없이 URL을 구문 분석 할 수있는 간단한 방법이 있습니다.

var a = document.createElement('a');
a.href = 'http://www.example.com:123/foo/bar.html?fox=trot#foo';

['href','protocol','host','hostname','port','pathname','search','hash'].forEach(function(k) {
    console.log(k+':', a[k]);
});

/*//Output:
href: http://www.example.com:123/foo/bar.html?fox=trot#foo
protocol: http:
host: www.example.com:123
hostname: www.example.com
port: 123
pathname: /foo/bar.html
search: ?fox=trot
hash: #foo
*/

나는 파티에 몇 년 늦었지만 아무도 균일 한 자원 식별자 사양이 정규 표현으로 URI를 구문 분석하는 섹션. Berners-Lee 등이 작성한 정규 표현은 다음과 같습니다.

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
 12            3  4          5       6  7        8 9
위의 두 번째 줄의 숫자는 가독성을 돕는 것입니다. 그것들은 각 서브 표현에 대한 기준점 (즉, 각각 쌍의 괄호)을 나타냅니다. 우리는 subexpression에 대해 일치하는 값을 $라고합니다. 예를 들어, 위의 표현식과 일치합니다

http://www.ics.uci.edu/pub/ietf/uri/#Related

다음과 일치하는 일치 결과가 나타납니다.
$1 = http:
$2 = http
$3 = //www.ics.uci.edu
$4 = www.ics.uci.edu
$5 = /pub/ietf/uri/
$6 = <undefined>
$7 = <undefined>
$8 = #Related
$9 = Related

가치가있는 것에 대해, 나는 JavaScript에서 앞으로 슬래시를 피해야한다는 것을 알았습니다.

^(([^:\/?#]+):)?(\/\/([^\/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

나는 가장 높은 투표를 받은 답변(hometoast의 답변)이 나에게 완벽하게 작동하지 않는다는 것을 알았습니다.두 가지 문제:

포트 번호를 처리할 수 없습니다.
해시 부분이 깨졌습니다.

다음은 수정된 버전입니다.

^((http[s]?|ftp):\/)?\/?([^:\/\s]+)(:([^\/]*))?((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(\?([^#]*))?(#(.*))?$

부품의 위치는 다음과 같습니다.

int SCHEMA = 2, DOMAIN = 3, PORT = 5, PATH = 6, FILE = 8, QUERYSTRING = 9, HASH = 12

다른 사용자가 게시한 편집 내용:

function getFileName(path) {
    return path.match(/^((http[s]?|ftp):\/)?\/?([^:\/\s]+)(:([^\/]*))?((\/[\w\/-]+)*\/)([\w\-\.]+[^#?\s]+)(\?([^#]*))?(#(.*))?$/i)[8];
}

모든 URL과 일치시키기 위해 정규 표현이 필요했고 이것을 만들었습니다.

/(?:([^\:]*)\:\/\/)?(?:([^\:\@]*)(?:\:([^\@]*))?\@)?(?:([^\/\:]*)\.(?=[^\.\/\:]*\.[^\.\/\:]*))?([^\.\/\:]*)(?:\.([^\/\.\:]*))?(?:\:([0-9]*))?(\/[^\?#]*(?=.*?\/)\/)?([^\?#]*)?(?:\?([^#]*))?(?:#(.*))?/

모든 URL, 모든 프로토콜, 심지어 URL과 같은 URL과 일치합니다.

ftp://user:pass@www.cs.server.com:8080/dir1/dir2/file.php?param1=value1#hashtag

결과 (JavaScript)는 다음과 같습니다.

["ftp", "user", "pass", "www.cs", "server", "com", "8080", "/dir1/dir2/", "file.php", "param1=value1", "hashtag"]

URL과 같은

mailto://admin@www.cs.server.com

이렇게 보인다 :

["mailto", "admin", undefined, "www.cs", "server", "com", undefined, undefined, undefined, undefined, undefined]

나는 이것을 JavaScript로 해결하려고 노력했다.

var url = new URL('http://a:b@example.com:890/path/wah@t/foo.js?foo=bar&bingobang=&king=kong@kong.com#foobar/bing/bo@ng?bang');

(크롬에서는 적어도) 이후 :

{
  "hash": "#foobar/bing/bo@ng?bang",
  "search": "?foo=bar&bingobang=&king=kong@kong.com",
  "pathname": "/path/wah@t/foo.js",
  "port": "890",
  "hostname": "example.com",
  "host": "example.com:890",
  "password": "b",
  "username": "a",
  "protocol": "http:",
  "origin": "http://example.com:890",
  "href": "http://a:b@example.com:890/path/wah@t/foo.js?foo=bar&bingobang=&king=kong@kong.com#foobar/bing/bo@ng?bang"
}

그러나 이것은 크로스 브라우저가 아닙니다 (https://developer.mozilla.org/en-us/docs/web/api/url), 그래서 나는 이것을 함께 모아 위와 같은 부분을 꺼내기 위해 다음과 같은 부분을 꺼 냈습니다.

^(?:(?:(([^:\/#\?]+:)?(?:(?:\/\/)(?:(?:(?:([^:@\/#\?]+)(?:\:([^:@\/#\?]*))?)@)?(([^:\/#\?\]\[]+|\[[^\/\]@#?]+\])(?:\:([0-9]+))?))?)?)?((?:\/?(?:[^\/\?#]+\/+)*)(?:[^\?#]*)))?(\?[^#]+)?)(#.*)?

이 동정형에 대한 크레딧은 간다 https://gist.github.com/rpflorence 누가이 jsperf를 게시했습니다 http://jsperf.com/url-parsing (원래 여기에 발견 : https://gist.github.com/jlong/2428561#comment-310066) Regex를 생각해 낸 사람은 원래 기반이었습니다.

부품은 다음 순서입니다.

var keys = [
    "href",                    // http://user:pass@host.com:81/directory/file.ext?query=1#anchor
    "origin",                  // http://user:pass@host.com:81
    "protocol",                // http:
    "username",                // user
    "password",                // pass
    "host",                    // host.com:81
    "hostname",                // host.com
    "port",                    // 81
    "pathname",                // /directory/file.ext
    "search",                  // ?query=1
    "hash"                     // #anchor
];

랩핑하고 쿼리 매개 변수를 제공하는 작은 라이브러리도 있습니다.

https://github.com/sadams/lite-url (Bower에서도 사용 가능)

개선 된 경우 더 많은 테스트를 통해 풀 요청을 작성하고 감사를 받아들이고 합병하겠습니다.

훨씬 더 읽기 쉬운 솔루션을 제안합니다 (파이썬에서는하지만 모든 정규식에 적용).

def url_path_to_dict(path):
    pattern = (r'^'
               r'((?P<schema>.+?)://)?'
               r'((?P<user>.+?)(:(?P<password>.*?))?@)?'
               r'(?P<host>.*?)'
               r'(:(?P<port>\d+?))?'
               r'(?P<path>/.*?)?'
               r'(?P<query>[?].*?)?'
               r'$'
               )
    regex = re.compile(pattern)
    m = regex.match(path)
    d = m.groupdict() if m is not None else None

    return d

def main():
    print url_path_to_dict('http://example.example.com/example/example/example.html')

인쇄물:

{
'host': 'example.example.com', 
'user': None, 
'path': '/example/example/example.html', 
'query': None, 
'password': None, 
'port': None, 
'schema': 'http'
}

하위 도메인은 최상위 도메인과 마찬가지로 여러 부분을 가질 수 있기 때문에 하위 도메인 및 도메인이 어렵습니다. http://sub1.sub2.domain.co.uk/

 the path without the file : http://[^/]+/((?:[^/]+/)*(?:[^/]+$)?)  
 the file : http://[^/]+/(?:[^/]+/)*((?:[^/.]+\.)+[^/.]+)$  
 the path with the file : http://[^/]+/(.*)  
 the URL without the path : (http://[^/]+/)

(Markdown은 Regexes에 매우 친절하지 않습니다)

이 개선 된 버전은 파서처럼 안정적으로 작동해야합니다.

   // Applies to URI, not just URL or URN:
   //    http://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Relationship_to_URL_and_URN
   //
   // http://labs.apache.org/webarch/uri/rfc/rfc3986.html#regexp
   //
   // (?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?
   //
   // http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax
   //
   // $@ matches the entire uri
   // $1 matches scheme (ftp, http, mailto, mshelp, ymsgr, etc)
   // $2 matches authority (host, user:pwd@host, etc)
   // $3 matches path
   // $4 matches query (http GET REST api, etc)
   // $5 matches fragment (html anchor, etc)
   //
   // Match specific schemes, non-optional authority, disallow white-space so can delimit in text, and allow 'www.' w/o scheme
   // Note the schemes must match ^[^\s|:/?#]+(?:\|[^\s|:/?#]+)*$
   //
   // (?:()(www\.[^\s/?#]+\.[^\s/?#]+)|(schemes)://([^\s/?#]*))([^\s?#]*)(?:\?([^\s#]*))?(#(\S*))?
   //
   // Validate the authority with an orthogonal RegExp, so the RegExp above won’t fail to match any valid urls.
   function uriRegExp( flags, schemes/* = null*/, noSubMatches/* = false*/ )
   {
      if( !schemes )
         schemes = '[^\\s:\/?#]+'
      else if( !RegExp( /^[^\s|:\/?#]+(?:\|[^\s|:\/?#]+)*$/ ).test( schemes ) )
         throw TypeError( 'expected URI schemes' )
      return noSubMatches ? new RegExp( '(?:www\\.[^\\s/?#]+\\.[^\\s/?#]+|' + schemes + '://[^\\s/?#]*)[^\\s?#]*(?:\\?[^\\s#]*)?(?:#\\S*)?', flags ) :
         new RegExp( '(?:()(www\\.[^\\s/?#]+\\.[^\\s/?#]+)|(' + schemes + ')://([^\\s/?#]*))([^\\s?#]*)(?:\\?([^\\s#]*))?(?:#(\\S*))?', flags )
   }

   // http://en.wikipedia.org/wiki/URI_scheme#Official_IANA-registered_schemes
   function uriSchemesRegExp()
   {
      return 'about|callto|ftp|gtalk|http|https|irc|ircs|javascript|mailto|mshelp|sftp|ssh|steam|tel|view-source|ymsgr'
   }

다음을 시도하십시오.

^((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+@)?([a-zA-Z]{1}([\w\-]+\.)+([\w]{2,5}))(:[\d]{1,5})?((/?\w+/)+|/?)(\w+\.[\w]{3,4})?((\?\w+=\w+)?(&\w+=\w+)*)?

HTTP / FTP, 하위 도메인, 폴더, 파일 등을 지원합니다.

빠른 Google 검색에서 찾았습니다.

http://geekswithblogs.net/casualjim/archive/2005/12/01/61722.aspx

/^((?P<scheme>https?|ftp):\/)?\/?((?P<username>.*?)(:(?P<password>.*?)|)@)?(?P<hostname>[^:\/\s]+)(?P<port>:([^\/]*))?(?P<path>(\/\w+)*\/)(?P<filename>[-\w.]+[^#?\s]*)?(?P<query>\?([^#]*))?(?P<fragment>#(.*))?$/

내 대답에서 비슷한 질문. 일부 버그가 있었기 때문에 언급 된 다른 사람들보다 더 잘 작동합니다 (예 : 사용자 이름/암호를 지원하지 않고 단일 문자 파일 이름을 지원하지 않고 조각 식별자가 파손되었습니다).

.NET에서 URI 객체를 사용하여 모든 HTTP/HTTPS, 호스트, 포트, 경로 및 쿼리를 얻을 수 있습니다. 어려운 작업은 호스트를 하위 도메인, 도메인 이름 및 TLD로 나누는 것입니다.

그렇게 할 표준은 없으며 단순히 문자열 구문 분석 또는 정규식을 사용하여 올바른 결과를 생성 할 수 없습니다. 처음에는 Regex 기능을 사용하고 있지만 모든 URL이 하위 도메인을 올바르게 구문 분석 할 수는 없습니다. 연습 방법은 TLD 목록을 사용하는 것입니다. URL에 대한 TLD가 정의 된 후 왼쪽 부분은 도메인이고 나머지는 하위 도메인입니다.

그러나 새로운 TLD가 가능하기 때문에 목록은 유지해야합니다. 내가 아는 순간은 publicsuffix.org입니다. 최신 목록을 유지하고 Google 코드의 DomainName-Parser 도구를 사용하여 공개 접미어 목록을 구문 분석하고 DomainName 객체를 사용하여 서브 도메인, 도메인 및 TLD를 쉽게 가져올 수 있습니다. .Domain 및 domainname.tld.

이 답변도 도움이됩니다.URL에서 하위 도메인을 가져옵니다

Callmelann

다음은 완전한 것이며 프로토콜에 의존하지 않습니다.

function getServerURL(url) {
        var m = url.match("(^(?:(?:.*?)?//)?[^/?#;]*)");
        console.log(m[1]) // Remove this
        return m[1];
    }

getServerURL("http://dev.test.se")
getServerURL("http://dev.test.se/")
getServerURL("//ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js")
getServerURL("//")
getServerURL("www.dev.test.se/sdas/dsads")
getServerURL("www.dev.test.se/")
getServerURL("www.dev.test.se?abc=32")
getServerURL("www.dev.test.se#abc")
getServerURL("//dev.test.se?sads")
getServerURL("http://www.dev.test.se#321")
getServerURL("http://localhost:8080/sads")
getServerURL("https://localhost:8080?sdsa")

인쇄물

http://dev.test.se

http://dev.test.se

//ajax.googleapis.com

//

www.dev.test.se

www.dev.test.se

www.dev.test.se

www.dev.test.se

//dev.test.se

http://www.dev.test.se

http://localhost:8080

https://localhost:8080

위의 어느 것도 나를 위해 일하지 않았습니다. 다음은 다음을 사용하여 다음과 같습니다.

/^(?:((?:https?|s?ftp):)\/\/)([^:\/\s]+)(?::(\d*))?(?:\/([^\s?#]+)?([?][^?#]*)?(#.*)?)?/

나는 "JavaScript : The Good Parts"에 출판 된 Regex를 좋아합니다. 너무 짧고 복잡하지 않습니다. github 의이 페이지에는이 페이지에도 사용되는 JavaScript 코드도 있습니다. 그러나 그것은 모든 언어에 적합합니다.https://gist.github.com/voodoogq/4057330

Java는이를 수행 할 URL 클래스를 제공합니다. 쿼리 URL 객체.

참고로 PHP는 제공합니다 parse_url ().

Regex를 사용하지 않는 것이 좋습니다. API 호출 winhttpcrackurl () 오류가 적습니다.

http://msdn.microsoft.com/en-us/library/aa384092%28vs.85%29.aspx

나는 내 요구를 충족시키지 않은 이들 중 몇 가지를 시도했는데, 특히 경로없이 URL을 잡지 않은 가장 높은 투표권.http://example.com/)

또한 그룹 이름이 부족하여 ansible (또는 내 Jinja2 기술이 부족할 수 있음)에서 사용할 수 없었습니다.

따라서 이것은 소스가 가장 높은 투표 버전 인 소스로 약간 수정 된 내 버전입니다.

^((?P<protocol>http[s]?|ftp):\/)?\/?(?P<host>[^:\/\s]+)(?P<path>((\/\w+)*\/)([\w\-\.]+[^#?\s]+))*(.*)?(#[\w\-]+)?$

사용 http://www.fileformat.info/tool/regex.htm Hometoast의 Regex는 훌륭하게 작동합니다.

그러나 여기에 거래가 있습니다. 저는 프로그램의 다른 상황에서 다른 Regex 패턴을 사용하고 싶습니다.

예를 들어이 URL이 있으며 프로그램에 지원되는 모든 URL을 나열하는 열거가 있습니다. 열거의 각 객체에는 REGEX 패턴을 반환하는 방법이있는 방법이 있습니다. 그런 다음 URL과 비교하는 데 사용됩니다. 특정 Regex 패턴이 True를 반환하면이 URL이 내 프로그램에서 지원된다는 것을 알고 있습니다. 따라서 각 열거는 URL 내부를보아야하는 위치에 따라 자체 재규모가 있습니다.

HomeToast의 제안은 훌륭하지만 제 경우에는 도움이되지 않을 것이라고 생각합니다 (모든 열거에서 동일한 정규 페이스트를 복사하지 않는 한).

그렇기 때문에 각 상황에 대한 Regex를 별도로 제공하기 위해 답변을 원했습니다. HomeToast의 경우 +1. ;)

나는 당신이 이것에 대해 언어에 대한 언어를 주장하고 있다는 것을 알고 있지만, 당신이 사용하고있는 것을 우리에게 말할 수 있도록 우리는 당신이 가지고있는 기능을 알고 있습니까?

캡처되지 않은 경기 능력이있는 경우 캡처에 관심이없는 하위 표현이 다음과 같이 설정되도록 HomeToast의 표현식을 수정할 수 있습니다.

(?:SOMESTUFF)

다시는 복사 및 붙여 넣기 (그리고 약간 수정)를 여러 곳으로 복사하고 약간 수정해야하지만 이는 의미가 있습니다. 하위 표현이 존재하는지 확인하는 것이 아니라 오히려 존재하는 경우 URL의 일부로. 하위 표현을 위해 비과기 수정자를 사용하면 필요한 것을 줄 수 있습니다. 내가 당신을 올바르게 읽는다면 원하는 것입니다.

작고 작은 메모와 마찬가지로 Hometoast의 표현은 'https'의 'S'주위에 괄호를 넣을 필요가 없습니다. 수량자는 한 문자 (또는 문자 클래스 또는 하위 표현)를 바로 앞에 정량화합니다. 그래서:

https?

'http'또는 'https'는 괜찮습니다.

regexp 파일없이 URL 경로를 얻습니다.

url = 'http : // domain/dir1/dir2/somefile'url.scan (/^(http : // [^/]+) ((? :/[^/]+)+(? =/))?/? (? : [^/]+)? $ /i).to_s

이 URL에 상대 경로를 추가하는 데 유용 할 수 있습니다.

String s = "https://www.thomas-bayer.com/axis2/services/BLZService?wsdl";

String regex = "(^http.?://)(.*?)([/\\?]{1,})(.*)";

System.out.println("1: " + s.replaceAll(regex, "$1"));
System.out.println("2: " + s.replaceAll(regex, "$2"));
System.out.println("3: " + s.replaceAll(regex, "$3"));
System.out.println("4: " + s.replaceAll(regex, "$4"));

다음 출력을 제공합니다.
1 : https : //
2 : www.thomas-bayer.com
3: /
4 : axis2/services/blzservice? wsdl

URL을 변경하면
문자열 s = "https://www.thomas-bayer.com?wsdl=qwerwer&ttt=888"; 출력은 다음과 같습니다.
1 : https : //
2 : www.thomas-bayer.com
3: ?
4 : WSDL = QWERWER & TTT = 888

즐겨..
요시 레브

완전한 구문 분석을하는 동정인은 상당히 끔찍합니다. 가독성에 대한 Backreferences라는 이름의 이름이 포함되어 있으며 각 부분을 별도의 선으로 끊었지만 여전히 다음과 같습니다.

^(?:(?P<protocol>\w+(?=:\/\/))(?::\/\/))?
(?:(?P<host>(?:(?:&(?:amp|apos|gt|lt|nbsp|quot|bull|hellip|[lr][ds]quo|[mn]dash|permil|\#[1-9][0-9]{1,3}|[A-Za-z][0-9A-Za-z]+);)|[^\/?#:]+)(?::(?P<port>[0-9]+))?)\/)?
(?:(?P<path>(?:(?:&(?:amp|apos|gt|lt|nbsp|quot|bull|hellip|[lr][ds]quo|[mn]dash|permil|\#[1-9][0-9]{1,3}|[A-Za-z][0-9A-Za-z]+);)|[^?#])+)\/)?
(?P<file>(?:(?:&(?:amp|apos|gt|lt|nbsp|quot|bull|hellip|[lr][ds]quo|[mn]dash|permil|\#[1-9][0-9]{1,3}|[A-Za-z][0-9A-Za-z]+);)|[^?#])+)
(?:\?(?P<querystring>(?:(?:&(?:amp|apos|gt|lt|nbsp|quot|bull|hellip|[lr][ds]quo|[mn]dash|permil|\#[1-9][0-9]{1,3}|[A-Za-z][0-9A-Za-z]+);)|[^#])+))?
(?:#(?P<fragment>.*))?$

너무 장악 해야하는 것은 프로토콜이나 포트를 제외하고는 모든 부품에 HTML 엔티티가 포함되어있어 조각의 묘사가 매우 까다 롭습니다. 따라서 마지막 몇 가지 경우 - 호스트, 경로, 파일, 쿼리 스트링 및 파편은 HTML 엔티티 또는없는 문자를 허용합니다. ? 또는 #. HTML 엔티티의 정규식은 다음과 같습니다.

$htmlentity = "&(?:amp|apos|gt|lt|nbsp|quot|bull|hellip|[lr][ds]quo|[mn]dash|permil|\#[1-9][0-9]{1,3}|[A-Za-z][0-9A-Za-z]+);"

그것이 추출 될 때 (나는 그것을 나타내기 위해 콧수염 구문을 사용 했음) 조금 더 읽기 쉬워집니다.

^(?:(?P<protocol>(?:ht|f)tps?|\w+(?=:\/\/))(?::\/\/))?
(?:(?P<host>(?:{{htmlentity}}|[^\/?#:])+(?::(?P<port>[0-9]+))?)\/)?
(?:(?P<path>(?:{{htmlentity}}|[^?#])+)\/)?
(?P<file>(?:{{htmlentity}}|[^?#])+)
(?:\?(?P<querystring>(?:{{htmlentity}};|[^#])+))?
(?:#(?P<fragment>.*))?$

물론 JavaScript에서는 이름이 지정된 Backreferences를 사용할 수 없으므로 REGEX가됩니다.

^(?:(\w+(?=:\/\/))(?::\/\/))?(?:((?:(?:&(?:amp|apos|gt|lt|nbsp|quot|bull|hellip|[lr][ds]quo|[mn]dash|permil|\#[1-9][0-9]{1,3}|[A-Za-z][0-9A-Za-z]+);)|[^\/?#:]+)(?::([0-9]+))?)\/)?(?:((?:(?:&(?:amp|apos|gt|lt|nbsp|quot|bull|hellip|[lr][ds]quo|[mn]dash|permil|\#[1-9][0-9]{1,3}|[A-Za-z][0-9A-Za-z]+);)|[^?#])+)\/)?((?:(?:&(?:amp|apos|gt|lt|nbsp|quot|bull|hellip|[lr][ds]quo|[mn]dash|permil|\#[1-9][0-9]{1,3}|[A-Za-z][0-9A-Za-z]+);)|[^?#])+)(?:\?((?:(?:&(?:amp|apos|gt|lt|nbsp|quot|bull|hellip|[lr][ds]quo|[mn]dash|permil|\#[1-9][0-9]{1,3}|[A-Za-z][0-9A-Za-z]+);)|[^#])+))?(?:#(.*))?$

그리고 각 경기에서 프로토콜은 다음과 같습니다 \1, 호스트입니다 \2, 포트는입니다 \3, 경로 \4, 파일 \5, 쿼리 스트링 \6, 그리고 조각 \7.

//USING REGEX
/**
 * Parse URL to get information
 *
 * @param   url     the URL string to parse
 * @return  parsed  the URL parsed or null
 */
var UrlParser = function (url) {
    "use strict";

    var regx = /^(((([^:\/#\?]+:)?(?:(\/\/)((?:(([^:@\/#\?]+)(?:\:([^:@\/#\?]+))?)@)?(([^:\/#\?\]\[]+|\[[^\/\]@#?]+\])(?:\:([0-9]+))?))?)?)?((\/?(?:[^\/\?#]+\/+)*)([^\?#]*)))?(\?[^#]+)?)(#.*)?/,
        matches = regx.exec(url),
        parser = null;

    if (null !== matches) {
        parser = {
            href              : matches[0],
            withoutHash       : matches[1],
            url               : matches[2],
            origin            : matches[3],
            protocol          : matches[4],
            protocolseparator : matches[5],
            credhost          : matches[6],
            cred              : matches[7],
            user              : matches[8],
            pass              : matches[9],
            host              : matches[10],
            hostname          : matches[11],
            port              : matches[12],
            pathname          : matches[13],
            segment1          : matches[14],
            segment2          : matches[15],
            search            : matches[16],
            hash              : matches[17]
        };
    }

    return parser;
};

var parsedURL=UrlParser(url);
console.log(parsedURL);

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow