Question

I've got a HTML file, and I'd like to grab all the links that are in the file and save it into another file using Vim.

I know that the regex would be something like:

:g/href="\v([a-z_/]+)"/

but I don't know where to go from here.

Was it helpful?

Solution

Put your cursor in the first row/column and try this:

:redir > output.txt|while search('href="', "We")|exe 'normal yi"'|echo @"|endwhile|redir END

OTHER TIPS

Jeff Meatball Yang was almost there.

As Sasha wrote if you use w it writes the full original file to the outfile

To only write the matched line, you have to add '.' before 'w':

:g/href="\v([a-z_/]+)"/ .w >> outfile

Note that the outfile needs to exists.

The challenge here lies with extracting all of the links where there may be multiple on line, otherwise you'd be able to simply do:

" Extract all lines with href=
:g/href="[^"]\+"/w >> list_of_links.txt
" Open the new file
:e list_of_links.txt
" Extract the bit inside the quotation marks
:%s/.*href="\([^"]\+\)".*/\1/

The simplest approach would probably be to do this:

" Save as a new file name
:saveas list_of_links.txt
" Get rid of any lines without href=
:g!/href="\([^"]\+\)"/d
" Break up the lines wherever there is a 'href='
:%s/href=/\rhref=/g
" Tidy up by removing everything but the bit we want
:%s/^.*href="\([^"]\+\)".*$/\1/

Alternatively (following a similar theme),

:g/href="[^"]\+"/w >> list_of_links.txt
:e list_of_links.txt
:%s/href=/\rhref=/g
:%s/^.*href="\([^"]\+\)".&$/\1/

(see :help saveas, :help :vglobal, :help :s)

However, if you really wanted to do it in a more direct way, you could do something like this:

" Initialise register 'h'
:let @h = ""
" For each line containing href=..., get the line, and carry out a global search
" and replace that extracts just the URLs and a double quote (as a delimiter)
:g/href="[^"]\+"/let @h .= substitute(getline('.'), '.\{-}href="\([^"]\+\)".\{-}\ze\(href=\|$\)', '\1"', 'g')
" Create a new file
:new
" Paste the contents of register h (entered in normal mode)
"hp
" Replace all double quotes with new-lines
:s/"/\r/g
" Save
:w

Finally, you could do it in a function with a for loop, but I'll leave that for someone else to write!

clear reg:x

qxq

search regex(whatever) and append to reg:x

:g/regex/call setreg('X', matchstr(getline('.'), 'regex') . "\n")

open a new tab

:tabnew outfile

put reg:x

"xp

write file

:w

Have you tried this?

:g/href="\v([a-z_/]+)"/w >> outfile

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top