Here is the source file I made from your description
cat file
www.domain.com
Quite a popular domain name
www.domain.com
I should buy this domain
Whenever I happen to have enough money for this
All entries are separated by single blank lines. And sometimes the URLs are in markdown format:
[domain.com](www.domain.com)
How would I crawl the folder for duplicate URLs?
Using awk to export the duplicate domain name:
awk 'BEGIN{FS="\n";RS=""}
{ if ($1~/\[/) { split($1,a,"[)(]"); domain[a[2]]++}
else {domain[$1]++}
}
END{ for (i in domain)
if (domain[i]>1) print "Duplicate domain found: ",i
}' file
Duplicate domain found: www.domain.com