Question

I have some html text that I need to fix the URL text on. I need to:

1) convert text within the URL to lowercase also 2) converting any spaces to hyphens within the URL also 3) deleting any parenthesis from URL

I have multiple occurances of this pattern within each file:

<div class="classname"><a href="/URL"><img src="${asset.image/url}" alt="TEXT" class="another-class-name" ></a></div>

Example:
I want to change this pattern: <div class="classname"><a href="/URL-EXAMPLE-ONE"><img src="${asset.image/url}" alt="TEXT" class="another-class-name" ></a></div>

To: <div class="classname"><a href="/url-example-one)"><img src="${asset.image/url}" alt="TEXT" class="another-class-name" ></a></div>

I have a number of files, and want to do an infile substitution. The /URL-EXAMPLE-ONE could have any combination of SPACE, Parenthesis too.

From a previous suggestion I'm using the following SED script:

/sw/bin/sed -e '/<div class="mk-man-logo-mod5-m"><a href="\/[A-Z -{}&]*"></ {
   h;
   s/.*<div class="mk-man-logo-mod5-m"><a href="\/\(.*\)"><img.*/\1/;
   s/\(.*\)/\L\1/;
   s/[ &]/-/g;
   s/[()]//g;
   s/<img.*//;
   x;
   s/\(.*<div class="mk-man-logo-mod5-m"><a href="\/\)\(.*\)\(<img.*\)/\1\3/;
   G;
   s/\n//;
   }' $e

But the output I'm getting is, as an example:

Original text: <div class="classname"><a href="/ABC (D&E)"><img src="${asset.images/common/manufacturer_logos/medium/abb-m.gif}" alt="TEXT" class="another-classname" ></a></div>

Transformed text: <div class="classname"><a href="/<img src="${asset.images/url}" alt="TEXT" class="another-classname" abc-d-ediv>

Actually want: <div class="classname"><a href="/abc-d-e"><img src="${asset.images/url}" alt="TEXT" class="another-classname"></a></div>

Could anyone help further? I've been burning many hours on this; I'm not a SED expert but think I'm close here but missing something.

Many thanks in advance, Alex

Was it helpful?

Solution

This seems to work:

sed '
\#<div class="mk-man-logo-mod5-m"><a href="/[A-Z &()-]\+"# {
  h
  s#<div class="mk-man-logo-mod5-m"><a href="/[A-Z &()-]\+##
  x
  s#.*href="/\(.*\)"><img src.*#\1#
  s#.*#\L&#
  s#[ &]#-#g
  s#[()]##g
  s#^#<div class="mk-man-logo-mod5-m"><a href="/#
  G
  s#\n##
}'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top