Question

I need to create a newsletters by URL. I to do next:

  1. Create a WebClient;
  2. Use WebClient's method DownloadData to get a source of page in byte array;
  3. Get string from source-html byte array and set it to the newsletter content.

But I have some troubles with paths. All elements' sources were relative (/img/welcome.png) but I need absolute (http://www.mysite.com/img/welcome.png).

How can I do this?

Best regards, Alex.

Was it helpful?

Solution

One of the possible ways to resolve this task is the use the HtmlAgilityPack library.

Some example (fix links):

WebClient client = new WebClient();
byte[] requestHTML = client.DownloadData(sourceUrl);
string sourceHTML = new UTF8Encoding().GetString(requestHTML);

HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(sourceHTML);

foreach (HtmlNode link in htmlDoc.DocumentNode.SelectNodes("//a[@href]"))
{
    if (!string.IsNullOrEmpty(link.Attributes["href"].Value))
    {
        HtmlAttribute att = link.Attributes["href"];
        att.Value = this.AbsoluteUrlByRelative(att.Value);
    }
}

OTHER TIPS

if the request comes in from your site (same domain links) then you can use this:

new Uri(Request.Uri, "/img/welcome.png").ToString();

If you're in a non-web app, or you want to hardcode the domain name:

new Uri("http://www.mysite.com", "/img/welcome.png").ToString();

You have some options:

  1. You can convert your byte array to a string and find replace.
  2. You can create a DOM object, convert the byte array to string, load it and append the value to the attributes where needed (basically you are looking for any src, href attribute that doesn't have http: or https: in it).
    Console.Write(ControlChars.Cr + "Please enter a Url(for example, http://www.msn.com): ")
    Dim remoteUrl As String = Console.ReadLine()
    Dim myWebClient As New WebClient()
    Console.WriteLine(("Downloading " + remoteUrl))
    Dim myDatabuffer As Byte() = myWebClient.DownloadData(remoteUrl)
    Dim download As String = Encoding.ASCII.GetString(myDataBuffer)
    download.Replace("src=""/", "src=""" & remoteUrl & "/")
    download.Replace("href=""/", "href=""" & remoteUrl & "/")
    Console.WriteLine(download)
    Console.WriteLine("Download successful.")

This is super contrived and actually the main brunt of it is taken directly from : http://msdn.microsoft.com/en-us/library/xz398a3f.aspx but it illustrates the basic principal behind method 1.

Just use this function

'# converts relative URL ro Absolute URI
    Function RelativeToAbsoluteUrl(ByVal baseURI As Uri, ByVal RelativeUrl As String) As Uri
        ' get action tags, relative or absolute
        Dim uriReturn As Uri = New Uri(RelativeUrl, UriKind.RelativeOrAbsolute)
        ' Make it absolute if it's relative
        If Not uriReturn.IsAbsoluteUri Then
            Dim baseUrl As Uri = baseURI
            uriReturn = New Uri(baseUrl, uriReturn)
        End If
        Return uriReturn
    End Function

Instead of resolving/completing relative paths, you can try to set the base-element with the href-attrib = the original baseURI in question.

Placed as the first child of the header-element, all following relative paths should be resolved by browser to point to the original destination, not to where the doc (newsletter) is located/comes from.

on firefox, some tautologic(<-in formal logics) to-and-fro of getting/setting of all src/href-attribs resumes in having COMPLETE paths written to all layers(serialized) of the html-doc, thus scriptable, saveable ...:

var d=document;
var n= d.querySelectorAll('[src]'); // do the same for [href] ...
var i=0; var op ="";var ops="";
for (i=0;i<n.length;i++){op = op + n[i].src + "\n";ops=n[i].src;
n[i].src=ops;}
alert(op);

Of course, the url()-func bases as given in the STYLE-Element(s, - for background-img or content-rules) as well as in style-attrib's at node-level and in particular the url()-func-stated src/href-values are NOT regarded/tested by any of the solutions above.

Therefore, to get the base-Elem approach to a valid, tested (compat-list) state, seems the more promising notion to me.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top