Question

I am reading 3 or 4 news rss feeds from different websites, and merging them in yahoo pipes. I am displaying one image from each news item. Now I am facing two problems.

1> The images are provided in different tags in different feeds. The different tags for the images are:

<media:content medium="image" url="http://metrouk2.files.wordpress.com/2013/08/1000x67025.jpg?w=150&amp;h=150&amp;crop=1">
<media:title type="html">Liverpool v Stoke City - Premier League</media:title>
</media:content>

From another feed:

<media:thumbnail height="340" link="" url="http://www.chelseafc.com/javaImages/4a/7c/0,,10268~12155978,00.jpg" width="640"/

And another feed:

<enclosure length="150" type="image/jpeg" url="http://u.goal.com/187200/187249_thumb.jpg"/>

2> Also in some of the feeds I am getting 3 or 4 media:content data and some of them are not images but mp3 files.And even the image that is related to the news item is not in a fixed position. Sometimes it is fourth media:content sometimes it is first.

For the first issue, I am evaluating the source and extracting information according to the source in the client side but I dont want to do that because I would like to add more feeds in the future and I seriously dont want to handle all those sources explicitly on the client side.

For the second issue I am helpless... I am just displaying the first media:content which sometimes give me the correct image sometimes not.

Also to point out, yahoo pipe executes it properly and shows relevant images according to the news item in the yahoo pipes panel.

I am really really struggling with this. Please point me to a right direction for this.

Was it helpful?

Solution

The first problem is easy. For each input feed, use a Rename operator rename or copy dom elements to your unified format. Then after you union all your modified input fields, you can use the unified names.

The second problem is tricky. I usually work around cases like that by using a Rename + Regex + Loop(String Builder) operators like this:

  1. Rename the relevant fields, just so that they become easier to work with later, for example:

    • Rename item.div.div.0.div.div.a.href to link0
    • Rename item.div.div.1.div.div.a.href to link1
    • Rename item.div.div.2.div.div.a.href to link2
    • ...
  2. Using a Regex, try to make the irrelevant link0, link1 empty, for example:

    • In item.link0 replace ^.*[^/]$ with nothing
    • In item.link1 replace ^.*[^/]$ with nothing
    • In item.link2 replace ^.*[^/]$ with nothing
    • ...

    The right regex will depend on your use case. In my example this was suitable to make the irrelevant elements empty.

  3. Using a Loop, with a String Builder inside, I concatenate item.link0 and item.link1 and assign the result to item.link.

As a result, whichever link0, link1, ... had the right link, since all the others were emptied, the concatenated value will be the right one.

It's a hack, but it can work. I actively use this technique in some of my pipes. The tricky part is coming up with the right regex to make the irrelevant values blank.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top