Question

I'm quite new to Python and I'm trying to get the following done:

  1. Pull RSS feed from Yahoo Finance News API ->YDN
  2. Put the Title, Link,PubDate into a Database table
  3. Store only unique Titles and the latest PubDate

1 and 2 I got through no problem but when I try to do 3, I ran into some trouble. The following is how the table looks like:

enter image description here

As you can see, I have duplicated Titles with different PubDate. What I've been trying to do is to follow the steps ilustrated from Microsoft Kb -> Support Microsoft.

I was able to get through steps 1 - 3 which does:

  1. Selecting all Titles with counts > 1 into a table called holdkey
  2. Selecting all distinct records with Titles equal to Titles in the holdkey table

my code is slightly different from what was shown in the MSFT link:

import sqlite3

cur = db.cursor()
cur.execute('''create table holdkey as 
            select Title,count(*) 
            from bar group by Title
            having count(*)>1''')
cur.execute('''create table holdup as 
            select distinct bar.* 
            from bar, holdkey 
            where bar.Title = holdkey.Title''')

What I need help on is that in my holdup table (as seen in the screenshot), I still get duplicate Titles because the pubDate is different and that it is known as distinct by SQLite. Is there a way to delete the duplicates based on the PubDate?

I would ideally just keep the latest one, but getting rid of the duplicate is more important.

Let me know if more clarification is required. Thanks

Was it helpful?

Solution

Try

SELECT b.symbol, b.title, b.link, b.pubdate
  FROM bar b JOIN
(
  SELECT title, MAX(pubdate) pubdate
    FROM bar
   GROUP BY title
) q ON b.title = q.title 
   AND b.pubdate = q.pubdate

Here is SQLFiddle demo

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top