Domanda

I'm quite new to Python and I'm trying to get the following done:

  1. Pull RSS feed from Yahoo Finance News API ->YDN
  2. Put the Title, Link,PubDate into a Database table
  3. Store only unique Titles and the latest PubDate

1 and 2 I got through no problem but when I try to do 3, I ran into some trouble. The following is how the table looks like:

enter image description here

As you can see, I have duplicated Titles with different PubDate. What I've been trying to do is to follow the steps ilustrated from Microsoft Kb -> Support Microsoft.

I was able to get through steps 1 - 3 which does:

  1. Selecting all Titles with counts > 1 into a table called holdkey
  2. Selecting all distinct records with Titles equal to Titles in the holdkey table

my code is slightly different from what was shown in the MSFT link:

import sqlite3

cur = db.cursor()
cur.execute('''create table holdkey as 
            select Title,count(*) 
            from bar group by Title
            having count(*)>1''')
cur.execute('''create table holdup as 
            select distinct bar.* 
            from bar, holdkey 
            where bar.Title = holdkey.Title''')

What I need help on is that in my holdup table (as seen in the screenshot), I still get duplicate Titles because the pubDate is different and that it is known as distinct by SQLite. Is there a way to delete the duplicates based on the PubDate?

I would ideally just keep the latest one, but getting rid of the duplicate is more important.

Let me know if more clarification is required. Thanks

È stato utile?

Soluzione

Try

SELECT b.symbol, b.title, b.link, b.pubdate
  FROM bar b JOIN
(
  SELECT title, MAX(pubdate) pubdate
    FROM bar
   GROUP BY title
) q ON b.title = q.title 
   AND b.pubdate = q.pubdate

Here is SQLFiddle demo

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top