Python 2.7 : Can't figure out how to parse a tree with BeautifulSoup4

https://stackoverflow.com/questions/22889349

28-06-2023
|

Question

I am trying to parse this site to create 5 lists, one for each day and filled with one string for each announcement. For example

[in]   custom_function(page)

[out]  [[<MONDAYS    ANNOUNCEMENTS>],
        [<TUESDAYS   ANNOUNCEMENTS>],
        [<WEDNESDAYS ANNOUNCEMENTS>],
        [<THURSDAYS  ANNOUNCEMENTS>],
        [<FRIDAYS    ANNOUNCEMENTS>]]

But I can't figure out the correct way to do this.

This is what I have so far

from bs4 import BeautifulSoup
import requests
import datetime

url = http://mam.econoday.com/byweek.asp?day=7&month=4&year=2014&cust=mam&lid=0




# Get the text of the webpage
r               = requests.get(url)
data            = r.text
soup            = BeautifulSoup(data)


full_table_1 = soup.find('table', 'eventstable')

ScreenShot of Website Developers Tools

I Figured out that what I want is in the highlighted tag, but I'm not sure how to get to that exact tag and then parse out the times/announcements into a list. I've tried multiple methods but it just keeps getting messier.

What do I do?

Solution

The idea is to find all td elements with events class, then read div elements inside:

data = []
for day in soup.find_all('td', class_='events'):
    data.append([div.text for div in day.find_all('div', class_='econoevents')])

print data

prints:

[[u'Gallup US Consumer Spending Measure8:30 AM\xa0ET',
  u'4-Week Bill Announcement11:00 AM\xa0ET',
  u'3-Month Bill Auction11:30 AM\xa0ET',
  ...
 ],
 ...
]

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow