I'm looking to scrape some simple HTML from a website. It's just 4 pieces of information, and the HTML is very easy, it looks like this:
<span class="price">$375</span>
<span class="description">description</span>
<a class="title" href="link" title="Title">
So we have the price, description, title, and link to be scraped.
There will be a file, called [login to view URL] that looks like this:
"Montreal Alerts";"[login to view URL]"
"Ottawa Alerts";"[login to view URL]"
"Toronto Alerts";"[login to view URL]"
And I can add as many lines to this text file that I want.
There is a second file called [login to view URL] and when it is executed from the CRON, it starts on line 1 of [login to view URL] and searches
[login to view URL] for the information to be scraped.
In a file called [login to view URL], it stores the title and description of the entries found at [login to view URL]
And it also sends me an e-mail with the entries that it found:
Subject: Montreal Alerts
Body:
Price - Title - Description - Link
Price - Title - Description - Link
Price - Title - Description - Link
The next time that [login to view URL] is executed, it starts on
line 2 of [login to view URL] and searches:
[login to view URL]
In [login to view URL] it stores the title and description of the entries again, but only if they are unique, and not already in [login to view URL]
Again, if there are any unique entries, it would send me an e-mail about those entries. Entries that are already in [login to view URL] are not entered again and I do not receive an e-mail notification about those entries.
I would like to accomplish this without the use of a database.