JAVA application that goes through a set of URLs, retrieve HTML and PDF content, analyse and store it.
€500-1100 EUR
Folyamatban
Kiadva ekkor: majdnem 9 évvel ezelőtt
€500-1100 EUR
Teljesítéskor fizetve
I need to develop a Java application to be run from a tomcat web application name Openbravo. It's not a must to be an Openbravo ERP experienced developer, as I will provide an already configured ubuntu development environment, in a virtual box virtual machine, with everything (eclipse, pgAdmin, ...) already configured, so the developer will just need to use that virtual machine to develop, debug and run the code, and a browser to run the Openbravo ERP instance.
This java application will be run regularly and will go through several websites, retrieving the content (sometimes in HTML format, some others in PDF format) and storing the content retrieved locally. Once retrieved, one out of this two things must be done:
- Compare the retrieved content with the content obtained from the same URL in the previous data retrieve, to highlight in case there is some change. This is, there has been detected changes in the website.
- Go through the content, save it in database, and look for some keywords inside the text being retrieved.
The list of websites to be visited will be obtained in two different ways:
- A list of static URLs
- A list of URLs that must be calculated, according to a certain logic.
This is explained in deep in the attached documents.
The project can be easily divided into several