delphi html parser
$495-500 USD
Paid on delivery
The goal of this project is to make an "intelligent" html parser to extract data from HTML pages.
This parser should be able to automatically extract data such as:
companyName, address, email, fax, tel, website
this parser must be able to extract N times these data, since html pages will contain tablular data. (N data per page).
[url removed, login to view]();
while ([url removed, login to view]()) do begin;
data:=[url removed, login to view]();
// data should be an object or type like
// [url removed, login to view], [url removed, login to view], [url removed, login to view], [url removed, login to view], [url removed, login to view], [url removed, login to view]
end;
I think a good knowledge of DOM and og REGEX is necessary.
of course it will not work on ALL websites, but should be universal enough.
should work with data from
[url removed, login to view]
[url removed, login to view]
[url removed, login to view]
[url removed, login to view]
etc..
I think the good startegy would be:
1) find a repetitive fragment in the DOM (when a page contains 20 results, it should extract 20 HTML blocks)
2) apply a parser to each block that contain data to be extracted
Should be DELPHI 6 compatible.
Project ID: #3451768