I have a word document that is very structured and i would like to parse it to excel so we have actual data. the document lists locations of bars and they are grouped by country and genre.
The elements in the document would be pretty easy to define. we have some higher level groupings that i need your advice on but as for the listings themselves they are like this
Name (optional call sign)
address
email address
website
description
sometimes you will notice the order is a littel different but i think based on some indicators like http or @ or a state/zip we can easily infer what should go where.
the document is about 1000 pages so we need something that can work against a huge bulk of content. The document is in word now but i think it would be easy to paste this to html and build rules around markup vs interacting with word directly.
Let me know if you are interested and i will send you a few pages of the document so you can bid
I want your code when you are done so i can run it a few times. ideally this is c#