Extract text from PDF files
$100-500 USD
Paid on delivery
Create C++/MFC classes with methods that will extract all text in a PDF file to a list. You will need to learn the format for PDF files and then write the code needed to read through the file and output each line or element of text (including the font and x,y coordinates on the page) to a list. The list should be a MFC container class CList that will contain pointers to the elements of text objects found in the pdf file. Each text object will have members containing the actual text string, x y coordinates, and font ID, and the page number if possible. See PDFtext.h for a proposed definition of this class. There should be at least one other class with methods to specify the path name of the pdf file, and to supply a pointer to the CList container described above. Sample pdf files are attached for testing, but your code should work with any pdf file that can be opened with Adobe Reader 7.
## Deliverables
1) Complete source code of all work done using only C++. The code should be able to be compiled with MS Visual C++ on a Windows platform. Use of MFC is preferred for the container classes, but not required. Use of third party libraries is not acceptable unless we can own all rights to the source code (non exclusively) and distribute it in executable form without any obligations to any third parties. The code must be well structured and heavily commented within the code so that anyone proficient with C++ and VC++ can easily modify it without having to spend a lot of time studying the code.
2) Coder will supply a simple test app, with source code and project files, that we can compile and run with VC++ that will demonstrate the interface to all the classes.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
## Platform
Windows 2000, XP
Project ID: #3595068