A library for Windows to extract the plaintext of several file formats

In Progress Posted Sep 17, 2015 Paid on delivery
In Progress Paid on delivery

A library for Windows to extract the plaintext of several file formats

I need a library in .dll and .lib forms, that extracts the plaintext of some file formats, listed below:

• Microsoft Word Files (.doc, .docx)

• Microsoft Excel Files (.xls, xlsx)

• Microsoft Access (.mdb, .accdb)

• Adobe Acrobat Files (.pdf)

• IBM Lotus Notes database file (.nsf)

The library must have a callable function with the following signature

BOOL ExtractPlaintextFromFile(PCTSTR FilePath, TextCallback Callback);

The first parameter will be a pointer to a unicode string containing the file path to extract the plaintext from (ex. D:\[login to view URL])

The second parameter will be a plaintext processing callback, explained below.

The return value must be TRUE on success and FALSE on error.

The callback function must have the following signature, with each parameter explained.

typedef BOOL (*TextCallback)(PCTSTR Text, SIZE_T TextLength, PCTSTR SourceFile);

Text: Pointer to a buffer that contains all or part of the extracted plaintext in unicode. If the file is to be extracted in chunk or parts, the callback can be safely called again pointing to the new chunk or part.

TextLength: Length in characters of the buffer pointed by Text.

SourceFile: Pointer to a buffer that contains the originating source file (ex. D:\[login to view URL])

Return value: TRUE on success, FALSE on error.

The project must be delivered in one or two .sln files (Visual Studio solution file) to the choice of the developer.

If only one .sln file is provided it must compile everything from scratch to a demo application

If two .sln file are provided one must be for all the possible dependencies of the project (external libraries and such) and other for the main library and the demo application

The demo application must be a simple application that calls the extracting function with a provided sample file for each of the supported file formats.

The callback function of the demo must simply save the extracted contents to a file with the .txt extension added. (ex. D:\[login to view URL]).

The goal of the demo is to extract all the plaintext from all the included sample files.

The included sample files were uploaded as a multipart rar file due upload file size limitations

As expected, converting from formats with special formatting like PDFs to plaintext can lead to loss of text positioning or format. This is no problem for my requirements. As long as all available text from the document is extracted, superfluous whitespace is not a problem.

Additionally, the library must meet the following technical specifications:

• It must be coded in C or C++ (Avoid using C++0x/C++11)

• It must be able to run in any version of windows from Windows XP SP1 to the latest version. (Windows XP SP1 to SP3, Windows Vista Retail to SP2, Windows 7 Retail and SP1, Windows 8, Windows 8.1 and Windows 10)

• The library must be self-contained. This means that it should not depend on any external libraries, installed programs, DLLs or frameworks that are not included in a clean installation of Windows XP SP1 (That is, an installation of Windows XP with SP1 with no extra programs or system updates installed).

• It must not have any graphical interface, play any sound or generate any kind of alert to the user

• You must deliver all the source code that generates the final library; no precompiled libraries will be accepted.

• You must document all the external libraries used by the library, including the version used, direct download link and detailed notes about any changes to the original source code of such libraries.

• The final binaries should be compiled using Visual Studio 2010 or higher and compiled with the Runtime Library option set to Multi-threaded (/MT).

If you are interested in the job please answer this request with the following information:

• Estimated time of development.

• What is your favorite animal pet

Your proposal will be subject to approval

C Programming C++ Programming Windows API Windows Desktop

Project ID: #8455585

About the project

9 proposals Remote project Active Sep 23, 2015

9 freelancers are bidding on average $596 for this job

Yknox

Hello I'm interesting your project very well I'm a Good C++/C#, Java, Math, Algorithm expert. I understand your req exactly. I m quite well experienced in these jobs. Let's go ahead with me I want to service More

$750 USD in 8 days
(614 Reviews)
8.8
szymszteinsl

Ready to work !Ready to work !Ready to work !Ready to work !Ready to work !Ready to work !Ready to work !Ready to work !Ready to work !Ready to work !Ready to work !Ready to work !Ready to work !Ready to work !Ready to More

$500 USD in 5 days
(93 Reviews)
7.3
samitXI

A proposal has not yet been provided

$750 USD in 10 days
(94 Reviews)
6.8
hbxfnzwpf

I am very proficient in c, c++. I have 15 years c++ developing experience now, and I have worked for 5 years. My work is online game developing, and mainly focus on server side, the language is c++ under windows. I use More

$400 USD in 7 days
(129 Reviews)
6.9
mahershahmeer

Hi there! I just read the proposal, it is pretty detailed. I just have one offer to make if you that makes you interested then Please do contact me for a demo. I already made a simple command line application More

$250 USD in 10 days
(96 Reviews)
6.0
SUog

Hello, I have a partial solution for your problem. Some time ago I developed a C++ dll that extracts text from *.doc files (not *docx). The library is self-contained and works on raw doc files doesn't require MS W More

$250 USD in 10 days
(94 Reviews)
5.9
zuiguanglong

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

$736 USD in 10 days
(21 Reviews)
5.0
ddarz4u

A proposal has not yet been provided

$555 USD in 10 days
(0 Reviews)
0.0