Compare Text Files for Similarity

Completed Posted Jul 17, 2003 Paid on delivery
Completed Paid on delivery

This should be pretty easy. I need an app that looks at a folder full of my different .eml (Outlook email) files and compares them, looking for matching word-strings. Here's how it works: First, each .eml file is parsed in such a way that all that's left is the subject line and what's written by the sender. That means all the header info above the subject line is ignored, as is all the lines that may be replies or earlier email contents. (In other words, every line beginning with ">". ) Now, comes the comparing: Find all the 10 word strings (if any) that occur in the body of multiple emails. Then, all the 9 word strings that occur in the body of more than one email (if any). Then, all the 8 word strings, etc. Until you're finding single words that appear in multiple emails. (Not multiple times in the same email!) Count the number of emails that each string appears in, and then calculate the percentage of times they appear. Present this data in a simple text file. Here's an example of what it might look like: Total Emails in Folder = 164 Message Body Comparison: String Appearances Percentage --------------------------------------------- "i would love to" 7/164 4.3% "i would like to" 9/164 5.5% "i would like" 14/164 8.5% "i would love" 12/164 7.3% "i would" 27/164 16.5% "to" 33/164 20.1% "would" 41/164 25.0% "i" 113/164 68.9% [Then, I want to do the same thing for the subject lines. Like this...] Subject Line Comparison: String Appearances Percentage --------------------------------------------- "help please" 14/164 8.5% "help" 22/164 13.4% "please" 77/164 46.9% That's it. Thanks for bidding, and if you have any questions, please let me know!

## Deliverables

1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done. 2) Installation package that will install the software (in ready-to-run condition) on the platform(s) specified in this bid request. 3) Complete ownership and distribution copyrights to all work purchased.

## Platform

Windows

.NET C Programming C# Programming Delphi Engineering MySQL PHP Software Architecture Software Testing Visual Basic

Project ID: #2954368

About the project

9 proposals Remote project Active Jul 18, 2003

Awarded to:

rizwanahmedvw

See private message.

$12 USD in 7 days
(10 Reviews)
2.3

9 freelancers are bidding on average $50 for this job

softservicesvw

See private message.

$68 USD in 7 days
(329 Reviews)
7.6
SelbySolutions

See private message.

$85 USD in 7 days
(22 Reviews)
5.4
viraltrivedivw

See private message.

$34 USD in 7 days
(50 Reviews)
5.1
michaeldweber

See private message.

$34 USD in 7 days
(34 Reviews)
4.6
teamvw

See private message.

$21.25 USD in 7 days
(36 Reviews)
3.8
csmbavw

See private message.

$85 USD in 7 days
(2 Reviews)
2.5
pavelgritsay

See private message.

$42.5 USD in 7 days
(0 Reviews)
0.0
clgibson

See private message.

$63.75 USD in 7 days
(0 Reviews)
0.0
kenfraser

See private message.

$42.5 USD in 7 days
(0 Reviews)
0.0