database cleaning/merging/deduplication & fuzzy matching

Completed Posted Jan 18, 2014 Paid on delivery
Completed Paid on delivery

I have a DataBase I'm building (excel) that contains records from many different sources. 77k rows and 50+ columns in total.

I would like to condense it by unique address but keep all the other unique data cells in the rows.

This will require some type of fuzzy matching as the duplicate addresses are not all 100% exact, ie:

300 Water Street suite #3 | Portland | Oregon

300 Water Street | Portland | Oregon

300 Water St | Portland | Oregon

The above examples would all be the same record. Each row may have different corresponding data in the columns that needs to be condensed into one row.

I have normalized the data as much as I can using my limited excel skills and powergrep. I have made sure the states, cities and abbreviations are all consistent for easier duplicate recognition.

I estimate that there is probably 20k actual unique addresses, which is what this should be condensed to, but keeping all the unique cells. making a very rich data set at the end.

I'm not sure if Excel can handle this type of project perhaps you have a better solution using sql or VBA Access or some other db manipulation/deduplication tool.

Let me know via PM how you would best tackle this.

Big Data Sales Database Administration Excel Microsoft Access MySQL

Project ID: #5335522

About the project

18 proposals Remote project Active Feb 8, 2014

Awarded to:

MDavidCrompton

I have been developing applications in both Access and Excel for 20 years with extensive use of VBA. I have developed several applications for Freelance clients, please see Feedback and examples in Portfolio. I am UK b More

$110 USD in 3 days
(39 Reviews)
5.6

18 freelancers are bidding on average $139 for this job

paris2785

VB, VBA and Databases expert for over a decade. Master in Information Systems. I have delivered similar projects in the past. Please check https://www.freelancer.gr/projects/Data-Processing-Excel/data-translation-e More

$78 USD in 5 days
(152 Reviews)
6.8
tzo

Hello, can help you on this. Using some common tools is not really the best way for it so need to do some custom scripts exactly for this project.

$147 USD in 3 days
(146 Reviews)
6.3
truongngocthanh

Dear Sir, I can import all your data to a mysql database and process it and filter the duplicate data. I can do it right now for you. Best regards Thanh.

$111 USD in 2 days
(57 Reviews)
6.3
diamond247

Hello Sir, We are a well built set up with excellent skilled operator with lot of experience in this segment/skill,have complete more than 200 similar job, i have gone through your project description, its really a More

$250 USD in 5 days
(150 Reviews)
6.6
srinichal

I like to discuss more details about the project and deliver the relevant tools to your needs .

$252 USD in 3 days
(38 Reviews)
6.3
vikas0903

Hi, Approach regarding your Project: I believe it can be done in excel. We may have to run the data matching code multiple times with slight variations in key words. My Background: I have worked as pricing analy More

$98 USD in 2 days
(38 Reviews)
5.7
teeares

Ihave done this type of work before, but may be not as large. I have easily handled upto 16000 records and 15 columns. If Excel can handle it surely, I can. I understand that duplication must be recognised only by the More

$100 USD in 10 days
(50 Reviews)
5.0
Venicebrooks

Greetings, I have taken note of your request to clean a database in your position. I can do that for you since been a software developer I can write a module to achieve your objective. I have been doing this kind of More

$100 USD in 3 days
(8 Reviews)
5.0
happycharle

A proposal has not yet been provided

$111 USD in 3 days
(10 Reviews)
3.5
vovo4ka

Hello, i'm very interested in this work, since I have a good experience and knowledge working with big database. There are few possible ways to merge, sort data: in the example you showed it might be possible to use More

$66 USD in 3 days
(3 Reviews)
3.2
TechJSolutions

Dear Palmweb, Let me help you, I will use SQL and querying will be easier. Could you send the whole data? I will send the result sample. Thanks

$56 USD in 2 days
(15 Reviews)
3.3
EfficientIrish

A proposal has not yet been provided

$277 USD in 1 day
(0 Reviews)
0.0
maranaxsl

Hi there, Thank you for placing this project. We belong to Microsoft Partner Network and we have over 15 years experience in MS SQL Server, MS Access, Crystal Reports. We provide 30 days free of charge support on al More

$222 USD in 3 days
(0 Reviews)
1.5
xiddw

Hi, I've previously worked in a similar project for matching similar strings and condense them, so I have experience in this particular task. Also, I've over two years of experience using R. I've strong backgrou More

$100 USD in 4 days
(0 Reviews)
0.0
hi4ppl

A proposal has not yet been provided

$155 USD in 3 days
(0 Reviews)
0.0
qianshen

Hi, I have done a 3 million patent case last month on Hadoop with Pig to collected all citing for each patent over 10 years. I think your case is somewhat similar. I am quite interested to use excel file as data sourc More

$111 USD in 3 days
(0 Reviews)
0.0