Software to compare PDF files

  • Status: Closed
  • Prize: $600
  • Entries Received: 7
  • Winner: carlquist

Contest Brief

This contest is to compare multiple PDF files based on the similarities of bounding boxes. This is not an easy contest and will require understanding of PDF libraries.
There are many PDF libraries available and it is not important which one is used.

Features required:
Upload multiple PDF files (many).
Converts PDFs to PNGs with bounding box squares
PNG with bounding boxes shown - user selects which bounding boxes are of interest. Can select multiple bounding boxes.
Software then searches ALL the original PDFs - to find which files have the same bounding boxes.

Matches must be based on either:-
1. Approximate co-ordinates of the bounding boxes and the respective page number. Leaving room for 3% error in placement of bounding boxes.
OR
2. Image match the area of the bounding box. It means for each match from (1) that another step must also convert that bounding box to a PNG file and do an image comparison - if almost identical images then it returns as a match.

The end result is the software shows a list of links to download that contain the PNGs\PDFs of the files with ONLY the same bounding boxes.

The winner will be asked to add a module to:-
-Enable the placement of another PNG image over any PDF image and re-write the PDF image. Many github libraries can do this.

-Put the bounding box through tessarect and do OCR text search in addition to the simple bounding box co-ordinate comparison. This would produce another criteria to match on.

So the winner can earn total $800+ from this Contest through the add on module.

Good Luck.

Please serious entries only. I have zero patience so only do this once it is fully working! I suggest you first message me your proposed methodology and I can then confirm your ideas will succeed.

Be quick!




I recommend using https://blueimp.github.io/jQuery-File-Upload/ to save time.

Some other ideas would be to convert the bounding boxes to SVG format and use an existing SVG comparison library.

Recommended Skills

Top entries from this contest

View More Entries

Public Clarification Board

  • Asianexperts
    Asianexperts
    • 5 years ago

    hehehe all thought to get this prize and disspointed

    • 5 years ago
  • sunnyguptahotels
    Contest Holder
    • 5 years ago

    Please do not enter this contest! One contestant is extremely close to winning.

    • 5 years ago
    1. danielvz96
      danielvz96
      • 5 years ago

      :( How close? I've already implemented a bounding box finder (can find from the smallest detail to whole paragraphs), the bulk compare function and was working on the frontend when I saw this.

      • 5 years ago
  • teachartdevteam
    teachartdevteam
    • 5 years ago

    Hey there! I have an slightly different idea and I will be happy to discuss it with you. Basically what you think, does it make sense if the user draws the bounding boxes. Rendering a box to each object over the pdf might not be 100% useful, I saw tons of pdf's in the past with bad structure and arrangement which contain overlapping objects. This will result into overlapping bounding boxes. With the current way a recursive lookup must be implemented, each object must be extracted from the pdf and parsed. Each object must be parsed with different internal parser (itextsharp and pdfsharp work on that way) just to take the details like size and position.

    • 5 years ago
    1. sunnyguptahotels
      Contest Holder
      • 5 years ago

      I see what you are saying. So which library do you propose to use for image comparison? And how would you extract the corresponding area from the other PDFs? Or does it need to compare the selected area in png against the entire png full pages of every PDF ?

      • 5 years ago
    2. sunnyguptahotels
      Contest Holder
      • 5 years ago

      Speed is a big consideration. To do what you are describing - it may be neccesary to overlay the page with a 12x16 grid - and then find all 'touched' grid-boxes that the hand-drawn bounding box touches - so that it does the comparison more efficiently. but that seems to add more complexity to the exercise. adobe acrobat reader seems to get the bounding boxes right without much overlap.

      • 5 years ago
  • ITPyramid85
    ITPyramid85
    • 5 years ago

    At first, I want to see the pdf quality if it is possible for image processing or not. Can you provide pdf files you have?

    • 5 years ago
    1. sunnyguptahotels
      Contest Holder
      • 5 years ago

      Assume that all the pdfs are generated from the same creation utility. The most obvious example is a bank statement. But - I think image comparison is missing the point - we want comparison by bounding box co-ordinates. So the 1st step is to find the alogirithm that Adobe uses to obtain the bounding-boxes. Most of the open-source utility treat ever character as a separate co-ordinate.

      • 5 years ago
  • sunnyguptahotels
    Contest Holder
    • 5 years ago

    Hi Everyone.. please ask your questions here for everyone. If you don't know what a bounding box is in a PDF document then you should not attempt this contest. I don't have time to educate, sorry. No point explaining your experience - this is a guaranteed contest - if you understand the concepts in the brief then you may submit an entry. It's as simple as that. If you don't understand it then you do the basic work first and return with specific questions.

    • 5 years ago
  • sunnyguptahotels
    Contest Holder
    • 5 years ago

    Hi Everyone

    • 5 years ago
  • Codeitsmarts
    Codeitsmarts
    • 5 years ago

    Hi, I have read your project description. I have few queries before I can begin the work. Can we discuss the same through chat? I shall endeavor to exceed your expectations.

    I have 5 years of experience in PHP, mysql, Codeigniter, Wordpress, Jquery, HTML, CSS ,Python and many more . Please see my portfolio for art work samples and my clients feedback.

    1 . http://www.astrologyindubai.com/
    2 . http://www.sweetspace9.com/
    3 . http://www.ngotiator.com/
    4 . http://www.shypon.com/
    5 . https://www.pixbrand.in/
    6 . http://www.etfmodelsolutions.com/
    7 . http://wricitieshub.org/worldtodresource/

    And I'm confident that I can complete your project on time and within your budget. I can achieve the results that you are asking for
    Please initiate chat for further discussion. I will do my best for you , with a Positive Hope! Regards

    • 5 years ago
  • ITPyramid85
    ITPyramid85
    • 5 years ago

    Also If you want to do the image searching, It will be normallized by special size so that it is needed image quality, pdf page amounts and it will give effect for searching speed

    • 5 years ago
  • sprlabs9
    sprlabs9
    • 5 years ago

    Hi, I would like to discuss. Please drop me a message.

    • 5 years ago
  • dev681999
    dev681999
    • 5 years ago

    I am probably wrong fell free to correct me

    • 5 years ago
  • dev681999
    dev681999
    • 5 years ago

    By reading the description this is what I have understood - You want a website where people can upload PDFs files. Then the PDF is converted to PNG which contains bounding boxes. These bouding boxes matches any other boxes from uploaded files. Then user can select bouding boxes to download.

    • 5 years ago
  • sunnyguptahotels
    Contest Holder
    • 5 years ago

    It can be in PHP, Python, or C#. There must be a web-front end to accept the upload of the files so Java\VB are not suitable.

    • 5 years ago
  • a6jack
    a6jack
    • 5 years ago

    Dear,
    May we know which language (PHP, Python, C#, JAVA ...) this software should be written and is it will be a website or Desktop app?

    • 5 years ago
  • sunnyguptahotels
    Contest Holder
    • 5 years ago

    Please submit a blank entry then it will allow me to message you.

    • 5 years ago
  • desmondmile03
    desmondmile03
    • 5 years ago

    Hi, please message me so I can discuss my proposed methodology. Thanks

    • 5 years ago
  • ahsanfaheem3
    ahsanfaheem3
    • 5 years ago

    Dear contest holder, kindly message me so I can discuss my proposed methodology. Thanks.

    • 5 years ago

Show more comments

How to get started with contests

  • Post your contest

    Post Your Contest Quick and easy

  • Get tons of entries

    Get Tons of Entries From around the world

  • Award the best entry

    Award the best entry Download the files - Easy!

Post a Contest Now or Join us Today!