Compare two PDF files in C# windows application

Overview :

This project will take two PDF as user input and compare both PDF and generates a PDF in D drive with the name compare.pdf. It will compare such that difference in both pdf is shown in RED color and deleted text is shown in GREEN color and unchanged(equal) text is shown in BLUE color. 

INPUT PDF 1 :

Compare two PDF files in C# windows application

INPUT PDF 2 :

Compare two PDF files in C# windows application

OUTPUT : 

Compare two PDF files in C# windows application

"This is" is the deleted text in pdf2 so it is displayed in green color. "Updated" is the new or inserted text in pdf2 so it is displayed in output in red color. "test document" is unchanged in both input pdfs so it is displayed in blue color.

Creating Windows Form Application:

Let's start creating the project, I will try my level best to go with basics so that everyone can understand and create this project.

Step 1 :


Step 2 :

Choose Windows Form App (.NET Framework)
Compare two PDF files in C# windows application


Step 3 :

Drag and drop these buttons , text box and label and make a form like this,

Compare two PDF files in C# windows application


Step 4 :

Install NuGet Package Diff.Match.Patch, this library implements Myer's diff algorithm and compares two blocks of plain text and efficiently return a list of differences. iText7 and iTextSharp are  libraries for creating and manipulating PDF files in .NET.

Compare two PDF files in C# windows application

Step 5 :

Alter the code behind select pdf file and generate pdf button. Here in my case button 1 is Select a PDF file , button 2 is Select another PDF file and button 3 is Generate PDF.

using System;
using System.Windows.Forms;

namespace PDFComparisionWindowsForm
{
    public partial class Form1 : Form
    {
    public Form1()
        {
            InitializeComponent();
        }
        private void button1_Click(object sender, EventArgs e)
        {
            OpenFileDialog BrowseFile = new OpenFileDialog
            {
                InitialDirectory = @"D:\",
                Title = "Browse Pdf Files",

                CheckFileExists = true,
                CheckPathExists = true,

                DefaultExt = "pdf",
                Filter = "pdf files (*.pdf)|*.pdf",
                FilterIndex = 2,
                RestoreDirectory = true,

                ReadOnlyChecked = true,
                ShowReadOnly = true
            };

            if (BrowseFile.ShowDialog() == DialogResult.OK)
            {
                FilePath.Text = BrowseFile.FileName;
               
            }
        }

        private void button2_Click(object sender, EventArgs e)
        {
            OpenFileDialog BrowseFile = new OpenFileDialog
            {
                InitialDirectory = @"D:\",
                Title = "Browse Pdf Files",

                CheckFileExists = true,
                CheckPathExists = true,

                DefaultExt = "pdf",
                Filter = "pdf files (*.pdf)|*.pdf",
                FilterIndex = 2,
                RestoreDirectory = true,

                ReadOnlyChecked = true,
                ShowReadOnly = true
            };

            if (BrowseFile.ShowDialog() == DialogResult.OK)
            {
                FilePath2.Text = BrowseFile.FileName;
            }
        }

        private void button3_Click(object sender, EventArgs e)
        {
            PDFFileHandler reader = new PDFFileHandler(FilePath,FilePath2);
            var result = reader.ComparePdfFiles().Split("");
            reader.GeneratePdf(result, "create.pdf");
            MessageBox.Show("PDF Created Successfully \n" + "Check D:/create.pdf");

        }

       
    }
}

Step 6 :

Create a class PDFFileHandler.cs and alter the code as follows,

using iText.Kernel.Pdf.Canvas.Parser;
using iText.Kernel.Pdf.Canvas.Parser.Listener;
using System.IO;
using System.Text;
using System.Windows.Forms;
using iTextSharp.text;
using Document = iTextSharp.text.Document;
using PdfDocument = iText.Kernel.Pdf.PdfDocument;
using DiffMatchPatch;


namespace PDFComparisionWindowsForm
{
    class PDFFileHandler
    {
        private TextBox filePath;
        private TextBox filePath2;
        iTextSharp.text.Paragraph p = new iTextSharp.text.Paragraph();
        public string changed_text;
        public string equal_text;
        public string deleted_text;
        public string whole_text;
        public string text1;
        

        public PDFFileHandler(TextBox filePath, TextBox filePath2)
        {
            this.filePath = filePath;
            this.filePath2 = filePath2;
        }
        public string ReadFile(TextBox pdfPath)
        {
            var pageText = new StringBuilder();
            using (PdfDocument pdfDocument = new PdfDocument(new iText.Kernel.Pdf.PdfReader(pdfPath.Text)))
            {
                var pageNumbers = pdfDocument.GetNumberOfPages();
                for (int i = 1; i <= pageNumbers; i++)
                {
                    LocationTextExtractionStrategy strategy = new LocationTextExtractionStrategy();
                    PdfCanvasProcessor parser = new PdfCanvasProcessor(strategy);
                    parser.ProcessPageContent(pdfDocument.GetFirstPage());
                    pageText.Append(strategy.GetResultantText());
                }
            }
            return pageText.ToString();
        }

        public string ComparePdfFiles()
        {
            var dmp = DiffMatchPatchModule.Default;
            StringBuilder compareResult = new StringBuilder();
             text1 = ReadFile(filePath);
            var text2 = ReadFile(filePath2);
            var diff = dmp.DiffMain(text1, text2);
            foreach (var d in diff)
            {
                whole_text = d.Text;
                if (d.Operation.IsEqual)
                {

                    compareResult.Append(d.Text + " ");
                    equal_text = d.Text;
                   
                    p.SpacingBefore = 10;
                    p.SpacingAfter = 10;
                    p.Alignment = Element.ALIGN_LEFT;
                    p.Font = FontFactory.GetFont(FontFactory.HELVETICA, 12f, BaseColor.BLUE);
                    p.Add(d.Text);
                }
                else if (d.Operation.IsInsert)
                {
                    compareResult.Append(d.Text +  " ");
                    changed_text = d.Text;
                  
                    p.SpacingBefore = 10;
                    p.SpacingAfter = 10;
                    p.Alignment = Element.ALIGN_LEFT;
                    p.Font = FontFactory.GetFont(FontFactory.HELVETICA, 12f, BaseColor.RED);
                    p.Add(" "+d.Text);
                }
                else if (d.Operation.IsDelete)
                {
                    compareResult.Append(d.Text + "   ");
                    deleted_text = d.Text;
                   
                    p.SpacingBefore = 10;
                    p.SpacingAfter = 10;
                    p.Alignment = Element.ALIGN_LEFT;
                    p.Font = FontFactory.GetFont(FontFactory.HELVETICA, 12f, BaseColor.GREEN);
                    p.Add(d.Text);
                }
             }
         return compareResult.ToString();
    }
        
        public void GeneratePdf(string[] paragraphs, string destination)
        {
           
            Document document = new Document();
            iTextSharp.text.pdf.PdfWriter.GetInstance(document, new FileStream("D:/create.pdf", FileMode.Create));
                document.Open();
            document.Add(p);
           
            document.Close();
           
        }
        }
    }


Now ! You have done with the code you can test the output with any pdf file of your choice.

CODE OVERVIEW :

Button1 will browse the path of 1st pdf file you have selected and save the path in the textbox 1 and Button2 will browse the path of 2nd pdf file you have selected and save the path in the textbox 2. PDFFileHandler class first read those files in ReadFile() function. Then ComparePDFFiles() function will compare those files and find the differences using iText7 and iTextSharp library. Finally, GeneratePDF() function will generate a  PDF in D drive using iTextSharp. The D  drive is hardcoded you can change it in GeneratePDF() function.

 OUTPUT :



Post a Comment

0 Comments