PDF to HTML Converter for .NET

Winnovative PDF to HTML Converter can be used in any type of .NET application to convert PDF pages to HTML documents. The integration with existing .NET applications is extremely easy and no installation is necessary. The downloaded archive contains the assembly for .NET and a demo application. The full C# source code for the demo application is available in the Samples folder. The converter produces HTML string objects during conversion that you can save to HTML files or use for further processing. You can also customize the resulted HTML content zoom level and HTML images resolution.

The main features of the PDF to HTML Converter for .NET are:

  • Convert PDF pages to HTML documents
  • Customize the generated HTML content zoom level
  • Customize the HTML images resolution in generated HTML document
  • Convert PDF pages to HTML documents in memory or to HTML files in a folder
  • Support for password protected PDF documents
  • Convert to HTML only a range of PDF pages
  • Get the number of pages in a PDF document
  • Get the PDF document title, keywords, author and description
  • Does not require Adobe Reader or other third party tools
  • Support for .NET 4.0 framework and later
  • Documentation and C# samples for all the features

You can read more about the PDF to HTML to PDF Converter for .NET Features on product website.

C# Sample Code for PDF to HTML Converter

The code below was taken from the PDF to HTML Converter demo application available for download in the PDF to HTML converter archive. In this sample an instance of the PdfToHtmlConverter class is constructed and used to convert the PDF document pages to HTML documents.

private void btnConvertToHtml_Click(object sender, EventArgs e)
{
    if (pdfFileTextBox.Text.Trim().Equals(String.Empty))
    {
        MessageBox.Show("Please choose a source PDF file", "Choose PDF file", MessageBoxButtons.OK);
        return;
    }

    // the source pdf file
    string pdfFileName = pdfFileTextBox.Text.Trim();

    // start page number
    int startPageNumber = int.Parse(textBoxStartPage.Text.Trim());
    // end page number
    // when it is 0 the conversion will continue up to the end of document
    int endPageNumber = 0;
    if (textBoxEndPage.Text.Trim() != String.Empty)
        endPageNumber = int.Parse(textBoxEndPage.Text.Trim());

    // create the converter object and set the user options
    PdfToHtmlConverter pdfToHtmlConverter = new PdfToHtmlConverter();

    // set the resolution of HTML images
    pdfToHtmlConverter.Resolution = int.Parse(textBoxResolution.Text);

    // set the zoom of HTML content
    pdfToHtmlConverter.Zoom = int.Parse(textBoxZoom.Text);

    // the demo output directory
    string outputDirectory = Path.Combine(Application.StartupPath, @"DemoFiles\Output");

    Cursor = Cursors.WaitCursor;
            
    try
    {
        // convert PDF pages to HTML files in a directory
        pdfToHtmlConverter.CreateIndexFile = true;
        pdfToHtmlConverter.ConvertPdfPagesToHtmlFile(pdfFileName, startPageNumber, endPageNumber, outputDirectory, "PdfPage");

        // uncomment the lines below to raise the PageConvertedEvent event when a PDF page is converted
        // the pdfToHtmlConverter_PageConvertedEvent handler below will be executed for each converted PDF page
        // Do not forget to uninstall the handler when is not needed anymore
        //pdfToHtmlConverter.PageConvertedEvent += pdfToHtmlConverter_PageConvertedEvent;
        //pdfToHtmlConverter.ConvertPdfPagesToHtmlInEvent(pdfFileName, startPageNumber, endPageNumber);

        // uncomment the line below to convert PDF pages in memory to an array of PdfPageHtml objects
        //PdfPageHtml[] pdfPageHtmls = pdfToHtmlConverter.ConvertPdfPagesToHtml(pdfFileName, startPageNumber, endPageNumber);
    }
    catch (Exception ex)
    {
        // The conversion failed
        MessageBox.Show(String.Format("An error occurred. {0}", ex.Message), "Error");
        return;
    }
    finally
    {
        Cursor = Cursors.Arrow;
    }

    try
    {
        System.Diagnostics.Process.Start(outputDirectory);
    }
    catch (Exception ex)
    {
        MessageBox.Show(string.Format("Cannot open output folder. {0}", ex.Message));
        return;
    }
}

/// <summary>
/// The PageConvertedEvent event handler called after when a PDF page was converted to HTML
/// The event is raised when the ConvertPdfPagesToHtmlInEvent() method is used
/// </summary>
/// <param name="args">The handler argument containing the PDF page HTML and page number</param>
void pdfToHtmlConverter_PageConvertedEvent(PageConvertedEventArgs args)
{
    // get the HTML document and page number from even handler argument
    string  pdfPageHtml = args.PdfPageHtml.Html;
    int pageNumber = args.PdfPageHtml.PageNumber;

    // save the PDF page HTML to a file
    string outputHtmlFile = Path.Combine(Application.StartupPath, @"DemoFiles\Output", "PdfPage_" + pageNumber + ".html");
    File.WriteAllText(outputHtmlFile, pdfPageHtml, Encoding.UTF8);

    args.PdfPageHtml.Dispose();
}

 

 

PDF To Image Converter for .NET

Winnovative PDF to Image Converter can be used in any type of .NET application to convert PDF pages to images. The integration with existing .NET applications is extremely easy and no installation is necessary. The downloaded archive contains the assembly for .NET and a demo application. The full C# source code for the demo application is available in the Samples folder. The converter produces .NET Image objects during conversion that you can save to image files or use for further processing. You can also customize the color space and resolution used during rasterization operation.

The main features of the PDF to Image Converter are enumerated below:

  • Convert PDF pages to images
  • Create thumbnails of the PDF pages
  • Customize the color space and resolution of generated images
  • Convert PDF pages to images in memory or to image files in a folder
  • Save the PDF pages images in various image formats
  • Support for password protected PDF documents
  • Convert to images only a range of PDF pages
  • Get the number of pages in a PDF document
  • Get the PDF document title, keywords, author and description
  • Does not require Adobe Reader or other third party tools
  • Support for .NET 4.0 framework and later
  • Documentation and C# samples for all the features

For a full description of the software you can check the Winnovative PDF to Image Converter for .NET web page.

C# Code Sample for PDF to Image Conversion

The code below was taken from the PDF to Image Converter demo application available for download in the PDF to Image converter archive. In this sample an instance of the PdfToImageConverter class is constructed and used to rasterize the PDF document pages to images.

private void btnConvertToImages_Click(object sender, EventArgs e)
{
    if (pdfFileTextBox.Text.Trim().Equals(String.Empty))
    {
        MessageBox.Show("Please choose a source PDF file", "Choose PDF file", MessageBoxButtons.OK);
        return;
    }

    // the source pdf file
    string pdfFileName = pdfFileTextBox.Text.Trim();

    // start page number
    int startPageNumber = int.Parse(textBoxStartPage.Text.Trim());
    // end page number
    // when it is 0 the conversion will continue up to the end of document
    int endPageNumber = 0;
    if (textBoxEndPage.Text.Trim() != String.Empty)
        endPageNumber = int.Parse(textBoxEndPage.Text.Trim());

    // create the converter object and set the user options
    PdfToImageConverter pdfToImageConverter = new PdfToImageConverter();

    pdfToImageConverter.LicenseKey = "0F5PX0tPX09fSVFPX0xOUU5NUUZGRkZfTw==";

    // set the color space of the resulted images
    pdfToImageConverter.ColorSpace = SelectedColorSpace();

    // set the resolution of the resulted images
    pdfToImageConverter.Resolution = int.Parse(textBoxResolution.Text);

    // the demo output directory
    string outputDirectory = Path.Combine(Application.StartupPath, @"DemoFiles\Output");

    Cursor = Cursors.WaitCursor;

    // set the handler to be called when a page was converted
    pdfToImageConverter.PageConvertedEvent += pdfToImageConverter_PageConvertedEvent;            
            
    try
    {
        // call the converter to raise the PageConvertedEvent event when a PDF page is converted
        // the pdfToImageConverter_PageConvertedEvent handler below will be executed for each converted PDF page
        pdfToImageConverter.ConvertPdfPagesToImageInEvent(pdfFileName, startPageNumber, endPageNumber);

        // Alternatively you can use the ConvertPdfPagesToImage() and ConvertPdfPagesToImageFile() methods
        // to convert the PDF pages to images in memory or to image files in a directory

        // uncomment the line below to convert PDF pages in memory to an array of PdfPageImage objects
        //PdfPageImage[] pdfPageImages = pdfToImageConverter.ConvertPdfPagesToImage(pdfFileName, startPageNumber, endPageNumber);

        // uncomment the lines below to convert PDF pages to image files in a directory
        //string outputDirectory = System.IO.Path.Combine(Application.StartupPath, @"DemoFiles\Output");
        //pdfToImageConverter.ConvertPdfPagesToImageFile(pdfFileName, startPageNumber, endPageNumber, outputDirectory, "pdfpage");
    }
    catch (Exception ex)
    {
        // The conversion failed
        MessageBox.Show(String.Format("An error occurred. {0}", ex.Message), "Error");
        return;
    }
    finally
    {
        // uninstall the event handler
        pdfToImageConverter.PageConvertedEvent -= pdfToImageConverter_PageConvertedEvent;

        Cursor = Cursors.Arrow;
    }

    try
    {
        System.Diagnostics.Process.Start(outputDirectory);
    }
    catch (Exception ex)
    {
        MessageBox.Show(string.Format("Cannot open output folder. {0}", ex.Message));
        return;
    }
}

/// <summary>
/// The PageConvertedEvent event handler called after when a PDF page was converted to image
/// The event is raised when the ConvertPdfPagesToImageInEvent() method is used
/// </summary>
/// <param name="args">The handler argument containing the PDF page image and page number</param>
void pdfToImageConverter_PageConvertedEvent(PageConvertedEventArgs args)
{
    // get the image object and page number from even handler argument
    Image pdfPageImageObj = args.PdfPageImage.ImageObject;
    int pageNumber = args.PdfPageImage.PageNumber;

    // save the PDF page image to a PNG file
    string outputPageImage = Path.Combine(Application.StartupPath, @"DemoFiles\Output", "pdfpage_" + pageNumber + ".png");
    pdfPageImageObj.Save(outputPageImage, ImageFormat.Png);

    args.PdfPageImage.Dispose();
}

 

 

Winnovative PDF to Text Converter for .NET

Winnovative PDF to Text Converter is a library for .NET that can be used in ASP.NET and MVC websites or in Windows Forms and WPF  desktop applications to extract the text from existing PDF documents or to search text in a PDF document. After PDF to Text conversion you get a String object in memory.

The PDF to Text Converter does not depend on Adobe Reader or on any other third party tool. The main features of the Winnovative PDF to Text converter are:

  • Extract text from PDF documents
  • Search text in PDF documents
  • Save the extracted text using various text encodings
  • Case sensitive and whole word options for text search
  • Support for password protected PDF documents
  • Extract the text or search only a range of PDF pages
  • Extract text preserving the original PDF layout
  • Extract text in PDF reading order or PDF internal order
  • Get the number of pages in a PDF document
  • Get the PDF document title, keywords, author and description
  • Does not require Adobe Reader or other third party tools
  • Support for .NET 4.0 framework and later
  • Documentation and C# samples for all the features

You can find a complete description of the Winnovative PDF to Text Converter for .NET on product web page.

C# Code Sample to Convert PDF to Text

In the C# code is taken from the demo application that comes with software package. An object of the PdfToTextConverter class is created to extract the text from an existing PDF document. The extracted text is saved in a file on disk using the UTF-8 encoding.

private void btnConvertToText_Click(object sender, EventArgs e)
{
    if (pdfFileTextBox.Text.Trim().Equals(String.Empty))
    {
        MessageBox.Show("Please choose a PDF file to convert", "Choose PDF file", MessageBoxButtons.OK);
        return;
    }

    // the pdf file to convert
    string pdfFileName = pdfFileTextBox.Text.Trim();
            
    // start page number
    int startPageNumber = int.Parse(textBoxStartPage.Text.Trim());
    // end page number
    // when it is 0 the extraction will continue up to the end of document
    int endPageNumber = 0;
    if (textBoxEndPage.Text.Trim() != String.Empty)
        endPageNumber = int.Parse(textBoxEndPage.Text.Trim());

    // the output text layout
    TextLayout textLayout = SelectedTextLayout();

    // the output text encoding
    System.Text.Encoding textEncoding = SelectedTextEncoding();

    // page breaks
    bool markPageBreaks = cbMarkPageBreaks.Checked;

    string outputFileName = System.IO.Path.Combine(Application.StartupPath, @"DemoFiles\Output", 
            System.IO.Path.GetFileNameWithoutExtension(pdfFileName) + ".txt");

    // create the converter object and set the user options
    PdfToTextConverter pdfToTextConverter = new PdfToTextConverter();

    pdfToTextConverter.LicenseKey = "C4WUhJaRhJSEkoqUhJeVipWWip2dnZ2ElA==";

    pdfToTextConverter.Layout = textLayout;
    pdfToTextConverter.MarkPageBreaks = markPageBreaks;

    Cursor = Cursors.WaitCursor;
    try
    {
        // extract text from PDF
        string extractedText = pdfToTextConverter.ConvertToText(pdfFileName, startPageNumber, endPageNumber);

        // write the resulted string into an output file 
        // in the application directory using the selected encoding
        System.IO.File.WriteAllText(outputFileName, extractedText, textEncoding);
    }
    catch (Exception ex)
    {
        MessageBox.Show(String.Format("An error occurred. {0}", ex.Message), "Error");
        return;
    }
    finally
    {
        Cursor = Cursors.Arrow;
    }


    try
    {
        System.Diagnostics.Process.Start(outputFileName);
    }
    catch (Exception ex)
    {
        MessageBox.Show(ex.Message);
        return;
    }
}