pdf PDF Integration with Edilitics

PDF (Portable Document Format) is a widely used file format for presenting documents in a manner independent of application software, hardware, and operating systems. While commonly used for textual and graphical content, PDFs often contain structured tabular data embedded within the document. Extracting and analyzing this tabular data is essential for workflows that depend on structured data.

In Edilitics, PDF files are used exclusively as data sources to extract tabular data for advanced analytics. This guide provides a detailed, step-by-step approach to integrating PDF files into Edilitics while ensuring data security and workflow optimization.


Before You Begin

Ensure the following prerequisites are met:

  • File Size Limit: PDF files must not exceed 100 MB.

  • Tabular Data: Ensure the PDF file contains well-structured tabular data for accurate extraction.

  • Password Protection: If the PDF is password-protected, ensure you have the correct password to allow Edilitics to process the file.

  • Usage Constraints:

    • PDF files are only supported as data sources, not destinations.

    • Workflows using PDF files:

      • Allow full loads with "Schedule as Once" in Replicate.

      • Support "Schedule as Once" in Transform.

      • Do not support auto updates or data refreshes in Visualize.

  • AI Column Insights: PDF files are not eligible for AI Column Insights.


File Security and Management

Edilitics implements robust security protocols for handling PDF files:

  • Security Scans: Uploaded files are validated for potential risks and data integrity.

  • Data Extraction: All tabular data from the PDF is extracted, and each table is saved with the naming convention: "PageNo_Table".

  • Encryption: Extracted data is securely encrypted during storage and decrypted only during user access or workflow execution (Replicate, Transform, Visualize).

  • Permanent Deletion: Upon deleting an integration, the file and all associated extracted data are permanently removed from Edilitics systems, ensuring compliance with data privacy standards.


Supported Data Structures

Edilitics scans all pages in the PDF file for tabular data. Any detected tables are extracted, structured, and stored as separate tables.

Data TypeDescriptionExample
Tabular DataStructured rows and columns within a PDF file.Tables with columns for Date, Product, Quantity, and Price.

Note: Non-tabular data (e.g., text and images) is not extracted or stored.


Steps to Integrate PDF Files

Step 1: Add the PDF Connector

  • Navigate to the Integrations module in Edilitics.

  • Click on New Integration.

Edilitics dashboard in light mode with no integrations yet and an option to create a new integration.
Edilitics dashboard in light mode with no integrations yet and an option to create a new integration.
  • Search for and select the PDF connector.
Edilitics dashboard in light mode displaying various integration categories and a selected PDF integration.
Edilitics dashboard in light mode displaying various integration categories and a selected PDF integration.

Step 2: Configure the Integration

Enter the following details on the setup screen:

Field NameDetails
Integration TitleA unique identifier for your integration.
Integration DescriptionA concise summary of the tabular data being extracted.
File UploadUpload the PDF file directly from your local storage (must be ≤ 100 MB).
Password ProtectionSpecify if the file is password-protected (True or False). If True, provide the password.
Edilitics interface with an active PDF integration setup, showing fields for integration name and description, and an upload button for the PDF file in light mode.
Edilitics interface with an active PDF integration setup, showing fields for integration name and description, and an upload button for the PDF file in light mode.

Step 3: Validate and Save

  • Click Test & Save Connection to validate the uploaded file.

  • Edilitics scans the file for tabular data extraction and validates schema compliance.

  • Upon successful validation:

    • Extracted tables are stored as separate tables with the naming convention: "PageNo_Table".

    • The file and data are securely encrypted and saved for use in workflows.

Need Assistance? Edilitics Support is Here for You!

Our dedicated support team is ready to assist you. If you have any questions or need help using Edilitics, please don't hesitate to contact us at support@edilitics.com. We're committed to ensuring your success!

Don't just manage data, unlock its potential.

Choose Edilitics and gain a powerful advantage in today's data-driven world.