Parquet (.parquet) Integration with Edilitics
Parquet is a highly efficient open-source file format designed for optimal data storage and retrieval. Its columnar storage structure allows for faster querying of specific columns, significantly reducing processing time and improving performance. Parquet is especially adept at handling large, complex datasets with various data types, offering excellent compression capabilities to minimize storage requirements while maintaining data integrity.
In Edilitics, Parquet files are used exclusively as data sources, enabling streamlined data ingestion and facilitating advanced data analysis workflows. This guide provides a step-by-step approach to configuring and integrating Parquet files into Edilitics.
Before You Begin
Ensure the following prerequisites are met:
-
File Size Limit: Parquet files must not exceed 100 MB.
-
Data Validation: Ensure the file is well-formed and follows Parquet standards.
-
Usage Constraints:
-
Parquet files are only supported as data sources, not destinations.
-
Workflows using Parquet files:
-
Allow full loads with "Schedule as Once" in Replicate.
-
Support "Schedule as Once" in Transform.
-
Do not support auto updates or data refreshes in Visualize.
-
-
-
AI Column Insights: Parquet files are not eligible for AI Column Insights.
File Security and Management
Edilitics ensures robust security and management of Parquet files:
-
Security Scans: All uploaded files are scanned for risks and data accuracy.
-
Encryption: Files are securely encrypted during storage and decrypted only when accessed or utilized in workflows (Replicate, Transform, Visualize).
-
Permanent Deletion: Upon deleting an integration, the associated file is permanently removed from Edilitics systems, ensuring compliance with data privacy policies.
Supported Data Structures
Edilitics supports the following data structures in Parquet files:
Data Structure | Description | Example |
---|---|---|
Nested Structures | Arrays, Maps, and Structs stored as nested data structures. | Array: {"skills": ["Python", "SQL", "Data Analysis"]} |
DataFrames | Two-dimensional tabular data with labeled axes (rows and columns). | Sales data with columns for Date, Product, Quantity, and Price. |
Note: Structuring your Parquet data as a DataFrame ensures maximum performance within Edilitics.
Steps to Integrate Parquet Files
Step 1: Add the Parquet Connector
-
Navigate to the Integrations module in Edilitics.
-
Click on New Integration.
- Search for and select the Parquet connector.
Step 2: Configure the Integration
Enter the following details on the setup screen:
Field Name | Details |
---|---|
Integration Title | A unique identifier for your integration. |
Integration Description | A concise summary of the Parquet data being integrated. |
File Upload | Upload the Parquet file directly from your local storage (must be ≤ 100 MB). |
Step 3: Validate and Save
-
Click Test & Save Connection to validate the uploaded file.
-
Edilitics scans the file for schema compliance and security validation.
-
Upon successful validation, the file is securely encrypted and saved for use in workflows.
Need Assistance? Edilitics Support is Here for You!