Load Types in Edilitics | Optimizing Data Replication
Edilitics offers four distinct load types for data replication, each meticulously designed to address specific data management requirements: Full Load, Incremental Load, Change Data Capture (CDC), and Historical Load. Understanding these load types and their respective use cases is essential for optimizing data replication workflows and ensuring robust and efficient data management.
1. Full Load
Concept
A Full Load replication involves transferring the entire dataset from the source to the destination during each replication cycle. This process disregards any previous data loads and completely overwrites the destination data with the source data.
Use Cases
- Initial Data Migration
- Complete Dataset Refresh
- Small to Moderate Datasets
When to Select
-
When the dataset size permits full transfer without causing performance degradation.
-
When absolute data accuracy is paramount, and incremental changes are not required.
-
For initial setup before transitioning to Incremental Load or CDC for routine updates.
Performance Implications
-
Resource Utilization: Full Load is resource-intensive as it requires the entire dataset to be transferred and processed during each replication cycle. This can lead to significant CPU, memory, and network bandwidth consumption, especially for large datasets.
-
Processing Time: The processing time is generally longer due to the volume of data being transferred. As the dataset grows, the time required for replication increases proportionally, potentially causing delays in data availability.
-
System Impact: Full Load can significantly impact system performance, particularly on the source and destination databases. Continuous full loads can lead to increased I/O operations, potentially affecting other operations on the databases.
2. Incremental Load
Concept
Incremental Load replication transfers only the new or modified records since the last replication. This process requires a Primary Key or Datetime/Timestamp column in the source table to track changes. Edilitics' UI distinctly segregates tables with the requisite columns from those without, ensuring precise replication workflows.
Use Cases
-
Frequent Updates
-
Data Warehousing
-
Transactional Systems
When to Select
-
When a Primary Key or Datetime/Timestamp column is available to track changes.
-
When managing large datasets where only incremental changes need to be replicated to conserve resources.
-
For maintaining near real-time updates without the overhead of Full Load replication.
Performance Implications
-
Resource Utilization: Incremental Load is more resource-efficient compared to Full Load as it only replicates the changes made since the last replication. This results in lower CPU, memory, and network bandwidth usage, especially for large datasets.
-
Processing Time: The processing time is significantly reduced since only a subset of the data is transferred. This allows for more frequent updates with minimal system impact.
-
System Impact: Incremental Load minimizes the load on the source and destination databases, as it reduces the volume of data being processed. This ensures that the databases remain responsive and can handle other operations concurrently.
3. Change Data Capture (CDC)
Concept
Change Data Capture (CDC) captures and replicates only the data that has changed (insertions, updates, deletions) since the last replication. CDC is supported for MongoDB, MongoDB Atlas, MySQL, MySQL Google Cloud, PostgreSQL, PostgreSQL Google Cloud, MySQL Server, and MySQL Server on Google Cloud. Edilitics performs automatic checks to verify log access for these databases, notifying users during the flow creation process if CDC replication setup is feasible.
Use Cases
-
High-Volume Data Environments
-
Compliance and Auditing
-
Real-Time Analytics
When to Select
-
When detailed change tracking is required.
-
When utilizing databases that support CDC and log access is properly configured.
-
For applications necessitating real-time data synchronization with minimal latency.
Performance Implications
-
Resource Utilization: CDC is highly efficient in terms of resource utilization. It processes only the changes made since the last replication, which significantly reduces CPU, memory, and network usage, making it ideal for high-volume data environments.
-
Processing Time: CDC offers near real-time data replication, with minimal latency. The processing time is typically short because only a small fraction of the data is replicated, depending on the frequency and volume of changes.
-
System Impact: CDC has a minimal impact on the source and destination databases, as it leverages database logs to capture changes. This reduces the need for intensive querying and processing, allowing the databases to maintain optimal performance.
4. Historical Load
Concept
Historical Load is applicable for Incremental and CDC flows. It involves transferring all existing data in a specific table before the replication flow's creation time. Historical data flows execute only once. All scheduled runs for a replication are paused until the Historical data flow completes successfully. Upon completion, all flow runs resume according to their schedules.
Use Cases
-
Data Backfilling
-
Initial Setup for Incremental/CDC Flows
-
Data Archive Migration
When to Select
-
When establishing Incremental or CDC replication and historical data needs to be loaded initially.
-
For a one-time transfer of existing data before initiating regular updates.
-
When pausing regular updates to transfer historical data without disruption.
Performance Implications
-
Resource Utilization: Historical Load is resource-intensive due to the need to transfer large volumes of existing data in a single operation. This can lead to high CPU, memory, and network consumption, especially for large tables.
-
Processing Time: The processing time for Historical Load can be substantial, as it involves transferring all data in the selected table. This process must be completed before regular replication flows can resume, which may result in temporary delays.
-
System Impact: Similar to Full Load, Historical Load can significantly impact system performance. The one-time nature of this load type can cause a temporary spike in resource usage, affecting other operations on the source and destination databases.
Selecting the Optimal Load Strategy
Choosing the appropriate load strategy is critical for efficient and effective data replication. Consider the following factors when selecting the best load type for your specific needs:
Data Volume
- Smaller Datasets: Full Load may be suitable due to its simplicity and comprehensive data transfer.
- Larger Datasets: Incremental Load or CDC is generally preferred to avoid the inefficiencies of transferring large volumes of data repeatedly.
Data Change Rate
- High Frequency of Changes: Incremental Load or CDC is recommended to minimize data transfer and keep the dataset up-to-date.
- Low Frequency of Changes: Full Load might be sufficient as the overhead of transferring the entire dataset is manageable.
Data Consistency
- Critical Consistency Requirements: Full Load is the best option as it ensures complete data synchronization without inconsistencies.
- Tolerable Delays: Incremental Load and CDC may introduce slight delays in data synchronization but are efficient for ongoing updates.
Performance
- System Resource Considerations: Full Load and Historical Load can be resource-intensive and may significantly impact system performance. These should be used cautiously, especially with large datasets.
- Efficient Resource Use: Incremental Load and CDC are generally more efficient, minimizing the impact on system resources and reducing processing time.
Data Recovery
- Enhanced Recovery Options: CDC often provides better data recovery capabilities as it captures detailed change information, allowing for more granular recovery processes.
By carefully evaluating these factors, you can select the load strategy that aligns with your data replication requirements, ensuring optimal performance, consistency, and efficiency.
Need Assistance? Edilitics Support is Here for You!