Hello All,
We propose to include Compaction operation for Carbondata data source such that multiple segments of Carbondata files in memory are merged into smaller number of segments, either based on size or count constraints as supplied by the user. The Compaction operation is valid for both partitioned tables and simple non-partitioned tables.
We call this operation as VACUUM and have as follows-
1. MINOR Vacuum Operation - based on count of segments to be merged 2. Major Vacuum Operation - based on size of segments to be merged
This operation will be beneficial as it will compact the multiple segments that are maintained into a single segment in memory.
Attached is the initial design document. Please let me know if there are any comments/suggestions.
Thanks and Regards, Siddharth D Jaiswal