July 2020 - Dev - mailweb.openlookeng.io

openLooKeng PMC Biweekly Meeting
by Maowei (Morewell) 31 Jul '20

31 Jul '20

Just remind that our PMC Biweekly meeting will be held at 11 a.m. Beijing time today~ openLooKeng PMC Biweekly Meeting Agenda: * Remaining Issues of last meeting * Review the features of 930 vesion ――Michael * Introduce PMC's voting rules and methods for requirements and issues――Fred Li If you have other topics to discuss please leave a message here：https://etherpad.openlookeng.io/p/PMC-weeklymeeting Meeting Link: https://welink-meeting.zoom.us/j/663235710

2 1

Remove he segment when all segments are deleted in a segment.
by Neeraj Unnikrishnan 30 Jul '20

30 Jul '20

Hello all, This proposal is to add an enhancement to the carbondata connector which marks the entire segment as marked for delete if all the elements in that segment is deleted. Please have a look and let me know if you have any suggestions or comments. Thanks & Regards, Neeraj U

1 0

openLooKeng PMC Biweekly Meeting
by Maowei (Morewell) 29 Jul '20

29 Jul '20

openLooKeng PMC Biweekly Meeting Agenda: * Remaining Issues of last meeting * Review the features of 930 vesion ――Michael * …… If you have other topics to discuss please leave a message here：https://etherpad.openlookeng.io/p/PMC-weeklymeeting Meeting Link: https://welink-meeting.zoom.us/j/663235710

1 0

Auto-Compaction
by K Sandeep 28 Jul '20

28 Jul '20

Hi all, Openlookeng not support auto compaction of tables. This becomes the limitation of openlookeng since other processing engines like spark provide the same. Find design doc in attachments. Kindly let me know if there are any comments/suggestions. Thanks, Sandeep.k

1 0

Introduce the new work mechanism for approvers and reviewers
by zengchen 28 Jul '20

28 Jul '20

Hi all contributors: My name is zeng Chen and in charge of the service of robot for openlookeng. Recently, some contributors ask me to manage the approvers and reviewers as this one which is already running on k8s. I think it is a good suggestion. I send this email to clarify how does that mechanism run. 1. How to manage approvers The approvers are managed though the OWNERS files which may exist at each level directories. An approver in an OWNERS file can approve(by adding approved label) any file in that directory and its sub-directories, but he/she can't approve the files in the parent directory and any other sibling directories. 2. How to add label of 'approved' Please see the example to get the details. 3. How to manage reviewers Like the way to manage the approvers, reviewers are managed by OWNERS file too. A reviewer in an OWNERS file can approve(by adding lgtm label) any file in that directory and its sub-directories, but he/she can't approve the files in the parent directory and any other sibling directories. This is the big change to before. The old mechanism is that any one of collaborators which are managed by the git platform such as Gitee, Github can add lgtm label. The purpose is to let the reviewer work as approver and make each files at different level directories reviewed by different reviewers. 4. How to add label 'lgtm' The process is also like approve process, see the example of adding approved label above. In summary, both the approver and reviewer will be managed by OWNERS file and their working mechanisms are also the same. Appreciated for any suggestions. Thanks! Best Wishes zeng chen

2 2

[doc]new requirement about guidance to tryme
by Liyongle (Fred) 22 Jul '20

22 Jul '20

Hi docs team, I created [1] for the guidance to tryme. Please find the details in the issue, and leave comments there or by replying this email. Thanks [1] https://gitee.com/openlookeng/hetu-core/issues/I1OQ2E Fred 李永乐

1 0

MOM- openLooKeng PMC Biweekly Meeting at 11:00 a.m. Beijing time on July 17
by Maowei (Morewell) 21 Jul '20

21 Jul '20

Biweekly meeting at 11:00 a.m. Beijing time on July 17. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Attendees list: �� Ken Zhang �� Michael Li �� Raghunandan �� Xulifeng �� Xutianli �� Zhangwei �� Yangguo �� Maowei �� Zhangheng �� Fred Li �� Huangyan �� Ella Liu �� Lizheng �� Nitin �� Xudezhi �� Liaodenghong �� Rajeev rastogi �� Zhangjingfang Agenda: �� PMC Operation Mechanism �� Ken �� Vision of openLooKeng��Michael �� 930 openLooKeng version features ��Michael �� Proposed openLooKeng RoadMap ��Michael Minutes: �� PMC Operation Mechanism * Major responsibilities of PMC �� Final technical decision-making authority of the open source code base �� Open is mandatory, all communication must be visible �� Maintain roadmap and milestones �C Ken, Xulifeng �� Manage efficiency, quality and dependency of development process �C Raghu, Chenzeng �� Design reviews �C Michael �� Maintain approver/committer lists - All �� Community expansion �C Michael, Maowei, Raghu * Roadmap & Milestone-Ken��Xulifeng �� Vision - responsible for defining the system positioning and technology direction �� Roadmap and milestone for the second half year of 2020 �C 1st draft on July 30th �� Quarterly releases �C 1st official release: 930 �� Major semi-annual releases �� Release Manager �C Maowei * Software Development Life Cycle (SDLC) efficiency and quality��Raghu �� Responsible for the proper SDLC process to support quality development �� Design guidelines, Coding guidelines, UT �� Code inspection * Design reviews��Michael �� All design review requests to be raised on slack and dev(a)openlooken.io<mailto:dev@openlooken.io> for visibility �� Ensure in-depth discussion on all features �� All features must have design docs in our community �� Only approved feature with proper design guidelines can be merged * Community Expansion �C Ella (Canada), Maowei (China), Raghu (India) �� Community expansion events will be organized independently in each location while some materials will be shared �� Collect a list of all communities can help our expansion �� University Events �� Hackthon * PMC Activity Organizers �C Maowei/Ella �� Biweekly meetings �� Each meeting must have an agenda sent out ahead of time �� Meeting minutes must be sent to dev/user mailing list after finished �� Vision of openLooKeng * A stupid and simple in-situ engine to satisfy ad-hoc, batch, and streaming use cases * Plug and Play capability to improve performance * Global data access with simplified view via virtualization capability �� 930 openLooKeng version features * Further discussion is needed. In the next PMC meeting we should review items with priority for 930 release �� Proposed openLooKeng RoadMap * 2020 Q4��Focus on engine enhancements which includes core, performance and north-south enhancements. * 2021 Q1��Focus on Cross DC Performance Improvements��CBO enhancements and integration with hardware * 2021 Q2��Focus on Stream processing��Batch processing etc. Remaining Issues�� Collect a list of communities that can help us to expand��Fred Li / Maowei �� Where do we archive the design docs��Michael �� Find a tool that can be used for voting features on an open source platform��Fred Li Also you can see this MOM in here��https://etherpad.openlookeng.io/p/PMC-weeklymeeting +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

1 0

Alter Table Support For Carbondata Connector
by Neeraj Unnikrishnan 20 Jul '20

20 Jul '20

Hi all, This proposal is for providing the support for Alter table operations on Carbondata connector. Attached is the design document for the same. Please let me know if you have any comments or suggestions.

1 0

Table Pushdown Query Optimizer
by Siddharth Jaiswal 20 Jul '20

20 Jul '20

Hello All, We propose to include a query optimizer called Table Pushdown which is a rule based optimizer and acts upon queries which satisfy the constraint that the query has a subquery within it and the outer table must have a join with the inner table specified in the subquery on a unique/Primary Key column of the outer table. The following reference might be helpful- https://15721.courses.cs.cmu.edu/spring2017/papers/15-optimizer2/a8-sen.pdf The expectation is that by using this query optimizer, the run time can be reduced significantly for such queries thereby providing a significant advantage to the OpenLookeng engine's query processing capabilities. Attached is the design document. Please let me know if there are any review comments/suggestions. Thanks and Regards, Siddharth D Jaiswal

1 0

Auto-Compaction for Carbondata Connector
by K Sandeep 20 Jul '20

20 Jul '20

Openlookeng integration with Carbondata not supporting auto compaction of tables. This becomes the limitation of openlookeng since other processing engine like spark provides the same. Find initial design doc in attachments. Kindly let me know if there are any comments/suggestions. Thanks, Sandeep.k

1 0

[all]PPT templates are available
by Fred Li 17 Jul '20

17 Jul '20

Hi community, Thanks to lvzixin who helped us to design the ppt templates[1]. [1] https://gitee.com/openlookeng/community/tree/master/artwork/ppt_template -- Regards Fred Li (李永乐)

1 0

openLooKeng PMC Biweekly Meeting
by Maowei (Morewell) 16 Jul '20

16 Jul '20

openLooKeng PMC Biweekly Meeting Agenda: * PMC Operation Mechanism * Large-granularity Milesetone & RoadMap * 930 openLooKeng version features * Other…… If you have other topics to discuss please leave a message here：https://etherpad.openlookeng.io/p/PMC-weeklymeeting Meeting Link: https://welink-meeting.zoom.us/j/813957346

1 0

Auto cleanup for Hive ORC compaction
by Aman Omer 15 Jul '20

15 Jul '20

Hi all, Since, openlookeng is now supporting compaction/vacuum operation on Hive transactional tables which merges all base and delta files together (according to the compaction type). So that upcoming read queries can be redirected to use newly created, compacted directories which improves performance. This makes old, smaller delta directories eligible for cleanup. Therefore, a background process is required which will be responsible for removing any redundant directory from table path. Check attached document for more details. Kindly let me know if there are any comments/suggestions. I have created a slack channel (#auto-cleanup). Kindly join it for discussion. Thanks, Aman Omer

1 0

Carbondata Auto-Cleanup
by K Sandeep 15 Jul '20

15 Jul '20

Hello all, Openlookeng integration with Carbondata not supporting auto cleanup of vacuum/compaction tables. This becomes the limitation of openlookeng since other processing engine like spark provides the same. Find initial design doc in attachments. Kindly let me know if there are any comments/suggestions. Thanks, Sandeep.k

1 0

Carbondata Auto-Cleanup
by Sandeep K 15 Jul '20

15 Jul '20

Hello all, Openlookeng integration with Carbondata not supporting auto cleanup of vacuum/compaction tables. This becomes the limitation of openlookeng since other processing engine like spark provides the same. Find initial design doc in attachments. Kindly let me know if there are any comments/suggestions. Thanks, Sandeep.k

1 0

IUD support on Carbondata partitioned table
by Aman Omer 15 Jul '20

15 Jul '20

Hello all, Openlookeng integration with Carbondata supports creation and reading data from partitioned table but doesn’t provide support for insert, update and delete. This becomes the limitation of openlookeng since other processing engine like spark provides the same. Insert, Update and Delete support on carbondata partitioned table is the high level goal of this proposal. Find initial design doc in attachments. Kindly let me know if there are any comments/suggestions. Thanks, Aman Omer

2 1

Welcome Xulifeng and Xutianli to the openLooKeng PMC
by Ken Zhang 15 Jul '20

15 Jul '20

Hi All, Our community is expanding! To ensure our roadmap and vision matches the big data trend. I am honored to announce Xulifeng and Xutianli has accepted our invitation to join openLooKeng PMC. Both of them have over 10 years of big data experience with in-depth knowledge about streaming, batch and ad-hoc data processing. They will not only bring insights on technical directions, but also provide valuable user scenarios which can help us shape the future of openLooKeng. Please join me in welcoming Xulifeng and Xutianli. Best regards, /** * Ken Zhang * openLooKeng PMC */

4 3

转发: Welcome to the "Dev" mailing list
by Jiangdayong 15 Jul '20

15 Jul '20

江大勇 -----邮件原件----- 发件人: dev-request(a)openlookeng.io [mailto:dev-request@openlookeng.io] 发送时间: 2020年7月15日 11:37 收件人: Jiangdayong <jiangdayong(a)huawei.com> 主题: Welcome to the "Dev" mailing list Welcome to the "Dev" mailing list! To post to this list, send your email to: dev(a)openlookeng.io You can unsubscribe or make adjustments to your options via email by sending a message to: dev-request(a)openlookeng.io with the word 'help' in the subject or body (don't include the quotes), and you will get back a message with instructions. You will need your password to change your options, but for security purposes, this password is not included here. If you have forgotten your password you will need to reset it via the web UI.

1 0

Carbondata Auto-Cleanup
by Sandeep K 15 Jul '20

15 Jul '20

Hello all, Openlookeng integration with Carbondata not supporting auto cleanup of vacuum/compaction tables. This becomes the limitation of openlookeng since other processing engine like spark provides the same. Find initial design doc in attachments. Kindly let me know if there are any comments/suggestions. Thanks, Sandeep.k

1 0

Compaction on Carbondata Connector [Carbondata]
by Siddharth Jaiswal 14 Jul '20

14 Jul '20

Hello All, We propose to include Compaction operation for Carbondata data source such that multiple segments of Carbondata files in memory are merged into smaller number of segments, either based on size or count constraints as supplied by the user. The Compaction operation is valid for both partitioned tables and simple non-partitioned tables. We call this operation as VACUUM and have as follows- 1. MINOR Vacuum Operation - based on count of segments to be merged 2. Major Vacuum Operation - based on size of segments to be merged This operation will be beneficial as it will compact the multiple segments that are maintained into a single segment in memory. Attached is the initial design document. Please let me know if there are any comments/suggestions. Thanks and Regards, Siddharth D Jaiswal

1 0

Table Scan Optimization Using Filter Pushdown [HIVE-ORC]
by Rajeev Rastogi 13 Jul '20

13 Jul '20

Hello All, Many queries attribute good percentage of CPU time to table scan, so with making scan improvement, many queries will get performance improvement. Currently we focus on table scan optimization for tables stored in ORC format (but should be possible to extend to other data formats also). Overall approach for filter pushdown can be depicted as in the below picture. As can be seen, in the new approach (right side), filter evaluation has been pushed to the HIVE connector from the engine. [image: image.png] Below is a list of ideas to optimize table scan by reducing CPU time. 1. *Filter Pushdown* (Only deterministic filter e.g. id>10, id in (10,20) etc): This optimization is required to push the filter from engine to hive connector so that only filtered rows/columns get returned back to the engine layer. 2. *Efficient Row Skipping*: Currently if a filter on one column matches only for a small set of rows, then still we read all values in subsequent columns which are part of the query and immediately discard them. This waste CPU cycle unnecessarily. This can be optimized to read only rows for next columns which matched as part of the previous column. 3. *Avoid unwanted Columns*: Using this approach, the connector will return only columns required to project and not the column which was just used for the filter. 4. *Filter Re-ordering (Part-1)*: Considering the "Efficient Row Skipping", it is always beneficial to process columns with filter first which results in the least number of rows, so that further column processing needs to process only a relatively small number of rows. As part of this re-ordering will be done in such a way that the first column with filter will be processed first and then column without filter. But with-in the multiple column with filter ordering will be done later. In the first proposal, We would like to propose a design for the all the above optimization idea. Major changes required in HIVE connector to process filter along with some changes in optimizer to push filter down to connector. 1. HIVE Connector (ORC format) - Rajeev Rastogi 2. Optimizer - Nitin Kashyap. We will require a session configuration variable to enable/disable this feature. By default it will be disabled. We propose to have a configuration variable named as "*orc_predicate_pushdown_enabled*" for the same. But we can also discuss naming it as "*experimental_orc_predicate_pushdown_enabled*" till we see all TODO items finished and see everything working fine. Attached is the design document. Please let us know if any comments/suggestions. Once the first part is done, then it will open up opportunities for further optimization, some of them are as mentioned below: 5. *Not-deterministic Filter Pushdown (e.g. id1+id2 > 10 etc)*: As part of this non-deterministic filter will be pushed down to the connector layer and the same will be processed in the connector itself (like function processing). 6. *Filter Re-ordering (Part-2): *As part of this, re-ordering among multiple columns with a filter will be done. We may make use of stats (or type of filter) here to see which filter may return how many rows and then accordingly reorder it. 7. *Sub-field Pruning*: Hetu supports structural complex data-type e.g. Map, Array, List. In the current approach HIVE connector returns a whole map corresponding to a column and row, which may include many keys but the application might be interested in only one key. So here the optimization idea is to prune those unwanted keys and return whatever really wanted by the user. E.g. query "*SELECT ISDCODE('INDIA') FROM CITIZEN*", connector will no longer return ISDCODE corresponding to other countries stored in this row column. 8. *Multi-Column OR condition*: Handling of predicate OR condition on multiple columns. 9. In addition to these optimizations, there are some *TODO items in the design document*, which we can discuss to decide if it can be taken up after the first phase. -- *Thanks and Regards,Kumar Rajeev RastogiCell No - +91 8971367787*

1 0

[openEuler][Big Data][openLooKeng]Is it possible to introduce openLooKeng to openEuler
by Liyongle (Fred) 07 Jul '20

07 Jul '20

Hi openEuler AI-Bigdata SIG and openLooKeng development team, It happened that I am in both openEuler[0] and openLooKeng[2] communities. openLooKeng was opened just few days ago. I knew that AI & Big Data SIG in openEuler is working on many projects about Big Data. I hope there will be a chance to introduce openLooKeng into openEuler. This email was sent to bridge the 2 teams. Regards [0] https://openeuler.org [1] https://gitee.com/openeuler/community/tree/master/sig/sig-ai-bigdata [2] https://openlookeng.io Fred 李永乐

2 1

2024

2023

2022

2021

2020

Dev July 2020