This past December and January, the Vertica team for the Asia-Pacific region conducted a story contest for Chinese-speaking Vertica customers using creative solutions to data analytics requirements. In addition to reaching out and learning more about Vertica customers in the region, we wanted to gather use case details and share them with the worldwide Vertica community.
Initially, we had hoped for 10 submissions, but we received a whopping 45 stories for consideration for the Grand Prize, as well as First and Second Runner Up! We were delighted by the enthusiastic response, and here I want to share with you the top three prize winners. All stories were submitted in Chinese, so what follows are summary translations into English.
Second Runner Up: Yutong Bus
Based in Zhengzhou, Henan Province, Yutong Bus is a subsidiary of Yutong Group Co., Ltd., a large-scale industrial group which specializes in bus manufacturing, and supports other strategic businesses such as construction machinery, automotive parts and components, and real estate.
This story was submitted by Qin Chaofeng, Head of Big Data Analysis, who in an earlier life was responsible for the construction of China Mobile’s big data platform and the operation and maintenance management of a Vertica data warehouse. At Yutong Bus, he heads big data analysis for the team responsible for the company’s big data platform and data analytics application.
Yuton Bus’s professional logistics team creates customized logistics solutions.
Our judges felt that Qin Chaofeng’s story was comprehensive and informative, offering a diversified form of tutorial for novices to Vertica, and that it provided good guidance for Vertica users wanting to try some of the techniques for themselves.
Row vs Column Store
Recognizing that many organizations attempt their big data projects using traditional row-store databases, Mr. Qin begins his story with an introduction to Vertica and the benefits of columnar storage. He provides the following simple diagrams to help beginners understand the different architectures.
“For example, we query a table table1, table1 has 100 columns, write an SQL statement to query the data of four columns of col1, col2, col3, and col4:
select col1,col2,col3,col4 from table1 ;
“If it is a row-based relational database, such as Oracle, then it will scan all column data in each row, and then take out col1, col2, col3, and col4. Then for the column-stored Vertica, you only need to scan each row of the four columns of col1, col2, col3, and col4, which reduces the IO operation with the data of the remaining 96 columns.”
Traditional row query
After noting the great improvements to query speed using a column-store database, Mr. Qin notes: “In addition, another advantage of column storage is that it can adopt different compression methods according to different column data types to reduce storage and improve IO performance.”
Comparing Vertica to open source and commercial alternatives
With more diagrams, he shows how Vertica can replace a Hive data warehouse computing engine “and the efficiency can be improved by more than 100 times”; then he shows how “Vertica can also be used as your analysis engine to replace existing Spark ad hoc queries, and the efficiency can be improved dozens of times.”
After a brief comparison of the Greenplum, Clickhouse, Kylin, and Vertica OLAP engines, Mr. Qin shows ~25 lines of Python code “you used to have to write to create, for example, a linear regression model.”
“Now using Vertica, the same can be achieved by calling the function LINEAR_REG… from real-time data access, to data processing, to data analysis, algorithm mining, and visualization applications, Vertica offers one-stop service.”
First Runner Up: Beijing Com&Lan Tech. Corp., Ltd.
Founded in 1998, Beijing Com&Lan is committed to “making IT simpler, making business safer,” and helping IT drive rapid development of business. They have now developed into a domestic first-class IT service company. Their story was submitted by Wang Ke, Technical Consultant for Com&Lan who holds Vertica ASP/CSP technical expert certification, and is proficient “in Gbase, Oracle, and other databases.” Wang Ke has “led and participated in the implementation of multiple PB-level MPP clusters, and has supported more than 20 sets of MPP clusters in China. Now he is committed to providing MPP solutions and technical consulting services for customers in the fields of operators / bank payment / commerce and trade.”
From the Com&Lan website
Our judges noted this story for its high degree of innovation, Wang Ke’s unique technical understanding, and good communication, also describing his current role in database expertise as “exemplary.”
Mr. Wang begins his story describing today’s explosive growth of data volumes, the need for analytical databases, the superior performance of MPP architecture, and the value of separating compute and storage. Focusing the discussion on Vertica deployed in Eon Mode, he details an “experiment using Docker to simulate four CentOS hosts, and builds a Vertica in Eon Mode community edition cluster based on MinIO as shared storage,” as shown below.
Mr. Wang then provides the code used to configure the Docker network, as well as the single-node Minio deployment. For example:
- Pull the minio image
[root@docker ~]docker search minio [root@docker ~]docker pull minio/minio
- Start minio
[root@docker ~]# mkdir -p /data/minio # 创建minio目录 [root@docker ~]# docker run -itd --name minio --restart=always \ --network vnet --ip 172.xxx.xxx.15 \ -e "MINIO_ACCESS_KEY=minio" \ -e "MINIO_SECRET_KEY=minio123456" \ -v /data/minio:/data \ -v /data/minio:/root/.minio \ minio/minio server /data [root@docker ~]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 9f7bc1649168 minio/minio "/usr/bin/docker-ent…" 2 months ago Up 3 hours 9000/tcp minio
Note: So far, the single-node minio has been deployed. The Access and Secret keys are minio and minio123456 respectively.
Mr. Wang then steps through what the user will see on the Minio Console, as well as:
- creating a directory
- creating a Vertica image
- customizing and building the image in Docker
- installing Vertica
- creating a database.
These simple, clear steps should help a Vertica novice understand how Vertica in Eon Mode works with communal storage and Docker. Thanks Wang Ke!
Grand Prize Winner
The Grand Prize winner, based in eastern China, wishes to remain anonymous, but offers a great example of what our judges called a “deep deployment of Vertica for seven years, product understanding, a solid deployment experience base, with indicators that have been scientifically proven” giving this story “strong persuasive power.”
A very large Vertica cluster
“Vertica has been used for nearly 7 years,” and “with the development of the business, the Vertica cluster has gradually grown. In 2020, the MPP data master warehouse node has been expanded to 138, making it the largest single-cluster node in the [also anonymous] industry.”
But, before all that…
“Before the Vertica database, the data warehouse used multiple minicomputer RAC and SAN storage architectures. The data warehouse had been continuously expanded and upgraded for years of business development, but the horizontal expansion under this architecture … was not enough to meet business needs. The shared RAC method also caused exponential growth of the communication traffic between nodes and simply could not continue to be expanded. At the same time, the timeliness of data reporting and the performance of report analysis and calculation cannot meet the business requirements. There is an urgent need for a better performance, lower cost, and expansion.”
The results of using Vertica
After noting several Vertica advantages – especially columnar design, powerful active data compression, and the speed and reduced expense compared to row-store databases, this user describes how Vertica specifically solved several of the problems that had been formerly plaguing the business:
“Through continuous construction and distributed database transformation, the problem of traditional architectures, minicomputers, and SAN storage being unable to provide adequate performance and scalability for petabyte-scale data sets has been solved. Today, Vertica supports various professional business decision-making data service support in three major markets. It has greatly improved system performance and data analysis capabilities, while reducing costs and increasing efficiency.
The characters are Chinese, but the numbers don’t need translation! This is the performance
improvement, in seconds, that resulted in moving from Oracle to Vertica.
“Take the user table for example. The traditional database summary analysis took 1835 seconds; the Vertica summary analysis took 299 seconds. The time for summary analysis is reduced by 1536 seconds, and the data analysis capability is greatly improved. The overall Vertica database is 2 hours earlier than the traditional database, and the timeliness of group assessment and reporting and key business indicators is effectively guaranteed.”
If you can read Chinese, and you’d like to see the full stories (untranslated), check out the following pages:
- Grand Prize winner: https://www.modb.pro/db/229188
- First Runner Up: https://www.modb.pro/db/212524
- Second Runner Up: https://www.modb.pro/db/193852