Managing the Storage Database
Extended Monitoring stores your Vertica database's monitoring data in a dedicated MC storage database.
To use Extended Monitoring, you must first set up the storage database and configure it for Kafka streaming. Then, turn on Extended Monitoring for any or all monitored databases.
MC automatically configures a schema for the storage database, named dcschema, which is synced with DC tables on your monitored databases.
MC Preparation
First verify that MC is not installed on the same host as a Vertica database. When Extended Monitoring is enabled, MC sharing a host with a production database can affect performance.
You must also increase the allocation of memory for the MC application server, as described in the next section. Tune the memory allocation options based on:
- The demands of your database.
- The amount of monitoring data you plan to view in MC charts at the same time.
For example, MC requires more memory to display a week of data in a chart.
Modify Memory Allocation
To modify memory allocation:
- In Management Console, select the Configuration tab on the MC Settings page.
- Modify the following fields under Application Server JVM Settings to increase the allocation of memory for the JVM:
- Initial Heap Size: For Extended Monitoring, a minimum value of 2 GB is recommended. (The default value is 1 GB.)
- Maximum Heap Size: For Extended Monitoring, a minimum value of 4 GB is recommended. (The default value is 2 GB.)
- Click Apply at the top right of the page. A prompt appears to restart MC.
- Click OK to restart MC and save your changes.
Storage Database Requirements
To set up storage for Extended Monitoring, your system must meet the following prerequisites:
- An available host, or available database whose Vertica version is the same version or a higher version of the database you plan to monitor.
- Configured MC for Extended Monitoring (See MC Preparation.)
- Access to a deployed Kafka server (For details on installing Kafka, see the Apache Kafka site.)
Set Up the Storage Database
To configure the storage database for Extended Monitoring, on the MC Settings page, select the MC Storage DB Setup tab. Modify the settings in each of the three areas:
1) Kafka Broker
Enter the host name or IP addresses and ports for one or more of your deployed Kafka servers.
2) MC External Storage Database
Designate the storage database. You can create a new database or use an existing database.
- Create a new database: To create a new single node cluster on an available host using a Community Edition license of Vertica, choose this option. Doing so does not affect your normal Vertica license usage.
- Use an existing database known to MC: To designate a database you have already imported to MC, choose this option. If the schema 'dcschema' exists in the
database, a dialog appears. Depending on your system needs, do one of the following:
To keep the existing schema's data, click Append. For example, if you have already used this database for Extended Monitoring storage and are reimporting it, you can use this option to retain its historical data for continued use.
- To clear the existing schema from the database and create a fresh version of dcschema configured for Extended Monitoring storage, click Remove.
At the Database name prompt:
- Select the database you want to use from the drop-down list.
-
To use that database for Extended Monitoring, click Prepare MC Storage database.
Advanced Streaming Options:
To change the value of the Scheduler Frame Duration, click Advanced Streaming Options. Management Console displays the Streaming Options window, which allows you to modify the Scheduler Frame Duration default that Management Console uses for Extended Monitoring..
The Scheduler Frame Duration is the amount of time given to the Kafka scheduler for each individual frame to process and run the COPY statements, after which KafkaSource terminates the COPY statements. Vertica must have enough time to complete COPY tasks within this duration.
If the frame duration is too small, you would see data loss, as the scheduler does not have sufficient time to process all the data. You may see errors or messages on Management Console’s Load page for microbatches that are not able to process the data.
On the contrary, if the frame duration is too large, the scheduler will have too much time to process the incoming data and after it has finished processing the data, it might wait for the frame duration to expire. In this case, you may see some latency in the data getting processed. In addition, the charts in Management Console may not show the data in real time and may show some latency.
You can approximate the average available time per COPY using the following equation:
TimePerCopy=(FrameDuration*Parallelism)/Microbatches
This equation provides only a rough estimate. There are many factors that impact the amount of time that each COPY statement needs to run.
Vertica requires at least 100 milliseconds per COPY to function.
The Advanced Scheduler options button is enabled when the Streaming is turned off. If Kafka Streaming is enabled, the Advanced Scheduler options button is disabled.
3) Enable Extended Monitoring
Click Select database(s) for extended monitoring.
For more information, see Managing Extended Monitoring on a Database.
Restart the Storage Database
If you stop the storage database while streaming is enabled, streaming to the storage database stops automatically. You must re-enable streaming on the MC Storage DB Setup tab after you restart the storage database.
If streaming to the MC storage database is disabled while Extended Monitoring on your database is on, the Kafka retention policy determines how long streaming can remain disabled without data loss. See Managing Streaming Services for Extended Monitoring.
Discontinue the Storage Database
- Select the Extended Monitoring tab in MC Settings.
- Set Extended Monitoring for all databases to OFF.
- Select the MC Storage DB Setup tab in MC Settings.
- Click Disable Streaming in the MC External Storage Database section to de-activate your storage database.
- Click Remove in the MC External Storage Database section to remove the MC Storage Database from MC.
- Choose whether to keep or remove the data your storage database has collected:
- Keep Data: Existing data will not removed. If you re-use this database for Extended Monitoring storage, you can choose to append new collected monitoring data to this existing data.
- Remove Data: MC deletes its customized storage schema from the database.
Configure Storage Database Memory Usage
On the Resource Pools tab of the storage database, you can optionally increase the memory size of SYSQUERY and KAFKA_DEFAULT_POOL. For setting resource pool parameters in MC, see Configuring Resource Pools in Management Console.
- SYSQUERY: Reserved for temporary storage of intermediate results of queries against system monitoring and catalog tables. Default setting is 1G. For best performance for MC, set to 2G or higher.
- KAFKA_DEFAULT_POOL: Reserved for all queries executed by the Kafka scheduler. Default setting is 30%, which is the recommended setting. By default, queries spill to the general pool when they exceed the 30% memory size.
Manage Disk Space
The storage database uses a customized schema, named dcschema. You can monitor these tables on MC, using the Table Utilization chart on the storage database's Activity tab. The Table Utilization chart lists all the tables in dcschema and their details, such as row counts and column properties. You can sort by row count to determine if certain tables use more disk space on your storage database. SeeMonitoring Table Utilization and Projections.
You should regularly drop partitions from dcschema if you have limited disk space for the MC storage database. MC does not automatically drop partitions from the storage database. For more information on dropping partitions, seeDropping Partitions.
The table dc_execution_engine_profiles is partitioned by day. Because this table typically contains the most rows, as a best practice you should drop partitions from this table more often. The following example shows how you can specify partition key 2016-08-22 to drop a partition from dc_execution_engine_profiles.
=> SELECT DROP_PARTITIONS ('dcschema.dc_execution_engine_profiles', 2016-08-2, 2016-08-22);
Other than dc_execution_engine_profiles, all other tables in dcschema are partitioned by week. The next example shows you how you can drop a partition from the table dc_cpu_aggregate_by_minute, specifying the thirty-fourth week of 2016.
=> SELECT DROP_PARTITION ('dcschema.dc_cpu_aggregate_by_minute', 201634, 201634);
Manage Client Sessions
By default Vertica allows 50 client sessions and an additional five administrator sessions per node. If you reach the limit on the storage database, MC switches back to default monitoring, and does not use Extended Monitoring data from the storage database.
You can optionally configure the maximum number of client sessions that can run on a single database cluster node on your MC storage database's Settings page:
- On the storage database dashboard, click the Settings page.
- Choose the General tab.
- Enter a value in the Maximum client sessions field. Valid values are 0–1000.
For more details about managing client connections in MC, see Managing Client Connections.