Master Blog Series: Vertica Database Administrators

This blog post was authored by Soniya Shah. Are you a database administrator looking for ways to get the most from your Vertica database? If so, this post is for you. You’re already familiar with the technicalities of Vertica – the Tuple Mover, deletes, projections, and more. If you’re looking to get started, check out […]

Vertica Wins at Strata Data Conference

We all love to win and it’s even better when Vertica and our customers win! Earlier this week at the Strata Conference in New York City, our customer Cerner won the “Most Innovative Product” award – a product that is powered by Vertica! Cerner is a leading supplier of health information technology solutions and services. […]

How do you use UDx’s?

We’ve posted a new Product Management feedback survey and we’re wondering what you think about our SDK and how you use the UDx’s. We appreciate all your feedback! You can find the survey here. https://in.hotjar.com/s?siteId=438341&surveyId=109476

Master Blog Series: Getting Started with Vertica

This post was authored by Soniya Shah. Are you a new Vertica user? If so, you’re probably wondering where to start. We’re here to help you on your big data analytics journey, from understanding Vertica terminology to making the most of your resources. If you find yourself asking questions like What does the Tuple Mover […]

What are your Data Loading Preferences?

We’re back with our newest product management survey for this summer! This time we’re asking about how you load your data – everything from the ETL tools you use to how you manage your information. The answers you provide will help us fit Vertica into your infrastructure in a way that is ideal for your […]

Master Blog Series : Vertica in Eon Mode

This post was authored by Soniya Shah. In Vertica 9.0 we introduced Eon Mode. Since then, there have been many improvements in recent releases. Vertica in Eon Mode is a new architecture that separates compute and storage, allowing users to take advantage of cloud economics that enable rapid scaling and shrinking of clusters in response […]

Using Java UDX in Vertica

Michael Flower authored this post. Introduction Vertica has a highly extensible UDx framework, which allows external user-defined functions, parsers and data loaders to be installed onto the Vertica server. This means that a routine written in C++, R, Java or Python can be run in-database as a Vertica SQL function. This blog is based on […]

DataGals, Colin, Meg, and Micro Focus

Crystal North authored this post. What is Inclusion & Diversity? What is an ERG? Those questions are asked often and I am beyond excited to answer them. Inclusion & Diversity are what fuels any sort of growth in any capacity. From family, to friends, to education, to community, to company, to organization, etc etc etc, […]

GDPR READY RECIPE

This blog post was authored by Moshe Goldberg. Disclaimer This document content is for informational purposes only. The content here should not be relied upon as legal advice, or to determine how GDPR applies to your organization or you. We strongly advise working with a qualified legal GDPR professional to understand how the regulation applies […]

Introducing the Vertica ML-Python Library

This blog post was authored by Soniya Shah. One of the coolest things about working at Vertica is our amazing intern program, which often leads to full-time hires. Last year, the Vertica-ML-Python library, also known as vpython, was started as an internship project by Badr Ouali. A year later, he works for Vertica full time […]

Verticaのアップグレード方法

Verticaは、すべてのリリースで、新しい機能を追加し既存の機能を強化します。新機能あるいは改良された機能を使用するには、Verticaを最新リリースにアップグレードしてください。 前提条件 アップグレードする前に、次の手順を実行します。 データベースのフルバックアップを実行します。アップグレードが成功しなかった場合、フルバックアップで現在のバージョンにロールバックすることができます。ハードリンクでのバックアップの取得を検討可能です。ディスク障害が発生すると、バックアップが破損することに注意してください。 新しいバージョンのプラットフォーム要件を確認します。 カタログ用のストレージスペースを確認します。 HDFSコネクタをアンインストールします(Vertica 7.2.xより前のバージョンからアップグレードする場合のみ必要です)。 vertica-R-langなどのVerticaサーバーパッケージと依存関係のあるものについても、アンインストールを実施します。 最高のパフォーマンス結果を得るために、すぐに次のバージョンにアップグレードすることをご推奨します。 例:次の増分バージョンのVerticaにアップグレードします。たとえば、7.1から7.2または8.0から8.1にアップグレードします。詳細については、Vertica Documentation の Upgrade Paths を参照してください。 これらのタスクの完了後、データベースをシャットダウンし、次の手順を実行します。 ステップ タスク 結果 1 既存のデータベースの完全なローカルバックアップを取得し、データベースを停止します。 $ /opt/vertica/bin/admintools -t stop_db -d db-name バックアップが成功した場合、Step 2 に進みます。 バックアップが失敗した場合、Vertica Documentation の Creating Full Backups を参照してください。 2 各ホストにインストールした追加のパッケージをアンインストールします。 rpm -e vertica-package-nameこの手順を省略し、追加のパッケージをアンインストールしない場合、次のステップでVerticaサーバーパッケージのインストールに失敗します。 アンインストールが成功した場合、Step 3 に進みます。 アンインストールが失敗した場合、エラーを確認し、Verticaテクニカルサポートまでお問い合わせください。 3 任意のホストに新しいVerticaサーバーパッケージをrootまたはsudoでインストールします。 # rpm -Uvh pathname $ sudo […]

新規ノードをクラスターに追加する場合の対処方法

より多くのストレージを必要としますか?追加のストレージが必要な場合、データベースにノードを追加することを検討してください。新しいノードは、すべて同時に追加することをご推奨します。 このチェックリストを使用して、ノードを追加することができます。これらは基本的なステップです。Vertica Documentation には、追加のオプションが掲載されています。 ステップ タスク 結果 1 既存のデータベースの完全なローカルバックアップを取得します。 $ vbr -t backup –config $FULLBAK_CONFIGデフォルトでは、進行状況バー以外の出力は表示されません。追加の進行状況情報を含めるには、1-3の間の値が指定可能な「–debugオプション」を使用します。 バックアップユーティリティを実行している際にconfigファイルを指定します。 ファイルが存在しない場合、バックアップはエラーで失敗します。エラーを解決し、Step 2 に進みます。 2 古い、あるいは、未使用のテーブルパーティションを削除します。 => SELECT DROP_PARTITION(table-name, partition value); テーブルパーティションが削除された場合、Step 3 に進みます。 テーブルパーティションが削除されていない場合、テーブル名とパーティションの値を確認して、もう一度やり直してください。 3 ローカルセグメンテーションが無効であることを確認します。無効になっていない場合、無効にします。 => SELECT DISABLE_LOCAL_SEGMENTS(); ローカルセグメンテーションが無効になっていることを確認後、Step 4 に進みます。 4 クラスターのネットワーク帯域幅とCPUパフォーマンスを確認します。 $ /opt/vertica/bin/vnetperf $ /opt/vertica/bin/vcpuperf ノード間のネットワークのレイテンシとスループットを測定し、ハードドライブの速度を測定します。 レイテンシとスループットが初期ベンチマークよりも低い場合、システム管理者に連絡し、性能劣化の問題を解決してください。 5 リバランスを実行するのに十分なストレージ(データベースのサイズの少なくとも40%)があるかどうかを確認します。 各ノードのスナップショットを取得するには、HOST_RESOURCESシステムテーブルの次のフィールドを確認します。 => SELECT host_name, disk_space_used_mb, disk_space_total_mb disk_space_free_mb FROM host_resources; 各ノードのディスクストレージの容量がリバランスを実行するのに十分であることを確認します。十分である場合、Step […]

AHM(Ancient History Mark)が進まない場合の対処方法

AHMが進んでいない場合、次のチェックリストを使用してトラブルシューティングを行います。 ステップ タスク 結果 1 Last Good Epoch(LGE)が進んでいるかどうかを確認します。 => SELECT CURRENT_EPOCH, LAST_GOOD_EPOCH, AHM_EPOCH FROM SYSTEM; LGEが進んでいる場合、Step 2 へ。 LGEが進んでいない場合、Step 5 へ。 2 すべてのノードがUPしているかどうかを確認します。 => SELECT * FROM NODES WHERE NODE_STATE = ‘UP’; すべてのノードがUPの場合、Step 3 へ。 1つ以上のノードがDOWNの場合、下記コマンドを使用してすべてのノードをUPにします。 $ admintools -t restart_node -d <database name> -s <node_name> すべてのノードがUPになった後、Step 4 へ。 3 リフレッシュが実行されていないプロジェクションがないかどうか確認します。 => SELECT PROJECTION_NAME, NODE_NAME, IS_UP_TO_DATE FROM PROJECTIONS WHERE IS_UP_TO_DATE […]

メンテナンスのためにVerticaノード停止する場合の対処方法

メンテナンスのためにVerticaノードをシャットダウンする必要がある場合、このチェックリストが役立ちます。 ステップ タスク 結果 1 すべてのノードがUPであることを確認します。 $ /opt/vertica/bin/admintools -t view_cluster シャットダウン後の長時間にわたるノードリカバリを避けるには、1つ以上のノードが停止している場合、Restarting Vertica on a Host 内の説明にしたがって、なるべく早く検知し、再起動します。 2 ノードを再起動できない場合、 AHMが長い間進んでいないかどうか確認します。 => SELECT current_epoch, ahm_epoch, last_good_epoch FROM system; デリートベクター数を確認します。 => SELECT COUNT(*) FROM delete_vectors; AHMが長時間保持されていて、デリートベクター数が多く、また、複数のテーブルにまたがっている場合、ノードリカバリが遅延する可能性があります。 この場合、ノードリカバリのチェックリスト内の手順に従います。 次の内容があてはまるとします。 データの3%以上に影響を与える削除や更新を行っています。 ノードが長い間停止しています。 多くの更新と削除を行うステージングテーブルがある場合、ノードをリカバリする前にテーブルを削除します。 最後の選択肢は、次のコマンドでAHMを強制的に進めることです。 => SELECT MAKE_AHM_NOW(true); 続いて、停止ノードをリカバリすると、Verticaはバディーノードからデータをコピーすることによって、そのノードを初期状態からリカバリします。 3 ノードの依存関係を確認します。 => SELECT GET_NODE_DEPENDENCIES(); 想定するノード依存関係の場合、(ノードの数 + 1)行をリストします。 ノードの依存関係が正しい場合、Step 4 へ。 ノードの依存関係が正しくない場合、クラスター内のデータをリバランスします。 => […]

ノードのリカバリ処理遅延の対処方法

7.2.x以降を使用している場合、テーブルごとにリカバリを実行してください。詳細については、Vertica Documentation の Recovery By Table を参照ください。 7.1.xより前のVerticaバージョンを使用している場合、ETLジョブ等の更新処理を行う処理を停止してノードリカバリを再開してください。 ステップ タスク 結果 1 リカバリの進捗状況をモニターします。 => SELECT node_name, is_running FROM RECOVERY_STATUS;「is_running = f」 の場合、リカバリは完了しています。 このステートメントを繰り返し実行し、current_completedの値が増加しているかどうかを確認します。増加している場合、リカバリが進行中であることを意味します。 リカバリは進行中ですか? リカバリが進行していない場合、Step 3 へ。 リカバリが進行している、あるいは、終了している場合、Step 2 へ。 2 ノードリカバリは正常に完了しましたか? 正常に完了している場合、チェックリストは完了です。 正常に完了しておらず、リカバリがエラーで終了している場合、Step 6 へ。 3 リカバリは予想よりも遅いですか? 遅くない場合、Step 4 へ。 遅い場合、次の内容を実施します。 iostatを使用し、ディスクI/Oをチェックします。問題がある場合、ディスクI/Oスケジューラをdeadline(HDDの場合)あるいはnoop(SSDの場合)に変更します。 同時に実行されているクエリの数が最大数、あるいはそれに近いかどうかを確認します。=> SELECT node_name, pool_name, max_concurrency, running_query_count from RESOURCE_POOL_STATUS; 「max_concurrency = running_query_count」の場合、クエリのロードが非常に高いと言えます。 a. MAXCONCURRENCY の増加 b. リカバリの再開 c. […]

ノードがSpreadに接続できなくなった場合の対処方法

ノードがSpreadに接続されていない場合、次のチェックリストを使用してトラブルシューティングを行います。 ステップ タスク 結果 1 カタログフォルダ内のspread.confファイルが、クラスター内のすべてのノードで同一であるかどうかを確認します。 $ cat spread.conf spread.confファイルがすべてのノードで同一の場合、 Step 2 へ。 すべてのノードでspread.confファイルが同一でない場合、次の手順を実施します。 正しいspread.confファイルを確認します。ファイルには、すべてのクラスターノードのIPアドレスが含まれているはずです。 正しいspread.confファイルをscpを使用してすべてのノードにコピーします。 2 spreadのポート(デフォルトは4803)が次のいずれかを使用してリスニングしているかどうかを確認します。 $ netstat -ap | grep 4803 もしくは $ telnet <spread IP address> 4803 クラスター内の各ノードには、それぞれのspreadのポートがあります。クラスター内のspreadによって使用されるポートを確認するには、次の手順を実行します。 /database_catalog_directory/spread.conf 内で確認する。 「Spread_Segment 192.168.40.255:4803」のようなSpread_Segmentの行のIPの後の内容(この場合、4803)を確認する。 spreadがリスニングしていない場合、Step 3 へ。 spreadがリスニングしている場合、Step 4 へ。 3 ファイアウォールが有効になっていないことを確認します。 # iptables –L ファイアウォールが有効になっている場合、ファイアウォールを無効にするか、ネットワーク管理者にポートを開くように依頼してください。 4 /etc/hosts ファイルにクラスターのすべてのIPアドレスが含まれているかどうかを確認します。 ファイルがクラスターのすべてのIPアドレスを含んでいて正しい内容の場合、Step 5 へ。 ファイルが正しくない場合、クラスター内のすべてのノードを含めるように/etc/hostsファイルを修正し、Step 5 へ。 5 UDP受信パケットにエラーが含まれていないかどうかを確認します。 $ netstat […]

リバランス処理遅延の対処方法

Verticaクラスターにノードを追加したり、クラスターからノードを削除したりすると、Verticaはすべてのノード上でデータのリバランスを実施します。リバランスに長時間を要する場合、これらの手順を参照して原因を調べます。 前提条件 リバランスを開始する前に、クラスターの正常なリバランスを確実に行うために、以下のステップを実行してください。 1. ETLジョブと競合しない時間帯にリバランスをスケジューリングします。 2. データベースをバックアップします。 3. 古い、あるいは、未使用のテーブルパーティションを削除します。 4. ローカルセグメンテーションが無効であることを確認します。ローカルセグメンテーションが無効になっていない場合、このコマンドを実行して無効にします。 => SELECT DISABLE_LOCAL_SEGMENTS(); 5. vioperfとvnetperfを使用して、CPUとネットワークの帯域幅をそれぞれ確認します。 使用可能な帯域幅が初期ベンチマークの値よりも小さい場合、システム管理者に連絡して、性能が低下している原因となる問題を見つけて修正してください。 6. リバランスを実行するために、データベースのサイズの少なくとも40%のストレージが使用可能であるかどうかを確認します。ストレージの使用状況を確認するには、次のクエリを実行します。 => SELECT node_name, storage_path, disk_space_used_mb, disk_space_free_mb FROM DISK_STORAGE; Linuxファイルシステムで使用可能なディスク容量を確認します。 $ df -h HOST_RESOURCESシステムテーブルから各ノードのスナップショットを取得します。 => SELECT host_name, disk_space_used_mb, disk_space_total_mb FROM HOST_RESOURCES; ストレージが不足している場合、カタログサイズを縮小するための手順を実行してください。 不要なデータ、一時的なデータ、ステージングデータを削除する。 ログファイルをクリーンアップする。 不要なテーブルまたはパーティションを削除する。 新しいドライブを追加し、ストレージのロケーションを追加し、一部のカタログオブジェクトを新しいロケーションに移行する。 リバランスの間に使用される一時スペース用の一時格納領域を追加する。 ビルトインのREFRESHリソースプールの設定を確認します。 => SELECT name, is_internal, plannedconcurrency, maxmemorysize FROM RESOURCE_POOLS WHERE […]

ストレージにアクセスできずVerticaが起動できない場合の対処方法

データベースホストがまだ稼働中で、電源がオンの場合、Verticaストレージにアクセスできなくなることがあります。Verticaが停止している可能性があり、データやカタログのディスクボリュームが使用できません。 次のチェックリストを使用してトラブルシューティングを行います。 ステップ タスク 結果 1 ホストが再起動した際に、ファイルシステムの自動マウントに問題はありましたか? 問題があった場合、 手動でそれらをマウントします。 /etc/fstabにエントリーを追加して、次のサーバーのブート時に自動マウントするようにします。 Linuxのログで、ディスクの障害やエラーの可能性があるかどうかを確認します 問題がなかった場合、Step 2 へ。 2 ハードウェアまたはディスクの障害のために、1つあるいは複数のディスクがオフラインになっていますか? 該当する場合、RAIDセットとハードウェアコントローラーでエラーとメッセージを確認してください。複数のディスク障害がないかチェックします。 該当しない場合、Step 3 へ。 3 ハードウェアのRAID保護の設定、たとえば、RAID 0となっているなど、設定が間違っていないですか? 間違っている場合、ハードウェアRAID保護の設定、RAID 1+0またはRAID 5を使用してRAIDセットを再構築します。 障害が発生したホストのデータは、K-safetyがデータベースに設定されている場合にのみ、別のVerticaホストから復元できます。 間違っていない場合、Step 4 へ。 4 ディスクボリュームの修復が必要ですか? 修復が必要な場合、fsckを実行します。 fsck /dev/sta2修復が必要でない場合、Step 5 へ。 5 もっと情報が必要ですか? 情報が必要な場合、 追加情報とエラーについては、/var/log/messagesを確認します。 製造元のディスクユーティリティ(ハードウェアコンソール)をチェックして、構成を検証し、エラーを確認します。 情報が必要でない場合、Step 6 へ。 6 ディスクエラーのためにディスクが「読み取り専用」でマウントされていますか? 該当する場合、システム上のデータ保護が有効化されています。Verticaは、読み取り専用のファイルシステムを持つホスト上では起動せず、失敗します。 ディスクを再マウントするか、読み取り専用のファイルシステムの原因となるエラーを修正してください。 該当しない場合、Verticaテクニカルサポートまでお問い合わせください。 関連詳細情報 Vertica Documentation の Restart Vertica on a Node をご参照ください。

Verticaホストが遅延している場合の対処方法

状況: 応答時間が遅い サーバーはCPU負荷が高く、プロセスが多すぎる ホストがハングしている可能性があり、接続できない、あるいは、接続がハングしている可能性がある 考えうる原因: 利用可能なメモリが少ない 実行中の同時実行プロセスが多すぎる サーバーがスワップ領域を積極的に使用している 次のチェックリストを使用してトラブルシューティングを行います。 ステップ タスク 結果 1 Vertica以外のプロセスまたはアプリケーションが実行されているかどうかを確認します。 ホストの稼働時間を確認するか、Linuxコマンドを使用して、プロセス、メモリ使用量、およびCPU稼動のリストを表示します。 $ uptime 09:08:14 up 236 days, 8:58, 13 users, load average: 1.72, 2.09, 2.35uptimeの結果は、それぞれ直近の1分、5分、15分のシステムの平均負荷を表しています。 サーバーのアクティビティを表示するその他のコマンドは次の通りです。 top, vmstat, iostat, sar, netstat top コマンドは、詳細なリアルタイムCPUアクティビティ、メモリ使用量、および現在実行中のプロセスの詳細を表示します。 Vertica以外のプロセスまたはアプリケーションが実行されている場合、可能であれば、それらのプロセスを他のサーバーに移動してVerticaデータベースホストへの影響を減らします。 該当しない場合、Step 2 へ。 2 メモリが少なく、スワップ領域を積極的に使用しているかどうかを確認します。 2a. top コマンドを使用して、使用可能なメモリーとスワップ領域の使用状況を表示します。top コマンドの出力により、メモリの使用状況が、以下のように表示されます。 Mem: 16333388k total, 4355992k used, 11977396k free, 402180k buffers […]

Verticaホストが停止している場合の対処方法

このチェックリストでは、Verticaクラスターの各メンバーをホストと呼びます。 Verticaプロセスは、クラスター内の他のVerticaノードと通信します。ノードは、Verticaデータベースソフトウェアを参照します。 ホストが停止している場合、次のチェックリストを使用してトラブルシューティングを行います。 ステップ タスク 結果 1 Verticaホストが物理的に電源オフになっています。サーバーの前面に、ステータスのライトがありません。 電源まわりで問題はありますか? チェックすべき項目: サーバーの電源障害 UPSバッテリー サイトの停電 ラックの電源障害 問題がある場合、各コンポーネントの障害のためにホストの電源がオフになっていることが考えられます。 Step 2 に進み、停電の原因を特定し、解決します。 2 次のようなハードウェアのいずれかが原因でホストがクラッシュしましたか? ディスク、あるいは、ディスクコントローラー CPU メモリ ネットワークアダプター 該当する場合、問題を解決してください。その後、Step 6 へ。 該当しない場合、Step 3 へ。 3 次のいずれかが原因の温度過熱の問題はありますか? 十分な冷却が足りておらず、その結果、高温状態でシャットダウンが発生 データセンターの冷却問題 ラック内の通気不足 換気の遮断 問題がある場合、該当する問題を解決してください。その後、Step 6 へ。 問題がない場合、Step 4 へ。 4 既知のクラッシュまたは障害の時刻を記録します。 Step 5で記録された障害の発生時刻を使用して、クラッシュの根本原因を特定します。 5 Step 4の障害発生時刻を使用して、次のログファイルで障害とその原因を確認します。 ブートログ:/var/log/boot.log Linuxログ(通常のシャットダウン、電源オフ、ハードウェア障害):/var/log/messages ハードウェアコンソールログ。コンソールマネージャーからログを確認 これらのログファイルの情報は、障害の原因を特定するのに役立ちます。 将来の障害を防ぐために、問題解決にあたります。 問題を解決するために必要に応じてVerticaテクニカルサポートまでお問い合わせください。 6 […]

Verticaホストのネットワーク接続に問題がある場合の対処方法

Verticaホストが物理的に利用できる状態で、オペレーティングシステムが稼動していてもVerticaクラスターへのネットワークにアクセスできない場合、外部からの接続ができないことが考えられます。クライアントソフトウェアの接続に失敗し、ユーザーはホストにsshできない状態で、ハードウェアのコンソール接続を確認すると、ホストがまだ稼動しているように見えるとします。 次のような問題が存在しないかどうかを確認します。 ステップ タスク 結果 1 NETWORK_USAGEシステムテーブルから初期情報を取得します。 => SELECT * FROM NETWORK_USAGE; NETWORK_USAGEシステムテーブルに関する情報を取得したら、Step 2 に進みます。 2 デフォルトのルートIPアドレス、あるいは、ルーターアドレスにpingする際に問題がありますか? 問題がある場合、ルーターはダウンしているか、または誤って構成されています。ルーターを再設定して再起動してください。該当システムのネットワーク管理者に相談してください。 問題がない場合、Step 3 へ。 3 ファイアウォールが有効になっていますか? 有効になっている場合、ファイアウォールを無効にして設定を確認し、Step 4 へ。 有効になっていない場合、設定を確認し、Step 4 へ。 4 ファイアウォール設定がVerticaが必要とするポートをブロックしていますか? デフォルト設定を使用し、Verticaが必要とするポートをブロックすると、ファイアウォールの問題が発生することがあります。 ポートをブロックしている場合、次のコマンドを実行してファイアウォールを無効にします。 $ service iptables save $ service iptables stop $ chkconfig iptables off $ service ip6tables save $ service ip6tables stop $ chkconfig ip6tables off ポートをブロックしていない場合、Step […]

クエリの性能が突然低下した場合の対処方法

以前は高速で実行されていたクエリの実行が遅くなり始めましたか? 次のチェックリストを使用し、以前は高速に実行されていたクエリの性能が突然低下した原因を調べます。 ステップ タスク 結果 1 次のコマンドを使用して、ホストのエラーメッセージを確認します。 $ cat /var/log/messages$ dmesg クラスターの状態が良好な場合、Step 2 へ。 クラスターの状態が良好でなく、エラーメッセージが表示された場合は、システム管理者に連絡してください。 サーバー上の問題が解決したら、このチェックリストの次に進みます。 2 次のコマンドを使用して、全ノードのクラスター設定を確認します。 $ /opt/vertica/oss/python/bin/python -m vertica.local_verify クラスターが正しく設定されていない場合、Verticaのドキュメントへのリンクを含むエラーのリストを表示します。リンク先の内容を確認し、エラーを解決してください。 エラーが表示されない場合、Step 3 へ。 3 クラスターの状態を確認する前にデータベースを停止します。 $ /opt/vertica/bin/admintools -t stop_db -d db-name データベースが正常に停止した場合、Step 4 へ。 それ以外の場合、検証スクリプトを実行し、Step 4 へ。 4 クラスターのネットワーク性能を確認します。 $ /opt/vertica/bin/vnetperf ディスクI/Oの性能を確認します。 $ /opt/vertica/bin/vioperf CPUの性能を確認します。 $ /opt/vertica/bin/vcpuperf ネットワーク、ディスクI/O、およびCPUのパフォーマンスの結果を調べ、Step 5 へ。 5 EXPLAINステートメント、あるいは、EXPLAIN LOCAL VERBOSEステートメントを実行し、出力結果を確認し、異常である可能性がある点を特定します。=> EXPLAIN <query> => […]

データベースの性能が劣化した場合の対処方法

データベースの性能が劣化した場合に、次のチェックリストを使用してトラブルシューティングを行います。 次のような問題が存在しないかどうか確認します。 ステップ タスク 結果 1 特定のクエリの性能が遅いですか? 特定のクエリの性能が遅い場合、クエリの性能が突然低下した場合の対処方法チェックリストを参照します。 特定のクエリの性能が遅くない場合、 Step 2 へ。 2 データベース全体が遅いですか? データベース全体が遅い場合、 Step 3 へ。 データベース全体が遅くない場合、このチェックリストは完了です。 3 全ノードが起動しているかどうか確認します。 => SELECT node_name, node_address, node_state FROM nodes WHERE node_state != ‘UP’; 任意のノードがDOWNしている場合、 ノードが停止している理由について調査を行うために、 データベースノードが停止した場合の対処方法チェックリストを確認します。 ノードを再起動します。 $ admintools –t restart_nodes –d <database> -s <nodes_address> ノードが再起動し、性能が向上した場合、このチェックリストは完了です。 ノードが再起動したが、性能がまだ遅い場合、 Step 4 へ。 ノードが再起動しない場合、 データベースノードが停止した場合の対処方法チェックリストへ。 4 大量のデリートベクターが存在していないかどうか確認します。 => SELECT count(*) FROM delete vectors; 1000以上のデリートベクターが存在する場合、デリートベクターの対処方法チェックリストを参照します。 それほど多くのデリートベクターが存在しない場合、Step 5 へ。 5 […]

カタログサイズ肥大化の対処方法

カタログのサイズが10 GBを超えるか、あるいは、カタログが1日あたり5%を超えて増えている場合、カタログサイズが大きいといえます。 このチェックリストには、データベースカタログのサイズをモニタリング、および、削減するための提案事項と推奨事項が記載されています。 データベースカタログには、テーブル、プロジェクション、ユーザー、ノード、ROSなどのメタデータが含まれています。 カタログには2種類のオブジェクトがあります。 グローバルオブジェクトはノード固有ではありません。グローバルオブジェクトには、テーブル、ユーザー、およびノードが含まれます。 どのノードもクエリのイニシエーター(実行開始ノード)として機能するため、グローバルオブジェクトは全ノード上に完全に複製されます。 ローカルオブジェクトはノード固有です。ローカルオブジェクトには、ROS、WOS、および依存関係が含まれます。ストレージレイアウトはノードによって異なる可能性があるため、ローカルオブジェクトは複製されません。各ノード上で、独立してストレージレイアウト情報を保持します。 ノード起動時に、カタログはメモリ上にキャッシュされます。 ステップ タスク 結果 1 全ノード上のカタログサイズを確認します。 => SELECT DATE_TRUNC(‘day’,ts) date, node_name, MAX(catalog_size_in_MB)::int as END_CATLOG_SIZE_MEM_MB FROM ( SELECT node_name, TRUNC((dc_allocation_pool_statistics_by_second.”time”) ::TIMESTAMP,’SS’::VARCHAR(2)) AS ts, SUM((dc_allocation_pool_statistics_by_second. total_memory_max_value – dc_allocation_pool_statistics_by_second. free_memory_min_value))/ (1024*1024) AS catalog_size_in_MB FROM dc_allocation_pool_statistics_by_second GROUP BY 1, TRUNC((dc_allocation_pool_statistics_by_second.”time”) ::TIMESTAMP,’SS’::VARCHAR(2)) ) foo GROUP BY 1,2 ORDER BY 1 DESC, 2; […]

デリートベクターの対処方法

デリートベクターを手動で削除するか、または自動的に削除されないためにトラブルシューティングをする場合、このチェックリストが役立ちます。 ステップ タスク 結果 1 プロジェクションでデリートベクターが多すぎるかどうか(100以上を目安)を確認します。 =>SELECT node_name, schema_name, projection_name, COUNT(*) num_dv, SUM(deleted_row_count) del_cnt, SUM(used_bytes) ubytes, MIN(start_epoch) min_epoch, MAX(start_epoch) max_epoch FROM delete_vectors GROUP BY 1,2,3 ORDER BY 4 DESC; num_dvで確認できるデリートベクターの数が多すぎる(通常、100以上が目安)場合、Step 2 へ。 2 AHMやLGEが進んでいるかどうかを確認します。 =>SELECT current_epoch,ahm_epoch,last_good_epoch FROM system; AHMやLGEが進んでいる場合、Step 3 へ。 AHMやLGEが進んでいない場合、AHM(Ancient History Mark)が進まない場合の対処方法チェックリストを参照してください。 3 Step 1 で確認したエポックの番号が、Step 2 で確認したAHMよりも古いかどうかを確認します。 Step 1 のクエリの結果により、デリートされた行がAHMより新しいエポックの場合、Step 4 へ。 4 下記のコマンドでAHMを進めます。 =>SELECT make_ahm_now(); 次のいずれかを実行します。 […]

データベースプロセスが起動していない場合の対処方法

データベースプロセスが起動していない場合、次のチェックリストを使用してトラブルシューティングを行います。 ステップ タスク 結果 1 Verticaプロセスがどのノード上でも実行されていないことを確認します。 $ ps –ef | grep vertica Verticaプロセスは次のように表示されます。 /opt/vertica/bin/vertica -D <catalog directory> -C <dbname> -n <node name> -h <host IP> -p <port> 続いてのチェックを行う前に、dbadminがカタログディレクトリとデータディレクトリの所有者であり、そのディレクトリにアクセスできることを確認してください。 Verticaプロセスがどのノード上でも実行されていない場合、Step 2 へ。 2 前回のデータベースの停止が正常に実行されたかどうかを確認します。 $ cat <catalog directory of each node>/epoch.log Epoch.logが存在する場合、データベース停止が正常に実行されたことが分かります。 データベース停止が正常に実行されていた場合、Step 3 へ。 データベース停止が正常に実施されていなかった場合、Verticaデータベースをまず起動してみてください。状態によっては、Last Good Epoch(LGE)でVerticaを起動するようにプロンプトが表示されることがあります。 3 パスワードなしのSSH接続で、すべてのノードが互いに接続可能であることを確認します。 $ ssh <IP address of the node> 接続が成功すると、最後のログインの日付、曜日、タイムスタンプ等の詳細情報が表示されます。 […]

データベースノードが停止した場合の対処方法

データベースノードが停止している場合、次のチェックリストを使用してトラブルシューティングを行います。 ステップ タスク 結果 1 データベースがUPの状態であるかどうか確認します。 $ admintools -t db_status -s UP データベースがUPの状態である場合、Step 2 へ。 データベースがUPの状態でない場合、データベースを再起動します。 $ admintools -t start_db -d <Database_name> -p <Database_password> データベースが起動すれば、このチェックリストは完了です。 データベースが起動しない場合、データベースプロセスが起動していない場合の対処方法チェックリストを参照してください。 2 DOWNしている全てのノードを確認します。 => SELECT node_name, node_address, node_state FROM nodes WHERE node_state = ‘DOWN’; DOWNしている全てのノードを確認し、Step 3 へ。 3 DOWNしているノードとSSH接続可能かどうか確認します。 $ ssh dbadmin @<nodedown_ip> ノードにSSH接続できる場合、DOWNしているノード上のVerticaプロセスを再起動します。 $ admintools -t restart_node -d <database_name> -s <node_host_name […]

Query Tuning with Vertica: Dos and Don’ts

This blog post was authored by Eugenia Moreno. Query tuning in Vertica is not an exact science. Recommendations differ based on your database. This document assumes that all nodes in the cluster are UP, your Vertica configuration is ok, and that v*perf tools have been executed. The following diagram shows the query flow in Vertica: […]

Saving an Apache Spark DataFrame to a Vertica Table

Before you save an Apache Spark DataFrame to a Vertica table, make sure that you have the following setup: • Vertica cluster • Spark cluster • HDFS cluster. The Vertica Spark connector uses HDFS as an intermediate storage before it writes the DataFrame to Vertica. This checklist identifies potential problems you might encounter when using […]

Why is Vertica not Ingesting Data From Kafka?

Prerequisite: Verify that Vertica is up and running. If you want to troubleshoot why Vertica is not ingesting data from Kafka, follow this checklist. Step Task Results 1 Check whether Kafka is up and running. a. Examine the server log files for broker errors: $ BROKER_ROOT/logs/server.log If there are errors, consult the Kafka documentation. b. […]

Rebalance Taking a Long Time

After you add a node to your Vertica cluster or remove a node from your cluster, Vertica rebalances the data across all the nodes. If rebalancing is taking a long time, review these steps to find out the probable cause. Pre-Requisites To ensure a successful rebalance of your cluster, before you start the rebalance, take […]

Storage not Accessible and Vertica Fails to Start on Host

When the database host is still up and available and the power is on, Vertica storage may be inaccessible. Vertica may be down and the disk volumes /data and /catalog are unavailable. To troubleshoot, follow this checklist: Step Task Results 1 When the host rebooted, was there a problem auto-mounting the file systems? If yes, […]

Why is the Vertica Host Slow?

Symptoms: Sluggish response time Server is CPU-bound with a high load and too many processes. The host is possibly in a hung state, unable to make connections, or connections are hanging. Possible causes: Low available memory Too many concurrent active processes running Server actively using swap space To troubleshoot, follow this checklist: Step Task Results […]

What Should I do When Vertica Host is Down?

Each member of a Vertica cluster is referred to as a host throughout this checklist. The Vertica process communicates to other Vertica nodes in a cluster. Node refers to the Vertica database software. If your host is down, follow this checklist: Step Task Results 1 The Vertica host is physically powered off. No status lights […]

Debug Vertica Host Network Connectivity

If your host is physically available and the operating system is up and running, but you cannot access your network to the Vertica cluster, then external connections might not be possible. Client software connections are failing, and users are not able to ssh to the host. From observing a hardware console connection, it looks like […]

Upgrading Vertica

Vertica adds new features and enhances existing features with every release. To access the new and improved features, upgrade to the latest release of Vertica. Prerequisites Before you upgrade, perform the following: Perform a full database backup. If upgrade is unsuccessful, a full back up allows you to roll back to your current version. You […]

Expanding Your Cluster with New Nodes

Does your business need more storage? Consider adding nodes to your database if you need more storage. Vertica recommends that you add all new nodes at the same time. You can use this checklist to add nodes. These are basic steps. There are additional options discussed in the Vertica documentation. Step Task Results 1 Perform […]

What Should I do if the Ancient History Mark is not Advancing?

If the AHM is not advancing, troubleshoot using this checklist: Step Task Results 1 Check whether the Last Good Epoch (LGE) is advancing. => SELECT CURRENT_EPOCH, LAST_GOOD_EPOCH, AHM_EPOCH FROM SYSTEM; If the LGE is advancing, go to Step 2. If the LGE is not advancing, go to Step 5. 2 Check whether all nodes are […]

What Should I do to Shut Down Vertica Node for Maintenance?

If you need to shut down a Vertica node for maintenance, follow this checklist. Step Task Results 1 Verify that all cluster nodes are UP. $ /opt/vertica/bin/admintools -t view_cluster To avoid a long node recovery time after shutdown, if one or more nodes is DOWN, identify and restart them using the instructions in Restarting Vertica […]

What Should I do if my Node Recovery is Slow?

If you are running Vertica 7.2.x or later, perform recovery by table. For details, see Recovery By Table in the Vertica documentation. If you are running a Vertica version prior to 7.1.x, stop the ETL jobs and restart node recovery. Step Task Results 1 Monitor progress of recovery: => SELECT node_name, is_running FROM RECOVERY_STATUS; If […]

Why Does My Node Not Connect to Spread?

If your node is not connected to spread, follow the below steps to troubleshoot the connection issue. Step Task Results 1 Check whether the spread.conf file in the catalog folder is identical across all the nodes in the cluster. $ cat spread.conf If the spread.conf flies are identical across all nodes, go to Step 2. […]

Investigate Sudden Performance Degradation of a Query

Have you ever had a formerly fast-running query begin to run slowly? Use the following checklist to investigate the sudden slowdown of a formerly fast-running query. Step Task Results 1 Check for host error messages using the following commands: $ cat /var/log/messages $ dmesg If the cluster is in good health, go to Step 2. […]

What Should I do if the Database Performance is Slow?

Troubleshootusing the following checklist if your database performance is slow. Check if any of the following problems exist: Step Task Results 1 Is the query performance slow? If the query performance is slow, review the Query Performance checklist. If the query performance is not slow, go to Step 2. 2 Is the entire database slow? […]

Catalog Size Debugging

You have a large catalog if your catalog size is more than 10 GB or your catalog is changing more than 5% per day. This checklist gives you some suggestions and recommendations to monitor and reduce your database catalog size. The database catalog contains tractable metadata such as tables, projections, users, nodes, ROSes, and so […]

How Do I Manage Delete Vectors?

If you want to remove delete vectors manually or troubleshoot why they are not being removed automatically, follow this checklist. Step Task Results 1 Check if you have too many delete vectors (more than 100) in a projection. =>SELECT node_name, schema_name, projection_name, COUNT(*) num_dv, SUM(deleted_row_count) del_cnt, SUM(used_bytes) ubytes, MIN(start_epoch) min_epoch, MAX(start_epoch) max_epoch FROM delete_vectors GROUP […]

What Should I do When the Database Process is not Starting?

If you want to troubleshoot why the database process is not starting, follow this checklist. Step Task Results 1 Ensure Vertica is not already running on any node. $ ps –ef | grep vertica The Vertica process displays as follows: /opt/vertica/bin/vertica -D <catalog directory> -C <dbname> -n <node name> -h <host IP> -p <port> Prior […]

What Should I do When the Database Node is DOWN?

When database node is DOWN, troubleshoot using the following checklist. Step Task Results 1 Check whether your database is UP. $ admintools -t db_status -s UP If the database is UP, go to Step 2. If the database if not UP, restart your database. $ admintools -t start_db -d <Database_name> -p <Database_password> If the database […]

Understanding Vertica Query Budgets

This blog post was authored by Shrirang Kamat. The purpose of this document is to explain how the query budget of a resource pool used by the query can influence the initial memory acquisition for a query and how it impacts query performance. For more details about how we compute the query budget, see the […]

How to Code Vertica UDx

This blog post was authored by Ding-Qiang Liu. In analytic businesses supported by Vertica, complex processing logic is sometimes unavoidable. Using ANSI SQL might cause query strings to be much longer, and will slow the query with a huge volume data to query. If using Vertica SDKs, you can encapsulate that general computing logic in […]

Time Series Analysis with Vertica

This blog post was authored by Maurizio Felici. Time series analytics evaluate the values of a given set of variables over time and group those values into a window, based on a time interval for analysis and aggregation. There are a few types of functions that help perform time series analytics: • Event based functions […]

Load Balancing Options

This blog post was authored by Soniya Shah. Connection load balancing automatically spreads the overhead of client connections across the cluster by redirecting connections. Each client connection a host in your Vertica cluster requires memory and processor time. If a lot of clients connect to a single host, this can affect database performance. The initiator […]

Can you tell us about your data lake?

It’s the fourth round of Vertica product management surveys and we have really appreciated getting your feedback! In this survey, we want to know all about your data lake. We want to know what tools you use, how much data is in your lake, and the types of workloads you are running. We are hoping […]

Make data analysis easier with dimensionality reduction

This blog post was authored by Anh Le. Introduction As the number of features in your data set grows, it becomes harder to work with. Visualizing 2D or 3D data is straightforward, but for higher dimensions you can only select a subset of two or three features to plot at a time, or turn to […]

Machine Learning Key Terms

This blog post was authored by Soniya Shah. Machine learning seems to be everywhere these days – in the online recommendations you get on Netflix, the self-driving cars that hyped in the media, and in serious cases, like fraud detection. Data is a huge part of machine learning, and so are the key terms. Unless […]

What

This blog post was authored by Lisa Donaghue. In Vertica 9.1, Management Console (MC) introduces improved monitoring for external tables and HCatalog data, and the ability to revive Eon Mode databases in a new cluster. MC also includes management and monitoring capabilities for databases running in Eon Mode, a database mode previously in beta. Eon […]

Vertica Quick Tip: Replacing an Empty String

This blog post was authored by Jim Knicely. In Vertica an empty string is not treated as a NULL value, so an empty string does equal an empty string. Example: dbadmin=> SELECT ”” = ”” “Are they equal?”; Are they equal? —————– t (1 row) The function REPLACE replaces all occurrences of characters in a […]

What’s New in Vertica 9.1: Python SDK Expands

This blog post was authored by Monica Cellio. Using the Vertica SDK, you can write several types of user-defined extensions (UDxs) to add your own customizations. In a previous release the SDK added Python support for one type, scalar functions (UDSFs). In 9.1 we added Python support for transform functions (UDTFs). User-Defined Transform Functions (UDTFs) […]

Vertica Quick Tip: Empty String Vs. NULL

This blog post was authored by Jim Knicely. An empty string (”) is treated as a NULL value in Oracle, while in Vertica an empty string is not treated as a NULL value. So when using Vertica, if you want to indicate that a column value is unknown, be sure to use NULL and not […]

Vertica Quick Tip: Which Column in my Flattened Table caused the “Subquery used as an expression returned more than one row” Error?

This blog post was authored by Jim Knicely. A flattened table contains columns that are derived from query-expressions that must return only one row and column value, or none. If a column’s query-expression does return more than one row you will get a “Subquery used as an expression returned more than one row” error. If […]

Announcing Vertica 9.x Query Performance Tuning Training

This blog post was authored by Drea Brandford. You have had Vertica up and running in production for a while now; but you’re noticing that, while your analytic queries are executing far faster than they would in a traditional row-store database, they are still not performing quite as fast as you need them to. Chances […]

Reusing EBS Volumes

This blog post was authored by Mark Hayden. You can deploy a Vertica cluster running in Eon or Enterprise mode using EC2 instances that support EBS volumes. This document provides the steps you need to use EBS volumes that were preserved from a Vertica cluster that was terminated, and how to reattach them to revive […]

Vertica Quick Tip: Tuple Mover (TM) Assigned More Resources in Vertica 9.1

This blog post was authored by Jim Knicely. In version 9.1 and later, Vertica increases the default amount of memory assigned to the Tuple Mover (TM) resource pool from 200MB to 5% of available memory to improve performance merging out wide tables. If your General resource pool is defined by a percentage, the Tuple Mover […]

Vertica in Eon Mode: Revive

This blog post was authored by Soniya Shah. Overview An Eon Mode database keeps an updated version of its data and metadata in a communal storage location. After you shut down the database, the data continues to reside in communal storage. When you are ready to use the storage again, you can revive the database […]

Vertica Quick Tip: Monitoring the Ancient History Mark

This blog post was authored by Jim Knicely. Also known as AHM, the ancient history mark is the oldest epoch whose data is accessible to historical queries. Any data that precedes the AHM is eligible to be purged. If a cluster nodes is down or if a database contains projections that need to be refreshed, […]

Vertica Quick Tip: Determining Table Columns that Contain NULL Values

This blog post was authored by Jim Knicely. A client recently asked if we had a Vertica function that could return a list of columns from a table which contain NULL values. There aren’t any built-in functions for that, but there is always a fast solution available in Vertica! Here’s one! Example: dbadmin=> SELECT * […]

Why All Column Stores Are Not the Same

This blog post was authored by Steve Sarsfield. Critical Capabilities for Big Data Databases Imitation is the sincerest form of flattery, but when imitators try to match brilliant engineering against uninspiring execution in order to meet the needs of a bullet on a PowerPoint slide, the imitation tends to be more of a parody, or […]

UPDATE: Vertica Test Results with Microcode Patches for the Meltdown and Spectre Security Flaws

This blog post was authored by Michael Leuchtenburg. Introduction Vertica engineers tested database performance with and without Meltdown and Spectre mitigations on both Haswell and Skylake microprocessors. As compared to no mitigation, with the fastest settings, we found a minimal, average 1% slowdown on Haswell, and an average 25% slowdown on Skylake. These results are […]

Vertica Quick Tip: Getting Better Performance via External Table Row Counts

This blog post was authored by Jim Knicely. The Vertica optimizer uses its own internal logic to determine whether to join one table to another as an inner or outer input. When joining external tables, you can help out the optimizer immensely by letting it know the exact row counts of those external tables! For […]

Understanding Vertica Eon Mode

This blog post was authored by Soniya Shah. This blog post is updated as of July 2018. With Eon mode, Vertica is a database that takes advantage of all elastic compute and scalable storage capabilities in the cloud. In Eon mode, Vertica can take advantage of cloud economics – it can be made to scale […]

Vertica Quick Tip: Keeping DbVisualizer Alive

This blog post was authored by Jim Knicely. Clients often use DbVisualizer as their preferred GUI tool to query Vertica. Developers and DBAs tend to leave DbVisualizer open on their desktop while they are off doing some other activity, only to return to find out that DbVisualizer has disconnected from Vertica. Most likely the Database […]

Vertica Quick Tip: Remove Duplicate Values from a String

This blog post was authored by Jim Knicely. Ever wonder how to get rid of those pesky duplicate values from a string? One way of doing that is via the REGEXP_REPLACE function. Example: dbadmin=> SELECT c1, REGEXP_REPLACE(c1, ‘(?

Introducing the Eon Mode Concept

This blog post was authored by Joy King. Vertica was born at a tipping point in the world of Enterprise Data Warehouses. It was designed from the first line of code to address new levels of data volumes and analytical performance. But it was also designed to break the tightly integrated hardware and software appliances […]

Protected: Japanese Checklist Test

There is no excerpt because this is a protected post.

Vertica Quick Tip: When Modifying a SEARCH_PATH, Don’t Forget to Include PUBLIC

This blog post was authored by Jim Knicely. Each user session has a search path of schemas. Vertica uses this search path to find tables and user-defined functions (UDFs) that are unqualified by their schema name. You can change a session’s search path at any time by calling SET SEARCH_PATH. When you modify a search […]

Sizing Your Vertica Cluster for an Eon Mode Database

This blog post was authored by Shrirang Kamat. Vertica in Eon Mode is a new architecture that separates compute and storage, allowing users to take advantage of cloud economics that enable rapid scaling and shrinking of clusters in response to a variable workload. Eon Mode decouples the cluster size from the data volume and lets […]

Vertica Quick Tip: Use the Overlay Function to Easily Insert a String Into Another String

This blog post was authored by Jim Knicely. The OVERLAY function returns a VARCHAR value representing a string having had a substring replaced by another string. Example: dbadmin=> SELECT OVERLAY(‘See ya @ the Deep Dive Meetup tomorrow in St. Louis, MO!’ PLACING ‘ Vertica’ FROM 13 FOR 0) AS “Awesome Event on 05/15/2018”; Awesome Event […]

What’s New in Vertica 9.1: Precision-Recall Curve and F1-Score Machine Learning Evaluation Functions

This blog post was authored by Ginger Ni. The precision-recall curve is a measure for evaluating binary classifiers. It is a basic measure derived from the confusion matrix. In Vertica 9.1, we provide a new machine learning evaluation function PRC() for calculating precision and recall values from the results of binary classifiers. Along with the […]

Unlock Data Analytics for Dynamic Workloads with Vertica 9.1

This blog post was authored by Sanjay Baronia. Today, cloud infrastructure has made it easier for organizations to consume services and deploy business applications with a pay-as-you-go, OPEX model. This provides a number of incentives to move data to the cloud, especially for variable workloads and use cases that require heavy compute for finite periods […]

Vertica Quick Tip: Who am I?

This blog post was authored by Jim Knicely. There are several ways to figure out who is the session user. This comes in handy for logging. Example: [dbadmin@s18384357 ~]$ vsql -U jim Welcome to vsql, the Vertica Analytic Database interactive terminal. Type: \h or \? for help with vsql commands \g or terminate with semicolon […]

Vertica Quick Tip: Eon Mode

This blog post was authored by Jim Knicely. You can now deploy a Vertica cluster with an Eon Mode database instead of an Enterprise mode database. In Enterprise Mode, the original Vertica “share nothing” configuration, each database node stores a portion of the data and performs a portion of the computation. In Eon Mode, computational […]

Vertica Quick Tip: Using vsql Almost Anywhere

This blog post was authored by Jim Knicely. I’m an old school kind of guy so I love our vsql command line interface (CLI) tool. What’s cool is if I am on a remote Linux host, I can copy the vsql binary from the database host and run then run it locally. Example: [dbadmin@s18384357 ~]$ […]

Vertica Quick Tip: User Defined Functions to the Rescue

This blog post was authored by Jim Knicely. A friend recently asked me why we don’t have certain functions available in Vertica that are offered in databases like MySQL. I shrugged it off as we have bigger fish to fry. But he kept pushing so I showed him how easy it is to implement a […]

How do you use Vertica analytics and machine learning today?

One of the coolest aspects of Vertica is its in-database machine learning functionality. As part of our third round of product management surveys, we’re asking you to take the time to answer how you use Vertica analytics today. We’re hoping to get answers we can use to expand upon our growing analytics and machine learning […]

Vertica in Eon Mode Paper Accepted for ACM SIGMOD/PODS Conference

Every year, the ACM SIGMOD/PODS conference is held as a forum for database researchers, developers, and users to explore innovations in the field and exchange ideas. The conference is an opportunity for thought leaders to talk with each other and hear from one another through presentations, workshops, and tutorials. This year, Ben Vandiver, Shreya Prasad, […]

Vertica in Eon Mode: Tables, Parameters, Functions, and Useful Queries

This blog post was authored by Rakesh Bankula. The purpose of this document is to list new system tables, configuration parameters and meta-functions that were added for Vertica Eon mode. This document also has sample queries you can run on the new system tables to get up to speed with Eon mode. This document assumes […]

Vertica Quick Tip: Get Rid of all Those Q’s

This blog post was authored by Jim Knicely. If you guys are like me I prefer the CLI. So admintools is my friend. However, at many client sites I see admintools is kind of ugly. So to get it looking a little better: Run this in Linux: export NCURSES_NO_UTF8_ACS=1 And add the command to your […]

Vertica Quick Tip: Forget the Column Name of a System Table?

This blog post was authored by Jim Knicely. If you are like me I am constantly forgetting the column names from our system tables. Use this simple trick to get the column names. Example: dbadmin=> SELECT * FROM v_catalog.tables limit 0; table_schema_id | table_schema | table_id | table_name | owner_id | owner_name | is_temp_table | […]

Vertica Quick Tip: Capitalize Only the First Letter of Each Word

This blog post was authored by Jim Knicely. A few years ago when I was an Oracle DBA a client thought they’d stump me by asking how they could capitalize only the first letter of each word for a given input. I immediately said try the initcap function. She was amazed. Guess what, Vertica has […]

Constantly Inspired #StandUp #LiftUp

This blog post was authored by Crystal North. Wow. I have spent the last month reflecting every single work day as to why I love my job. I started this personal focus because I had read one too many status updates on LinkedIn that just seemed so unauthentic, so negative, or so salesy. I wanted […]

Running an Eon Mode Database: Live in Vertica 9.1

This blog post was authored by Sarah Lemaire. Before now, you could only operate your Vertica database in Enterprise Mode, the traditional Vertica architecture where your data is distributed across the local nodes. Now, Vertica 9.1, released in April, 2018, allows you to operate your database in Eon Mode, which was previously released as beta […]

What’s New in Vertica 9.1?

This blog post was authored by Soniya Shah. In Vertica 9.1 we introduce new functionality including: • Eon Mode, now available in production environments • Machine Learning Enhancements • Management Console Updates • Voltage SecureData Integration • Python UDTF • AWS Licensing Updates • Security Updates • Upgrade and Installation Changes • S3 Session Parameter […]

Understanding the Vertica Query Optimizer

This blog post was authored by Soniya Shah. The Vertica query optimizer uses statistics about the data to create a query plan, which contains a set of operations to compute the requested result. Depending on the properties of the projections defined in your database, the optimizer can choose faster and more efficient operations. This is […]

Upgrading Vertica

This blog post was authored by Soniya Shah. With each release, Vertica adds new features and enhancements. To access these new features, you must upgrade your database. Before performing an upgrade, you must shut down your database. Upgrade Path You should always upgrade only to the immediate next version. Upgrading to a version that is […]

Using Vertica Machine Learning to Analyze Smart Meter Data

This blog post was authored by Soniya Shah. Machine learning and data science have the potential to transform businesses because of their ability to deliver non-obvious, valuable insights from massive amounts of data. However, many data scientist’s workflows are hindered by computational constraints, especially when working with very large data sets. While most real-world data […]

Vertica Quick Tip: Check if a Date Range Overlaps another Date Range

This blog post was authored by Jim Knicely. The OVERLAPS Vertica built-in function evaluates two time periods and returns true when they overlap, false otherwise. Example: dbadmin=> SELECT (DATE ‘2018-04-18’, DATE ‘2018-04-21’) OVERLAPS (DATE ‘2018-04-16’, DATE ‘2018-04-19’); overlaps ———- t (1 row) dbadmin=> SELECT (DATE ‘2018-04-18’, DATE ‘2018-04-21’) OVERLAPS (DATE ‘2018-04-22’, DATE ‘2018-04-25’); overlaps ———- […]

Vertica Quick Tip: Getting the Root of a Number

This blog post was authored by Jim Knicely. You can find the square root of a number with the |/ operator and the cube root of a number with the ||/ operator. Example: dbadmin=> SELECT |/25 square_root_of_25, dbadmin-> ||/27 cube_root_of_27; square_root_of_25 | cube_root_of_27 ——————-+—————– 5 | 3 (1 row) Have Fun!

Handling Duplicate Records in Input Data Streams

This blog post was authored by Ravi Gupta. We have often found that sources or operational systems that provide data for further analysis have duplicate records and these are sent to a downstream application or EDW for processing. This post shows a few scenarios of how to handle these duplicate records using various SQL options, […]

Vertica Quick Tip: Using a Preferred Editor in VSQL

This blog post was authored by Jim Knicely. The vsql \e and \edit meta-commands let you edit the query buffer (or specified file) with an external editor. The default editor on Linux is vi and notepad.exe on Windows systems. Because vsql searches the environment variables VSQL_EDITOR, EDITOR, and VISUAL (in that order) for an editor […]

Vertica Quick Tip: Extract Time from a Date

This blog post was authored by Jim Knicely. There are multiple ways to extract just the time from date in Vertica. Which method you choose depends on your preferred result data type. Example: dbadmin=> CREATE TABLE public.test AS dbadmin-> SELECT sysdate, dbadmin-> CAST(sysdate AS TIME) just_time_as_time, dbadmin-> TO_CHAR(sysdate, ‘HH:MI:SS:US’) just_time_as_varchar dbadmin-> FROM dual; CREATE TABLE […]

Hierarchical Partitioning

This blog post was authored by Shrirang Kamat. This document demonstrates how you can use hierarchical partitions, introduced in Vertica 9.0, to solve a few example use cases. For more information, see Hierarchical Partitioning. Partitioning a table helps with managing the information life cycle. Since data for partitions is segregated into different ROS containers, predicates […]

Vertica Quick Tip: Add Days to a Date, excluding SAT and SUN

This blog post was authored by Jim Knicely. Say I want to add 12 days to today’s date April, 12, 2018. That’s easy using date arithmetic. dbadmin=> SELECT ’04-12-2018′::DATE + 12 AS today_plus_12_business_days; today_plus_12_days ——————– 2018-04-24 (1 row) But what if I want to only add 12 “business” days and exclude the “weekend” days? That’s […]

Vertica Quick Tip: Converting Intervals to Numeric

This blog post was authored by Jim Knicely. Intervals measure the difference between two points in time. Converting an interval to a numeric is easy in Vertica because we can perform operations (i.e. division) on them! Examples: dbadmin=> SELECT INTERVAL ’24 HOURS’ / INTERVAL ‘1 DAY’ days; days —— 1 (1 row) dbadmin=> SELECT INTERVAL […]

Vertica Quick Tip: Extract Just Numbers from a String of Characters

This blog post was authored by Jim Knicely. The easiest way to extract just the numbers from a string of characters is to simply remove any character that isn’t a number! Example: dbadmin=> SELECT c1, regexp_replace(c1, ‘\D’, ”) FROM bunch_of_characters; c1 | regexp_replace —————–+—————- A1B2C3D4E5 | 12345 ABCDE12345 | 12345 Phone: 555-0100 | 5550100 1!2$3%4^5 […]

Resource Management

This blog post was authored by Soniya Shah. A Vertica database runs on a cluster of hardware. All loads and queries running against the database take up system resources, such as CPU, memory, disk I/O, bandwidth, file handles, and more. Query performance depends on how many resources are allocated to it. In a single-user environment, […]

Ntile, and why you should love it

This blog post was authored by Curtis Bennett. According to a quick internet search, the hottest recorded temperature in Los Angeles was 109 degrees Fahrenheit, recorded on July 8th, 2017. If you had access to all kinds of weather data (spoiler alert: you do!) you could query the data and figure out the highs and […]

Vertica Quick Tip: Simulating DML Operations on External Tables

This blog post was authored by Jim Knicely. An external table lets us query a text file that sits outside of the database as if it were a table internal to the database. Example: dbadmin=> CREATE EXTERNAL TABLE public.mood (pk INT, mood VARCHAR(100)) AS COPY FROM ‘/home/dbadmin/mood.txt’; CREATE TABLE dbadmin=> SELECT * FROM public.mood; pk […]

Vertica Quick Tip: How old am I?

This blog post was authored by Jim Knicely. The AGE_IN_YEARS function returns the difference in years between two dates, expressed as an integer. This function is very useful when I forget how old I am or how old my wife is being that today is her birthday. Example: dbadmin=> SELECT extract(year from sysdate) “Current Year”, […]

Dynamic Row and Column Access Policies

This blog post was authored by Serge Bonte. Vertica’s row and column access policies can be used to provide extra security on data in your tables. These policies are well covered in Best Practices for Creating Access Policies in Vertica and Dynamic Row and Column Access Policies. In this blog, we will explore how dynamic […]

Vertica Quick Tip: Monitoring CPU Usage

This blog post was authored by Jim Knicely. The V_MONITOR.CPU_USAGE system table records the CPU usage history by node. Example: I can easily find the top 5 highest average CPU usage on my cluster in the previous 24 hours: dbadmin=> SELECT start_time, dbadmin-> AVG(average_cpu_usage_percent) AS avg_cpu_usage dbadmin-> FROM v_monitor.cpu_usage dbadmin-> WHERE start_time BETWEEN NOW() – […]

Vertica Quick Tip: Projection Create Types

This blog post was authored by Jim Knicely. The DDL produced by the EXPORT_OBJECTS function typically contains a create type for a projection. Example: CREATE PROJECTION public.test_super /*+basename(test),createtype(L)*/ ( c1 ) AS SELECT test.c1 FROM public.test ORDER BY test.c1 SEGMENTED BY hash(test.c1) ALL NODES OFFSET 0; In the above example the create type is L […]

Vertica Wins First Place for Edge-Based Analytics at University of Central Arkansas Raspberry Pi Bakeoff

This blog post was authored by Tim Donar. We had the opportunity to enter our Vertica flight tracking demo at the University of Central Arkansas’s university raspberry pi bakeoff. We won first place, competing against a number of other raspberry pi-based projects! For more information on how to use Vertica to track commercial aircraft in […]

Improving Performance and Memory Acquisitions for Vertica Queries

This blog post was authored by Shrirang Kamat. The following design considerations will help you improve the performance and memory of your Vertica queries. When creating table definitions, you should carefully choose the size of the lookup column based on your data. Properly sizing your column based on your data will help to improve performance. […]

Vertica Quick Tip: Summary of Cluster State

This blog post was authored by Jim Knicely. The GET_CLUSTER_STATE_SUMMARY function can be used to quickly view the current status of your Vertica cluster. Example: All nodes are up: dbadmin=> SELECT get_cluster_state_summary(); get_cluster_state_summary ———————————————————————————————————- Cluster State: test2 UP: 4 of 4 (v_test2_node0001, v_test2_node0002, v_test2_node0003, v_test2_node0004) (1 row) One node is down: dbadmin=> \! admintools -t […]

Vertica Quick Tip: What’s the Maximum Column Size?

This blog post was authored by Jim Knicely. There exists a maximum number of characters that can be stored in columns having a data type of VARCHAR or LONG VARCHAR. In case you’ve forgotten those maximum values for the version of Vertica you are using, there are two handy functions that provide this information. Example: […]

Blog Post Series: Using Vertica to Track Commercial Aircraft in near Real-Time – Part 6

This blog post was authored by Mark Whalley. I have discussed in previous blog posts the continuous stream of messages from aircraft transponders, captured and decoded using the DUMP1090 application, which we are planning on feeding into a series of Kafka topics, prior to loading into their corresponding tables in a Vertica database. This blog […]

Vertica Quick Tip: Case Insensitive Session Queries

This blog post was authored by Jim Knicely. By default Vertica is case sensitive when it comes to comparing strings. You can change this behavior by setting the session locale to LEN_S1. Example: dbadmin=> CREATE TABLE test (c1 VARCHAR(25)); CREATE TABLE dbadmin=> INSERT INTO test SELECT ‘Vertica Rocks!’; OUTPUT ——– 1 (1 row) dbadmin=> SELECT […]

Vertica Quick Tip: Automatically Close an Idle Session

This blog post was authored by Jim Knicely. The IDLESESSIONTIMEOUT parameter can be used to close a session that has been idle for a period of time. An idle session is one that has no queries running. Example: dbadmin=> CREATE USER jim IDLESESSIONTIMEOUT ’10 seconds’; CREATE USER dbadmin=> \c – jim You are now connected […]

What is it like being a girl in a room of many boys?

This blog post was authored by Crystal Farley (North). I volunteered to be the Chairperson for the virtual chapter of DataGals because I think connecting women across the world is a great way to continue to #PushforProgress. Why not bridge the gap and create a community where we encourage diversity and inclusion of everyone that […]

How Cisco and Vertica empower high performance analytics for the most demanding workloads

This blog post was authored by Steve Sarsfield. Hadoop and HDFS is capable of storing massive volumes of data, but performing analytics on Hadoop can be challenging. Despite the apparent low-cost cost of Hadoop, it is best suited for data lake and data science solutions, where the number of concurrent analytical users is low. In […]

Vertica Quick Tip: EXPLAIN Plan in JSON Format

This blog post was authored by Jim Knicely. The EXPLAIN command returns the optimizer’s query plan for executing a specified query. Example: dbadmin=> EXPLAIN SELECT the_date FROM big_date_table WHERE the_date_as_date BETWEEN ’09-19-2017′ AND ’09-19-2018′; —————————— QUERY PLAN DESCRIPTION: —————————— EXPLAIN SELECT the_date FROM big_date_table WHERE the_date_as_date BETWEEN ’09-19-2017′ AND ’09-19-2018′; Access Path: +-STORAGE ACCESS for […]

Projection Delete Concerns

This blog post was authored by Curtis Bennett. Deletes in Vertica are a complicated topic. I’ve had many people say to me that they’d heard that one should never do deletes in Vertica, or that deletes in Vertica are slow. Nothing could be further from the truth. But like anything in a database, they should […]

Vertica Quick Tip: On the Fly Date Dimension

This blog post was authored by Jim Knicely. The Vertica TIMESERIES Clause is an important component of time series analytics computation. It performs gap filling and interpolation (GFI) to generate time slices missing from the input records. I like to use it to create a dynamic date dimension for my data warehouse. Example: dbadmin=> CREATE […]

Vertica Quick Tip: Enable \timing from the vsql Command Line

This blog post was authored by Jim Knicely. The vsql \timing meta-command reports, in milliseconds, the length of time each SQL statement runs. Example: dbadmin=> \timing Timing is on. dbadmin=> SELECT COUNT(*) FROM big_varchar_table WHERE pk BETWEEN 1000000 AND 2000000; COUNT ——— 1000001 (1 row) Time: First fetch (1 row): 179.609 ms. All rows formatted: […]

Vertica Quick Tip: My SQL History

This blog post was authored by Jim Knicely. In vsql you can use the \s meta-command to view your command line history: Example: dbadmin=> CREATE TABLE some_table (c1 INT, c2 VARCHAR(10)); CREATE TABLE dbadmin=> INSERT INTO some_table SELECT 1, ‘TEST1’; OUTPUT ——– 1 (1 row) dbadmin=> ALTER TABLE some_table ADD COLUMN c3 int; ALTER TABLE […]

Introducing the Parallel Streaming Transformation Loader (PSTL) Solution

This blog post was authored by Soniya Shah. At Vertica, we understand how important it is that our customers can make decisions in near real time. Being able to do this not only requires the massive parallel processing that Vertica offers, but the ability to transform and ingest your data into Vertica as quickly as […]

Vertica Quick Tip: Repeating

This blog post was authored by Jim Knicely. The Vertica REPEAT function replicates a string the specified number of times, and concatenates the replicated values as a single string. Examples: dbadmin=> SELECT repeat(‘TesT’, 5); repeat ———————- TesTTesTTesTTesTTesT (1 row) dbadmin=> SELECT ‘1’ || repeat(‘0’, 100) AS “One Googolplex”; One Googolplex ——————————————————————————————————- 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 (1 row) This […]

Vertica Quick Tip: Splitting a String into Rows

This blog post was authored by Jim Knicely. The Vertica Text Search feature includes a handy function named StringTokenizerDelim that you can use to split a string into rows by a given delimiter. Example: dbadmin=> SELECT * FROM test; c1 | c2 —-+——- 1 | A|B|C 2 | D|E|F (2 rows) dbadmin=> SELECT c1, words […]

Monitoring Login Failures

This blog post was authored by Curtis Bennett. Security is an ever-present topic on the minds of database administrators. Certainly Vertica has a number of security features like Kerberos, and SSL. This blog isn’t about those! Instead, let’s take a look at a much more simplistic aspect of security which Vertica tracks – login failures. […]

Blog Post Series: Using Vertica to Track Commercial Aircraft in near Real-Time

This blog post was authored by Mark Whalley. As discussed in an earlier blog post, we have a continuous stream of messages from aircraft transponders captured and decoded using the DUMP1090 application. We now want to load this data into a series of Vertica tables. As with the streaming nature of the incoming data, with […]

Vertica Quick Tip: Getting Every n-th Row from a Table

This blog post was authored by Jim Knicely. You can use the Vertica MOD function to grab every nth row from a table. Example: dbadmin=> SELECT * FROM nth_row ORDER BY 1; id | some_data —-+———– 1 | TEST1 2 | TEST2 3 | TEST3 4 | TEST4 5 | TEST5 6 | TEST6 7 […]

Vertica Quick Tip: What

This blog post was authored by Jim Knicely. Planning a road trip to Disney? How far is it? Vertica has a built-in function called DISTANCE, which returns the distance (in kilometers) between two points. You specify the latitude and longitude of both the starting point and the ending point. You can also specify the radius […]

Vertica Quick Tip: Keeping Track of Data Load File Sources

This blog post was authored by Jim Knicely. If you are like most companies you will be loading data into a Vertica table from many different data files. And if you are like me, you’ll forget later which file a table record arrived in. Fortunately the CURRENT_LOAD_SOURCE function returns the file name used when executing […]

Operational Considerations for Database Administrators in Eon Mode

This blog post was authored by Shrirang Kamat. The purpose of this document is to familiarize advanced Vertica Enterprise mode users about some of the behavioral similarities and differences between Vertica Enterprise mode and Vertica Eon mode. This document assumes you have a basic understanding about Eon mode architecture. You can find more details about […]

Vertica Eon Mode: Caching

This blog post was co-authored by Pratibha Rana and Ben Vandiver. Running every query directly against the data in shared storage would result in poor performance and subject the shared storage to heavy load. Eon mode in Vertica introduces a cache called the Depot that avoids reading from the shared storage for frequently used data […]

Vertica Quick Tip: What’s the first day of the month?

This blog post was authored by Jim Knicely. In the last Vertica Quick Tip we learned about the Vertica built-in LAST_DAY function. So where’s the FIRST_DAY function? You’ll have to create your own. So how do you compute the first day of the month for a given date? There are many ways, each involving a […]

The Kafka Streaming Load Scheduler

This blog post was authored by Tom Wall. Vertica’s streaming load scheduler provides high-performance streaming data load from Kafka into your Vertica database. Whether you already use Kafka or not, it is worth considering it as a solution to your data loading challenges. Kafka complements Vertica very nicely, and the scheduler removes many complexities of […]

Understanding the APPROXIMATE_COUNT_DISTINCT Functions

This blog post was authored by Curtis Bennett. The exact computation of the number of distinct values of an expression X on a multi-node architecture requires bringing all distinct values of X (within the specified group if a GROUP BY was specified) to the same node, and then counting the number of distinct values on […]

Vertica Quick Tip: What’s the last day of the month?

This blog post was authored by Jim Knicely. The Vertica built-in LAST_DAY function returns the last day of the month for a specified date. This function comes in handy for leap years. Example: dbadmin=> SELECT last_day(’02/28/2018′) NOT_A_LEAP_YEAR, last_day(’02/28/2020′) A_LEAP_YEAR; NOT_A_LEAP_YEAR | A_LEAP_YEAR —————–+————- 2018-02-28 | 2020-02-29 (1 row) In one of my previous positions, an […]

Vertica Quick Tip: Checking User Role Membership

This blog post was authored by Jim Knicely. The HAS_ROLE function returns a Boolean value that indicates whether a role has been assigned to a user. Example: To create a read only user and role, do the following: dbadmin=> CREATE ROLE read_only_role; CREATE ROLE dbadmin=> CREATE USER read_only; CREATE USER dbadmin=> GRANT read_only_role TO read_only; […]

Aggregate Projections

This blog post was authored by Curtis Bennett. Vertica stores physical data for tables in objects known as projections. Unlike traditional RDBMS’s, Vertica does not rely on indexes for performance. Instead, Vertica stores the physical data (either all or some of the columns) in whatever sort order is required for optimal query processing. This can […]

Vertica Quick Tip: Increasing the Performance of a Rebalance

This blog post was authored by Jim Knicely. Before performing a rebalance, Vertica by default will query system tables to compute the size of all projections involved in the rebalance task. This query can add significant overhead to the rebalance operation! To disable this query, set the configuration parameter RebalanceQueryStorageContainers to 0. Example: dbadmin=> SELECT […]

Vertica Quick Tip: Expiring a User’s Password

This blog post was authored by Jim Knicely. You can expire a user’s password immediately using the ALTER USER statement’s PASSWORD EXPIRE parameter. By expiring a password, you can: • Force users to comply with a change to password policy. • Set a new password when a user forgets the old password. This feature also […]

Vertica Quick Tip: VSQL Shortcuts to Move Faster on the Command Line

This blog post was authored by Jim Knicely. vsql is Vertica’s character-based, interactive, front-end utility that lets you type SQL statements and see the results. If you’ve typed a particularly long query in vsql then realize that you’d have a typo way back at the beginning of your code (i.e. you wrote SEELECT), instead of […]

Improve the Efficiency of Mergeout on Wide Tables

This blog post was co-authored by Xiao Ling and Jim Kelley. Introduction When resource pools were first introduced to Vertica, the average computer had a lot less memory than it does today. The default memory size for the Tuple Mover resource pool, 200 MB, reflects the more limited resources of that period. As hardware and […]

Vertica Quick Tip: Renaming a View

This blog post was authored by Jim Knicely. You are probably aware that you can rename a table using the ALTER TABLE … RENAME command. Example: dbadmin=> \dt test List of tables Schema | Name | Kind | Owner | Comment ——–+——+——-+———+——— public | test | table | dbadmin | (1 row) dbadmin=> ALTER TABLE […]

Vertica Quick Tip: Viewing Query Error Information

This blog post was authored by Jim Knicely. The V_MONITOR.ERROR_MESSAGES system table tracks error and warning messages encountered while processing queries. Example: dbadmin=> CREATE TABLE 123 (c1 INT); ERROR 4856: Syntax error at or near “123” at character 14 LINE 1: CREATE TABLE 123 (c1 INT); ^ dbadmin=> SELECT event_timestamp, user_name, message FROM error_messages ORDER […]

Vertica Quick Tip: Setting a Client Connection Label

This blog post was authored by Jim Knicely. When you connect to a Vertica database you can set a client connection label to help you later identify the connection. Example: dbadmin=> SELECT set_client_label(‘Daily Load’); set_client_label ——————————– client_label set to Daily Load (1 row) dbadmin=> SELECT get_client_label(); get_client_label —————— Daily Load (1 row) dbadmin=> SELECT client_label […]

Vertica Customer Experience Survey

You are a valued Vertica customer and we are interested in your opinions. We want to hear about your experiences with our product, technical support, documentation, and community. Please take a few minutes to respond to our short survey and evaluate our database product here. We look forward to hearing from you!

Blog Post Series: Using Vertica to Track Commercial Aircraft in near Real-Time

This blog post was authored by Mark Whalley. The preceding blog post detailed the hardware requirements used in this project for tracking commercial aircraft in near real-time. In this blog post I will touch on installing the operating system on the Raspberry Pi (RPI) and the DUMP1090 software used for decoding the ADS-B signals being […]

Vertica Quick Tip: Analyzing Table Statistics by Column

This blog post was authored by Jim Knicely. The ANALYZE_STATISTICS function collects and aggregates data samples and storage information from all nodes that store projections associated with the specified table. On a very large wide table it will take a significant amount of time to gather those statistics. In many situations only a few columns […]

Vertica Tip: The System Table for System Tables

This blog post was authored by Sarah Lemaire. Most of you probably know that Vertica provides system tables that allow you to monitor • System resources • Background processes • Workload • Performance • Catalog size These tables help you to profile, diagnose, and view historical data equivalent to load streams, query profiles, Tuple Mover […]

Identifying Projection Skew

This blog post was authored by Curtis Bennett. In Vertica, projections can either be replicated (unsegmented), or segmented. A segmented projection divides the data up across all the nodes in your cluster. Segmentation works by hashing a key value, and then using some simple math, figuring out which node that piece of data will live […]

What Projections are not Being Used

This blog post was authored by Eugenia Moreno. It is common to create new projections to improve performance in Vertica. However, you might forget about the old projections. Vertica still loads data in projections that you might not be using. A projection that is loaded but not picked up by the Vertica optimizer consumes storage […]

Vertica Quick Tip: Sampling Data from the Entire Table

This blog post was authored by Jim Knicely. A development or quality assurance team often times requests access to a sub-set of production data. One way to do that would be to make use of the LIMIT clause. Example: dbadmin=> SELECT COUNT(*) FROM big_number_table; COUNT ———— 1000000000 (1 row) dbadmin=> SELECT 0.05*1000000000 “5_percent”; 5_percent ————- […]

Query Execution in Eon Mode

This blog post was authored by Ben Vandiver. How Vertica distributes query processing across the cluster in Eon mode is a complex topic that is best illustrated through a concrete example. As part of this post, we’ll start with a simple data load and walk through metadata storage and query execution. To begin, we need […]

Vertica Quick Tip: Superfast Table Copy

This blog post was authored by Jim Knicely. Very often we need to make a copy of a very large table in order to do some development or quality assurance type of duties. Typically we’ll use a CREATE TABLE AS SELECT (CTAS) statement to make that copy. Example: dbadmin=> SELECT COUNT(*) FROM big_number_table; COUNT ———— […]

Vertica Quick Tip: Add a Time Zone

This blog post was authored by Jim Knicely. Vertica recognizes many time zones. However, there might come a time (zone) when you will need to reference one that is not available by default. Luckily it’s relatively easy to add a time zone to Vertica. Example: The “Hawaii Standard Time (HST)” is a default time zone […]

What’s New in Vertica 9.0.1: Security and Authentication

This blog post was authored by Soniya Shah. In this release, we introduce some security enhancements. We’ve added the ability to grant and revoke privileges on system tables, using the same syntax as you would for granting and revoking on tables. However, there are some limitations about the types of privileges you can use with […]

Vertica Quick Tip: Default Size of the NUMBER Data Type

This blog post was authored by Jim Knicely. When creating a table where you do not define a precision for a NUMBER column data type, Vertica will use a default precision of 38 digits. Often this is larger than necessary. By specifying NUMBER(37) you will potentially get better query performance and save on storage. Why? […]

Vertica Tip: Predicting the Resources a Statement Needs

This blog post was authored by Eugenia Moreno. You may find you want to set up resource pools before running queries to know how many resources a particular query needs. One way to do this is to create a small resource pool, profile the query, and note when the query is rejected. When the query […]

Beware of Segmentation Islands

This blog post was authored by Curtis Bennett. Many clients who are new to Vertica are also new to big data. While Vertica’s reliance on industry-standard SQL can make the transition very easy, often the introduction of multiple nodes used in support of a database platform can take some getting used to. It is the […]

Vertica Quick Tip: Dynamically Split Up a String

This blog post was authored by Jim Knicely. One of my favorite functions in Vertica is named SPLIT_PART. It splits up a string into parts by a given delimiter. Example: dbadmin=> SELECT split_part(my_text, ‘,’, 1) the_first_part, dbadmin-> split_part(my_text, ‘,’, 2) the_second_part, dbadmin-> split_part(my_text, ‘,’, 3) the_third_part, dbadmin-> split_part(my_text, ‘,’, 4) the_fourth_part dbadmin-> FROM (SELECT ‘ONE,TWO,THREE,FOUR’ […]

What’s New in Vertica 9.0.1: Ranger Integration

This blog post was co-authored by Mitchell Tracy and Monica Cellio. Hadoop clusters can use authorization services to determine which users can access what data in Hive and, by extension, HDFS. In Vertica 9.0 we added support for one of the most common such services, Apache Sentry, and in 9.0.1 we now support Apache Ranger […]

Vertica Quick Tip: Proper Ordering of IP Addresses

This blog post was authored by Jim Knicely. Often times we store IP addresses in a VARCHAR column in a Vertica table. When querying the data and sorting by the IP address, we see that IP addresses are sorted by its VARCHAR value instead of its numeric value. Fortunately Vertica has the INET_ATON function which […]

Vertica at the Aeronaut Brewery: Adventures in Data Architecture

This blog post was authored by Sarah Lemaire. Vertica hosted a Meetup at the Aeronaut Brewery in Somerville for customers and prospective customers, including data scientists from Nuance, J.Jill, and True Fit. After some cold beer to warm us up on a cold night, we were lucky enough to hear JB Huang, Head of Data […]

Vertica Quick Tip: The <=> operator

This blog post was authored by Jim Knicely. The operator performs an equality comparison like the = operator, but it returns true, instead of NULL, if both operands are NULL, and false, instead of NULL, if one operand is NULL. Example: dbadmin=> SELECT 1 = 2 “Returns FALSE”, dbadmin-> 1 2 “Returns FALSE”, dbadmin-> 1 […]

Vertica Quick Tip: A Truly Unique Constraint

This blog post was authored by Jim Knicely. According to the ANSI standards SQL:92, SQL:1999, and SQL:2003, a UNIQUE constraint should disallow duplicate non-NULL values, but allow multiple NULL values. A Unique Constraint in Vertica does just that! Example: dbadmin=> CREATE TABLE test (c1 INT); CREATE TABLE dbadmin=> ALTER TABLE test ADD CONSTRAINT test_uk UNIQUE […]

What’s New in Vertica 9.0.1: S3 Backup Encryption

This blog post was authored by James Kelley. Amazon S3 offers flexibility, efficiency, and scale. But does it offer security? With the release of Vertica 9.0.1, Vertica offers users the ability to encrypt their backups to S3 with server-side encryption. Vertica supports the following forms of S3 encryption: Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3) […]

What’s New in Management Console 9.0.1

This blog post was authored by Lisa Donaghue. Vertica 9.0.1 introduces Management Console (MC) improvements to cloud monitoring. Tag AWS Instances Management Console with Provisioning, available on the AWS Marketplace, includes a Cluster Creation wizard to provision databases on AWS resources. With Vertica 9.0.1, you can tag instances as you create them through the Cluster […]

Vertica Quick Tip: Lightning Fast Text Search

This blog post was authored by Jim Knicely. Searching the contents of a sizeable CHAR, VARCHAR, LONG VARCHAR, VARBINARY, or LONG VARBINARY field within a table to locate a specific keyword can be quite time consuming. Especially when dealing in Big Data. Fortunately, Vertica includes a text indexing feature which allows you to query that […]

Vertica Quick Tip: Generating a Random String

This blog post was authored by Jim Knicely. We saw in a previous Vertica Quick Tip that we can create a SQL function that generates random dates. How about one that generates random strings? Example: dbadmin=> CREATE OR REPLACE FUNCTION randomstring (x INT) RETURN VARCHAR dbadmin-> AS dbadmin-> BEGIN dbadmin-> RETURN CASE x dbadmin-> WHEN […]

Loading in Eon Mode

This blog was co-authored by Yuanzhe Bei, Ryan Roelke, Amin Saeidi, Soniya Shah, and Natalia Stavisky. This blog was updated in July 2018. Overview As of Vertica 9.1.x, you can operate your database in Eon Mode. Eon Mode separates the computational processes from the storage layer of your database. Deployment of Eon Mode is limited […]

Vertica Quick Tip: Which Rows Will Commit?

This blog post was authored by Jim Knicely. Did you ever update a bunch of rows in a table, then forget which ones you changed? Fearing you might have updated an incorrect record, you might have to roll back and start again. Or, in Vertica you can first check which records have been modified prior […]

Vertica Quick Tip: Date Arithmetic with Intervals

This blog post was authored by Jim Knicely. In the last Vertica Quick Tip we saw how easy date arithmetic can be. Well, it can be even easier with Intervals! Example: What is today’s, yesterday’s and tomorrow’s date? dbadmin=> SELECT SYSDATE Today, dbadmin-> SYSDATE – INTERVAL ‘1 Day’ Yesterday, dbadmin-> SYSDATE + INTERVAL ‘1 Day’ […]

Virtual DataGals Kick Off

This blog post was authored by Crystal Farley (North). Last week, the virtual chapter of DataGals started with their kick off event: Come here, go anywhere! Joy King, VP of Product Management, Product Marketing, and Field Engagement for Vertica talked about her experience and journey in her career. Joy has been with the company for […]

Vertica AWS Eon Mode Beta Provisioning with Management Console

This blog post was co-authored by Michael Hua and Soniya Shah. As of Vertica 9.0.x, you can operate your database in Eon Mode Beta. Doing this separates the computational processes from the storage layer of your database, enabling rapid scaling of resources to accommodate variable workloads. This post describes how you can provision a Vertica […]

Vertica Quick Tip: Date Arithmetic

This blog post was authored by Jim Knicely. Date arithmetic in Vertica is extremely easy! Example: What is today’s, yesterday’s and tomorrow’s date? dbadmin=> SELECT SYSDATE Today, dbadmin-> SYSDATE – 1 Yesterday, dbadmin-> SYSDATE + 1 Tomorrow; Today | Yesterday | Tomorrow —————————-+—————————-+—————————- 2018-01-18 11:36:43.132482 | 2018-01-17 11:36:43.132482 | 2018-01-19 11:36:43.132482 (1 row) But you’re […]

Vertica Quick Tip: Avoid Using Functions on Very Large Data Sets

This blog post was authored by Jim Knicely. You can store billions and billions and billions (i.e. a lot) of records in your Vertica tables. When querying these large data sets, try to avoid using database functions like TO_DATE, TO_CHAR, NVL, etc. when unnecessary. Example: A table named BIG_DATE_TABLE has 1 billion rows and a […]

Eon Mode Beta Overview in 9.0.1

This blog post was authored by Soniya Shah. What is Eon Mode Beta? In Vertica 9.0.1, Eon Mode Beta, the separation of compute and storage, continues on Amazon Web Services S3. Eon Mode Beta was introduced in Vertica 9.0 to capitalize on cloud economics, while still enjoying the fast query processing for which Vertica is […]

Vertica Quick Tip: Generating a Random Date

This blog post was authored by Jim Knicely. I can easily generate a random integer value using the Vertica built-in RANDOMINT function. For example: dbadmin=> SELECT randomint(10) “Random 0-9”, dbadmin-> randomint(10) “Random 0-9”, dbadmin-> randomint(10) “Random 0-9”; Random 0-9 | Random 0-9 | Random 0-9 ————+————+———— 6 | 4 | 0 (1 row) But what […]

Vertica Quick Tip: The LIMIT Analytic Function

This blog post was authored by Jim Knicely. Vertica contains an abundance of built-in SQL analytic functions. One of the lesser known but also one of the coolest is the LIMIT analytic function. Example Say I have the following table data: dbadmin => SELECT * FROM limit_test; the_date | test_num | test_desc ————+———-+———– 2018-01-10 | […]

Vertica Quick Tip: How to Query for NaN Values

This blog post was authored by Jim Knicely. We’re introducing a new series: Vertica Quick Tips! These tips are intended to give you concise information to help you get the most out of using Vertica. NaN (Not a Number) does not equal anything, not even another NaN. You can query for them using the predicate […]

What’s New in Vertica 9.0.1: Machine Learning

This blog post was authored by Soniya Shah. Vertica 9.0.1 introduces new functionality that continues to match our goals for fast-paced development and enhancement of machine learning in Vertica. In this release, we introduce support for random forest for regression, a new statistical summary function, increased support for cross validation, and enhancements for data evaluation. […]

IMPORTANT: What you need to know about hardware security flaws

Vertica is aware of potential chip-level security flaws in recently discovered hardware bugs. Operating system vendors have developed patches to address these problems. Installing the patches could impact application performance. Vertica engineers are actively investigating how these patches might affect Vertica performance and will keep our customers informed. There is much speculation about the impact […]

Authentication Methods for dbadmin

This blog post was authored by Sumeet Keswani. In Vertica, when you create a new database, there are no configured authentication methods. In this case, Vertica assumes that all users, including the dbadmin, have an implicit password authentication. Users can use this authentication method both for authenticating over a network interface and for over a […]

DataGals Year in Review

The DataGals had an amazing 2017! Everyone on our steering committee – Lisa Donaghue, Styliani Pantela, Soniya Shah, Diem Tran, and Sharada Vesta would like to wish you a very happy new year! Check out our year in review below. We look forward to a successful 2018. Happy holidays from the DataGals!

What’s New in Vertica 9.0.1?

This blog post was authored by Soniya Shah. In Vertica 9.0.1, we introduce new functionality including: • Eon Mode Beta Updates • Supported Platform Updates • Machine Learning Enhancements • Support for Ranger Integration • Security Enhancements • Backup, Restore and Recovery Updates Eon Mode Beta Updates In Vertica 9.0, we introduced Eon Mode Beta, […]

External data is easier to manage than the Hadoop Stack

This blog post was authored by Steve Sarsfield. It has been a thrilling ride. In a few short years, Hadoop has seen astronomical rise, but recently, interest in Hadoop has peaked. Analysts like Gartner and Datanami are reporting that the hype in Hadoop is waning, with some hope that use cases for Hadoop will find […]

Vertica in the Clouds

This blog post was authored by Soniya Shah. The benefits of using cloud computing and storage are virtually endless. You can scale services up or down to fit your needs, customize applications, and access cloud services from everywhere. Using the cloud makes it easy to scale elastically and makes infrastructure both affordable and flexible. With […]

Blog Post Series: Using Vertica to Track Commercial Aircraft in near Real-Time Part 3

This blog post was authored by Mark Whalley. Picking apples, pears, blackberries or raspberries In a previous blog post, I provided a very high-level overview of ADS-B, and that with the appropriate pieces of hardware and some open-source software, it was possible to capture and decode the radio signals being broadcast from commercial aircraft, with […]

Flattened Tables

This blog post was authored by Soniya Shah. Before release 8.1., Vertica users could denormalize their data by combining all fact and dimension table columns in a single ‘fat’ table. These tables facilitated faster query execution. However, this approach required users to maintain redundant sets of normalized and denormalized data, which incurred its own overhead. […]

Phrase Search with Vertica Text Search

This blog post was authored by Serge Bonte. Vertica Text Search Vertica already provides Text Search. Text Search allows you to quickly search the contents of a single CHAR, VARCHAR, LONG VARCHAR, VARBINARY, or LONG VARBINARY field within a table to locate a specific token. Vertica implements that capability using a dedicated Text Index to […]

What’s New in Vertica 9.0: Hierarchical Partitioning

This blog post was authored by Michael Kronenberg. With Vertica 9.0, you can consolidate partitions into groups that minimize use of ROS storage. Reducing the number of ROS containers to store partitioned data helps facilitate DML operations such as DELETE and UPDATE, and avoid ROS pushback. For example, you can group date partitions by year. […]

What’s New in Vertica 9.0: The UUID Data Type

This blog post was authored by Gary Gray. Vertica version 9.0 adds Universal Unique Identifier (UUID) to its collection of data types. Accompanying this new data type are updates to the client libraries and a new function to help you use UUIDs in your database. As its name implies, computers use UUIDs to uniquely identify […]

Blog Post Series: Using Vertica to Track Commercial Aircraft in near Real Time

This blog post was authored by Mark Whalley. A Source of Streaming Data If you’re joining us a new reader, be sure to read part one of this series to get up to speed! When looking for a topic to use in the first of The Lab Series’ mini projects for the Big Data and […]

What’s New in Vertica 9.0: Sentry Integration

This blog post was authored by Mitchell Tracy and Monica Cellio. Hadoop clusters can use authorization services to determine which users can access what data in Hive and, by extension, HDFS. In Vertica 9.0 we now support one of the most common such services, Apache Sentry. Apache Sentry is a project in the Hadoop ecosystem […]

Data Pipelines: Vertica and Kafka

This blog post was authored by Tom Wall and Soniya Shah. At Vertica, we want to make it as easy as possible for your Vertica environment to coexist with other tools and technologies. We know that one size does not fit all. Sometimes you need a customized, end-to-end view of your system. Imagine you’re on […]

Transitioning Vertica Support

As part of our merge with Micro Focus, we are moving our support platform from my.vertica.com to Software Support Online. We want to make this alignment as smooth as possible for our customers and encourage you to read our Alignment Step by Step guide for Vertica Customers moving to SSO.

Estimate the Price of Diamonds Using Vertica Machine Learning

This blog post was authored by Vincent Xu. In this blog post, I’ll take you through the exercise I did to estimate the price of a diamond based on its characteristics, using the linear regression algorithm in Vertica. Besides Vertica 9.0, I used Tableau for charting and DbVisualizer as the SQL editor. From this exercise, […]

Try out the new Vertica Community Edition Virtual Machine!

This blog post was authored by Kathy Taylor. New to Vertica? Wondering where to start? Why not start with our new 8.1.1 Community Edition VM? It’s free! Just download the VM and start it up in a VM player on your PC. Open the User Guide, start the exercises, and off you go! You’ll be […]