Vertica Blog
Soniya Shah smiling

Soniya Shah

Information Developer

Currently, a first year law student with a background in science and technology. Experienced technical writer, with specializations in software documentation, big data, blog development, and website development. I build user-centered content to communicate complex and technical information more easily.

I used to work for Vertica full time for about 3 years. I still work at Vertica part time while going to law school.

Update: Soniya is now doing her law internship, and no longer working at Vertica. Good luck, Soniya!

Connect With Soniya on

Machine Learning Mondays: Data Preparation for Machine Learning in Vertica

This blog post was authored by Vincent Xu. This post is part of our Machine Learning Mondays series. Stay tuned for more! Introduction Machine learning (ML) is an iterative process. From understanding data, preparing data, building models, testing models to deploying models, every step of the way requires careful examination and manipulation of the data....

Using Hadoop Rack Locality to Boost Vertica Performance

This blog post was authored by Monica Cellio. When database nodes are co-located on Hadoop data nodes, Vertica can take advantage of the Hadoop rack configuration to execute queries against ORC and Parquet data. Moving query execution closer to the data reduces network latency and can improve performance. Vertica automatically uses database nodes that are...
Database Server Room

What’s New in Vertica 8.1: Machine Learning

This blog post was authored by Soniya Shah. Overall, you will notice that Machine Learning for Predictive Analytics, introduced in Vertica 7.2.2, is more accessible to use in Vertica 8.1, with the addition of several important functions. There are improvements to model management with access control ability to save and re-apply normalization parameters, missing value...

Why auto-scaling analytical databases aren’t so magical

This blog post was authored by Steve Sarsfield. There is a new feature in analytical databases that seems to be all the rage, particular in cloud data warehouse - Autoscaling. Autoscaling’s promise is that if you have a particularly hard analytical workload, autoscaling will spin up new storage and compute to get the job done....

Understanding AT TIME ZONE

TIMESTAMPTZ AT TIME ZONE and TIMESTAMP AT TIME ZONE return date input in another time zone. How Vertica executes AT TIME ZONE varies, depending on whether the input is a TIMESTAMPTZ or TIMESTAMP. At first glance, this might be confusing. More about that later. First, let's review AT TIME ZONE syntax: { TIMESTAMPTZ | TIMESTAMP...
Programmer

Vertica Machine Learning Series: Logistic Regression

This blog post is based on a white paper authored by Maurizio Felici. What is Logistic Regression? Logistic regression is a popular machine learning algorithm used for binary classification. Logistic regression labels a sample with one of two possible classes, given a set of predictors in the sample. Optionally, the output can be the probability...

DataGals Hosts an International Women

This blog post was authored by Soniya Shah. This week, the DataGals hosted an event in celebration of International Women’s Day. This year’s campaign asked supporters around the world to #BeBoldForChange to encourage a more inclusive, gender equal world. You can read more about the campaign and influencers on the International Women’s Day site. International...

Spark Summit East

This blog post was authored by Myles Collins. I recently went to the Spark Summit East to take the Spark training and get current on the technology that my group (Vertica Partner Engineering) is using more and more. Conveniently, it was held here in Boston. A few weeks after I registered, marketing decided to sponsor...

Using Vertica on IoT Data: Gap Filling and Interpolation for Incomplete Sensor Data

This post was originally authored by Marco Gessner and appeared on LinkedIn. It has been reposted here with his permission. This article explains the basic gap filling and interpolation functionality in Vertica. Vertica was designed for the fast processing and analysis of huge volumes of data and is well suited to IoT applications. One of...
Database Server Room

Vertica Machine Learning Series: k-means

The content of this blog is based on a white paper that was authored by Maurizio Felici. What is k-means Clustering? K-means clustering is an unsupervised learning algorithm that clusters data into groups based on their similarity. Using k-means, you can find k clusters of data, represented by centroids. As the user, you select the...

Machine Learning Series: Linear Regression

The content of this blog is based on a white paper that was authored by Maurizio Felici. This blog post is just one in a series of blog posts about the machine learning algorithms in Vertica. Stay tuned for more! What is Linear Regression? Let’s start with the basics. Linear regression is one of the...

Patented: A Look into Kahlil Oppenheimer

Kahlil Oppenheimer was a Vertica intern during the summer of 2014. This blog post was authored by him and reprinted with his permission. During the first week of my internship at Vertica, my mentor assigned a small bug for me to fix about a set of particular SQL queries. After writing a simple fix for...