Vertica

Author Archive

Enter the Flex Zone – Modernizing the Enterprise Data Warehouse

I’ve had the privilege of attending the Data Warehouse Institute’s (TDWI) conference this week. The Las Vegas show is usually one of their biggest gatherings. This year, there were about 600 of us gathered together to talk about the latest and greatest in the data warehouse and business intelligence world. HP Vertica was a sponsor.
The latest buzz was around many of the new data discovery tools that were announced by some vendors. Vendors recognize that there is a significant amount of undiscovered data in most businesses. As data warehouse teams go merrily along delivering daily analytics, piles and piles of dark data builds within that might have value. To innovate, users are recognizing that some of this unexplored data could be quite valuable, and it’s spurring on the development of a new breed of data discovery tools that enable users to develop new views of structured, semi-structured, and unstructured data.

Of course, this is the very reason that we have developed HP Vertica Flex Zone. The ability to ingest semi-structured data and use current visualization tools are one of the key tenets of HP Vertica Flex Zone. With HP Vertica Flex Zone, you can leverage your existing business intelligence (BI) and visualization tools to visually explore and draw conclusions from data patterns across a full spectrum of structured and semi-structured data. Analysts, data scientists, and business users can now explore and visualize information without burdening or waiting for your IT organizations to use lengthy and costly ETL tools and processes typical with legacy databases and data warehouses.
Most agreed that special data discovery tools should converge with standard analytical platforms in the coming months. Discovery should be as much a part of your business as daily analytics.

There were some first-rate executive sessions led by Fern Halper and Philip Russom, who talked about the transformation of analytics over the years. Analytics has become more mainstream, more understood by the masses of business users. Therefore innovation comes when we can deliver business intelligence for this new generation of information consumers.

The panel discussions and sessions focused very much on business value and put forth a call-to-action for some. Innovate. Feed the business users needs for information that will help drive revenue, improve efficiency, and achieve compliance with regulations. It was clear that data warehouse must be modernized of data warehouse (and that is happening today). Data warehouse pros aren’t satisfied with daily static analytics that they delivered in the past. They are looking for new data sources, including big data, and new-age data analytic platforms to help achieve their business goals.

Get started modernizing your enterprise data warehouse – evaluate HP Vertica 7 today.

Enter the FlexZone – Let’s talk ETL

When (and When Not) to Use Data Integration with HP Vertica

In December, HP released version 7 of the HP Vertica analytics platform which includes, among others, a great new feature called HP Vertica Flex Zone (Flex Zone). Flex Zone enables you to quickly and easily load, explore and analyze some forms of semi-structured data. It eliminates the need for coding-intensive schemas to be defined or applied before the data is loaded for exploration.

One of Flex Zone’s important values is that it can save you hours of work setting up and managing data extraction. Rather than setting up schemas and mappings in an ETL tools and later worrying about whether structure will change, the process is simplified with Flex Zone. Data is simply pulled into Flex Zone and structure is automatically understood. Flex Zone is powerful for the exploration of common types of data. Flex Tables can immediately leverage:

  • Delimited data – semi-structured text files. These are often referred to as flat files because the information is not stored in a relational database.
  • JSON – A readable file that is often used in social media and new online applications

For these types of files, which are very common in modern IT infrastructure, you do not need an ETL to extract, transform and load the data. This functionality is included with Flex Zone and can save you many hours in pre-processing data for analytics. It can save you time in the long-run by lowering the need to monitor ETL processes. Other mechanisms also let Vertica ingest data from other common big data structures from HIVE and HDFS.

Having a function in Flex Zone that automatically understands structure is powerful. This is something that normally takes time, slowing the overall process of exploration of the data. Should the structure of the data change, maintaining it is also time-consuming. By integrating these less structured data sources and supporting vanilla SQL queries against them, Vertica brings a key feature of relational databases to bear: abstracting the storage representation from the query semantics.

ETL – Extract Transform Load

However, most ETL tools offer hundreds of connectors that allow for connection into anything from Salesforce to Oracle to DB2 on the mainframe. For these types of uses, you can either use an ETL or export data from the application into a supported format in order to use Flex Tables.

For ETL, users take on the process of extracting data and transforming it to make it fit-for-purpose. The longer process may be necessary, however. During the ETL process, users can ensure that the data conforms to the schema and that data quality standards are upheld. Users can establish business rules and reject any records that don’t conform to standards. Users can recode certain values in the data to standardize them (e.g. ST, Street, strt can be recoded to ‘STREET’). Users can also extract data from sources that have proprietary formats, like SAP, MS SQL and AS/400 and hundreds of others. Therefore, in order to deliver accurate analytics, gain access to odd file formats, ETL is still necessary for certain data.

Some companies need to keep track of where data came from and what was changed in the data. The Data Lineage features of many ETL tools help you track where a change occurred. The result of the data lineage shows in a report which traces a change from the target end component of a Job up to the source end. If this is an important part of your process, you may need an ETL tool.

The good news is that Vertica has several partners who offer free open source ETL with support for Vertica, like Talend and Pentaho, as well as commercial partners like Syncsort, Informatica and others. See http://www.vertica.com/partners/data-integration-etl/ for a complete list.

Data Governance and Chicago’s CampIT event

Steve I recently had the privilege to talk at a CampIT event in Chicago, a very well-attended event at the Stephens Convention Center near Chicago’s O’Hare airport. Analytics professionals gathered and shared ideas on technologies like Hadoop, big data analytics, columnar store databases and in-memory technologies – to name just a few of the topics.

Challenges of Modern Analytics

In my presentation, I covered some of the challenges in modern analytics. Perhaps the biggest technical challenge we’re facing is the ever-growing volumes of data in our organizations. More data means that our legacy analytical solutions are slowing, making it harder and harder to deliver analytics at the right time for the right audience. Business user may lack the technical understanding of how this affects them. They only know that they can’t get answers and business intelligence as readily as they need to.

Another challenge is that IT professionals continued to be asked to do more with less funding. According to Gartner, IT spending increased only about 0.8% this year. IT is spending all of their funds on keeping the wheels on the bus spinning, but few funds in IT are available to innovate. Other budgets, like marketing and sales technology spend are increasing, however. IT is still seen as a cost center in many organizations, while the business side is considered to be revenue-generating.

Data Governance Can Help

Data governance can help us tap into the business-focused budgets with a couple of important edicts:

    1. IT should form an alliance with business users
    Take a real interest in some of the challenges that your business users have by inviting them for coffee or giving them an opportunity to beef about their challenges.

    2. IT should focus on important business aspects of the IT initiative
    If you ask your business users, the most important aspects of IT aren’t technical. The three most important business aspects of any initiative is revenue, efficiency and compliance. IT should be trying hard to help the company make more money, be more efficient in the way that day-to-day business is done, and compliance with state, local, federal or industry regulations.

    3. The data governance team should initially pick projects that can provide quick return on investment and track benefits.
    Quick wins that are profitable to the corporation form an agile approach to data governance. Initiatives shouldn’t take months or years, but days or weeks. When users see the value that IT is bring to the organization, they will want to work with you on solving their issues.

    4. Analytics is just one of the systems of opportunity to begin your data governance initiative.
    Providing fast analytics with Vertica’s help is just one system of opportunity to move your data governance initiative forward.

Tap into Business Budgets

By understanding your business user’s needs, providing a strong ROI and talking about the business benefits of Vertica, you can sell the benefits of big data analytics into your organization. Again, it’s about revenue, efficiency and compliance in your business. It speaks to revenue when you have execution windows to run analysis that you have never had before and now you can find new ways to reach your customers. It speaks to efficiency when you increase speed, typically hundreds of times faster than the old way of doing analytics, and avoid worries about a long analysis taking up too much processing time. It speaks to compliance when you can deliver analysis that’s fast and accurate, and analysis that you don’t have to check and re-check it before you deliver it to a broader audience.

Get Started With Vertica Today

Subscribe to Vertica