Vertica on the Road! Using Vertica to Prepare and Analyze Vehicle Telematics Data

Posted December 17, 2019 by Bryan Herger, Vertica Big Data Solution Architect at Micro Focus

Vehicles generate a lot of data from onboard electronics and there is increasing interest in monitoring vehicle handling to help assess insurance risk and premiums.

At Vertica, we’re working with a few auto manufacturers, so I built this demo to show a relatively simple example of data collection, aggregation and preparation, and analysis of the available drivetrain parameters of my 2014 Subaru Outback.

Setup

The core of my data collector is a Raspberry Pi running Raspbian (Debian-like) Linux.

I have a standard keyboard and a small HDMI monitor for debug purposes.

The setup is powered by a 2.4A USB charger (for the Raspberry Pi) and an AC inverter (for the monitor).

I’ve attached an OBD-II interface and a GPS receiver that each feed data over USB serial ports.

The Raspberry Pi runs software tools to capture data from these ports: gpsd and gpspipe to interpret and capture the GPS data as JSON, and a Python script to store the OBD-II sensor data as JSON (https://github.com/bryanherger/obd-json-logger)

The Pi is also set to connect to my home wi-fi or tether to my mobile phone.  A cron job runs periodically to upload data via SFTP to my Vertica node.

Analysis

Vertica has a feature called “flex tables” that allows Vertica to import semi-structured data and construct virtual SQL tables on the records.

We can ingest the OBD and GPS data into flex tables and prepare SQL views as follows (the procedure is same for OBD and GPS data):

CREATE FLEX TABLE subaru_obd();
CREATE TABLE
COPY subaru_obd FROM LOCAL 'obd-1576088565.2618604.txt' PARSER FJSONPARSER();
Rows Loaded
-------------
1159
(1 row)
SELECT COMPUTE_Flextable_keys_and_build_view('subaru_obd');
COMPUTE_Flextable_keys_and_build_view
----------------------------------------------------------------------------------------------------------
Please see public.subaru_obd_keys for updated keys
The view public.subaru_obd_view is ready for querying
(1 row)
=> \x
Expanded display is on.
=> SELECT * FROM subaru_obd_view LIMIT 1;
-[ RECORD 1 ]----------------------+------------------------------------------------------------------------
absolute_load_percent              | 13.3333333333333
accelerator_pos_d_percent          | 13.7254901960784
accelerator_pos_e_percent          | 14.1176470588235
ambiant_air_temp_degc              | 8
barometric_pressure_kilopascal     | 102
catalyst_temp_b1s1_degc            | 200.5
commanded_egr_percent              | 0
commanded_equiv_ratio_ratio        | 0.9914025
control_module_voltage_volt        | 14.37
coolant_temp_degc                  | 51
distance_since_dtc_clear_kilometer | 526.00
distance_w_mil_kilometer           | 0.00
egr_error_percent                  | 0.000000
engine_load_percent                | 12.9411764705882
evaporative_purge_percent          | 0
fuel_level_percent                 | 76.4705882352941
fuel_status                        | ('Closed loop, using oxygen sensor feedback to determine fuel mix', '')
fuel_type                          | Gasoline
intake_pressure_kilopascal         | 32
intake_temp_degc                   | 23
long_fuel_trim_1_percent           | 3.125000
maf_gps                            | 2.69
o2_b1s2_volt                       | 0.0600
o2_s1_wr_current_milliampere       | -0.09375
o2_s1_wr_voltage_volt              | 2.11819638361181
o2_sensors                         | ((), (False, False, False, False), (False, False, True, True))
obd_compliance                     | OBD and OBD-II
pids_a                             | 1.01111100011111e+31
pids_b                             | 1.00100000001111e+31
pids_c                             | 1.1111110110111e+31
relative_accel_pos_percent         | 0
relative_throttle_pos_percent      | 3.13725490196078
rpm_revolutions_per_minute         | 767.750
run_time_mil_minute                | 0.00
run_time_second                    | 60.00
sessionid                          | 1576088558
short_fuel_trim_1_percent          | 10.156250
speed_kph                          | 12.00
status                             | <obd.OBDResponse.Status object at 0x75c8d3d0>
status_drive_cycle                 | <obd.OBDResponse.Status object at 0x75c8d2b0>
throttle_actuator_percent          | 0
throttle_pos_b_percent             | 31.3725490196078
throttle_pos_percent               | 14.5098039215686
time_since_dtc_cleared_minute      | 804.00
timestamp                          | 1576088565
timing_advance_degree              | 22.50
warmups_since_dtc_clear_count      | 23.00

The OBD-II records are largely the same since I collect all available Mode 1 values for each timestamp.  However, the GPS device emits different message types, so we’ll want to limit just to TPV type messages that have the data fields I want, for example, time, position, speed, heading:

=> SELECT time, lat, lon, speed, track FROM subaru_gps_view WHERE lat IS NOT NULL limit 4;
-[ RECORD 1 ]---------------------
time  | 2019-12-11 17:00:31.995-05
lat   | 40.327563682
lon   | -74.643516215
speed | 0.7340
track | 151.35580
-[ RECORD 2 ]---------------------
time  | 2019-12-11 17:00:33-05
lat   | 40.32732572
lon   | -74.64340316
speed | 0.5290
track | 150.56410
-[ RECORD 3 ]---------------------
time  | 2019-12-11 17:00:34-05
lat   | 40.327306359
lon   | -74.6433727
speed | 1.8720
track | 83.87850

In order to perform a meaningful analysis, I’d like to merge the time series data, but a few things jump out.  Notably, the time fields are different – OBD records use UNIX timestamp as INTEGER, and GPS uses actual TIMESTAMP.  Also, both OBD and GPS data include vehicle speed.

Let’s work on normalizing the times and merging together the data feeds first.  Vertica is Postgres-like enough that fixing the timestamp can be done like this:

=> CREATE TABLE subaru_obd_timestamps AS SELECT *, to_timestamp(timestamp) AS corrected_timestamp FROM subaru_obd_view;
CREATE TABLE
dbadmin=> select * FROM subaru_obd_timestamps LIMIT 1;
-[ RECORD 1 ]----------------------+---------------------------------------------------------------------
...(truncated)...
timestamp                          | 1576108821
corrected_timestamp                | 2019-12-11 19:00:21

Let’s do similar for the GPS table, converting timestamp and limiting to the record type and fields that we are using:

=> CREATE TABLE subaru_gps_timestamps AS SELECT CAST(time as timestamp) as corrected_timestamp, lat, lon, speed, track FROM subaru_gps_view WHERE lat IS NOT NULL;
CREATE TABLE

So now both tables have a SQL TIMESTAMP to compare.  However, while GPS is literally clockwork – it sends a time signal with one update per second – OBD-II is not, where the script writes when it has a completed set of sensor reads, every 2-5 seconds.

This means we need to leverage Vertica’s event series join to interpolate and join the data into a new table like so:

=> create table subaru_interpolated as select obd.*, gps.corrected_timestamp as gps_timestamp, gps.lat, gps.lon, gps.speed, gps.track from subaru_obd_timestamps obd FULL OUTER JOIN subaru_gps_timestamps gps ON obd.corrected_timestamp interpolate previous value gps.corrected_timestamp;
CREATE TABLE

This gives us a few thousand rows with aligned timestamps and data.  We can now draw some graphs of vehicle data, and we can also look at GPS and consider location, speed, heading, which might give some idea why the graph looks the way it does:


Figure 1: Driving on a local road on the way to the New Jersey Turnpike. It looks like I am prone to punching the gas when in traffic (dark orange line). Then I remembered I’m being recorded and eased up!


Figure 2: Local roads (Figure 1) leading up to highway driving. You’ll notice that I reach highway speed (~110 km/h) manually, then speed and other parameters level off considerably when I activate cruise control!

So what else could we use the data for?  We could capture data to study fuel efficiency and help to predict failures and optimize maintenance schedules, for instance.  OBD-II was designed to monitor and limit emissions, so this setup doesn’t capture some possibly useful data: brake position, steering position, TPMS come to mind.  Adding an accelerometer to the Raspberry Pi would allow indirect measurement of some of these.

Thus concludes our simple example for merging and analyzing two related time series data streams.  Looking at the highway data with and without cruise control, I wonder if analytics could be applied to improve fuel economy by leveling off my manual input on the accelerator.  Maybe someday I will be invited for the short drive south to Subaru of America HQ to talk more about vehicle telematics!

See also

Flex tables: https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/FlexTables/FlexTableHandbook.htm
INTERPOLATE: https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Predicates/INTERPOLATE.htm

For more information on OBD-II Python module and available sensor data, see https://python-obd.readthedocs.io/en/latest/

Related Posts:
Time Series Analysis with Vertica
Blog Post Series: Using Vertica to Track Commercial Aircraft in near Real Time — Part 1
Extending Vertica with Python Functions: Adding NymPy FFT as a UDx