
A simple way to stream data is to write CSV rows to a network socket. This is the method used by dump1090, a software-defined radio application for monitoring ADS-B air traffic. But how do we capture and transform those emitted rows into queryable data?
Here’s a sample from dump1090:
MSG,3,1,1,ABB007,1,2019/07/23,13:45:38.487,2019/07/23,13:45:38.541,,7275,,,
40.66585,-73.40684,,,0,,0,0
MSG,3,1,1,A2C5DB,1,2019/07/23,13:45:38.487,2019/07/23,13:45:38.541,,9075,,,
40.70375,-73.01025,,,0,,0,0
MSG,8,1,1,AB1257,1,2019/07/23,13:45:38.489,2019/07/23,13:45:38.542,,,,,,,,,,,,0
MSG,7,1,1,A34E97,1,2019/07/23,13:45:38.493,2019/07/23,13:45:38.543,,
4100,,,,,,,,,,
MSG,1,1,1,A44E24,1,2019/07/23,13:45:38.497,2019/07/23,13:45:38.543,DAL1200 ,,,,,,,,,,,0
These messages are published on TCP port 30003. At the other end, we can use “netcat” (nc) to listen to TCP port 30003 on the source machine and pipe the data to Vertica:
nc -i 60 remote 30003 | split -u --filter="/opt/vertica/bin/vsql -U dbadmin -w PASS -i -c \"COPY dump1090csv FROM LOCAL STDIN DELIMITER ','\"" --lines=2000 - 1090
That is one complicated line! Let’s break it out:- Netcat (nc) connects to TCP port 30003 on “remote” and writes the stream to stdout (-i 60 times out if no data received in 60 seconds)
- Split reads the data from stdin and divides into 2000-line chunks passed to filter “vsql”
- vsql reads comma-delimited text from stdin and writes to table “dump1090csv” (we could read CSV or JSON data into a flex table also)
Helpful Links:
COPY LOCAL data to Vertica: https://www.vertica.com/docs/latest/HTML/Content/Authoring/SQLReferenceManual/Statements/COPY/COPYLOCAL.htm Robust streaming with Kafka: https://www.vertica.com/docs/latest/HTML/Content/Authoring/KafkaIntegrationGuide/HowKafkaAndVerticaWorkTogether.htm Blog series on Vertica and ADS-B air traffic monitoring:
https://www.vertica.com/blog/blog-post-series-using-vertica-to-track-commercial-aircraft-in-near-real-time-part-7/
Have fun!