Smart Meters

This example uses the following datasets to predict peoples' electricity consumption. You can download the Jupyter Notebook of the study here. We'll use the following datasets:

sm_consumption

  • dateUTC: Date and time of the record
  • meterID: Smart meter ID
  • value: Electricity consumed during 30 minute interval (in kWh)

sm_weather

  • dateUTC: Date and time of the record
  • temperature: Temperature
  • humidity: Humidity

sm_meters

  • longitude: Longitude
  • latitude: Latitude
  • residenceType: 1 for Single-Family; 2 for Multi-Family; 3 for Appartement

We will follow the data science cycle (Data Exploration - Data Preparation - Data Modeling - Model Evaluation - Model Deployment) to solve this problem.

Initialization

This example uses the following version of VerticaPy:

In [1]:
import verticapy as vp
vp.__version__
Out[1]:
'0.9.0'

Create vDataFrames of the datasets:

In [3]:
sm_consumption = vp.read_csv("data/smart_meters/sm_consumption.csv")
sm_weather = vp.read_csv("data/smart_meters/sm_weather.csv")
sm_meters = vp.read_csv("data/smart_meters/sm_meters.csv")
display(sm_consumption)
display(sm_weather)
display(sm_meters)
123
meterID
Int
📅
dateUTC
Timestamp
123
value
Numeric(10,6)
102014-01-01 11:00:000.029
202014-01-01 13:45:000.277
302014-01-02 10:45:000.321
402014-01-02 11:15:000.305
502014-01-02 13:45:000.358
602014-01-02 15:30:000.115
702014-01-03 08:30:000.071
802014-01-04 23:45:000.323
902014-01-06 01:15:000.085
1002014-01-06 21:45:000.713
1102014-01-07 00:15:000.242
1202014-01-07 23:00:000.09
1302014-01-09 05:30:000.115
1402014-01-09 15:00:000.928
1502014-01-09 18:15:001.09
1602014-01-10 10:00:000.07
1702014-01-10 14:30:0010.832
1802014-01-10 15:30:003.288
1902014-01-10 18:00:000.42
2002014-01-11 10:45:001.045
2102014-01-12 07:45:000.091
2202014-01-13 19:45:000.249
2302014-01-13 20:15:000.34
2402014-01-14 13:00:000.099
2502014-01-14 14:00:000.2
2602014-01-14 14:45:000.865
2702014-01-14 22:00:000.716
2802014-01-15 03:45:000.05
2902014-01-15 11:15:000.052
3002014-01-15 11:30:000.038
3102014-01-15 23:15:000.239
3202014-01-16 19:30:001.422
3302014-01-17 15:00:000.291
3402014-01-18 00:30:000.828
3502014-01-19 03:00:000.017
3602014-01-19 05:30:000.024
3702014-01-19 06:15:000.048
3802014-01-19 15:30:0013.3
3902014-01-20 19:30:000.59
4002014-01-21 12:30:000.327
4102014-01-21 13:30:000.108
4202014-01-22 01:30:000.083
4302014-01-22 04:30:000.044
4402014-01-23 21:00:000.226
4502014-01-24 02:00:000.125
4602014-01-24 12:15:000.168
4702014-01-27 22:45:000.495
4802014-01-28 06:15:000.056
4902014-01-28 11:30:000.113
5002014-01-28 19:00:001.566
5102014-01-28 19:45:001.108
5202014-01-29 05:00:000.064
5302014-01-29 07:45:000.145
5402014-01-29 13:00:001.719
5502014-01-30 06:15:000.059
5602014-01-30 17:45:000.575
5702014-01-30 18:45:001.129
5802014-02-01 03:30:000.057
5902014-02-01 03:45:000.059
6002014-02-02 08:45:000.186
6102014-02-04 03:45:000.045
6202014-02-04 12:00:000.187
6302014-02-04 16:00:000.313
6402014-02-04 18:45:000.912
6502014-02-05 06:45:000.018
6602014-02-05 17:15:000.112
6702014-02-06 15:15:0010.146
6802014-02-07 11:00:000.868
6902014-02-07 17:30:003.812
7002014-02-07 22:15:001.262
7102014-02-08 03:45:000.043
7202014-02-08 06:45:000.106
7302014-02-09 04:15:000.449
7402014-02-09 08:30:000.007
7502014-02-11 02:30:000.087
7602014-02-11 05:30:000.039
7702014-02-11 19:00:000.094
7802014-02-12 02:30:000.102
7902014-02-13 02:45:000.097
8002014-02-13 09:30:000.161
8102014-02-13 19:00:001.147
8202014-02-14 00:15:000.084
8302014-02-14 02:00:000.174
8402014-02-14 11:15:000.05
8502014-02-14 13:45:000.033
8602014-02-15 02:00:000.181
8702014-02-15 13:45:000.031
8802014-02-15 15:00:000.483
8902014-02-15 20:15:000.368
9002014-02-16 00:00:000.195
9102014-02-17 02:45:000.094
9202014-02-17 12:45:000.049
9302014-02-17 15:15:000.008
9402014-02-18 05:00:000.041
9502014-02-18 20:15:001.202
9602014-02-19 00:15:000.148
9702014-02-19 07:00:000.095
9802014-02-20 00:15:000.039
9902014-02-20 07:30:000.213
10002014-02-20 19:00:001.208
Rows: 1-100 | Columns: 3
📅
dateUTC
Timestamp
123
temperature
Numeric(10,6)
123
humidity
Numeric(11,6)
12014-01-01 00:00:0038.095.0
22014-01-01 00:30:0037.493.0
32014-01-01 01:00:0037.4100.0
42014-01-01 01:30:0037.4100.0
52014-01-01 02:00:0039.293.0
62014-01-01 02:30:0039.293.0
72014-01-01 03:00:0039.293.0
82014-01-01 03:30:0039.293.0
92014-01-01 04:00:0039.293.0
102014-01-01 04:30:0039.293.0
112014-01-01 05:00:0039.293.0
122014-01-01 05:30:0039.287.0
132014-01-01 06:00:0038.089.0
142014-01-01 06:30:0037.493.0
152014-01-01 07:00:0037.493.0
162014-01-01 07:30:0037.493.0
172014-01-01 08:00:0037.493.0
182014-01-01 08:30:0037.487.0
192014-01-01 09:00:0037.487.0
202014-01-01 09:30:0037.487.0
212014-01-01 10:00:0037.493.0
222014-01-01 10:30:0037.493.0
232014-01-01 11:00:0037.487.0
242014-01-01 11:30:0037.493.0
252014-01-01 12:00:0038.085.0
262014-01-01 12:30:0039.287.0
272014-01-01 13:00:0039.287.0
282014-01-01 13:30:0039.287.0
292014-01-01 14:00:0039.287.0
302014-01-01 14:30:0039.287.0
312014-01-01 15:00:0039.287.0
322014-01-01 15:30:0039.287.0
332014-01-01 16:00:0037.493.0
342014-01-01 16:30:0039.287.0
352014-01-01 17:00:0039.287.0
362014-01-01 17:30:0037.487.0
372014-01-01 18:00:0038.081.0
382014-01-01 18:30:0037.487.0
392014-01-01 19:00:0035.687.0
402014-01-01 19:30:0037.493.0
412014-01-01 20:00:0039.287.0
422014-01-01 20:30:0039.287.0
432014-01-01 21:00:0039.287.0
442014-01-01 21:30:0041.081.0
452014-01-01 22:00:0039.287.0
462014-01-01 22:30:0039.287.0
472014-01-01 23:00:0039.287.0
482014-01-01 23:30:0039.281.0
492014-01-02 00:00:0038.076.0
502014-01-02 00:30:0037.481.0
512014-01-02 01:00:0037.481.0
522014-01-02 01:30:0037.481.0
532014-01-02 02:00:0037.481.0
542014-01-02 02:30:0037.481.0
552014-01-02 03:00:0037.481.0
562014-01-02 03:30:0037.481.0
572014-01-02 04:00:0037.481.0
582014-01-02 04:30:0035.687.0
592014-01-02 05:00:0035.693.0
602014-01-02 05:30:0037.481.0
612014-01-02 06:00:0037.075.0
622014-01-02 06:30:0037.481.0
632014-01-02 07:00:0037.487.0
642014-01-02 07:30:0037.481.0
652014-01-02 08:00:0035.681.0
662014-01-02 08:30:0035.681.0
672014-01-02 09:00:0037.475.0
682014-01-02 09:30:0037.481.0
692014-01-02 10:00:0039.275.0
702014-01-02 10:30:0041.070.0
712014-01-02 11:00:0039.275.0
722014-01-02 11:30:0041.070.0
732014-01-02 12:00:0041.061.0
742014-01-02 12:30:0041.070.0
752014-01-02 13:00:0041.076.0
762014-01-02 13:30:0041.076.0
772014-01-02 14:00:0041.076.0
782014-01-02 14:30:0041.076.0
792014-01-02 15:00:0041.076.0
802014-01-02 15:30:0041.076.0
812014-01-02 16:00:0041.076.0
822014-01-02 16:30:0041.076.0
832014-01-02 17:00:0041.076.0
842014-01-02 17:30:0039.281.0
852014-01-02 18:00:0039.070.0
862014-01-02 18:30:0037.481.0
872014-01-02 19:00:0039.275.0
882014-01-02 19:30:0037.481.0
892014-01-02 20:00:0037.481.0
902014-01-02 20:30:0039.275.0
912014-01-02 21:00:0039.265.0
922014-01-02 21:30:0039.265.0
932014-01-02 22:00:0039.265.0
942014-01-02 22:30:0039.261.0
952014-01-02 23:00:0039.261.0
962014-01-02 23:30:0039.265.0
972014-01-03 00:00:0039.048.0
982014-01-03 00:30:0039.261.0
992014-01-03 01:00:0039.265.0
1002014-01-03 01:30:0037.470.0
Rows: 1-100 | Columns: 3
123
meterID
Int
123
residenceType
Int
🌎
latitude
Float
🌎
longitude
Float
10153.3575915882838-6.27554509198069
21153.3130886440828-6.21174062227953
32153.322863954253-6.25885396251447
43351.895226209818-8.54531304761961
54153.6511753623331-6.69348712237822
65153.3262233045899-6.31623713317817
76353.3816401612885-6.24222877491921
87353.4410265504271-6.2749868845315
98153.2883538726462-6.31547263281076
109153.307294775666-6.171585396401
1110153.3043589304317-6.27057231678352
1211152.2168162005257-7.19089306958349
1312353.3292821225723-6.26914988094354
1413152.2330869136812-7.0801397379672
1514153.3121094759744-6.1914301584486
1615152.6815374444767-8.71513846618087
1716253.2598894923594-8.98501096632971
1817153.3386461784187-6.19326209020707
1918153.2930399359755-6.29464811435931
2019153.2904090932662-6.20219801883112
2120153.3140843911614-6.27801544356096
2221253.3368019275034-6.30227460284837
2322153.3606245053304-6.25253082496436
2423353.7334769562786-6.40192964510675
2524151.8904635261463-8.4583653371337
2625153.2943781673251-6.27264309456447
2726153.4252605572445-6.24346657284835
2827253.3445341131241-6.21965121543182
2928153.358289570405-6.22902191315319
3029151.9025419059901-8.45522362503022
3130353.4226787570522-6.2597908055893
3231353.379068564415-6.25362478692152
3332354.0078974821771-6.36373099514602
3433253.3786128724389-6.26875699530572
3534153.3276679292134-6.25674968108046
3635153.3775061858458-6.2658589998757
3736153.422378212805-6.30085964886839
3837253.3163882602178-6.15806917577308
3938353.37096469415-6.27913717729599
4039153.4132892637185-6.23844413381297
4140153.4228804497345-6.26418479308435
4241153.353488205755-6.30562807207814
4342153.3054854916401-6.24275285029992
4443153.3376504362861-6.25709734131062
4544153.3748180529522-6.21375987196951
4645153.7635143297958-6.25548923165312
4746153.2618719521814-6.25472544269271
4847351.8616597388643-8.49342918637484
4948353.2044437294531-6.185303194865
5049153.3679368733445-6.23992423549581
5150153.2659672891683-6.26641539940328
5251153.3662651408726-6.14679963936097
5352153.3393859597162-6.30342142910585
5453351.9040058992909-8.39981371864713
5554153.3343246471021-6.25061672172339
5655251.8643744508624-8.56908207755333
5756153.3530817973677-6.30734386455951
5857153.3414541477597-6.25186365095182
5958353.3338737360076-6.22063504106583
6059152.2124956730345-7.11448302907311
6160153.3363487476997-6.31363533408071
6261153.3587038304275-6.21556299304468
6362353.3172551197574-6.24415298656193
6463153.2271603237907-6.14839645868847
6564253.2890269752152-6.23614626114708
6665153.3963744886512-6.34171292270289
6766151.8895373160002-8.48780907327119
6867153.344893347275-6.23974905608669
6968153.3712600623191-6.24875843222256
7069153.3504808496605-6.34801942486053
7170351.9154011937038-8.48836529664276
7271352.1952892522058-7.11063882871
7372153.303992111683-6.25315326364841
7473153.2753986962432-9.07589204146141
7574151.91819680651-8.43446154154656
7675152.6027837572115-8.6160224579197
7776153.4035657645851-6.25428225078493
7877153.3709894952545-6.37512943696882
7978253.3358314852994-6.39381681908009
8079153.2716167370656-6.23869625794234
8180153.3591211533429-6.21909400592063
8281153.372120236396-6.20827879919494
8382153.3647332578122-6.28451041581688
8483153.2632427747372-6.26140047474471
8584151.9123757703805-8.46795169870626
8685153.3549789863896-6.2372798346737
8786353.2266733295124-6.2667286257736
8887353.307434324047-6.17627526417622
8988353.2391085904911-6.15017286838827
9089353.9521467669519-6.43670239554822
9190152.2406296264404-7.09240230508834
9291153.425341733642-6.32118808599842
9392351.8614117711696-8.50296011084579
9493153.7029164125936-6.42959758250948
9594153.378597106695-6.17601148907721
9695353.3405488207761-6.31607804590097
9796153.4212269471156-6.21357678538708
9897153.3123208667246-6.26734792518767
9998153.2186994970707-9.04100949651601
10099153.3065251415844-6.21247769416289
Rows: 1-100 | Columns: 4

Data Exploration and Preparation

Predicting energy consumption in households is very important. Surges in electricity use could cause serious power outages. In our case, we'll be using data on general household energy consumption in Ireland to predict consumption at various times.

In order to join the different data sources, we need to assume that the weather will be approximately the same across the entirety of Ireland. We'll use the date and time as the key to join 'sm_weather' and 'sm_consumption'.

Joining different datasets with interpolation

In VerticaPy, you can interpolate joins; Vertica will find the closest timestamp to the key and join the result.

In [4]:
sm_consumption_weather = sm_consumption.join(
                            sm_weather,
                            how = "left",
                            on_interpolate = {"dateUTC": "dateUTC"},
                            expr1 = ["dateUTC", "meterID", "value"],
                            expr2 = ["humidity", "temperature"])
display(sm_consumption_weather)
📅
dateUTC
Datetime
123
meterID
Integer
123
value
Numeric(10,6)
123
humidity
Numeric(11,6)
123
temperature
Numeric(10,6)
12014-01-01 00:00:009130.09795.038.0
22014-01-01 00:00:008950.18495.038.0
32014-01-01 00:00:007950.09395.038.0
42014-01-01 00:00:007800.11695.038.0
52014-01-01 00:00:007470.55995.038.0
62014-01-01 00:00:006600.83695.038.0
72014-01-01 00:00:006401.79695.038.0
82014-01-01 00:00:006050.03895.038.0
92014-01-01 00:00:004840.21195.038.0
102014-01-01 00:00:004740.31395.038.0
112014-01-01 00:00:004570.02295.038.0
122014-01-01 00:00:003710.81595.038.0
132014-01-01 00:00:003550.44895.038.0
142014-01-01 00:00:003480.07195.038.0
152014-01-01 00:00:003071.42395.038.0
162014-01-01 00:00:001880.15295.038.0
172014-01-01 00:00:001810.13895.038.0
182014-01-01 00:00:001790.03395.038.0
192014-01-01 00:00:001290.05495.038.0
202014-01-01 00:00:001160.23295.038.0
212014-01-01 00:15:009860.30195.038.0
222014-01-01 00:15:008480.18695.038.0
232014-01-01 00:15:008290.14595.038.0
242014-01-01 00:15:007180.09195.038.0
252014-01-01 00:15:006370.31995.038.0
262014-01-01 00:15:006250.14195.038.0
272014-01-01 00:15:006190.10395.038.0
282014-01-01 00:15:005834.08195.038.0
292014-01-01 00:15:004440.13195.038.0
302014-01-01 00:15:002210.39995.038.0
312014-01-01 00:15:00670.15795.038.0
322014-01-01 00:15:00520.15795.038.0
332014-01-01 00:30:009330.06193.037.4
342014-01-01 00:30:009300.20293.037.4
352014-01-01 00:30:009040.15993.037.4
362014-01-01 00:30:008290.08293.037.4
372014-01-01 00:30:007960.23793.037.4
382014-01-01 00:30:006630.0893.037.4
392014-01-01 00:30:006281.24693.037.4
402014-01-01 00:30:006260.10693.037.4
412014-01-01 00:30:006180.06793.037.4
422014-01-01 00:30:005060.10893.037.4
432014-01-01 00:30:004970.14493.037.4
442014-01-01 00:30:004600.09493.037.4
452014-01-01 00:30:003760.12993.037.4
462014-01-01 00:30:003320.04493.037.4
472014-01-01 00:30:003080.04593.037.4
482014-01-01 00:30:002860.20993.037.4
492014-01-01 00:30:002770.28793.037.4
502014-01-01 00:30:002640.16293.037.4
512014-01-01 00:30:001382.27693.037.4
522014-01-01 00:30:00900.18393.037.4
532014-01-01 00:45:009770.17693.037.4
542014-01-01 00:45:009370.20193.037.4
552014-01-01 00:45:009270.55193.037.4
562014-01-01 00:45:008670.84493.037.4
572014-01-01 00:45:007620.1393.037.4
582014-01-01 00:45:005280.06793.037.4
592014-01-01 00:45:004470.59993.037.4
602014-01-01 00:45:003750.51993.037.4
612014-01-01 00:45:002970.60993.037.4
622014-01-01 00:45:002500.42893.037.4
632014-01-01 00:45:002350.0693.037.4
642014-01-01 01:00:009800.16100.037.4
652014-01-01 01:00:007970.114100.037.4
662014-01-01 01:00:007830.025100.037.4
672014-01-01 01:00:007090.058100.037.4
682014-01-01 01:00:006970.184100.037.4
692014-01-01 01:00:006360.139100.037.4
702014-01-01 01:00:005480.309100.037.4
712014-01-01 01:00:003740.104100.037.4
722014-01-01 01:00:003580.471100.037.4
732014-01-01 01:00:003040.109100.037.4
742014-01-01 01:00:002910.412100.037.4
752014-01-01 01:00:002630.172100.037.4
762014-01-01 01:00:002560.124100.037.4
772014-01-01 01:00:001230.309100.037.4
782014-01-01 01:00:00790.243100.037.4
792014-01-01 01:00:00750.274100.037.4
802014-01-01 01:00:00610.1100.037.4
812014-01-01 01:00:00350.134100.037.4
822014-01-01 01:15:009500.127100.037.4
832014-01-01 01:15:008440.539100.037.4
842014-01-01 01:15:008320.358100.037.4
852014-01-01 01:15:008070.095100.037.4
862014-01-01 01:15:006560.104100.037.4
872014-01-01 01:15:005980.18100.037.4
882014-01-01 01:15:005880.064100.037.4
892014-01-01 01:15:005700.018100.037.4
902014-01-01 01:15:004760.299100.037.4
912014-01-01 01:15:004670.292100.037.4
922014-01-01 01:15:003860.156100.037.4
932014-01-01 01:15:003060.16100.037.4
942014-01-01 01:15:002900.092100.037.4
952014-01-01 01:15:001930.101100.037.4
962014-01-01 01:15:001340.429100.037.4
972014-01-01 01:15:001230.446100.037.4
982014-01-01 01:15:001190.268100.037.4
992014-01-01 01:15:00890.107100.037.4
1002014-01-01 01:15:0020.037100.037.4
Rows: 1-100 | Columns: 5

Segmenting Latitude & Longitude using Clustering

The dataset 'sm_meters' is pretty important. In particular, the type of residence is probably a good predictor for electricity usage. We can create clusters of the different regions with k-means clustering based on longitude and latitude. Let's find the most suitable 'k' using an elbow curve and scatter plot.

In [5]:
sm_meters.agg(["min", "max"])
Out[5]:
min
max
"meterID"0.0999.0
"residenceType"1.03.0
"latitude"51.796460077021254.0270361317983
"longitude"-9.16352332036362-6.07134572494937
Rows: 1-4 | Columns: 3
In [6]:
%matplotlib inline
from verticapy.learn.model_selection import elbow
from verticapy.datasets import load_world

world = load_world()
df = world.to_geopandas(geometry = "geometry")
df = df[df["country"].isin(["Ireland", "United Kingdom"])]
ax = df.plot(edgecolor = "black",
             color = "white",
             figsize = (10, 9))
ax = sm_meters.scatter(["longitude", "latitude"], ax = ax)

Based on the scatter plot, five seems like the optimal number of clusters. Let's verify this hypothesis using an elbow curve.

In [7]:
elbow(sm_meters, ["longitude", "latitude"], n_cluster = (3, 8))

Out[7]:
Within-Cluster SS
30.9311388771341764
40.975099232332047
50.9844977831351884
60.993627480208735
70.9947041356087959
Rows: 1-5 | Columns: 2

The elbow curve seems to confirm that five is the optimal number of clusters, so let's create a k-means model with that in mind.

In [8]:
from verticapy.learn.cluster import KMeans
model = KMeans("kmeans_sm_regions", 
               n_cluster = 5,
               init = [(-6.26980, 53.38127),
                       (-9.06178, 53.25998),
                       (-8.48641, 51.90216),
                       (-7.12408, 52.24610),
                       (-8.63985, 52.65945),])
model.drop()
model.fit(sm_meters, 
          ["longitude", "latitude"])
Out[8]:

=======
centers
=======
longitude|latitude
---------+--------
-9.06178 |53.25998
-8.63985 |52.65945
-8.48641 |51.90216
-7.12408 |52.24610
-6.26980 |53.38127


=======
metrics
=======
Evaluation metrics:
     Total Sum of Squares: 1209.2077
     Within-Cluster Sum of Squares: 
         Cluster 0: 0.099754154
         Cluster 1: 0.2779225
         Cluster 2: 0.53464463
         Cluster 3: 0.2657853
         Cluster 4: 17.892423
     Total Within-Cluster Sum of Squares: 19.07053
     Between-Cluster Sum of Squares: 1190.1372
     Between-Cluster SS / Total SS: 98.42%
 Number of iterations performed: 1
 Converged: True
 Call:
kmeans('public.kmeans_sm_regions', '"public"."_verticapy_tmp_view_dbadmin_239726_1781523806_"', '"longitude", "latitude"', 5
USING PARAMETERS max_iterations=300, epsilon=0.0001, initial_centers_table='"public"."_verticapy_tmp_kmeans_init_dbadmin_239726_3848320059_"', distance_method='euclidean')

Let's add our clusters to the vDataFrame.

In [9]:
sm_meters = model.predict(sm_meters, name = "region")

Let's draw a scatter plot of the different regions.

In [10]:
ax = df.plot(edgecolor = "black",
             color = "white",
             figsize = (10, 9))
sm_meters.scatter(["longitude", "latitude"], 
                  catcol = "region",
                  max_cardinality = 10,
                  ax = ax)
Out[10]:
<AxesSubplot:xlabel='"longitude"', ylabel='"latitude"'>

Dataset Enrichment

Let's join 'sm_meters' with 'sm_consumption_weather'.

In [11]:
sm_consumption_weather_region = sm_consumption_weather.join(
                                    sm_meters,
                                    how = "natural",
                                    expr1 = ["*"],
                                    expr2 = ["residenceType", 
                                             "region"])
display(sm_consumption_weather_region)