Smart Meters

This example uses the following datasets to predict peoples' electricity consumption. You can download the Jupyter Notebook of the study here. We'll use the following datasets:

sm_consumption

  • dateUTC: Date and time of the record
  • meterID: Smart meter ID
  • value: Electricity consumed during 30 minute interval (in kWh)

sm_weather

  • dateUTC: Date and time of the record
  • temperature: Temperature
  • humidity: Humidity

sm_meters

  • longitude: Longitude
  • latitude: Latitude
  • residenceType: 1 for Single-Family; 2 for Multi-Family; 3 for Appartement

We will follow the data science cycle (Data Exploration - Data Preparation - Data Modeling - Model Evaluation - Model Deployment) to solve this problem.

Initialization

This example uses the following version of VerticaPy:

In [1]:
import verticapy as vp
vp.__version__
Out[1]:
'0.9.0'

Create vDataFrames of the datasets:

In [3]:
sm_consumption = vp.read_csv("data/smart_meters/sm_consumption.csv")
sm_weather = vp.read_csv("data/smart_meters/sm_weather.csv")
sm_meters = vp.read_csv("data/smart_meters/sm_meters.csv")
display(sm_consumption)
display(sm_weather)
display(sm_meters)
123
meterID
Int
📅
dateUTC
Timestamp
123
value
Numeric(10,6)
102014-01-01 11:00:000.029
202014-01-01 13:45:000.277
302014-01-02 10:45:000.321
402014-01-02 11:15:000.305
502014-01-02 13:45:000.358
602014-01-02 15:30:000.115
702014-01-03 08:30:000.071
802014-01-04 23:45:000.323
902014-01-06 01:15:000.085
1002014-01-06 21:45:000.713
1102014-01-07 00:15:000.242
1202014-01-07 23:00:000.09
1302014-01-09 05:30:000.115
1402014-01-09 15:00:000.928
1502014-01-09 18:15:001.09
1602014-01-10 10:00:000.07
1702014-01-10 14:30:0010.832
1802014-01-10 15:30:003.288
1902014-01-10 18:00:000.42
2002014-01-11 10:45:001.045
2102014-01-12 07:45:000.091
2202014-01-13 19:45:000.249
2302014-01-13 20:15:000.34
2402014-01-14 13:00:000.099
2502014-01-14 14:00:000.2
2602014-01-14 14:45:000.865
2702014-01-14 22:00:000.716
2802014-01-15 03:45:000.05
2902014-01-15 11:15:000.052
3002014-01-15 11:30:000.038
3102014-01-15 23:15:000.239
3202014-01-16 19:30:001.422
3302014-01-17 15:00:000.291
3402014-01-18 00:30:000.828
3502014-01-19 03:00:000.017
3602014-01-19 05:30:000.024
3702014-01-19 06:15:000.048
3802014-01-19 15:30:0013.3
3902014-01-20 19:30:000.59
4002014-01-21 12:30:000.327
4102014-01-21 13:30:000.108
4202014-01-22 01:30:000.083
4302014-01-22 04:30:000.044
4402014-01-23 21:00:000.226
4502014-01-24 02:00:000.125
4602014-01-24 12:15:000.168
4702014-01-27 22:45:000.495
4802014-01-28 06:15:000.056
4902014-01-28 11:30:000.113
5002014-01-28 19:00:001.566
5102014-01-28 19:45:001.108
5202014-01-29 05:00:000.064
5302014-01-29 07:45:000.145
5402014-01-29 13:00:001.719
5502014-01-30 06:15:000.059
5602014-01-30 17:45:000.575
5702014-01-30 18:45:001.129
5802014-02-01 03:30:000.057
5902014-02-01 03:45:000.059
6002014-02-02 08:45:000.186
6102014-02-04 03:45:000.045
6202014-02-04 12:00:000.187
6302014-02-04 16:00:000.313
6402014-02-04 18:45:000.912
6502014-02-05 06:45:000.018
6602014-02-05 17:15:000.112
6702014-02-06 15:15:0010.146
6802014-02-07 11:00:000.868
6902014-02-07 17:30:003.812
7002014-02-07 22:15:001.262
7102014-02-08 03:45:000.043
7202014-02-08 06:45:000.106
7302014-02-09 04:15:000.449
7402014-02-09 08:30:000.007
7502014-02-11 02:30:000.087
7602014-02-11 05:30:000.039
7702014-02-11 19:00:000.094
7802014-02-12 02:30:000.102
7902014-02-13 02:45:000.097
8002014-02-13 09:30:000.161
8102014-02-13 19:00:001.147
8202014-02-14 00:15:000.084
8302014-02-14 02:00:000.174
8402014-02-14 11:15:000.05
8502014-02-14 13:45:000.033
8602014-02-15 02:00:000.181
8702014-02-15 13:45:000.031
8802014-02-15 15:00:000.483
8902014-02-15 20:15:000.368
9002014-02-16 00:00:000.195
9102014-02-17 02:45:000.094
9202014-02-17 12:45:000.049
9302014-02-17 15:15:000.008
9402014-02-18 05:00:000.041
9502014-02-18 20:15:001.202
9602014-02-19 00:15:000.148
9702014-02-19 07:00:000.095
9802014-02-20 00:15:000.039
9902014-02-20 07:30:000.213
10002014-02-20 19:00:001.208
Rows: 1-100 | Columns: 3
📅
dateUTC
Timestamp
123
temperature
Numeric(10,6)
123
humidity
Numeric(11,6)
12014-01-01 00:00:0038.095.0
22014-01-01 00:30:0037.493.0
32014-01-01 01:00:0037.4100.0
42014-01-01 01:30:0037.4100.0
52014-01-01 02:00:0039.293.0
62014-01-01 02:30:0039.293.0
72014-01-01 03:00:0039.293.0
82014-01-01 03:30:0039.293.0
92014-01-01 04:00:0039.293.0
102014-01-01 04:30:0039.293.0
112014-01-01 05:00:0039.293.0
122014-01-01 05:30:0039.287.0
132014-01-01 06:00:0038.089.0
142014-01-01 06:30:0037.493.0
152014-01-01 07:00:0037.493.0
162014-01-01 07:30:0037.493.0
172014-01-01 08:00:0037.493.0
182014-01-01 08:30:0037.487.0
192014-01-01 09:00:0037.487.0
202014-01-01 09:30:0037.487.0
212014-01-01 10:00:0037.493.0
222014-01-01 10:30:0037.493.0
232014-01-01 11:00:0037.487.0
242014-01-01 11:30:0037.493.0
252014-01-01 12:00:0038.085.0
262014-01-01 12:30:0039.287.0
272014-01-01 13:00:0039.287.0
282014-01-01 13:30:0039.287.0
292014-01-01 14:00:0039.287.0
302014-01-01 14:30:0039.287.0
312014-01-01 15:00:0039.287.0
322014-01-01 15:30:0039.287.0
332014-01-01 16:00:0037.493.0
342014-01-01 16:30:0039.287.0
352014-01-01 17:00:0039.287.0
362014-01-01 17:30:0037.487.0
372014-01-01 18:00:0038.081.0
382014-01-01 18:30:0037.487.0
392014-01-01 19:00:0035.687.0
402014-01-01 19:30:0037.493.0
412014-01-01 20:00:0039.287.0
422014-01-01 20:30:0039.287.0
432014-01-01 21:00:0039.287.0
442014-01-01 21:30:0041.081.0
452014-01-01 22:00:0039.287.0
462014-01-01 22:30:0039.287.0
472014-01-01 23:00:0039.287.0
482014-01-01 23:30:0039.281.0
492014-01-02 00:00:0038.076.0
502014-01-02 00:30:0037.481.0
512014-01-02 01:00:0037.481.0
522014-01-02 01:30:0037.481.0
532014-01-02 02:00:0037.481.0
542014-01-02 02:30:0037.481.0
552014-01-02 03:00:0037.481.0
562014-01-02 03:30:0037.481.0
572014-01-02 04:00:0037.481.0
582014-01-02 04:30:0035.687.0
592014-01-02 05:00:0035.693.0
602014-01-02 05:30:0037.481.0
612014-01-02 06:00:0037.075.0
622014-01-02 06:30:0037.481.0
632014-01-02 07:00:0037.487.0
642014-01-02 07:30:0037.481.0
652014-01-02 08:00:0035.681.0
662014-01-02 08:30:0035.681.0
672014-01-02 09:00:0037.475.0
682014-01-02 09:30:0037.481.0
692014-01-02 10:00:0039.275.0
702014-01-02 10:30:0041.070.0
712014-01-02 11:00:0039.275.0
722014-01-02 11:30:0041.070.0
732014-01-02 12:00:0041.061.0
742014-01-02 12:30:0041.070.0
752014-01-02 13:00:0041.076.0
762014-01-02 13:30:0041.076.0
772014-01-02 14:00:0041.076.0
782014-01-02 14:30:0041.076.0
792014-01-02 15:00:0041.076.0
802014-01-02 15:30:0041.076.0
812014-01-02 16:00:0041.076.0
822014-01-02 16:30:0041.076.0
832014-01-02 17:00:0041.076.0
842014-01-02 17:30:0039.281.0
852014-01-02 18:00:0039.070.0
862014-01-02 18:30:0037.481.0
872014-01-02 19:00:0039.275.0
882014-01-02 19:30:0037.481.0
892014-01-02 20:00:0037.481.0
902014-01-02 20:30:0039.275.0
912014-01-02 21:00:0039.265.0
922014-01-02 21:30:0039.265.0
932014-01-02 22:00:0039.265.0
942014-01-02 22:30:0039.261.0
952014-01-02 23:00:0039.261.0
962014-01-02 23:30:0039.265.0
972014-01-03 00:00:0039.048.0
982014-01-03 00:30:0039.261.0
992014-01-03 01:00:0039.265.0
1002014-01-03 01:30:0037.470.0
Rows: 1-100 | Columns: 3
123
meterID
Int
123
residenceType
Int
🌎
latitude
Float
🌎
longitude
Float
10153.3575915882838-6.27554509198069
21153.3130886440828-6.21174062227953
32153.322863954253-6.25885396251447
43351.895226209818-8.54531304761961
54153.6511753623331-6.69348712237822
65153.3262233045899-6.31623713317817
76353.3816401612885-6.24222877491921
87353.4410265504271-6.2749868845315
98153.2883538726462-6.31547263281076
109153.307294775666-6.171585396401
1110153.3043589304317-6.27057231678352
1211152.2168162005257-7.19089306958349
1312353.3292821225723-6.26914988094354
1413152.2330869136812-7.0801397379672
1514153.3121094759744-6.1914301584486
1615152.6815374444767-8.71513846618087
1716253.2598894923594-8.98501096632971
1817153.3386461784187-6.19326209020707
1918153.2930399359755-6.29464811435931
2019153.2904090932662-6.20219801883112
2120153.3140843911614-6.27801544356096
2221253.3368019275034-6.30227460284837
2322153.3606245053304-6.25253082496436
2423353.7334769562786-6.40192964510675
2524151.8904635261463-8.4583653371337
2625153.2943781673251-6.27264309456447
2726153.4252605572445-6.24346657284835
2827253.3445341131241-6.21965121543182
2928153.358289570405-6.22902191315319
3029151.9025419059901-8.45522362503022
3130353.4226787570522-6.2597908055893
3231353.379068564415-6.25362478692152
3332354.0078974821771-6.36373099514602
3433253.3786128724389-6.26875699530572
3534153.3276679292134-6.25674968108046
3635153.3775061858458-6.2658589998757
3736153.422378212805-6.30085964886839
3837253.3163882602178-6.15806917577308
3938353.37096469415-6.27913717729599
4039153.4132892637185-6.23844413381297
4140153.4228804497345-6.26418479308435
4241153.353488205755-6.30562807207814
4342153.3054854916401-6.24275285029992
4443153.3376504362861-6.25709734131062
4544153.3748180529522-6.21375987196951
4645153.7635143297958-6.25548923165312
4746153.2618719521814-6.25472544269271
4847351.8616597388643-8.49342918637484
4948353.2044437294531-6.185303194865
5049153.3679368733445-6.23992423549581
5150153.2659672891683-6.26641539940328
5251153.3662651408726-6.14679963936097
5352153.3393859597162-6.30342142910585
5453351.9040058992909-8.39981371864713
5554153.3343246471021-6.25061672172339
5655251.8643744508624-8.56908207755333
5756153.3530817973677-6.30734386455951
5857153.3414541477597-6.25186365095182
5958353.3338737360076-6.22063504106583
6059152.2124956730345-7.11448302907311
6160153.3363487476997-6.31363533408071
6261153.3587038304275-6.21556299304468
6362353.3172551197574-6.24415298656193
6463153.2271603237907-6.14839645868847
6564253.2890269752152-6.23614626114708
6665153.3963744886512-6.34171292270289
6766151.8895373160002-8.48780907327119
6867153.344893347275-6.23974905608669
6968153.3712600623191-6.24875843222256
7069153.3504808496605-6.34801942486053
7170351.9154011937038-8.48836529664276
7271352.1952892522058-7.11063882871
7372153.303992111683-6.25315326364841
7473153.2753986962432-9.07589204146141
7574151.91819680651-8.43446154154656
7675152.6027837572115-8.6160224579197
7776153.4035657645851-6.25428225078493
7877153.3709894952545-6.37512943696882
7978253.3358314852994-6.39381681908009
8079153.2716167370656-6.23869625794234
8180153.3591211533429-6.21909400592063
8281153.372120236396-6.20827879919494
8382153.3647332578122-6.28451041581688
8483153.2632427747372-6.26140047474471
8584151.9123757703805-8.46795169870626
8685153.3549789863896-6.2372798346737
8786353.2266733295124-6.2667286257736
8887353.307434324047-6.17627526417622
8988353.2391085904911-6.15017286838827
9089353.9521467669519-6.43670239554822
9190152.2406296264404-7.09240230508834
9291153.425341733642-6.32118808599842
9392351.8614117711696-8.50296011084579
9493153.7029164125936-6.42959758250948
9594153.378597106695-6.17601148907721
9695353.3405488207761-6.31607804590097
9796153.4212269471156-6.21357678538708
9897153.3123208667246-6.26734792518767
9998153.2186994970707-9.04100949651601
10099153.3065251415844-6.21247769416289
Rows: 1-100 | Columns: 4

Data Exploration and Preparation

Predicting energy consumption in households is very important. Surges in electricity use could cause serious power outages. In our case, we'll be using data on general household energy consumption in Ireland to predict consumption at various times.

In order to join the different data sources, we need to assume that the weather will be approximately the same across the entirety of Ireland. We'll use the date and time as the key to join 'sm_weather' and 'sm_consumption'.

Joining different datasets with interpolation

In VerticaPy, you can interpolate joins; Vertica will find the closest timestamp to the key and join the result.

In [4]:
sm_consumption_weather = sm_consumption.join(
                            sm_weather,
                            how = "left",
                            on_interpolate = {"dateUTC": "dateUTC"},
                            expr1 = ["dateUTC", "meterID", "value"],
                            expr2 = ["humidity", "temperature"])
display(sm_consumption_weather)
📅
dateUTC
Datetime
123
meterID
Integer
123
value
Numeric(10,6)
123
humidity
Numeric(11,6)
123
temperature
Numeric(10,6)
12014-01-01 00:00:009130.09795.038.0
22014-01-01 00:00:008950.18495.038.0
32014-01-01 00:00:007950.09395.038.0
42014-01-01 00:00:007800.11695.038.0
52014-01-01 00:00:007470.55995.038.0
62014-01-01 00:00:006600.83695.038.0
72014-01-01 00:00:006401.79695.038.0
82014-01-01 00:00:006050.03895.038.0
92014-01-01 00:00:004840.21195.038.0
102014-01-01 00:00:004740.31395.038.0
112014-01-01 00:00:004570.02295.038.0
122014-01-01 00:00:003710.81595.038.0
132014-01-01 00:00:003550.44895.038.0
142014-01-01 00:00:003480.07195.038.0
152014-01-01 00:00:003071.42395.038.0
162014-01-01 00:00:001880.15295.038.0
172014-01-01 00:00:001810.13895.038.0
182014-01-01 00:00:001790.03395.038.0
192014-01-01 00:00:001290.05495.038.0
202014-01-01 00:00:001160.23295.038.0
212014-01-01 00:15:009860.30195.038.0
222014-01-01 00:15:008480.18695.038.0
232014-01-01 00:15:008290.14595.038.0
242014-01-01 00:15:007180.09195.038.0
252014-01-01 00:15:006370.31995.038.0
262014-01-01 00:15:006250.14195.038.0
272014-01-01 00:15:006190.10395.038.0
282014-01-01 00:15:005834.08195.038.0
292014-01-01 00:15:004440.13195.038.0
302014-01-01 00:15:002210.39995.038.0
312014-01-01 00:15:00670.15795.038.0
322014-01-01 00:15:00520.15795.038.0
332014-01-01 00:30:009330.06193.037.4
342014-01-01 00:30:009300.20293.037.4
352014-01-01 00:30:009040.15993.037.4
362014-01-01 00:30:008290.08293.037.4
372014-01-01 00:30:007960.23793.037.4
382014-01-01 00:30:006630.0893.037.4
392014-01-01 00:30:006281.24693.037.4
402014-01-01 00:30:006260.10693.037.4
412014-01-01 00:30:006180.06793.037.4
422014-01-01 00:30:005060.10893.037.4
432014-01-01 00:30:004970.14493.037.4
442014-01-01 00:30:004600.09493.037.4
452014-01-01 00:30:003760.12993.037.4
462014-01-01 00:30:003320.04493.037.4
472014-01-01 00:30:003080.04593.037.4
482014-01-01 00:30:002860.20993.037.4
492014-01-01 00:30:002770.28793.037.4
502014-01-01 00:30:002640.16293.037.4
512014-01-01 00:30:001382.27693.037.4
522014-01-01 00:30:00900.18393.037.4
532014-01-01 00:45:009770.17693.037.4
542014-01-01 00:45:009370.20193.037.4
552014-01-01 00:45:009270.55193.037.4
562014-01-01 00:45:008670.84493.037.4
572014-01-01 00:45:007620.1393.037.4
582014-01-01 00:45:005280.06793.037.4
592014-01-01 00:45:004470.59993.037.4
602014-01-01 00:45:003750.51993.037.4
612014-01-01 00:45:002970.60993.037.4
622014-01-01 00:45:002500.42893.037.4
632014-01-01 00:45:002350.0693.037.4
642014-01-01 01:00:009800.16100.037.4
652014-01-01 01:00:007970.114100.037.4
662014-01-01 01:00:007830.025100.037.4
672014-01-01 01:00:007090.058100.037.4
682014-01-01 01:00:006970.184100.037.4
692014-01-01 01:00:006360.139100.037.4
702014-01-01 01:00:005480.309100.037.4
712014-01-01 01:00:003740.104100.037.4
722014-01-01 01:00:003580.471100.037.4
732014-01-01 01:00:003040.109100.037.4
742014-01-01 01:00:002910.412100.037.4
752014-01-01 01:00:002630.172100.037.4
762014-01-01 01:00:002560.124100.037.4
772014-01-01 01:00:001230.309100.037.4
782014-01-01 01:00:00790.243100.037.4
792014-01-01 01:00:00750.274100.037.4
802014-01-01 01:00:00610.1100.037.4
812014-01-01 01:00:00350.134100.037.4
822014-01-01 01:15:009500.127100.037.4
832014-01-01 01:15:008440.539100.037.4
842014-01-01 01:15:008320.358100.037.4
852014-01-01 01:15:008070.095100.037.4
862014-01-01 01:15:006560.104100.037.4
872014-01-01 01:15:005980.18100.037.4
882014-01-01 01:15:005880.064100.037.4
892014-01-01 01:15:005700.018100.037.4
902014-01-01 01:15:004760.299100.037.4
912014-01-01 01:15:004670.292100.037.4
922014-01-01 01:15:003860.156100.037.4
932014-01-01 01:15:003060.16100.037.4
942014-01-01 01:15:002900.092100.037.4
952014-01-01 01:15:001930.101100.037.4
962014-01-01 01:15:001340.429100.037.4
972014-01-01 01:15:001230.446100.037.4
982014-01-01 01:15:001190.268100.037.4
992014-01-01 01:15:00890.107100.037.4
1002014-01-01 01:15:0020.037100.037.4
Rows: 1-100 | Columns: 5

Segmenting Latitude & Longitude using Clustering

The dataset 'sm_meters' is pretty important. In particular, the type of residence is probably a good predictor for electricity usage. We can create clusters of the different regions with k-means clustering based on longitude and latitude. Let's find the most suitable 'k' using an elbow curve and scatter plot.

In [5]:
sm_meters.agg(["min", "max"])
Out[5]:
min
max
"meterID"0.0999.0
"residenceType"1.03.0
"latitude"51.796460077021254.0270361317983
"longitude"-9.16352332036362-6.07134572494937
Rows: 1-4 | Columns: 3
In [6]:
%matplotlib inline
from verticapy.learn.model_selection import elbow
from verticapy.datasets import load_world

world = load_world()
df = world.to_geopandas(geometry = "geometry")
df = df[df["country"].isin(["Ireland", "United Kingdom"])]
ax = df.plot(edgecolor = "black",
             color = "white",
             figsize = (10, 9))
ax = sm_meters.scatter(["longitude", "latitude"], ax = ax)

Based on the scatter plot, five seems like the optimal number of clusters. Let's verify this hypothesis using an elbow curve.

In [7]:
elbow(sm_meters, ["longitude", "latitude"], n_cluster = (3, 8))

Out[7]:
Within-Cluster SS
30.9311388771341764
40.975099232332047
50.9844977831351884
60.993627480208735
70.9947041356087959
Rows: 1-5 | Columns: 2

The elbow curve seems to confirm that five is the optimal number of clusters, so let's create a k-means model with that in mind.

In [8]:
from verticapy.learn.cluster import KMeans
model = KMeans("kmeans_sm_regions", 
               n_cluster = 5,
               init = [(-6.26980, 53.38127),
                       (-9.06178, 53.25998),
                       (-8.48641, 51.90216),
                       (-7.12408, 52.24610),
                       (-8.63985, 52.65945),])
model.drop()
model.fit(sm_meters, 
          ["longitude", "latitude"])
Out[8]:

=======
centers
=======
longitude|latitude
---------+--------
-9.06178 |53.25998
-8.63985 |52.65945
-8.48641 |51.90216
-7.12408 |52.24610
-6.26980 |53.38127


=======
metrics
=======
Evaluation metrics:
     Total Sum of Squares: 1209.2077
     Within-Cluster Sum of Squares: 
         Cluster 0: 0.099754154
         Cluster 1: 0.2779225
         Cluster 2: 0.53464463
         Cluster 3: 0.2657853
         Cluster 4: 17.892423
     Total Within-Cluster Sum of Squares: 19.07053
     Between-Cluster Sum of Squares: 1190.1372
     Between-Cluster SS / Total SS: 98.42%
 Number of iterations performed: 1
 Converged: True
 Call:
kmeans('public.kmeans_sm_regions', '"public"."_verticapy_tmp_view_dbadmin_239726_1781523806_"', '"longitude", "latitude"', 5
USING PARAMETERS max_iterations=300, epsilon=0.0001, initial_centers_table='"public"."_verticapy_tmp_kmeans_init_dbadmin_239726_3848320059_"', distance_method='euclidean')

Let's add our clusters to the vDataFrame.

In [9]:
sm_meters = model.predict(sm_meters, name = "region")

Let's draw a scatter plot of the different regions.

In [10]:
ax = df.plot(edgecolor = "black",
             color = "white",
             figsize = (10, 9))
sm_meters.scatter(["longitude", "latitude"], 
                  catcol = "region",
                  max_cardinality = 10,
                  ax = ax)
Out[10]:
<AxesSubplot:xlabel='"longitude"', ylabel='"latitude"'>

Dataset Enrichment

Let's join 'sm_meters' with 'sm_consumption_weather'.

In [11]:
sm_consumption_weather_region = sm_consumption_weather.join(
                                    sm_meters,
                                    how = "natural",
                                    expr1 = ["*"],
                                    expr2 = ["residenceType", 
                                             "region"])
display(sm_consumption_weather_region)
📅
dateUTC
Datetime
123
meterID
Integer
123
value
Numeric(10,6)
123
humidity
Numeric(11,6)
123
temperature
Numeric(10,6)
123
residenceType
Integer
123
region
Integer
12014-01-01 00:00:009130.09795.038.034
22014-01-01 00:00:008950.18495.038.014
32014-01-01 00:00:007950.09395.038.014
42014-01-01 00:00:007800.11695.038.014
52014-01-01 00:00:007470.55995.038.024
62014-01-01 00:00:006600.83695.038.014
72014-01-01 00:00:006401.79695.038.011
82014-01-01 00:00:006050.03895.038.031
92014-01-01 00:00:004840.21195.038.014
102014-01-01 00:00:004740.31395.038.022
112014-01-01 00:00:004570.02295.038.033
122014-01-01 00:00:003710.81595.038.034
132014-01-01 00:00:003550.44895.038.014
142014-01-01 00:00:003480.07195.038.014
152014-01-01 00:00:003071.42395.038.014
162014-01-01 00:00:001880.15295.038.012
172014-01-01 00:00:001810.13895.038.014
182014-01-01 00:00:001790.03395.038.032
192014-01-01 00:00:001290.05495.038.034
202014-01-01 00:00:001160.23295.038.014
212014-01-01 00:15:009860.30195.038.032
222014-01-01 00:15:008480.18695.038.014
232014-01-01 00:15:008290.14595.038.034
242014-01-01 00:15:007180.09195.038.032
252014-01-01 00:15:006370.31995.038.034
262014-01-01 00:15:006250.14195.038.014
272014-01-01 00:15:006190.10395.038.032
282014-01-01 00:15:005834.08195.038.030
292014-01-01 00:15:004440.13195.038.014
302014-01-01 00:15:002210.39995.038.034
312014-01-01 00:15:00670.15795.038.014
322014-01-01 00:15:00520.15795.038.014
332014-01-01 00:30:009330.06193.037.434
342014-01-01 00:30:009300.20293.037.434
352014-01-01 00:30:009040.15993.037.424
362014-01-01 00:30:008290.08293.037.434
372014-01-01 00:30:007960.23793.037.430
382014-01-01 00:30:006630.0893.037.434
392014-01-01 00:30:006281.24693.037.433
402014-01-01 00:30:006260.10693.037.414
412014-01-01 00:30:006180.06793.037.434
422014-01-01 00:30:005060.10893.037.434
432014-01-01 00:30:004970.14493.037.411
442014-01-01 00:30:004600.09493.037.414
452014-01-01 00:30:003760.12993.037.431
462014-01-01 00:30:003320.04493.037.434
472014-01-01 00:30:003080.04593.037.414
482014-01-01 00:30:002860.20993.037.414
492014-01-01 00:30:002770.28793.037.412
502014-01-01 00:30:002640.16293.037.434
512014-01-01 00:30:001382.27693.037.414
522014-01-01 00:30:00900.18393.037.413
532014-01-01 00:45:009770.17693.037.434
542014-01-01 00:45:009370.20193.037.414
552014-01-01 00:45:009270.55193.037.434
562014-01-01 00:45:008670.84493.037.414
572014-01-01 00:45:007620.1393.037.414
582014-01-01 00:45:005280.06793.037.414
592014-01-01 00:45:004470.59993.037.414
602014-01-01 00:45:003750.51993.037.414
612014-01-01 00:45:002970.60993.037.434
622014-01-01 00:45:002500.42893.037.414
632014-01-01 00:45:002350.0693.037.434
642014-01-01 01:00:009800.16100.037.414
652014-01-01 01:00:007970.114100.037.434
662014-01-01 01:00:007830.025100.037.424
672014-01-01 01:00:007090.058100.037.434
682014-01-01 01:00:006970.184100.037.434
692014-01-01 01:00:006360.139100.037.414
702014-01-01 01:00:005480.309100.037.424
712014-01-01 01:00:003740.104100.037.414
722014-01-01 01:00:003580.471100.037.412
732014-01-01 01:00:003040.109100.037.412
742014-01-01 01:00:002910.412100.037.414
752014-01-01 01:00:002630.172100.037.424
762014-01-01 01:00:002560.124100.037.434
772014-01-01 01:00:001230.309100.037.414
782014-01-01 01:00:00790.243100.037.414
792014-01-01 01:00:00750.274100.037.411
802014-01-01 01:00:00610.1100.037.414
812014-01-01 01:00:00350.134100.037.414
822014-01-01 01:15:009500.127100.037.434
832014-01-01 01:15:008440.539100.037.424
842014-01-01 01:15:008320.358100.037.413
852014-01-01 01:15:008070.095100.037.414
862014-01-01 01:15:006560.104100.037.414
872014-01-01 01:15:005980.18100.037.414
882014-01-01 01:15:005880.064100.037.434
892014-01-01 01:15:005700.018100.037.434
902014-01-01 01:15:004760.299100.037.412
912014-01-01 01:15:004670.292100.037.410
922014-01-01 01:15:003860.156100.037.412
932014-01-01 01:15:003060.16100.037.414
942014-01-01 01:15:002900.092100.037.424
952014-01-01 01:15:001930.101100.037.414
962014-01-01 01:15:001340.429100.037.414
972014-01-01 01:15:001230.446100.037.414
982014-01-01 01:15:001190.268100.037.412
992014-01-01 01:15:00890.107100.037.434
1002014-01-01 01:15:0020.037100.037.414
Rows: 1-100 | Columns: 7

Handling Missing Values

Let's take care of our missing values.

In [12]:
sm_consumption_weather_region.count_percent()
Out[12]:
count
percent
"dateUTC"1188432.0100.0
"meterID"1188432.0100.0
"humidity"1188432.0100.0
"temperature"1188432.0100.0
"residenceType"1188432.0100.0
"region"1188432.0100.0
"value"1188412.099.998
Rows: 1-7 | Columns: 3

The variable 'value' has a few missing values that we can drop.

In [13]:
sm_consumption_weather_region["value"].dropna()
sm_consumption_weather_region.count()
20 elements were filtered.
Out[13]:
count
"dateUTC"1188412.0
"meterID"1188412.0
"value"1188412.0
"humidity"1188412.0
"temperature"1188412.0
"residenceType"1188412.0
"region"1188412.0
Rows: 1-7 | Columns: 2

Interpolation & Aggregations

Since power outages seem relatively common in each area, and the "value" represents the electricity consumed during 30 minute intervals (in kWh), it'd be a good idea to interpolate and aggregate the data to get a monthly average in electricity consumption per region.

Let's save our new dataset in the Vertica database.

In [15]:
vp.drop("sm_consumption_weather_region", method = "table")
sm_consumption_weather_region.to_db("sm_consumption_weather_region",
                                    relation_type = "table")
sm_consumption_weather_region_clean = vp.vDataFrame("sm_consumption_weather_region")

To get an equally-sliced dataset, we can then interpolate to fill any gaps. This operation is essential for creating correct time series models.

In [16]:
sm_consumption_weather_region_clean = sm_consumption_weather_region_clean.interpolate(
                                                                        ts = "dateUTC",
                                                                        rule = "30 minutes",
                                                                        method = {"value": "linear",
                                                                                  "humidity": "linear",
                                                                                  "temperature": "linear",
                                                                                  "residenceType": "ffill",
                                                                                  "region": "ffill"},
                                                                        by = ["meterID"])
display(sm_consumption_weather_region_clean)
📅
dateUTC
Datetime
123
meterID
Integer
123
value
Float
123
humidity
Float
123
temperature
Float
123
residenceType
Integer
123
region
Integer
12014-01-01 11:00:0000.02987.037.414
22014-01-01 11:30:0000.074090909090909187.037.727272727272714
32014-01-01 12:00:0000.11918181818181887.038.054545454545514
42014-01-01 12:30:0000.16427272727272787.038.381818181818214
52014-01-01 13:00:0000.20936363636363687.038.709090909090914
62014-01-01 13:30:0000.25445454545454587.039.036363636363614
72014-01-01 14:00:0000.2775238095238186.797619047619139.221428571428614
82014-01-01 14:30:0000.27857142857142986.392857142857139.264285714285714
92014-01-01 15:00:0000.27961904761904885.988095238095239.307142857142914
102014-01-01 15:30:0000.28066666666666785.583333333333339.3514
112014-01-01 16:00:0000.28171428571428685.178571428571439.392857142857114
122014-01-01 16:30:0000.28276190476190584.773809523809539.435714285714314
132014-01-01 17:00:0000.28380952380952484.369047619047639.478571428571414
142014-01-01 17:30:0000.28485714285714383.964285714285739.521428571428614
152014-01-01 18:00:0000.28590476190476283.559523809523839.564285714285714
162014-01-01 18:30:0000.28695238095238183.154761904761939.607142857142914
172014-01-01 19:00:0000.28882.7539.6514
182014-01-01 19:30:0000.28904761904761982.345238095238139.692857142857114
192014-01-01 20:00:0000.29009523809523881.940476190476239.735714285714314
202014-01-01 20:30:0000.29114285714285781.535714285714339.778571428571414
212014-01-01 21:00:0000.29219047619047681.130952380952439.821428571428614
222014-01-01 21:30:0000.29323809523809580.726190476190539.864285714285714
232014-01-01 22:00:0000.29428571428571480.321428571428639.907142857142914
242014-01-01 22:30:0000.29533333333333379.916666666666739.9514
252014-01-01 23:00:0000.29638095238095279.511904761904839.992857142857114
262014-01-01 23:30:0000.29742857142857179.107142857142940.035714285714314
272014-01-02 00:00:0000.29847619047619178.702380952380940.078571428571414
282014-01-02 00:30:0000.2995238095238178.297619047619140.121428571428614
292014-01-02 01:00:0000.30057142857142977.892857142857140.164285714285714
302014-01-02 01:30:0000.30161904761904877.488095238095240.207142857142914
312014-01-02 02:00:0000.30266666666666777.083333333333340.2514
322014-01-02 02:30:0000.30371428571428676.678571428571440.292857142857114
332014-01-02 03:00:0000.30476190476190576.273809523809540.335714285714314
342014-01-02 03:30:0000.30580952380952475.869047619047640.378571428571414
352014-01-02 04:00:0000.30685714285714375.464285714285740.421428571428614
362014-01-02 04:30:0000.30790476190476275.059523809523840.464285714285714
372014-01-02 05:00:0000.30895238095238174.654761904761940.507142857142914
382014-01-02 05:30:0000.3174.2540.5514
392014-01-02 06:00:0000.31104761904761973.845238095238140.592857142857114
402014-01-02 06:30:0000.31209523809523873.440476190476240.635714285714314
412014-01-02 07:00:0000.31314285714285773.035714285714340.678571428571414
422014-01-02 07:30:0000.31419047619047672.630952380952440.721428571428614
432014-01-02 08:00:0000.31523809523809572.226190476190540.764285714285714
442014-01-02 08:30:0000.31628571428571471.821428571428640.807142857142914
452014-01-02 09:00:0000.31733333333333371.416666666666740.8514
462014-01-02 09:30:0000.31838095238095271.011904761904840.892857142857114
472014-01-02 10:00:0000.31942857142857170.607142857142940.935714285714314
482014-01-02 10:30:0000.3204761904761970.202380952380940.978571428571414
492014-01-02 11:00:0000.31372.540.114
502014-01-02 11:30:0000.310375.139.3814
512014-01-02 12:00:0000.320975.339.7414
522014-01-02 12:30:0000.331575.540.114
532014-01-02 13:00:0000.342175.740.4614
542014-01-02 13:30:0000.352775.940.8214
552014-01-02 14:00:0000.32328571428571476.041.014
562014-01-02 14:30:0000.25385714285714376.041.014
572014-01-02 15:00:0000.18442857142857176.041.014
582014-01-02 15:30:0000.11576.041.014
592014-01-02 16:00:0000.11370588235294175.676470588235340.894117647058814
602014-01-02 16:30:0000.11241176470588275.352941176470640.788235294117614
612014-01-02 17:00:0000.11111764705882475.029411764705940.682352941176514
622014-01-02 17:30:0000.10982352941176574.705882352941240.576470588235314
632014-01-02 18:00:0000.10852941176470674.382352941176540.470588235294114
642014-01-02 18:30:0000.10723529411764774.058823529411840.364705882352914
652014-01-02 19:00:0000.10594117647058873.735294117647140.258823529411814
662014-01-02 19:30:0000.10464705882352973.411764705882340.152941176470614
672014-01-02 20:00:0000.10335294117647173.088235294117740.047058823529414
682014-01-02 20:30:0000.10205882352941272.764705882352939.941176470588214
692014-01-02 21:00:0000.10076470588235372.441176470588239.835294117647114
702014-01-02 21:30:0000.099470588235294172.117647058823539.729411764705914
712014-01-02 22:00:0000.098176470588235371.794117647058839.623529411764714
722014-01-02 22:30:0000.096882352941176571.470588235294139.517647058823514
732014-01-02 23:00:0000.095588235294117671.147058823529439.411764705882414
742014-01-02 23:30:0000.094294117647058870.823529411764739.305882352941214
752014-01-03 00:00:0000.09370.539.214
762014-01-03 00:30:0000.091705882352941270.176470588235339.094117647058814
772014-01-03 01:00:0000.090411764705882469.852941176470638.988235294117614
782014-01-03 01:30:0000.089117647058823569.529411764705938.882352941176514
792014-01-03 02:00:0000.087823529411764769.205882352941238.776470588235314
802014-01-03 02:30:0000.086529411764705968.882352941176538.670588235294114
812014-01-03 03:00:0000.085235294117647168.558823529411838.564705882352914
822014-01-03 03:30:0000.083941176470588268.235294117647138.458823529411814
832014-01-03 04:00:0000.082647058823529467.911764705882338.352941176470614
842014-01-03 04:30:0000.081352941176470667.588235294117738.247058823529414
852014-01-03 05:00:0000.080058823529411867.264705882352938.141176470588214
862014-01-03 05:30:0000.078764705882352966.941176470588238.035294117647114
872014-01-03 06:00:0000.077470588235294166.617647058823537.929411764705914
882014-01-03 06:30:0000.076176470588235366.294117647058837.823529411764714
892014-01-03 07:00:0000.074882352941176565.970588235294137.717647058823514
902014-01-03 07:30:0000.073588235294117665.647058823529437.611764705882414
912014-01-03 08:00:0000.072294117647058865.323529411764737.505882352941214
922014-01-03 08:30:0000.07165.037.414
932014-01-03 09:00:0000.074210191082802565.445859872611537.285350318471314
942014-01-03 09:30:0000.077420382165605165.891719745222937.170700636942714
952014-01-03 10:00:0000.080630573248407666.337579617834437.05605095541414
962014-01-03 10:30:0000.083840764331210266.783439490445936.941401273885314
972014-01-03 11:00:0000.087050955414012767.229299363057336.826751592356714
982014-01-03 11:30:0000.090261146496815367.675159235668836.71210191082814
992014-01-03 12:00:0000.093471337579617868.121019108280336.597452229299414
1002014-01-03 12:30:0000.096681528662420468.566878980891736.482802547770714
Rows: 1-100 | Columns: 7

Let's aggregate the data to figure out the monthly energy consumption for each smart meter. We can then save the result in the Vertica database.

In [18]:
import verticapy.stats as st
sm_consumption_month = sm_consumption_weather_region_clean.groupby(
                    columns = ["meterID",
                               "region", 
                               "residenceType",
                               "MONTH(dateUTC) AS month",
                               "DATE_TRUNC('MONTH', dateUTC::date) AS date_month"],
                    expr = [st.sum(sm_consumption_weather_region["value"])._as("value"),
                            st.avg(sm_consumption_weather_region["temperature"])._as("avg_temperature"),
                            st.avg(sm_consumption_weather_region["humidity"])._as("avg_humidity")]).filter(
                                                                                    "date_month < '2015-09-01'")
vp.drop("sm_consumption_month", method = "table")
sm_consumption_month.to_db("sm_consumption_month",
                           relation_type = "table",
                           inplace = True)
Out[18]:
123
meterID
Int
123
region
Int
123
residenceType
Int
123
month
Int
📅
date_month
Timestamp
123
value
Float
123
avg_temperature
Float
123
avg_humidity
Float
104112014-01-01 00:00:001094.4697677931840.11959141374385.3862434750278
204112015-01-01 00:00:00461.09891432908235.86999617387491.9545988846073
304122014-02-01 00:00:00859.3116225434741.146796543701287.3618817946322
404122015-02-01 00:00:00519.57540467649636.67847431885691.6393096571925
504132014-03-01 00:00:00385.58936655751644.612861142744383.7845727673415
604132015-03-01 00:00:00402.45733976113540.487651728516385.0905843339206
704142014-04-01 00:00:00392.3535621869347.709800632414784.8403939698354
804142015-04-01 00:00:00293.3173375838145.07640655930880.6242208952574
904152014-05-01 00:00:00294.37002834571651.338578358491283.9470163357567
1004152015-05-01 00:00:00429.37956331647647.988747843346183.8925132071737
1104162014-06-01 00:00:00416.72039678126856.282090812916280.0076357451401
1204162015-06-01 00:00:00375.48209925429757.832393657906580.5494105121803
1304172014-07-01 00:00:00517.7265983019458.237294241842282.5882473588889
1404172015-07-01 00:00:00608.34873200815660.842987036775178.4408499785012
1504182014-08-01 00:00:00351.44588708935860.44588055973180.7871798502449
1604182015-08-01 00:00:00663.94737645276757.194370500730982.1793258688277
1704192014-09-01 00:00:00540.26117416157256.002922204216781.0310786590966
18041102014-10-01 00:00:00387.21743307143851.910036354534989.3897158191042
19041112014-11-01 00:00:00654.68316281426745.506228391120187.4771741861059
20041122014-12-01 00:00:00467.37954202749838.694551322208691.3160798419345
2114112014-01-01 00:00:00582.93064549864939.120333758187186.2281047089881
2214112015-01-01 00:00:00733.17339312119435.982555840039892.9931707977183
2314122014-02-01 00:00:00328.5441011229940.74063074095288.6208589271437
2414122015-02-01 00:00:00664.90807313226536.10779674338792.0228801611064
2514132014-03-01 00:00:00750.14179253608644.841985642244281.4965569252717
2614132015-03-01 00:00:00747.69683846383642.381938848060477.8892704917467
2714142014-04-01 00:00:00303.62665981066648.431981720484883.1294834473743
2814142015-04-01 00:00:00333.36403485956846.522231399399179.1965773965361
2914152014-05-01 00:00:00571.63539225868551.00436005099581.7882238039068
3014152015-05-01 00:00:00384.70375597733550.764821715846278.4548447843953
3114162014-06-01 00:00:00923.5026666519758.189982887661377.4562080205899
3214162015-06-01 00:00:00497.58823965477858.058235505198480.2845965071549
3314172014-07-01 00:00:00922.01150887194458.858182600724783.6376663297095
3414172015-07-01 00:00:00764.12213811383959.764173162807979.953929990897
3514182014-08-01 00:00:00433.43797351890859.914983187036481.4356536158466
3614182015-08-01 00:00:00296.33582722388856.566716563918483.0268955395646
3714192014-09-01 00:00:00605.07458579661655.323752070133584.3041796784501
38141102014-10-01 00:00:00438.54779370015752.715731463812488.2888190729199
39141112014-11-01 00:00:00565.16518660484346.006196545229988.1022602608813
40141122014-12-01 00:00:00598.06366819038639.772169284992589.9018765295354
412[null][null]12014-01-01 00:00:00[null][null][null]
4224112014-01-01 00:00:00482.38519441539.903977916269284.9491344603949
4324112015-01-01 00:00:00849.90136117488135.920566811858294.0992998737685
4424122014-02-01 00:00:00379.59520761391441.762541510108886.0871432645725
4524122015-02-01 00:00:00657.70712983448137.2715497737690.4227418530055
4624132014-03-01 00:00:00599.6844797938544.79218919521680.8285802271696
4724132015-03-01 00:00:00581.38082108850841.099886851808682.2095048876506
4824142014-04-01 00:00:00470.58886129274748.808310307889785.081830737512
4924142015-04-01 00:00:00489.65798807076747.156023576936577.0781891119151
5024152014-05-01 00:00:00471.78162241893951.197344612016884.3597425444918
5124152015-05-01 00:00:00757.21049172069350.525609599187678.7060994156184
5224162014-06-01 00:00:00327.68333984474255.989260597365682.169875343245
5324162015-06-01 00:00:00381.0854229146857.327554294839380.3458444814076
5424172014-07-01 00:00:00473.42127026307558.594801721678983.3348880025485
5524172015-07-01 00:00:00285.3279952675760.531345184762479.4181890210453
5624182014-08-01 00:00:00341.38607744306459.058691488281584.0634813017688
5724182015-08-01 00:00:00320.78048831327857.568738324322181.8807510916972
5824192014-09-01 00:00:00257.28582472260153.870409606191984.3442042309829
59241102014-10-01 00:00:00434.15174974270452.768752957078686.9737490248736
60241112014-11-01 00:00:00500.06564343947244.591094869858787.2311202524699
61241122014-12-01 00:00:00721.39856458988938.296975867652290.2032059995848
6232312014-01-01 00:00:00442.9948052896140.604665668918583.9113654361121
6332312015-01-01 00:00:00971.73792480659235.655152754511692.535877588346
6432322014-02-01 00:00:00525.15874679573140.442697945974288.9358369295487
6532322015-02-01 00:00:00382.67812806042935.5428194126892.6688186376604
6632332014-03-01 00:00:00318.44232707930343.505920844723686.1035963115818
6732332015-03-01 00:00:00491.43889605389240.546432499659484.1240029140989
6832342014-04-01 00:00:00565.05019889391247.52203807969885.9595189590546
6932342015-04-01 00:00:00285.70537845442746.106918500859378.6967913955865
7032352014-05-01 00:00:00345.64371454294251.46031466471980.5660289985841
7132352015-05-01 00:00:00434.30929627286549.748869753596579.7188663290957
7232362014-06-01 00:00:00311.65691815873655.605584550219482.0744127371383
7332362015-06-01 00:00:00375.75714742935457.234202462633482.44084305558
7432372014-07-01 00:00:00340.69934799980659.475073850017380.5535345355014
7532372015-07-01 00:00:00250.66010243987460.134012939497981.2173259905666
7632382014-08-01 00:00:00423.70015433349860.282217624641577.9895358313569
7732382015-08-01 00:00:00345.54931392796157.353059166617580.0326119273861
7832392014-09-01 00:00:00368.3565642230455.098125643689683.2219195158294
79323102014-10-01 00:00:00419.68803877218352.911372794287187.0375524785344
80323112014-11-01 00:00:00650.70905304071145.093085131760388.229242580969
81323122014-12-01 00:00:00590.57550921817138.295804051427989.2197799286013
824[null][null]12014-01-01 00:00:00[null][null][null]
8344112014-01-01 00:00:00490.95325344723440.048414096161587.6757310001256
8444112015-01-01 00:00:00529.47043111159436.091953671207793.5570243176922
8544122014-02-01 00:00:00466.9571923414841.148786491814887.9046866890208
8644122015-02-01 00:00:00463.27740847097336.134077067150790.5861334174828
8744132014-03-01 00:00:00233.39243661467143.598085413857885.9600088369246
8844132015-03-01 00:00:00434.87319006004840.95547865269282.517165280247
8944142014-04-01 00:00:00578.12758529852447.564873078781485.4902693533969
9044142015-04-01 00:00:00549.91641497029546.076236638059581.0147476646827
9144152014-05-01 00:00:00383.53247166470652.218764801876380.6567902387458
9244152015-05-01 00:00:00389.75332399275951.689014492110176.7850317191888
9344162014-06-01 00:00:00387.78519478018557.58568445524276.4935770006317
9444162015-06-01 00:00:00482.10825743423857.450312563110280.8217663185125
9544172014-07-01 00:00:00360.34572546996957.676335237308787.0106795700421
9644172015-07-01 00:00:00395.68649035254860.265345398896381.6996901183243
9744182014-08-01 00:00:00405.07375950910160.379158194767780.0265350900232
9844182015-08-01 00:00:00659.94161097606156.812280360328482.112426029538
9944192014-09-01 00:00:00269.2147359590355.801998028246181.6661321023242
100441102014-10-01 00:00:00427.23030274778952.795609322019488.2320107845067
Rows: 1-100 | Columns: 8

Understanding the Data & Detecting Outliers

Looking at three different smart meters, we can see a clear decrease in energy consumption during the summer followed by a sharp increase in the winter.

In [19]:
for i in [10, 12, 14]:
    sm_consumption_month[sm_consumption_month["meterID"] == i]["value"].plot(ts = "date_month")

This behavior seems to be seasonal, but we don't have enough data to prove this.

Let's find outliers in the distribution by computing the ZSCORE per meterID.

In [20]:
std = st.std(sm_consumption_month["value"])._over(by = [sm_consumption_month["meterID"]])
avg = st.avg(sm_consumption_month["value"])._over(by = [sm_consumption_month["meterID"]])
sm_consumption_month["value_zscore"] = (sm_consumption_month["value"] - avg) / std
sm_consumption_month.search("value_zscore > 4")
Out[20]:
123
meterID
Integer
123
region
Integer
123
residenceType
Integer
123
month
Integer
📅
date_month
Datetime
123
value
Float
123
avg_temperature
Float
123
avg_humidity
Float
123
value_zscore
Float
136421122014-12-01 00:00:001852.0458063652939.230533354975289.96525230282624.00855200430863
239941102014-10-01 00:00:001589.2377046345652.706855811044188.77303609147824.07298404322647
380941112014-11-01 00:00:001300.5573170824145.570995575523186.47158029841174.0151606376986
49514142015-04-01 00:00:002128.4546619857448.1621177591473.63134617681084.01829822269677
Rows: 1-4 | Columns: 9

Four smart meters are outliers in energy consumption. We'll need to investigate to get more information.

In [21]:
for i in [364, 399, 809, 951]:
    sm_consumption_month[sm_consumption_month["meterID"] == i]["value"].plot(ts = "date_month")

Data Encoding & Bivariate Analysis

Since most of our data is categorical, let's encode them with One-hot encoding. We can then examine the correlations between the various categories.

In [22]:
sm_consumption_month = sm_consumption_month.one_hot_encode(["region", "residenceType", "month"], 
                                                           drop_first = False,
                                                           max_cardinality = 20)
display(sm_consumption_month)
123
meterID
Int
123
region
Int
123
residenceType
Int
123
month
Int
📅
date_month
Timestamp
123
value
Float
123
avg_temperature
Float
123
avg_humidity
Float
123
value_zscore
Float
123
region_0
Bool
123
region_1
Bool
123
region_2
Bool
123
region_3
Bool
123
region_4
Bool
123
residenceType_1
Bool
123
residenceType_2
Bool
123
residenceType_3
Bool
123
month_1
Bool
123
month_2
Bool
123
month_3
Bool
123
month_4
Bool
123
month_5
Bool
123
month_6
Bool
123
month_7
Bool
123
month_8
Bool
123
month_9
Bool
123
month_10
Bool
123
month_11
Bool
123
month_12
Bool
104112014-01-01 00:00:001094.4697677931840.11959141374385.38624347502783.0020292366434800001100100000000000
204112015-01-01 00:00:00461.09891432908235.86999617387491.9545988846073-0.2277241613250800001100100000000000
304122014-02-01 00:00:00859.3116225434741.146796543701287.36188179463221.8028852998388800001100010000000000
404122015-02-01 00:00:00519.57540467649636.67847431885691.63930965719250.070465504915289800001100010000000000
504132014-03-01 00:00:00385.58936655751644.612861142744383.7845727673415-0.61277064264080200001100001000000000
604132015-03-01 00:00:00402.45733976113540.487651728516385.0905843339206-0.52675564293997200001100001000000000
704142014-04-01 00:00:00392.3535621869347.709800632414784.8403939698354-0.5782779221451600001100000100000000
804142015-04-01 00:00:00293.3173375838145.07640655930880.6242208952574-1.08329418739400001100000100000000
904152014-05-01 00:00:00294.37002834571651.338578358491283.9470163357567-1.0779261924086300001100000010000000
1004152015-05-01 00:00:00429.37956331647647.988747843346183.8925132071737-0.38947091862219600001100000010000000
1104162014-06-01 00:00:00416.72039678126856.282090812916280.0076357451401-0.45402391456275600001100000001000000
1204162015-06-01 00:00:00375.48209925429757.832393657906580.5494105121803-0.66431071705093800001100000001000000
1304172014-07-01 00:00:00517.7265983019458.237294241842282.58824735888890.061037870792643200001100000000100000
1404172015-07-01 00:00:00608.34873200815660.842987036775178.44084997850120.52314809052886400001100000000100000
1504182014-08-01 00:00:00351.44588708935860.44588055973180.7871798502449-0.78687877885772200001100000000010000
1604182015-08-01 00:00:00663.94737645276757.194370500730982.17932586882770.80666273259884200001100000000010000
1704192014-09-01 00:00:00540.26117416157256.002922204216781.03107865909660.17594862577349500001100000000001000
18041102014-10-01 00:00:00387.21743307143851.910036354534989.3897158191042-0.60446862917318100001100000000000100
19041112014-11-01 00:00:00654.68316281426745.506228391120187.47717418610590.7594216489354500001100000000000010
20041122014-12-01 00:00:00467.37954202749838.694551322208691.3160798419345-0.19569730290652500001100000000000001
2114112014-01-01 00:00:00582.93064549864939.120333758187186.22810470898810.054824511485317400001100100000000000
2214112015-01-01 00:00:00733.17339312119435.982555840039892.99317079771830.82449726218630200001100100000000000
2314122014-02-01 00:00:00328.5441011229940.74063074095288.6208589271437-1.2483624605928800001100010000000000
2414122015-02-01 00:00:00664.90807313226536.10779674338792.02288016110640.47478349932725900001100010000000000
2514132014-03-01 00:00:00750.14179253608644.841985642244281.49655692527170.91142401811507200001100001000000000
2614132015-03-01 00:00:00747.69683846383642.381938848060477.88927049174670.89889885762710100001100001000000000
2714142014-04-01 00:00:00303.62665981066648.431981720484883.1294834473743-1.3760110552755900001100000100000000
2814142015-04-01 00:00:00333.36403485956846.522231399399179.1965773965361-1.2236706088001400001100000100000000
2914152014-05-01 00:00:00571.63539225868551.00436005099581.7882238039068-0.003039503707676400001100000010000000
3014152015-05-01 00:00:00384.70375597733550.764821715846278.4548447843953-0.9606643406226900001100000010000000
3114162014-06-01 00:00:00923.5026666519758.189982887661377.45620802058991.7995277233263200001100000001000000
3214162015-06-01 00:00:00497.58823965477858.058235505198480.2845965071549-0.38237279285522400001100000001000000
3314172014-07-01 00:00:00922.01150887194458.858182600724783.63766632970951.7918887289093300001100000000100000
3414172015-07-01 00:00:00764.12213811383959.764173162807979.9539299908970.98304338881250700001100000000100000
3514182014-08-01 00:00:00433.43797351890859.914983187036481.4356536158466-0.71100570576653200001100000000010000
3614182015-08-01 00:00:00296.33582722388856.566716563918483.0268955395646-1.4133609790540300001100000000010000
3714192014-09-01 00:00:00605.07458579661655.323752070133584.30417967845010.16826484527975200001100000000001000
38141102014-10-01 00:00:00438.54779370015752.715731463812488.2888190729199-0.68482880593792700001100000000000100
39141112014-11-01 00:00:00565.16518660484346.006196545229988.1022602608813-0.036185469566238600001100000000000010
40141122014-12-01 00:00:00598.06366819038639.772169284992589.90187652953540.13234888710996400001100000000000001
412[null][null]12014-01-01 00:00:00[null][null][null][null]00000000100000000000
4224112014-01-01 00:00:00482.38519441539.903977916269284.9491344603949-0.041359252553416500001100100000000000
4324112015-01-01 00:00:00849.90136117488135.920566811858294.09929987376852.2142699278524800001100100000000000
4424122014-02-01 00:00:00379.59520761391441.762541510108886.0871432645725-0.67223243488487500001100010000000000
4524122015-02-01 00:00:00657.70712983448137.2715497737690.42274185300551.0346785092242700001100010000000000
4624132014-03-01 00:00:00599.6844797938544.79218919521680.82858022716960.67856469874900600001100001000000000
4724132015-03-01 00:00:00581.38082108850841.099886851808682.20950488765060.56622605784525600001100001000000000
4824142014-04-01 00:00:00470.58886129274748.808310307889785.081830737512-0.11375920558657200001100000100000000
4924142015-04-01 00:00:00489.65798807076747.156023576936577.07818911191510.0032774929199838800001100000100000000
5024152014-05-01 00:00:00471.78162241893951.197344612016884.3597425444918-0.10643863837122500001100000010000000
5124152015-05-01 00:00:00757.21049172069350.525609599187678.70609941561841.6453800427044800001100000010000000
5224162014-06-01 00:00:00327.68333984474255.989260597365682.169875343245-0.99084134066947300001100000001000000
5324162015-06-01 00:00:00381.0854229146857.327554294839380.3458444814076-0.66308624371965500001100000001000000
5424172014-07-01 00:00:00473.42127026307558.594801721678983.3348880025485-0.096375305496839600001100000000100000
5524172015-07-01 00:00:00285.3279952675760.531345184762479.4181890210453-1.2507971189872100001100000000100000
5624182014-08-01 00:00:00341.38607744306459.058691488281584.0634813017688-0.90674083687200700001100000000010000
5724182015-08-01 00:00:00320.78048831327857.568738324322181.8807510916972-1.0332075676565800001100000000010000
5824192014-09-01 00:00:00257.28582472260153.870409606191984.3442042309829-1.4229058420423200001100000000001000
59241102014-10-01 00:00:00434.15174974270452.768752957078686.9737490248736-0.33739184952324400001100000000000100
60241112014-11-01 00:00:00500.06564343947244.591094869858787.23112025246990.067154441126717500001100000000000010
61241122014-12-01 00:00:00721.39856458988938.296975867652290.20320599958481.4255844659412200001100000000000001
6232312014-01-01 00:00:00442.9948052896140.604665668918583.91136543611210.0059325985271563900100001100000000000
6332312015-01-01 00:00:00971.73792480659235.655152754511692.5358775883463.2423474448163100100001100000000000
6432322014-02-01 00:00:00525.15874679573140.442697945974288.93583692954870.50885469870117400100001010000000000
6532322015-02-01 00:00:00382.67812806042935.5428194126892.6688186376604-0.36326329728267100100001010000000000
6632332014-03-01 00:00:00318.44232707930343.505920844723686.1035963115818-0.75644798808017300100001001000000000
6732332015-03-01 00:00:00491.43889605389240.546432499659484.12400291409890.30245688182700600100001001000000000
6832342014-04-01 00:00:00565.05019889391247.52203807969885.95951895905460.75302863316017300100001000100000000
6932342015-04-01 00:00:00285.70537845442746.106918500859378.6967913955865-0.95682950159952200100001000100000000
7032352014-05-01 00:00:00345.64371454294251.46031466471980.5660289985841-0.58994941632115800100001000010000000
7132352015-05-01 00:00:00434.30929627286549.748869753596579.7188663290957-0.047231044254715400100001000010000000
7232362014-06-01 00:00:00311.65691815873655.605584550219482.0744127371383-0.79798119647204100100001000001000000
7332362015-06-01 00:00:00375.75714742935457.234202462633482.44084305558-0.40562633452796100100001000001000000
7432372014-07-01 00:00:00340.69934799980659.475073850017380.5535345355014-0.62021368018584500100001000000100000
7532372015-07-01 00:00:00250.66010243987460.134012939497981.2173259905666-1.1713401919502800100001000000100000
7632382014-08-01 00:00:00423.70015433349860.282217624641577.9895358313569-0.11216916488751300100001000000010000
7732382015-08-01 00:00:00345.54931392796157.353059166617580.0326119273861-0.59052723859542300100001000000010000
7832392014-09-01 00:00:00368.3565642230455.098125643689683.2219195158294-0.45092499937299200100001000000001000
79323102014-10-01 00:00:00419.68803877218352.911372794287187.0375524785344-0.13672715890837100100001000000000100
80323112014-11-01 00:00:00650.70905304071145.093085131760388.2292425809691.2773429495905600100001000000000010
81323122014-12-01 00:00:00590.57550921817138.295804051427989.21977992860130.90926800581628700100001000000000001
824[null][null]12014-01-01 00:00:00[null][null][null][null]00000000100000000000
8344112014-01-01 00:00:00490.95325344723440.048414096161587.67573100012560.52410607043641900001100100000000000
8444112015-01-01 00:00:00529.47043111159436.091953671207793.55702431769220.89949517635003900001100100000000000
8544122014-02-01 00:00:00466.9571923414841.148786491814887.90468668902080.29024002753908200001100010000000000
8644122015-02-01 00:00:00463.27740847097336.134077067150790.58613341748280.25437678778818100001100010000000000
8744132014-03-01 00:00:00233.39243661467143.598085413857885.9600088369246-1.9860862804070800001100001000000000
8844132015-03-01 00:00:00434.87319006004840.95547865269282.517165280247-0.02245123544775800001100001000000000
8944142014-04-01 00:00:00578.12758529852447.564873078781485.49026935339691.3737086757319900001100000100000000
9044142015-04-01 00:00:00549.91641497029546.076236638059581.01474766468271.098762102580300001100000100000000
9144152014-05-01 00:00:00383.53247166470652.218764801876380.6567902387458-0.52281879984547800001100000010000000
9244152015-05-01 00:00:00389.75332399275951.689014492110176.7850317191888-0.46219026122829400001100000010000000
9344162014-06-01 00:00:00387.78519478018557.58568445524276.4935770006317-0.48137168391374400001100000001000000
9444162015-06-01 00:00:00482.10825743423857.450312563110280.82176631851250.43790258029336300001100000001000000
9544172014-07-01 00:00:00360.34572546996957.676335237308787.0106795700421-0.74879724507154300001100000000100000
9644172015-07-01 00:00:00395.68649035254860.265345398896381.6996901183243-0.40436551522935300001100000000100000
9744182014-08-01 00:00:00405.07375950910160.379158194767780.0265350900232-0.31287702128111400001100000000010000
9844182015-08-01 00:00:00659.94161097606156.812280360328482.1124260295382.1710696406996400001100000000010000
9944192014-09-01 00:00:00269.2147359590355.801998028246181.6661321023242-1.6369615070441600001100000000001000
100441102014-10-01 00:00:00427.23030274778952.795609322019488.2320107845067-0.096938952583949600001100000000000100
Rows: 1-100 of 20462 | Columns: 29

Let's compute the Pearson correlation matrix.

In [23]:
sm_consumption_month.corr()
Out[23]:
"meterID"
"region"
"residenceType"
"month"
"value"
"avg_temperature"
"avg_humidity"
"value_zscore"
"region_0"
"region_1"
"region_2"
"region_3"
"region_4"
"residenceType_1"
"residenceType_2"
"residenceType_3"
"month_1"
"month_2"
"month_3"
"month_4"
"month_5"
"month_6"
"month_7"
"month_8"
"month_9"
"month_10"
"month_11"
"month_12"
"meterID"1.0-0.03831541192690160.0980146907774023-0.0009582392237039140.000429585019171308-0.000388408483331499-0.00036832764708373-6.06712073807718e-180.0136513091370.04662658695575770.00328987359897663-0.013700995176311-0.0265760464510187-0.1023657682662180.04675775675574620.08167378797860410.00182472281671094-0.000222116232717449-0.000222116232717449-0.000222116232717449-0.000222116232717449-0.000222116232717449-0.000222116232717449-0.000222116232717449-0.000152971646153471-0.000152971646153471-0.000152971646153471-0.000152971646153471
"region"-0.03831541192690161.00.01669755612485090.00.003918380858492980.000771742974758161-0.00217981408380471-3.74313875208588e-18-0.564373811092644-0.539485038229266-0.474356114920355-0.08507961526952570.907172644025799-0.03100621980528580.05749446117131340.0003246584153362870.00.00.00.00.00.00.00.00.00.00.00.0
"residenceType"0.09801469077740230.01669755612485091.00.0-0.00058821325675922-0.0002148000190307680.00183889966163093-6.57284327067527e-18-0.004355617228730740.00168175653142086-0.00460956649743866-0.06236014496587430.0350513980015978-0.9640669703786230.09878733441742060.9600404140142560.00.00.00.00.00.00.00.00.00.00.00.0
"month"-0.0009582392237039140.00.01.0-0.06310503876953570.469805916060245-0.0101794307863282-0.06734724244934250.005893828582188760.00796723516551850.01195649678328390.007356606351977680.05329086618670170.04109842409671330.009024733931590950.021276792503283-0.525142347609341-0.365619082725405-0.263885316907362-0.162151551089319-0.06041778527127630.04131598054676650.1430497463648090.2447835121828520.2386467561956480.3087108907984270.3787750254012070.448839160003987
"value"0.0004295850191713080.00391838085849298-0.00058821325675922-0.06310503876953571.0-0.2860021198287140.2268403662907840.934346221805581-0.00416924243520411-0.003822411219675910.00259294684655743-0.001846440467362840.002687708723824020.00102728747461828-0.00178196776030956-7.98952664875218e-050.2194803428813770.06334331938748190.0503407640346705-0.0661901237431247-0.091208055644822-0.152884648437902-0.0821649426287596-0.0682407716347868-0.04500407103144860.002667798947185760.04766907318515290.170203430554323
"avg_temperature"-0.0003884084833314990.000771742974758161-0.0002148000190307680.469805916060245-0.2860021198287141.0-0.658481080773544-0.309447511959112-0.000725793447764298-0.000488976526300037-7.42642531323949e-050.0002363303536273530.000489612481223581-3.04846404635027e-050.000868657788855458-0.000456483170841854-0.453302104104287-0.4197274849156-0.251537709844356-0.07081958354615510.09163631057259050.361257771451110.435866278784460.3955412049912710.1671777482266520.101654334296474-0.0997624596518854-0.291460186920634
"avg_humidity"-0.00036832764708373-0.002179814083804710.00183889966163093-0.01017943078632820.226840366290784-0.6584810807735441.00.2474145405774160.002873171066408490.000539563796871106-0.0004119119717266020.00123123327393878-0.00174317956967894-0.00141710572205898-0.001150786138182130.002140151215629130.3684427012673110.392388381005602-0.120244449684894-0.190363603253431-0.302626151421451-0.357900077169892-0.146593598441957-0.1860844326001670.009465469490184510.2118063889464520.2085237038494750.317615062275282
"value_zscore"-6.06712073807718e-18-3.74313875208588e-18-6.57284327067527e-18-0.06734724244934250.934346221805581-0.3094475119591120.2474145405774161.04.12522668562028e-19-2.16549570827437e-183.44914590748451e-197.29594933790264e-18-3.08905986829715e-184.5174568540405e-182.04486013789287e-18-6.02067785713798e-180.2369419881134970.06780264610298180.0557949986970486-0.073370313163134-0.0974217129848312-0.168049217626795-0.0884954028425124-0.0739300152307806-0.04435526306641070.004062291097133270.05085975402862460.183143185651282
"region_0"0.013651309137-0.564373811092644-0.004355617228730740.00589382858218876-0.00416924243520411-0.0007257934477642980.002873171066408494.12522668562028e-191.0-0.0436458180521819-0.0654996460256671-0.0403006934839292-0.2919360858700830.0167180597630204-0.02748540618040490.00643135281586266-0.01122329709081710.001366167203945190.001366167203945190.001366167203945190.001366167203945190.001366167203945190.001366167203945190.001366167203945190.0009408805630799070.0009408805630799070.0009408805630799070.000940880563079907
"region_1"0.0466265869557577-0.5394850382292660.001681756531420860.0079672351655185-0.00382241121967591-0.0004889765263000370.000539563796871106-2.16549570827437e-18-0.04364581805218191.0-0.0885419512745503-0.0544781881322893-0.3946371050656150.0190485475998429-0.05017065768937380.0197775757586913-0.01517157243514750.001846775018558160.001846775018558160.001846775018558160.001846775018558160.001846775018558160.001846775018558160.001846775018558160.001271875590575670.001271875590575670.001271875590575670.00127187559057567
"region_2"0.00328987359897663-0.474356114920355-0.004609566497438660.01195649678328390.00259294684655743-7.42642531323949e-05-0.0004119119717266023.44914590748451e-19-0.0654996460256671-0.08854195127455031.0-0.0817558748588118-0.5922352207830740.0167811428537608-0.005284018995288030.00306688529055391-0.02276810628149630.002771470793833750.002771470793833750.002771470793833750.002771470793833750.002771470793833750.002771470793833750.002771470793833750.001908714389813760.001908714389813760.001908714389813760.00190871438981376
"region_3"-0.013700995176311-0.0850796152695257-0.06236014496587430.00735660635197768-0.001846440467362840.0002363303536273530.001231233273938787.29594933790264e-18-0.0403006934839292-0.0544781881322893-0.08175587485881181.0-0.3643911311186770.0663413595304377-0.00802323641415758-0.0552576313558247-0.01400878520932070.001705233565967480.001705233565967480.001705233565967480.001705233565967480.001705233565967480.001705233565967480.001705233565967480.00117439586684340.00117439586684340.00117439586684340.0011743958668434
"region_4"-0.02657604645101870.9071726440257990.03505139800159780.05329086618670170.002687708723824020.000489612481223581-0.00174317956967894-3.08905986829715e-18-0.291936085870083-0.394637105065615-0.592235220783074-0.3643911311186771.00.003236940081829130.05806025169591560.0443629121886827-0.1014788969682280.0123526214987180.0123526214987180.0123526214987180.0123526214987180.0123526214987180.0123526214987180.0123526214987180.008507261364248820.008507261364248820.008507261364248820.00850726136424882
"residenceType_1"-0.102365768266218-0.0310062198052858-0.9640669703786230.04109842409671330.00102728747461828-3.04846404635027e-05-0.001417105722058984.5174568540405e-180.01671805976302040.01904854759984290.01678114285376080.06634135953043770.003236940081829131.0-0.3447442864841-0.812772177641524-0.07826149287675980.009526459473972270.009526459473972270.009526459473972270.009526459473972270.009526459473972270.009526459473972270.009526459473972270.00656088107527760.00656088107527760.00656088107527760.0065608810752776
"residenceType_2"0.04675775675574620.05749446117131340.09878733441742060.00902473393159095-0.001781967760309560.000868657788855458-0.001150786138182132.04486013789287e-18-0.0274854061804049-0.0501706576893738-0.00528401899528803-0.008023236414157580.0580602516959156-0.34474428648411.0-0.178475277615356-0.01718530979776280.002091899238286340.002091899238286340.002091899238286340.002091899238286340.002091899238286340.002091899238286340.002091899238286340.00144069285775670.00144069285775670.00144069285775670.0014406928577567
"residenceType_3"0.08167378797860410.0003246584153362870.9600404140142560.021276792503283-7.98952664875218e-05-0.0004564831708418540.00214015121562913-6.02067785713798e-180.006431352815862660.01977757575869130.00306688529055391-0.05525763135582470.0443629121886827-0.812772177641524-0.1784752776153561.0-0.04051623831165680.004931880138315370.004931880138315370.004931880138315370.004931880138315370.004931880138315370.004931880138315370.004931880138315370.003396590218371930.003396590218371930.003396590218371930.00339659021837193
"month_1"0.001824722816710940.00.0-0.5251423476093410.219480342881377-0.4533021041042870.3684427012673110.236941988113497-0.0112232970908171-0.0151715724351475-0.0227681062814963-0.0140087852093207-0.101478896968228-0.0782614928767598-0.0171853097977628-0.04051623831165681.0-0.121726012676168-0.121726012676168-0.121726012676168-0.121726012676168-0.121726012676168-0.121726012676168-0.121726012676168-0.0838328127168386-0.0838328127168386-0.0838328127168386-0.0838328127168386
"month_2"-0.0002221162327174490.00.0-0.3656190827254050.0633433193874819-0.41972748491560.3923883810056020.06780264610298180.001366167203945190.001846775018558160.002771470793833750.001705233565967480.0123526214987180.009526459473972270.002091899238286340.00493188013831537-0.1217260126761681.0-0.108330625067707-0.108330625067707-0.108330625067707-0.108330625067707-0.108330625067707-0.108330625067707-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543
"month_3"-0.0002221162327174490.00.0-0.2638853169073620.0503407640346705-0.251537709844356-0.1202444496848940.05579499869704860.001366167203945190.001846775018558160.002771470793833750.001705233565967480.0123526214987180.009526459473972270.002091899238286340.00493188013831537-0.121726012676168-0.1083306250677071.0-0.108330625067707-0.108330625067707-0.108330625067707-0.108330625067707-0.108330625067707-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543
"month_4"-0.0002221162327174490.00.0-0.162151551089319-0.0661901237431247-0.0708195835461551-0.190363603253431-0.0733703131631340.001366167203945190.001846775018558160.002771470793833750.001705233565967480.0123526214987180.009526459473972270.002091899238286340.00493188013831537-0.121726012676168-0.108330625067707-0.1083306250677071.0-0.108330625067707-0.108330625067707-0.108330625067707-0.108330625067707-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543
"month_5"-0.0002221162327174490.00.0-0.0604177852712763-0.0912080556448220.0916363105725905-0.302626151421451-0.09742171298483120.001366167203945190.001846775018558160.002771470793833750.001705233565967480.0123526214987180.009526459473972270.002091899238286340.00493188013831537-0.121726012676168-0.108330625067707-0.108330625067707-0.1083306250677071.0-0.108330625067707-0.108330625067707-0.108330625067707-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543
"month_6"-0.0002221162327174490.00.00.0413159805467665-0.1528846484379020.36125777145111-0.357900077169892-0.1680492176267950.001366167203945190.001846775018558160.002771470793833750.001705233565967480.0123526214987180.009526459473972270.002091899238286340.00493188013831537-0.121726012676168-0.108330625067707-0.108330625067707-0.108330625067707-0.1083306250677071.0-0.108330625067707-0.108330625067707-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543
"month_7"-0.0002221162327174490.00.00.143049746364809-0.08216494262875960.43586627878446-0.146593598441957-0.08849540284251240.001366167203945190.001846775018558160.002771470793833750.001705233565967480.0123526214987180.009526459473972270.002091899238286340.00493188013831537-0.121726012676168-0.108330625067707-0.108330625067707-0.108330625067707-0.108330625067707-0.1083306250677071.0-0.108330625067707-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543
"month_8"-0.0002221162327174490.00.00.244783512182852-0.06824077163478680.395541204991271-0.186084432600167-0.07393001523078060.001366167203945190.001846775018558160.002771470793833750.001705233565967480.0123526214987180.009526459473972270.002091899238286340.00493188013831537-0.121726012676168-0.108330625067707-0.108330625067707-0.108330625067707-0.108330625067707-0.108330625067707-0.1083306250677071.0-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543
"month_9"-0.0001529716461534710.00.00.238646756195648-0.04500407103144860.1671777482266520.00946546949018451-0.04435526306641070.0009408805630799070.001271875590575670.001908714389813760.00117439586684340.008507261364248820.00656088107527760.00144069285775670.00339659021837193-0.0838328127168386-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.07460739740945431.0-0.0513821806597472-0.0513821806597472-0.0513821806597472
"month_10"-0.0001529716461534710.00.00.3087108907984270.002667798947185760.1016543342964740.2118063889464520.004062291097133270.0009408805630799070.001271875590575670.001908714389813760.00117439586684340.008507261364248820.00656088107527760.00144069285775670.00339659021837193-0.0838328127168386-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.05138218065974721.0-0.0513821806597472-0.0513821806597472
"month_11"-0.0001529716461534710.00.00.3787750254012070.0476690731851529-0.09976245965188540.2085237038494750.05085975402862460.0009408805630799070.001271875590575670.001908714389813760.00117439586684340.008507261364248820.00656088107527760.00144069285775670.00339659021837193-0.0838328127168386-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0513821806597472-0.05138218065974721.0-0.0513821806597472
"month_12"-0.0001529716461534710.00.00.4488391600039870.170203430554323-0.2914601869206340.3176150622752820.1831431856512820.0009408805630799070.001271875590575670.001908714389813760.00117439586684340.008507261364248820.00656088107527760.00144069285775670.00339659021837193-0.0838328127168386-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0746073974094543-0.0513821806597472-0.0513821806597472-0.05138218065974721.0
Rows: 1-28 | Columns: 29

There's a clear correlation between the month and energy consumption, but this isn't causal. Instead, we can think of the weather as having the direct influence on energy consumption. To accomodate for this view, we'll use the temperature as a predictor (rather than the month).

In [24]:
sm_consumption_month.corr(focus = "value")
Out[24]:
"value"
"value"1.0
"value_zscore"0.934346221805581
"avg_temperature"-0.286002119828714
"avg_humidity"0.226840366290784
"month_1"0.219480342881377
"month_12"0.170203430554323
"month_6"-0.152884648437902
"month_5"-0.091208055644822
"month_7"-0.0821649426287596
"month_8"-0.0682407716347868
"month_4"-0.0661901237431247
"month_2"0.0633433193874819
"month"-0.0631050387695357
"month_3"0.0503407640346705
"month_11"0.0476690731851529
"month_9"-0.0450040710314486
"region_0"-0.00416924243520411
"region"0.00391838085849298
"region_1"-0.00382241121967591
"region_4"0.00268770872382402
"month_10"0.00266779894718576
"region_2"0.00259294684655743
"region_3"-0.00184644046736284
"residenceType_2"-0.00178196776030956
"residenceType_1"0.00102728747461828
"residenceType"-0.00058821325675922
"meterID"0.000429585019171308
"residenceType_3"-7.98952664875218e-05
Rows: 1-28 | Columns: 2

Global Behavior

Let's look at this globally.

In [25]:
sm_consumption_final = sm_consumption_month.groupby(
                            ["date_month"], 
                            [st.avg(sm_consumption_month["avg_temperature"])._as("avg_temperature"),
                             st.avg(sm_consumption_month["avg_humidity"])._as("avg_humidity"),
                             st.avg(sm_consumption_month["value"])._as("avg_value")])
sm_consumption_final.plot(ts = "date_month", columns = ["avg_value"])
Out[25]:
<AxesSubplot:xlabel='"date_month"', ylabel='"avg_value"'>

We expect to see a fall in energy consumption during summer and then an increase during the winter. A simple prediction could use the average value a year before.

In [26]:
sm_consumption_final["prediction"] = st.case_when(sm_consumption_final["date_month"] < '2015-01-01', 
                                                  sm_consumption_final["avg_value"],
                                                  st.lag(sm_consumption_final["avg_value"], 12)._over(
                                                                                order_by = ["date_month"]))
sm_consumption_final.plot(ts = "date_month", columns = ["prediction", "avg_value"])
sm_consumption_final.score("avg_value", "prediction", "r2")
Out[26]:
0.987990336935642

As expected, our model's score is excellent.

Let's use machine learning to understand the influence of the weather and the humidity on energy consumption.

Machine Learning

Let's create our model.

In [27]:
from verticapy.learn.linear_model import LinearRegression

predictors = ["avg_temperature",
              "avg_humidity"]
model = LinearRegression("sm_lr", solver = "BFGS")
model.fit(sm_consumption_final, 
          predictors,
          "avg_value")
model.report("details")
Out[27]:
value
Dep. Variable"avg_value"
ModelLinearRegression
No. Observations20.0
No. Predictors2
R-squared0.79911369421353
Adj. R-squared0.7754800111798276
F-statistic33.81249096905816
Prob (F-statistic)3.837253952408389e-07
Kurtosis-0.0457129692472424
Skewness0.893296166786593
Jarque-Bera (JB)3.65140760780652
Rows: 1-11 | Columns: 2

The model seems to be good with an adjusted R2 of 77.5%, and the F-Statistic indicates that at least one of the two predictors is useful. Let's look at the residual plot.

In [28]:
sm_consumption_final = model.predict(sm_consumption_final, 
                                     name = "value_prediction")
sm_consumption_final["residual"] = sm_consumption_final["avg_value"] - sm_consumption_final["value_prediction"]
sm_consumption_final.scatter(["avg_value", "residual"])
Out[28]:
<AxesSubplot:xlabel='"avg_value"', ylabel='"residual"'>

Looking at the residual plot, we can see that the error variance varies by quite a bit. A possible suspect might be heteroscedasticity. Let's verify our hypothesis using a Breusch-Pagan test.

In [29]:
from verticapy.stats import het_breuschpagan

het_breuschpagan(sm_consumption_final, "residual", predictors)
Out[29]:
value
Lagrange Multiplier Statistic6.066154825831241
lm_p_value0.04816717950866987
F Value3.700508752254135
f_p_value0.04632851414387972
Rows: 1-4 | Columns: 2

The p-value is 4.81% and sits around the 5% threshold, so we can't really draw any conclusions.

Let's look at the entire regression report.

In [30]:
model.report()
Out[30]:
value
explained_variance0.799113694213527
max_error64.4503700445748
median_absolute_error13.1054316247273
mean_absolute_error20.9034976459432
mean_squared_error734.33259460088
root_mean_squared_error27.09857181847191
r20.79911369421353
r2_adj0.7754800111798276
aic139.47924104286193
bic140.9664378635239
Rows: 1-10 | Columns: 2

Our model is very good; its median absolute error is around 13kWh.

With this model, we can make predictions about the energy consumption of households per region. If the usage exceeds what the model predicts, we can raise an alert and respond, for example, by regulating the electricity distributed to the region.

Conclusion

We've solved our problem in a Pandas-like way, all without ever loading data into memory!


VerticaPy

Python API for Vertica Data Science at Scale

About the Author


Badr Ouali
Head of Data Science


Badr Ouali works as a Lead Data Scientist for Vertica worldwide. He can embrace data projects end to end through a clear understanding of the “big picture” as well as attention to details, resulting in achieving great business outcomes – a distinctive differentiator in his role. Badr enjoys sharing knowledge and insights related to data analytics with colleagues & peers and has a sweet spot for Python. He loves helping customers finding the best value from their data and empower them to solve their use-cases.