Estimating Lithium-ion Battery Health¶
Introduction to Lithium-based batteries, their cycles characteristics and aging¶
Lithium-ion (or Li-ion) batteries are rechargeable batteries used for a variety of electronic devices, which range from eletric vehicles, smartphones, and even satellites.
However, despite their wide adoption, research isn't mature enough to avoid problems with battery health and safety, and given the ubiquity of consumer electronics using the technology, this has led to some poor outcomes that range from poor user-experience to public safety concerns (see, for example, the Samsung Galaxy Note 7 explosions from 2016).
Dataset¶
In this example of predictive maintenance, we propose a data-driven method to estimate the health of a battery using the Li-ion battery dataset released by NASA (csv).
This dataset includes information on Li-ion batteries over several charge and discharge cycles at room temperature. Charging was at a constant current (CC) at 1.5A until the battery voltage reached 4.2V and then continued in a constant voltage (CV) mode until the charge current dropped to 20mA. Discharge was at a constant current (CC) level of 2A until the battery voltage fell to 2.7V.
You can download the Jupyter notebook of this study here.
The dataset includes the following:
- Voltage_measured: Battery's terminal voltage (Volts) for charging and discharging cycles
- Current_measured: Battery's output current (Amps) for charging and discharging cycles
- Temperature_measured: Battery temperature (degree Celsius)
- Current_charge: Current measured at charger for charging cycles and at load for discharging cycles (Amps)
- Voltage_charge: Voltage measured at charger for charging cycles and at load for discharging ones (Volts)
- Start_time: Starting time of the cycle
- Time: Time in seconds after the starting time for the cycle (seconds)
- Capacity: Battery capacity (Ahr) for discharging until 2.7V. Battery capacity is the product of the current drawn from the battery (while the battery is able to supply the load) until its voltage drops lower than a certain value for each cell.
Initialization¶
This example uses the following version of VerticaPy:
import verticapy as vp
vp.__version__
Connect to Vertica. This example uses an existing connection called "VerticaDSN." For details on how to create a connection, use see the connection tutorial.
vp.connect("VerticaDSN")
Before we import the data, we'll drop any existing schemas of the same name.
vp.drop("battery_data", method="schema")
vp.create_schema("battery_data", True)
Since our data is in a .csv file, we'll injest it with read_csv().
battery5 = vp.read_csv("data/battery5_data.csv")
Understanding the Data¶
Let's examine our data. Here, we use vDataFrame.head() to retrieve the first five rows of the dataset.
display(battery5.head(5))
Let's perform a few aggregations with vDataFrame.describe() to get a high-level overview of the dataset.
battery5.describe()
To get a better idea of the changes between each cycle, we look at an aggregation at their start time, duration, and voltage at the beginning and the end of each cycle.
battery5['start_time'].describe()
To see how the voltage changes during the cycle, we extract the initial and final voltage measurements for each cycle.
battery5.analytic(func="first_value",
columns="Voltage_measured",
by="start_time",
order_by={"Time":"asc"},
name="first_voltage_measured")
battery5.analytic(func="first_value",
columns="Voltage_measured",
by="start_time",
order_by={"Time":"desc"},
name="last_voltage_measured")
cycling_info = battery5.groupby(columns = ['start_time',
'type',
'first_voltage_measured',
'last_voltage_measured'],
expr = ["COUNT(*) AS nr_of_measurements",
"MAX(Time) AS cycle_duration"]).sort('start_time')
cycling_info['cycle_id'] = "ROW_NUMBER() OVER(ORDER BY start_time)"
cycling_info
We can see from the "duration" column that charging seems to take a longer time than discharging. Let's visualize this trend with an animated graph.
import warnings
warnings.filterwarnings('ignore')
cycling_info.animated(ts="start_time",
columns= ["type","cycle_duration"],
by="type",
kind="bar",)
The animated graph below shows how the cycles change throughout time. Another way we can verify that charging cycles are longer than discharging cycles is by looking at the average duration of each type of cycle.
cycling_info.bar(["type"],
method = "avg",
of = "cycle_duration")
In general, charging cycles are longer than discharging cycles. Let's examine how voltage changes between cycles and their transitions.
cycling_info.groupby('type',['MIN(first_voltage_measured) AS min_first_voltage',
'AVG(first_voltage_measured) AS avg_first_voltage',
'MAX(first_voltage_measured) AS max_first_voltage',
'MIN(last_voltage_measured) AS min_last_voltage',
'AVG(last_voltage_measured) AS avg_last_voltage',
'MAX(last_voltage_measured) AS max_last_voltage'])
From this table, it looks like batteries are charged until they are almost full (4.2V) and discharging doesn't begin until they are fully charged.
End-of-life (EOL) criteria for batteries is usually defined as when the battery capacity is lower than 70%-80% of its rated capacity. Since the rated capacity by the manufacturer for this battery is 2Ah, this battery is considered EOL when its capacity reaches 2Ah x 70% = 1.4Ah.
Let's plot the capacity curve of the battery with its smoothed version and observe when it reaches the degradation criteria.
# Visualize the capacity degradation curve along with its smoothed version
discharging_data = battery5[battery5['type'] == 'discharge']
d_cap = discharging_data[['start_time', 'Capacity']].groupby(['start_time', 'Capacity'])
d_cap["discharge_id"] = "ROW_NUMBER() OVER(ORDER BY start_time, Capacity)"
d_cap.rolling(func = 'mean',
columns = 'capacity',
window = (-100, -1),
name = 'smooth_capacity')
import matplotlib.pyplot as plt
from matplotlib.pyplot import axhline
fig = plt.figure()
ax = d_cap.plot(ts = 'discharge_id', columns = ['Capacity', 'smooth_capacity'])
ax.axhline(y=1.4, label='End-of-life criteria')
ax.set_title('Capacity degradation curve of the battery, its smoothed version and its end-of-life threshold')
ax.legend()
plt.show()
The sudden increases in battery capacity come from the self-charging property of Li-ion batteries. The smoothed graph makes the downward trend in the battery's capacity very clear.
An important observation here is that the battery meets the EOL criteria around the 125th cycle.
Goal and Problem Modeling¶
Understanding battery health is important, but at the time of writing, there's no direct way to measure it. In our case, we'll create a degredation model to find the relationship between a battery's overall health and the other properties in the dataset, which includes charge and discharge cycle duration, average voltage and current, etc.
One possible definition of the battery's overall health ("state of health" or "SoH") is the following:
Let
be the rated capacity of the battery when its new (2Ah in our case),
and
be the actual capacity of the battery at specific time. The
state of health of the battery is defined as:

In order to find this relationship, we'll clean and prepare our data by adding some extra features. These extra features will help us understand how the battery behaves during and between each cycle and pinpoint what might be the primary causes for battery degredation.
Data Preparation¶
Outlier detection¶
Let's start by finding and removing the global outliers from our dataset.
battery5.outliers(columns = ["Voltage_measured","Current_measured","Temperature_measured","Capacity"],
name = "global_outlier",
threshold = 4.0)
battery5.filter("global_outlier = 0").drop('global_outlier')
Feature engineering¶
Since measurements like voltage and temperature tend to differ within the different cycles, we'll create some features that can describe those cycles.
sample_cycle = battery5[battery5['Capacity'] == '1.83514614292266']
sample_cycle["Voltage_measured"].plot(ts = "Time")
sample_cycle["Temperature_measured"].plot(ts = "Time")
We'll define new features that describe the minimum and maximum temperature during one cycle; the minimal voltage; and the time needed to reach minimum voltage and maximum temperature.
# filter for discharge cycles
discharging_data = battery5[battery5['type'] == 'discharge']
# define new features
discharge_cycle_metrics = discharging_data.groupby(columns = ['start_time'],
expr = ['MIN(Temperature_measured) AS min_temp',
'MAX(Temperature_measured) AS max_temp',
'MIN(Voltage_measured) AS min_volt']).join(
discharging_data,
how = "left",
on = {"min_volt":"voltage_measured"},
expr1 = ["*"],
expr2 = ["Time AS time_to_reach_minvolt"]).join(
discharging_data,
how = "left",
on = {"max_temp":"temperature_measured"},
expr1 = ["*"],
expr2 = ["Time AS time_to_reach_maxtemp"])
# calculate values of SOH
discharging_data = discharging_data.groupby(['start_time','Capacity'])
discharging_data['SOH'] = discharging_data['Capacity'] * 0.5
# define the final dataset and save it to db
final_df = discharge_cycle_metrics.join(discharging_data,
on_interpolate = {"start_time":"start_time"},
how = "left",
expr1 = ["*"],
expr2 = ["SOH AS SOH"])
# normalize the features
final_df.normalize(method = "minmax",
columns = ["min_temp",
"max_temp",
"min_volt",
"time_to_reach_minvolt",
"time_to_reach_maxtemp"])
# save it to db
final_df.to_db(name = "battery_data.finaldata_battery_5")
Machine Learning¶
AutoML tests several models and returns input scores for each. We can use this to find the best model for our dataset.
from verticapy.learn.delphi import AutoML
model = AutoML("battery_data.battery_autoML",
estimator = "native")
model.fit("battery_data.finaldata_battery_5",
X = ["min_temp",
"max_temp",
"min_volt",
"time_to_reach_minvolt",
"time_to_reach_maxtemp"],
y = "SOH")
We can visualize the performance and efficency differences of each model with a plot.
model.plot()
Let's check the type and hyperparameters of the most performant model.
# take the best model and its parameters
best_model = model.best_model_
params = best_model.get_params()
print(best_model.type)
We can now define the model using those hyperparameters and train it.
from verticapy.learn.ensemble import RandomForestRegressor
# define a regression model based on the selected parameters
model_rf = RandomForestRegressor(name = "btr_rf1", **params)
model_rf.fit(final_df,
X = ["min_temp",
"max_temp",
"min_volt",
"time_to_reach_minvolt",
"time_to_reach_maxtemp"],
y = "SOH")
model_rf.regression_report()
The predictive power of our model looks pretty good. Let's use our model to predict the SoH of the battery. We can visualize our prediction with a plot against the true values.
# take the predicted values and the plot them along the true ones
result = model_rf.predict(final_df,
name = "SOH_estimates")
result.plot(ts = 'start_time',
columns = ['SOH', 'SOH_estimates'])
