PG&E Electricity Data Analysis and Forecasting
PG&E is one of the largest energy providers in California, serving millions of customers across the state. The company collects large amounts of data on electricity usage, which can be analyzed to identify trends, patterns, and seasonality. By using time series methods, we can analyze this data to make predictions about future electricity usage and demand.
For this project, I downloaded the datasets from PG&E's public datasets (https://pge-energydatarequest.com/public_datasets). Even though the public datasets have limited information and columns, but this project will show the overall electricity consumption in California for the past ten years.
Exploring Datasets
The public datasets shared by PG&E, have 8 columns as shown. In this project, we will be exploring 6 columns that provide much insight into electric consumption and trends.

Electricity Consumption in 2022
By the end of December 2022, PG&E had 4.973 Million active customers accounts and supplied 82 Billion KWH of electricity to its customers. Santa Clara and Alameda counties topped the chart for the most customers and higher consumption, but surprisingly, Fresno and Bakersfield, even though they don't have a lot of customers compared to counties like Contra Costa, the electricity consumption was higher than expected. But this is justified because both counties have a lot of agricultural and industrial activities.


The months of July and August topped the list for the highest consumption of electricity and November and February were the least. Mainly because during summer both residential and commercial customers run their ACs to cool off the heat, And the months before and after winter are relatively cooler and people do not use that much.

Commercial followed by Residential customers were the top classes of customers in consuming electricity.
​
Residential and Commercial customers almost follow the same curve, agricultural customer uses electricity mostly from February to November, and the curve bulges in summer, but not with a steep slope.
​
Industrial usage of electricity is clearly not affected by weather or seasons and maintained almost a steady line throughout the year.


Historical Trends
For the past 10 years, the consumption of electricity has steadily followed the same trend, except there are massive outliers for the years 2013, 2014, and 2017.
From September 2013 to July 2014, the consumption of electricity doubled twice the average normal in the same months for the other years. The same thing can also be observed in September 2017, but the trend fall back to normal in October.

When we compare the number of customers for each year, we see that in September 2013, the number of customers doubled, and fell back to normal in July 2014. And that was what caused the abnormal energy consumption trend in the figure above. The same pattern can be seen in September 2017 as well.

Forecasting Future Trends
To forecast the future need for energy consumption in California, I have used a machine learning algorithm, xgboost. The Algorithm made predictions based on the 10 years of historical data.
I trained the model to predict what energy consumption would look like in 5 years, that is the year 2027. This was supervised learning, so I picked month and year columns as features and total energy consumption as targets.
There isn't much difference between the current trends and the predicted trend for the year 2017 as shown below.


Github link for this project:
https://github.com/nletcher/PG-E-Electric-Usage-Analysis/blob/main/Historical%20Trends.ipynb