This is an example of how the API of oxfordrivers.ceh.ac.uk can be used to further delve into the statistics of storm overflows in the River Thames. This case study will look at the 2023 FFT data for seven sewage treatment works (STWs) in Oxford to see whether unpermitted discharges of untreated water occurred, and for how long.
The data
Flow-to-full treatment (FFT)
The FFT data, provided at 15 minute resolution, is a measure of the flow in litres per second (L/s) of untreated water entering the STWs. Each STW has a permit which stipulates the minimum flow an STW is expected to handle before it resorts to diverting untreated water to storm overflow tanks. Once storm overflow tanks are full, the STW will then divert untreated water directly into the river. We acquired this data from Thames Water through an Environmental Information Regulation Request. Data was provided for the following seven STWs in Oxfordshire:
Oxford STW
Cassington STW
Stanton Harcourt STW
Appleton STW
Horton-cum-Studley STW
Islip STW
Forest Hill STW (flowmeter was installed on 23/03/2023, no data prior to this).
Event duration monitoring (EDM)
The event duration monitoring (EDM) data, is provided for all storm overflow sites managed by Thames water through an API. It describes when and where spills of untreated sewage into the river occur. These are recorded as ‘Start’ and ‘Stop’ events. An important caveat to this data is provided on the Thames Water website.
Although EDM gives us valuable insight to how our storm overflows are working, it isn’t always accurate. The monitors are sensitive and can play up. Even the movement of a weed growing in front of the monitor could set it off, indicating that the overflow is active when it isn’t. So when a monitor triggers, we can’t say that a certain section of the watercourse definitely contains sewage, or promise that it’s safe to swim in. EDM doesn’t confirm discharges, it only indicates them because the data received might not be accurate. We’re being open and sharing our data exactly as we receive it, so you can make more informed choices.
Method
To determine whether unpermitted spills occurred, and for how long, the oxford rivers API will be used to compare the FFT and EDM data from the seven STWs listed above.
Specifically, spills will be classified as ‘permitted’ or ‘unpermitted’ based on whether the FFT data indicated a high-flow period exceeding the minimum permitted threshold before the spill occurred. In this analysis we will consider spills to be permitted if any of the FFT measurements (taken at 15 minute intervals) exceeded the minimum threshold in the four hours preceding the spill (Scenario A), twenty-four hours preceding the spill (Scenario B) or seventy-two hours preceding the spill (Scenario C).
Below is Python code to perform this analysis.
Code
1. Import Libraries
Code
#librariesimport requestsimport pandas as pdpd.options.mode.chained_assignment =Noneimport datetimeimport seaborn as snsimport numpy as npimport geopandas as gpdimport contextily as cxfrom matplotlib import pyplot as pltfrom shapely.geometry import Polygon
2. Download EDM sites using the API
Code
# the data is returned as JSONresponse = requests.get("https://oxfordrivers.ceh.ac.uk/getSites?datasetID=edm")# we need to flatten the json to read it into pandas, this is done with json normalise. we set the record path to features to extract the data. sof_sites = pd.json_normalize(response.json(), record_path='features')# convert to a geopandas dataframe for spatial processingsof_sites = gpd.GeoDataFrame(sof_sites)# create a geometry columnsof_sites['geometry.coordinates'] = gpd.points_from_xy(pd.DataFrame(sof_sites["geometry.coordinates"].to_list())[0], pd.DataFrame(sof_sites["geometry.coordinates"].to_list())[1])sof_sites.set_geometry('geometry.coordinates', inplace=True)sof_sites
type
geometry.type
geometry.coordinates
properties.id
properties.name
properties.river
0
Feature
Point
POINT (-0.22923 51.48892)
(Northern) Low Level No 1 Brook Green
(Northern) Low Level No 1 Brook Green
River Thames
1
Feature
Point
POINT (-0.35523 51.64569)
15 Coldharbour Lane, Bushey
15 Coldharbour Lane, Bushey
Colne Brook
2
Feature
Point
POINT (0.28270 51.77630)
Abbess Roding
Abbess Roding
Coopers Brook
3
Feature
Point
POINT (0.00031 51.53010)
Abbey Mills
Abbey Mills
River Lee (Channelsea River South)
4
Feature
Point
POINT (-0.28394 51.53792)
Abbeydale Road
Abbeydale Road
River Brent
...
...
...
...
...
...
...
557
Feature
Point
POINT (-1.27168 51.77214)
Woodstock Road Oxford
Woodstock Road Oxford
River Thames
558
Feature
Point
POINT (0.04819 51.61066)
Worcester Crescent
Worcester Crescent
River Roding
559
Feature
Point
POINT (-1.05927 51.77680)
Worminghall
Worminghall
Worminghall Brook
560
Feature
Point
POINT (-0.24963 51.41495)
Worple Road, SW19
Worple Road, SW19
Beverley Brook
561
Feature
Point
POINT (0.15557 51.74911)
Wynters Brook, Hastingwood
Wynters Brook, Hastingwood
Shonks Brook
562 rows × 6 columns
3. Download FFT sites using the API
Note that this data includes a ‘properties.threshold’ which states the minimum permitted FFT value above which water can be diverted to storm tanks, pop.served is the approximate populated served by each STW.
Code
# the data is returned as JSONresponse = requests.get("https://oxfordrivers.ceh.ac.uk/getSites?datasetID=fft")# we need to flatten the json to read it into pandas, this is done with json normalise. we set the record path to features to extract the data. fft_sites = pd.json_normalize(response.json(), record_path='features')# convert to a geopandas dataframe for spatial procesisngfft_sites = gpd.GeoDataFrame(fft_sites)# create a geometry columnfft_sites['geometry.coordinates'] = gpd.points_from_xy(pd.DataFrame(fft_sites["geometry.coordinates"].to_list())[0], pd.DataFrame(fft_sites["geometry.coordinates"].to_list())[1])fft_sites.set_geometry('geometry.coordinates', inplace=True)fft_sites
type
geometry.type
geometry.coordinates
properties.id
properties.name
properties.threshold
properties.popserved
0
Feature
Point
POINT (-1.33307 51.80664)
Cassington
Cassington
98.0
17200
1
Feature
Point
POINT (-1.21313 51.71428)
Oxford
Oxford
1040.0
228500
2
Feature
Point
POINT (-1.39009 51.75111)
Stanton Harcourt
Stanton Harcourt
17.5
1290
3
Feature
Point
POINT (-1.14786 51.76759)
Forest Hill
Forest Hill
2.9
500
4
Feature
Point
POINT (-1.24264 51.81949)
Islip
Islip
4.4
774
5
Feature
Point
POINT (-1.12397 51.80428)
Horton cum Studley
Horton cum Studley
4.1
430
6
Feature
Point
POINT (-1.37636 51.69619)
Appleton
Appleton
47.5
6850
4. Merge the two listings using a nearest spatial join
5. Calculate unpermitted vs permitted spills at each site based on scenario A (4 hour), scenario B (24 hour) and scenario C (72 hour) preceding time periods.
Code
def calculate_unpermitted_spills(fft_merged_sites,time_hrs):""" This function takes a merged fft/edm pandas dataframe and calculates how many spills occurred, and how many unpermitted spills occurred. Spills are calculated as unpermitted if there is no breach of the minimum permit in any of the readings in the time_hrs preceeding the spill event. time_hrs should be given in hours. """#only reset index if its been setif'Cassington'in fft_merged_sites.index: fft_merged_sites.reset_index(inplace=True) fft_merged_sites.set_index('properties.id_right', inplace=True)#spill stats fft_merged_sites['spill_events'] =0 fft_merged_sites['spill_days'] =0 fft_merged_sites['spill_duration'] =0 fft_merged_sites['mean_spill_duration'] =0#unpermitted spill stats fft_merged_sites['unpermitted_spill_events'] =0 fft_merged_sites['unpermitted_spill_days'] =0 fft_merged_sites['first_spill'] ="" fft_merged_sites['last_spill'] ="" fft_merged_sites['first_unpermitted_spill'] ="" fft_merged_sites['last_unpermitted_spill'] ="" fft_merged_sites['unpermitted_spill_duration'] =0 fft_merged_sites['mean_unpermitted_spill_duration'] =0# get the spill data foridin fft_merged_sites.index: response = requests.get(f"https://oxfordrivers.ceh.ac.uk/getTimeseries?siteID={id}&date=2023-01-01&datasetID=edm") spills = pd.json_normalize(response.json(), record_path ='data')# if there is data iflen(spills) >0: spills['dti'] = pd.DatetimeIndex(spills['datetime']) spills['date'] = spills['dti'].dt.date#filter spills to 2023 spills = spills[spills.dti.dt.year==2023]# calculate the duration of each spill start_spills = spills[spills['value'] =='Start'] stop_spills = spills[spills['value'] =='Stop'] stop_spills.index = stop_spills.index -1 spills['spill_length'] = stop_spills['dti'] - start_spills['dti']#summarise all spill stats in fft_merged_sites fft_merged_sites.loc[id,'spill_events'] = np.sum(spills['value']=='Start') fft_merged_sites.loc[id,'spill_days'] =len(spills['date'].unique()) fft_merged_sites.loc[id,'first_spill'] = spills['dti'].min() fft_merged_sites.loc[id,'last_spill'] = spills['dti'].max() fft_merged_sites.loc[id,'spill_duration'] = spills['spill_length'].sum() fft_merged_sites.loc[id,'mean_spill_duration'] = spills['spill_length'].mean()# now figure out spill days where fft was under a certain threshold response = requests.get(f"https://oxfordrivers.ceh.ac.uk/getTimeseries?siteID={fft_merged_sites.loc[id,'properties.id_left']}&date=2023-01-01&datasetID=fft") fft_threshold = fft_merged_sites.loc[id,'properties.threshold'] fft_timeseries = pd.json_normalize(response.json(), record_path ='data') fft_timeseries['dti'] = pd.DatetimeIndex(fft_timeseries['datetime']) fft_timeseries['date'] = fft_timeseries['dti'].dt.date# filter spills to the maximum fft dti and the minimum fft dti plus time_hrs hours spills = spills[(spills['dti'] >= fft_timeseries['dti'].min() + datetime.timedelta(hours=time_hrs)) & (spills['dti'] <= fft_timeseries['dti'].max())] # calculte the date of the time_hrs hours previous spills['time_period_hr_prev'] = spills['dti'] - datetime.timedelta(hours=time_hrs) spills['unpermitted_spill'] =False# figure out if the fft values ever breached the threshold in this period for idx in spills.index: temp_fft = fft_timeseries[(fft_timeseries['dti'] >= spills.loc[idx,'time_period_hr_prev']) & (fft_timeseries['dti'] <= spills.loc[idx,'dti'])] temp_fft['value'] = pd.to_numeric(temp_fft['value']) tp_sum =sum(temp_fft['value'] >= fft_threshold)if tp_sum ==0: spills.loc[idx,'unpermitted_spill'] =True# summarise the unpermitted spill stats unpermitted_spills = spills[spills['unpermitted_spill']==True] fft_merged_sites.loc[id,'unpermitted_spill_events'] =len(unpermitted_spills[unpermitted_spills['value']=='Start']) fft_merged_sites.loc[id,'unpermitted_spill_days'] =len(unpermitted_spills['date'].unique()) fft_merged_sites.loc[id,'first_unpermitted_spill'] = unpermitted_spills['dti'].min() fft_merged_sites.loc[id,'last_unpermitted_spill'] = unpermitted_spills['dti'].max() fft_merged_sites.loc[id,'unpermitted_spill_duration'] = unpermitted_spills['spill_length'].sum() fft_merged_sites.loc[id,'mean_unpermitted_spill_duration'] = unpermitted_spills['spill_length'].mean()else: raiseValueError('The Thames EDM API did not return any data.')## add some extra stats, like the total spill duration in hours, the spill days/duration per population served, and the percentage of spill hours that were unpermitted fft_merged_sites['unpermitted_spill_duration'] = pd.to_timedelta(fft_merged_sites['unpermitted_spill_duration']) fft_merged_sites['spill_duration'] = pd.to_timedelta(fft_merged_sites['spill_duration']) fft_merged_sites['unpermitted_spill_duration_hours'] = fft_merged_sites['unpermitted_spill_duration']/pd.Timedelta(hours=1) fft_merged_sites['spill_duration_hours'] = fft_merged_sites['spill_duration']/pd.Timedelta(hours=1) fft_merged_sites['percent_unpermitted_spill_hours'] = fft_merged_sites['unpermitted_spill_duration']/fft_merged_sites['spill_duration'] *100#reformat the table and output formatted_data = fft_merged_sites.filter(['properties.id_right','geometry.coordinates','properties.popserved','spill_days','unpermitted_spill_days','spill_duration_hours','unpermitted_spill_duration_hours','percent_unpermitted_spill_hours']) formatted_data.rename(columns={'properties.popserved':'Population served', 'properties.id_right':'STW Name','spill_days':'No. days that spills occured','unpermitted_spill_days':'No. days that unpermitted spills occured','spill_duration_hours':'Total spill duration (hours)','unpermitted_spill_duration_hours':'Total unpermitted spill duration (hours)', 'percent_unpermitted_spill_hours':'Percentage unpermitted spill hours (%)'}, inplace=True)return formatted_data
Across all scenarios, unpermitted spills (below minimum threshold permitted) occurred in five out of the seven STWs. These spills may have been genuine, caused by malfunctions in the STWs preventing low-flow water from being adequately treated, or artifacts, due to false positives recorded by the sensors that detect sewage overflow events.
Oxford and Appleton recorded the highest duration of unpermitted spills (721.75 and 1619 hours respectively under Scenario C). This represented 95.67 % of all spills at Appleton, indicating repeated malfunctioning in the STW itself or in the sensors that detect spills. This is worth further investigation.
Cassington and Islip recorded no unpermitted spills under any scenario, this indicates these STWs are functioning well.