Case study: FFT and EDM

This is an example of how the API of oxfordrivers.ceh.ac.uk can be used to further delve into the statistics of storm overflows in the River Thames. This case study will look at the 2023 FFT data for seven sewage treatment works (STWs) in Oxford to see whether unpermitted discharges of untreated water occurred, and for how long.

The data

Flow-to-full treatment (FFT)

The FFT data, provided at 15 minute resolution, is a measure of the flow in litres per second (L/s) of untreated water entering the STWs. Each STW has a permit which stipulates the minimum flow an STW is expected to handle before it resorts to diverting untreated water to storm overflow tanks. Once storm overflow tanks are full, the STW will then divert untreated water directly into the river. We acquired this data from Thames Water through an Environmental Information Regulation Request. Data was provided for the following seven STWs in Oxfordshire:

Oxford STW
Cassington STW
Stanton Harcourt STW
Appleton STW
Horton-cum-Studley STW
Islip STW
Forest Hill STW (flowmeter was installed on 23/03/2023, no data prior to this).

Event duration monitoring (EDM)

The event duration monitoring (EDM) data, is provided for all storm overflow sites managed by Thames water through an API. It describes when and where spills of untreated sewage into the river occur. These are recorded as ‘Start’ and ‘Stop’ events. An important caveat to this data is provided on the Thames Water website.

Although EDM gives us valuable insight to how our storm overflows are working, it isn’t always accurate. The monitors are sensitive and can play up. Even the movement of a weed growing in front of the monitor could set it off, indicating that the overflow is active when it isn’t. So when a monitor triggers, we can’t say that a certain section of the watercourse definitely contains sewage, or promise that it’s safe to swim in. EDM doesn’t confirm discharges, it only indicates them because the data received might not be accurate. We’re being open and sharing our data exactly as we receive it, so you can make more informed choices.

Method

To determine whether unpermitted spills occurred, and for how long, the oxford rivers API will be used to compare the FFT and EDM data from the seven STWs listed above.

Specifically, spills will be classified as ‘permitted’ or ‘unpermitted’ based on whether the FFT data indicated a high-flow period exceeding the minimum permitted threshold before the spill occurred. In this analysis we will consider spills to be permitted if any of the FFT measurements (taken at 15 minute intervals) exceeded the minimum threshold in the four hours preceding the spill (Scenario A), twenty-four hours preceding the spill (Scenario B) or seventy-two hours preceding the spill (Scenario C).

Below is Python code to perform this analysis.

Code

1. Import Libraries

Code

#libraries
import requests
import pandas as pd
pd.options.mode.chained_assignment = None
import datetime
import seaborn as sns
import numpy as np
import geopandas as gpd
import contextily as cx
from matplotlib import pyplot as plt
from shapely.geometry import Polygon

2. Download EDM sites using the API

Code

# the data is returned as JSON
response = requests.get("https://oxfordrivers.ceh.ac.uk/getSites?datasetID=edm")
# we need to flatten the json to read it into pandas, this is done with json normalise. we set the record path to features to extract the data. 
sof_sites = pd.json_normalize(response.json(), record_path='features')
# convert to a geopandas dataframe for spatial processing
sof_sites = gpd.GeoDataFrame(sof_sites)
# create a geometry column
sof_sites['geometry.coordinates'] = gpd.points_from_xy(pd.DataFrame(sof_sites["geometry.coordinates"].to_list())[0], pd.DataFrame(sof_sites["geometry.coordinates"].to_list())[1])
sof_sites.set_geometry('geometry.coordinates', inplace=True)
sof_sites

	type	geometry.type	geometry.coordinates	properties.id	properties.name	properties.river
0	Feature	Point	POINT (-0.22923 51.48892)	(Northern) Low Level No 1 Brook Green	(Northern) Low Level No 1 Brook Green	River Thames
1	Feature	Point	POINT (-0.35523 51.64569)	15 Coldharbour Lane, Bushey	15 Coldharbour Lane, Bushey	Colne Brook
2	Feature	Point	POINT (0.28270 51.77630)	Abbess Roding	Abbess Roding	Coopers Brook
3	Feature	Point	POINT (0.00031 51.53010)	Abbey Mills	Abbey Mills	River Lee (Channelsea River South)
4	Feature	Point	POINT (-0.28394 51.53792)	Abbeydale Road	Abbeydale Road	River Brent
...	...	...	...	...	...	...
557	Feature	Point	POINT (-1.27168 51.77214)	Woodstock Road Oxford	Woodstock Road Oxford	River Thames
558	Feature	Point	POINT (0.04819 51.61066)	Worcester Crescent	Worcester Crescent	River Roding
559	Feature	Point	POINT (-1.05927 51.77680)	Worminghall	Worminghall	Worminghall Brook
560	Feature	Point	POINT (-0.24963 51.41495)	Worple Road, SW19	Worple Road, SW19	Beverley Brook
561	Feature	Point	POINT (0.15557 51.74911)	Wynters Brook, Hastingwood	Wynters Brook, Hastingwood	Shonks Brook

562 rows × 6 columns

3. Download FFT sites using the API

Note that this data includes a ‘properties.threshold’ which states the minimum permitted FFT value above which water can be diverted to storm tanks, pop.served is the approximate populated served by each STW.

Code

# the data is returned as JSON
response = requests.get("https://oxfordrivers.ceh.ac.uk/getSites?datasetID=fft")
# we need to flatten the json to read it into pandas, this is done with json normalise. we set the record path to features to extract the data. 
fft_sites = pd.json_normalize(response.json(), record_path='features')
# convert to a geopandas dataframe for spatial procesisng
fft_sites = gpd.GeoDataFrame(fft_sites)
# create a geometry column
fft_sites['geometry.coordinates'] = gpd.points_from_xy(pd.DataFrame(fft_sites["geometry.coordinates"].to_list())[0], pd.DataFrame(fft_sites["geometry.coordinates"].to_list())[1])
fft_sites.set_geometry('geometry.coordinates', inplace=True)
fft_sites

	type	geometry.type	geometry.coordinates	properties.id	properties.name	properties.threshold	properties.popserved
0	Feature	Point	POINT (-1.33307 51.80664)	Cassington	Cassington	98.0	17200
1	Feature	Point	POINT (-1.21313 51.71428)	Oxford	Oxford	1040.0	228500
2	Feature	Point	POINT (-1.39009 51.75111)	Stanton Harcourt	Stanton Harcourt	17.5	1290
3	Feature	Point	POINT (-1.14786 51.76759)	Forest Hill	Forest Hill	2.9	500
4	Feature	Point	POINT (-1.24264 51.81949)	Islip	Islip	4.4	774
5	Feature	Point	POINT (-1.12397 51.80428)	Horton cum Studley	Horton cum Studley	4.1	430
6	Feature	Point	POINT (-1.37636 51.69619)	Appleton	Appleton	47.5	6850

4. Merge the two listings using a nearest spatial join

Code

fft_merged_sites = gpd.sjoin_nearest(fft_sites,sof_sites)
fft_merged_sites

	type_left	geometry.type_left	geometry.coordinates	properties.id_left	properties.name_left	properties.threshold	properties.popserved	index_right	type_right	geometry.type_right	properties.id_right	properties.name_right	properties.river
0	Feature	Point	POINT (-1.33307 51.80664)	Cassington	Cassington	98.0	17200	93	Feature	Point	Cassington	Cassington	River Thames
1	Feature	Point	POINT (-1.21313 51.71428)	Oxford	Oxford	1040.0	228500	395	Feature	Point	Oxford	Oxford	Pottery Stream
2	Feature	Point	POINT (-1.39009 51.75111)	Stanton Harcourt	Stanton Harcourt	17.5	1290	476	Feature	Point	Stanton Harcourt	Stanton Harcourt	Harcourt Brook
3	Feature	Point	POINT (-1.14786 51.76759)	Forest Hill	Forest Hill	2.9	500	214	Feature	Point	Forest Hill	Forest Hill	Polecat End Ditch
4	Feature	Point	POINT (-1.24264 51.81949)	Islip	Islip	4.4	774	284	Feature	Point	Islip	Islip	River Ray
5	Feature	Point	POINT (-1.12397 51.80428)	Horton cum Studley	Horton cum Studley	4.1	430	276	Feature	Point	Horton-Cum-Studley	Horton-Cum-Studley	Danes Brook
6	Feature	Point	POINT (-1.37636 51.69619)	Appleton	Appleton	47.5	6850	15	Feature	Point	Appleton	Appleton	Marcham Brook

5. Calculate unpermitted vs permitted spills at each site based on scenario A (4 hour), scenario B (24 hour) and scenario C (72 hour) preceding time periods.

Code

def calculate_unpermitted_spills(fft_merged_sites,time_hrs):
    """
    This function takes a merged fft/edm pandas dataframe and calculates how many spills occurred, and how many unpermitted spills occurred. 
    Spills are calculated as unpermitted if there is no breach of the minimum permit in any of the readings in the time_hrs preceeding the spill event.

    time_hrs should be given in hours.
    
    """
    #only reset index if its been set
    if 'Cassington' in fft_merged_sites.index:
        fft_merged_sites.reset_index(inplace=True)
    fft_merged_sites.set_index('properties.id_right', inplace=True)
    
    #spill stats
    fft_merged_sites['spill_events'] = 0 
    fft_merged_sites['spill_days'] = 0 
    fft_merged_sites['spill_duration'] = 0 
    fft_merged_sites['mean_spill_duration'] = 0 
   
    #unpermitted spill stats
    fft_merged_sites['unpermitted_spill_events'] = 0 
    fft_merged_sites['unpermitted_spill_days'] = 0 
    fft_merged_sites['first_spill'] = ""
    fft_merged_sites['last_spill'] = ""
    fft_merged_sites['first_unpermitted_spill'] = ""
    fft_merged_sites['last_unpermitted_spill'] = ""
    fft_merged_sites['unpermitted_spill_duration'] = 0 
    fft_merged_sites['mean_unpermitted_spill_duration'] = 0 
    
    # get the spill data 
    for id in fft_merged_sites.index:
        response = requests.get(f"https://oxfordrivers.ceh.ac.uk/getTimeseries?siteID={id}&date=2023-01-01&datasetID=edm")
        spills = pd.json_normalize(response.json(), record_path = 'data')

        # if there is data 
        if len(spills) > 0:
            spills['dti'] = pd.DatetimeIndex(spills['datetime'])
            spills['date'] = spills['dti'].dt.date
            
            #filter spills to 2023
            spills = spills[spills.dti.dt.year==2023]

            # calculate the duration of each spill
            start_spills = spills[spills['value'] == 'Start']
            stop_spills = spills[spills['value'] == 'Stop']
            stop_spills.index = stop_spills.index - 1 
            spills['spill_length'] = stop_spills['dti'] - start_spills['dti']
            
            #summarise all spill stats in fft_merged_sites
            fft_merged_sites.loc[id,'spill_events'] = np.sum(spills['value']=='Start')
            fft_merged_sites.loc[id,'spill_days'] = len(spills['date'].unique())
            fft_merged_sites.loc[id,'first_spill'] = spills['dti'].min()
            fft_merged_sites.loc[id,'last_spill'] = spills['dti'].max()
            fft_merged_sites.loc[id,'spill_duration'] = spills['spill_length'].sum()
            fft_merged_sites.loc[id,'mean_spill_duration'] = spills['spill_length'].mean()
            
            # now figure out spill days where fft was under a certain threshold 
            response = requests.get(f"https://oxfordrivers.ceh.ac.uk/getTimeseries?siteID={fft_merged_sites.loc[id,'properties.id_left']}&date=2023-01-01&datasetID=fft")
            fft_threshold = fft_merged_sites.loc[id,'properties.threshold']
            fft_timeseries = pd.json_normalize(response.json(), record_path = 'data')
            fft_timeseries['dti'] = pd.DatetimeIndex(fft_timeseries['datetime'])
            fft_timeseries['date'] = fft_timeseries['dti'].dt.date
            
            # filter spills to the maximum fft dti and the minimum fft dti plus time_hrs hours
            spills = spills[(spills['dti'] >= fft_timeseries['dti'].min() + datetime.timedelta(hours=time_hrs)) & (spills['dti'] <= fft_timeseries['dti'].max())]        
            
            # calculte the date of the time_hrs hours previous 
            spills['time_period_hr_prev'] =  spills['dti'] - datetime.timedelta(hours=time_hrs)
            spills['unpermitted_spill'] = False
            
            # figure out if the fft values ever breached the threshold in this period 
            for idx in spills.index:
                temp_fft = fft_timeseries[(fft_timeseries['dti'] >= spills.loc[idx,'time_period_hr_prev']) & (fft_timeseries['dti'] <= spills.loc[idx,'dti'])]
                temp_fft['value'] = pd.to_numeric(temp_fft['value'])
                tp_sum = sum(temp_fft['value'] >= fft_threshold)
                if tp_sum == 0:
                    spills.loc[idx,'unpermitted_spill'] = True
            
            # summarise the unpermitted spill stats
            unpermitted_spills = spills[spills['unpermitted_spill']==True]
            fft_merged_sites.loc[id,'unpermitted_spill_events'] = len(unpermitted_spills[unpermitted_spills['value']=='Start'])
            fft_merged_sites.loc[id,'unpermitted_spill_days'] = len(unpermitted_spills['date'].unique())
            fft_merged_sites.loc[id,'first_unpermitted_spill'] = unpermitted_spills['dti'].min()
            fft_merged_sites.loc[id,'last_unpermitted_spill'] = unpermitted_spills['dti'].max()
            fft_merged_sites.loc[id,'unpermitted_spill_duration'] = unpermitted_spills['spill_length'].sum()
            fft_merged_sites.loc[id,'mean_unpermitted_spill_duration'] = unpermitted_spills['spill_length'].mean()

        else: 
            raise ValueError('The Thames EDM API did not return any data.')
        
    ## add some extra stats, like the total spill duration in hours, the spill days/duration per population served, and the percentage of spill hours that were unpermitted
    fft_merged_sites['unpermitted_spill_duration'] = pd.to_timedelta(fft_merged_sites['unpermitted_spill_duration'])
    fft_merged_sites['spill_duration'] = pd.to_timedelta(fft_merged_sites['spill_duration'])
    fft_merged_sites['unpermitted_spill_duration_hours'] = fft_merged_sites['unpermitted_spill_duration']/pd.Timedelta(hours=1)
    fft_merged_sites['spill_duration_hours'] = fft_merged_sites['spill_duration']/pd.Timedelta(hours=1)
    fft_merged_sites['percent_unpermitted_spill_hours'] = fft_merged_sites['unpermitted_spill_duration']/fft_merged_sites['spill_duration'] * 100

    #reformat the table and output 
    formatted_data = fft_merged_sites.filter(['properties.id_right','geometry.coordinates','properties.popserved','spill_days','unpermitted_spill_days','spill_duration_hours','unpermitted_spill_duration_hours','percent_unpermitted_spill_hours'])
    formatted_data.rename(columns={'properties.popserved':'Population served', 'properties.id_right':'STW Name','spill_days':'No. days that spills occured','unpermitted_spill_days':'No. days that unpermitted spills occured','spill_duration_hours':'Total spill duration (hours)','unpermitted_spill_duration_hours':'Total unpermitted spill duration (hours)', 'percent_unpermitted_spill_hours':'Percentage unpermitted spill hours (%)'}, inplace=True)

    return formatted_data

Scenario A: 4 hour time period

Code

pd.set_option('display.precision', 2)
calculate_unpermitted_spills(fft_merged_sites,time_hrs=4)

	geometry.coordinates	Population served	No. days that spills occured	No. days that unpermitted spills occured	Total spill duration (hours)	Total unpermitted spill duration (hours)	Percentage unpermitted spill hours (%)
properties.id_right
Cassington	POINT (-1.33307 51.80664)	17200	2	1	14.25	0.00	0.00
Oxford	POINT (-1.21313 51.71428)	228500	94	89	1839.50	1479.25	80.42
Stanton Harcourt	POINT (-1.39009 51.75111)	1290	65	18	1322.00	132.25	10.00
Forest Hill	POINT (-1.14786 51.76759)	500	5	5	8.75	8.75	100.00
Islip	POINT (-1.24264 51.81949)	774	18	0	443.25	0.00	0.00
Horton-Cum-Studley	POINT (-1.12397 51.80428)	430	44	8	630.50	156.25	24.78
Appleton	POINT (-1.37636 51.69619)	6850	90	90	1692.25	1692.25	100.00

Scenario B: 24 hour time period

Code

calculate_unpermitted_spills(fft_merged_sites,time_hrs=24)

	geometry.coordinates	Population served	No. days that spills occured	No. days that unpermitted spills occured	Total spill duration (hours)	Total unpermitted spill duration (hours)	Percentage unpermitted spill hours (%)
properties.id_right
Cassington	POINT (-1.33307 51.80664)	17200	2	0	14.25	0.00	0.00
Oxford	POINT (-1.21313 51.71428)	228500	94	77	1839.50	1040.00	56.54
Stanton Harcourt	POINT (-1.39009 51.75111)	1290	65	7	1322.00	79.25	5.99
Forest Hill	POINT (-1.14786 51.76759)	500	5	5	8.75	8.75	100.00
Islip	POINT (-1.24264 51.81949)	774	18	0	443.25	0.00	0.00
Horton-Cum-Studley	POINT (-1.12397 51.80428)	430	44	5	630.50	81.00	12.85
Appleton	POINT (-1.37636 51.69619)	6850	90	89	1692.25	1630.25	96.34

Scenario C: 72 hour time period

Code

calculate_unpermitted_spills(fft_merged_sites,time_hrs=72)

	geometry.coordinates	Population served	No. days that spills occured	No. days that unpermitted spills occured	Total spill duration (hours)	Total unpermitted spill duration (hours)	Percentage unpermitted spill hours (%)
properties.id_right
Cassington	POINT (-1.33307 51.80664)	17200	2	0	14.25	0.00	0.00
Oxford	POINT (-1.21313 51.71428)	228500	94	56	1839.50	721.75	39.24
Stanton Harcourt	POINT (-1.39009 51.75111)	1290	65	3	1322.00	33.25	2.52
Forest Hill	POINT (-1.14786 51.76759)	500	5	5	8.75	8.75	100.00
Islip	POINT (-1.24264 51.81949)	774	18	0	443.25	0.00	0.00
Horton-Cum-Studley	POINT (-1.12397 51.80428)	430	44	5	630.50	81.00	12.85
Appleton	POINT (-1.37636 51.69619)	6850	90	87	1692.25	1619.00	95.67

Discussion

Across all scenarios, unpermitted spills (below minimum threshold permitted) occurred in five out of the seven STWs. These spills may have been genuine, caused by malfunctions in the STWs preventing low-flow water from being adequately treated, or artifacts, due to false positives recorded by the sensors that detect sewage overflow events.

Oxford and Appleton recorded the highest duration of unpermitted spills (721.75 and 1619 hours respectively under Scenario C). This represented 95.67 % of all spills at Appleton, indicating repeated malfunctioning in the STW itself or in the sensors that detect spills. This is worth further investigation.

Cassington and Islip recorded no unpermitted spills under any scenario, this indicates these STWs are functioning well.