Skip to content

Dataset Slicing

annual_mean(ds, n_year_days=360, first_day=1)

Returns dataset ds with variables being the average over all years i.e. time dimension of ds will now be from 0.5 to 359.5 if a year has 360 days.

Parameters:

Name Type Description Default
ds Dataset

Dataset for particular experiment.

required
n_year_days int

Number of days in a year used for the simulation. This depends on the calendar option in the main_nml namelist.

360
first_day int

Day used in starting date for the simulation. It is equal to the third number in the current_date option in the main_nml namelist. 1 refers to January 1st.

1

Returns:

Type Description
Dataset

Dataset containing the annual average of each variable

Source code in isca_tools/utils/ds_slicing.py
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
def annual_mean(ds: Dataset, n_year_days: int = 360, first_day: int = 1) -> Dataset:
    """
    Returns dataset `ds` with variables being the average over all years i.e. time dimension of `ds` will now be from
    0.5 to 359.5 if a year has 360 days.

    Args:
        ds: Dataset for particular experiment.
        n_year_days: Number of days in a year used for the simulation.
            This depends on the `calendar` option in the `main_nml` namelist.
        first_day: Day used in starting date for the simulation.
            It is equal to the third number in the `current_date` option in the `main_nml` namelist.
            `1` refers to January 1st.

    Returns:
        Dataset containing the annual average of each variable

    """
    ds_days = (first_day - 1 + ds.time) % n_year_days  # day in a given year that each value in ds.time refers to
    return ds.groupby(ds_days).mean(dim='time')

annual_time_slice(ds, include_months=None, include_days=None, month_days=30, year_months=12, first_day=1)

Slices dataset ds so only contains data corresponding to given months or days for each year.

Examples:

ds_summer = annual_time_slice(ds, [6, 7, 8])
This return a dataset containing data corresponding to June, July and August in each year i.e. only northern hemisphere summer.

ds_day = annual_time_slice(ds, include_days = [36])
This will return a dataset containing data corresponding to the 36th day of the year for each year of the simulation. The length of the time dimension will be the number of years of the simulation.

Parameters:

Name Type Description Default
ds Dataset

Dataset for particular experiment.

required
include_months Optional[List[int]]

int [n_months]
Months to keep (1 refers to January).

None
include_days Optional[List[int]]

int [n_days]
Days to keep (1 refers to 1st January). If include_months is provided, this will be ignored.

None
month_days int

Number of days in each month used for the simulation. This depends on the calendar option in the main_nml namelist.

30
year_months int

Number of months in a year. I think this is always likely to be 12.

12
first_day int

Day used in starting date for the simulation. It is equal to the third number in the current_date option in the main_nml namelist. 1 refers to January 1st.

1

Returns:

Type Description
Dataset

Dataset only including certain months/days for each year.

Source code in isca_tools/utils/ds_slicing.py
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def annual_time_slice(ds: Dataset, include_months: Optional[List[int]] = None, include_days: Optional[List[int]] = None,
                      month_days: int = 30, year_months: int = 12, first_day: int = 1) -> Dataset:
    """
    Slices dataset `ds` so only contains data corresponding to given months or days for each year.

    Examples:
        `ds_summer = annual_time_slice(ds, [6, 7, 8])`</br>
            This return a dataset containing data corresponding to
            June, July and August in each year i.e. only northern hemisphere summer.

        `ds_day = annual_time_slice(ds, include_days = [36])`</br>
            This will return a dataset containing data corresponding to the 36th day of the year for each year
            of the simulation. The length of the time dimension will be the number of years of the simulation.

    Args:
        ds: Dataset for particular experiment.
        include_months: `int [n_months]`</br>
            Months to keep (1 refers to January).
        include_days: `int [n_days]`</br>
            Days to keep (1 refers to 1st January).
            If `include_months` is provided, this will be ignored.
        month_days: Number of days in each month used for the simulation.
            This depends on the `calendar` option in the `main_nml` namelist.
        year_months: Number of months in a year. I think this is always likely to be `12`.
        first_day: Day used in starting date for the simulation.
            It is equal to the third number in the `current_date` option in the `main_nml` namelist.
            `1` refers to January 1st.

    Returns:
        Dataset only including certain months/days for each year.

    """
    year_days = year_months * month_days  # number of days in a year
    # ceil to deal with daily output data when 1st day is 0.5, 2nd day is 1.5 etc
    ds_days = (first_day - 1 + np.ceil(ds.time)) % year_days  # day in a given year that each value in ds.time refers to
    ds_days_step = float(ds_days[1] - ds_days[0])
    ds_days[ds_days == 0] = year_days  # correction so last day of year has index 360 not 0
    if include_months is not None:
        include_days = [np.arange(1, month_days + 1) + month_days * (month - 1) for month in include_months]
        include_days = np.concatenate(include_days)
    elif include_days is None:
        raise ValueError("Either include_months or include_days need to be specified but both are None.")

    # account for ds data being monthly or daily
    include_days = include_days[(include_days - float(ds_days.min())) % ds_days_step == 0]

    if not np.isin(include_days, ds_days).all():
        raise ValueError("Not all months / days provided in include_months / include_days are valid")
    return ds.where(ds_days.isin(include_days), drop=True)

anom_from_annual_mean(ds, combine_lon=False, n_year_days=360, first_day=1)

For each lat, lon and pressure; this computes the annual mean of each variable. It then subtracts it from the initial dataset, to give the anomaly relative to the annual mean value.

Parameters:

Name Type Description Default
ds Dataset

Dataset for particular experiment.

required
combine_lon bool

If True will be anomaly with respect to zonal annual mean, otherwise will just be with respect to annual mean.

False
n_year_days int

Number of days in a year used for the simulation. This depends on the calendar option in the main_nml namelist.

360
first_day int

Day used in starting date for the simulation. It is equal to the third number in the current_date option in the main_nml namelist. 1 refers to January 1st.

1

Returns:

Type Description
Dataset

Dataset with same time variable as ds, containing the annomaly relative to the

Dataset

annual average.

Source code in isca_tools/utils/ds_slicing.py
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
def anom_from_annual_mean(ds: Dataset, combine_lon: bool = False, n_year_days: int = 360,
                          first_day: int = 1) -> Dataset:
    """
    For each lat, lon and pressure; this computes the annual mean of each variable. It then subtracts it
    from the initial dataset, to give the anomaly relative to the annual mean value.

    Args:
        ds: Dataset for particular experiment.
        combine_lon: If `True` will be anomaly with respect to zonal annual mean, otherwise will just
            be with respect to annual mean.
        n_year_days: Number of days in a year used for the simulation.
            This depends on the `calendar` option in the `main_nml` namelist.
        first_day: Day used in starting date for the simulation.
            It is equal to the third number in the `current_date` option in the `main_nml` namelist.
            `1` refers to January 1st.

    Returns:
        Dataset with same `time` variable as `ds`, containing the annomaly relative to the
        annual average.
    """
    if combine_lon:
        ds_annual_mean = annual_mean(ds.mean(dim='lon'), n_year_days, first_day)
    else:
        ds_annual_mean = annual_mean(ds, n_year_days, first_day)
    ds_annual_mean = ds_annual_mean.rename({'time': 'day_of_year'})  # change coordinate to day of year
    # make it integer starting at 0
    ds_annual_mean = ds_annual_mean.assign_coords(day_of_year=(ds_annual_mean.day_of_year -
                                                               ds_annual_mean.day_of_year.min()).astype(int))
    ds['day_of_year'] = (ds.time % n_year_days - (ds.time % n_year_days).min()).astype(int)
    return ds.groupby('day_of_year') - ds_annual_mean

area_weight_mean_lat(ds)

For all variables in ds, an area weighted mean is taken over all latitudes in the dataset.

Parameters:

Name Type Description Default
ds Dataset

Dataset for particular experiment.

required

Returns:

Type Description
Dataset

Dataset containing averaged variables with no latitude dependence.

Source code in isca_tools/utils/ds_slicing.py
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
def area_weight_mean_lat(ds: Dataset) -> Dataset:
    """
    For all variables in `ds`, an area weighted mean is taken over all latitudes in the dataset.

    Args:
        ds: Dataset for particular experiment.

    Returns:
        Dataset containing averaged variables with no latitude dependence.
    """
    var_averaged = []
    for var in ds.keys():
        if 'lat' in list(ds[var].coords):
            ds[var] = area_weighting(ds[var]).mean(dim='lat')
            var_averaged += [var]
    print(f"Variables Averaged: {var_averaged}")
    return ds

lat_lon_coord_slice(ds, lat, lon)

Returns dataset, ds, keeping only data at the coordinate indicated by (lat[i], lon[i]) for all i.

If ds contained t_surf then the returned dataset would contain t_surf as a function of the variables time and location with each value of location corresponding to a specific (lat, lon) combination. For the original ds, it would be a function of time, lat and lon.

This is inspired from a stack overflow post.

Parameters:

Name Type Description Default
ds Dataset

Dataset for particular experiment.

required
lat ndarray

float [n_coords] Latitude coordinates to keep.

required
lon ndarray

float [n_coords] Longitude coordinates to keep.

required

Returns:

Type Description
Dataset

Dataset only including the desired coordinates.

Source code in isca_tools/utils/ds_slicing.py
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
def lat_lon_coord_slice(ds: Dataset, lat: np.ndarray, lon: np.ndarray) -> Dataset:
    """
    Returns dataset, `ds`, keeping only data at the coordinate indicated by `(lat[i], lon[i])` for all `i`.

    If `ds` contained `t_surf` then the returned dataset would contain `t_surf` as a function of the variables
    `time` and `location` with each value of `location` corresponding to a specific `(lat, lon)` combination.
    For the original `ds`, it would be a function of `time`, `lat` and `lon`.

    This is inspired from a
    [stack overflow post](https://stackoverflow.com/questions/72179103/xarray-select-the-data-at-specific-x-and-y-coordinates).

    Args:
        ds: Dataset for particular experiment.
        lat: `float [n_coords]`
            Latitude coordinates to keep.
        lon: `float [n_coords]`
            Longitude coordinates to keep.

    Returns:
        Dataset only including the desired coordinates.
    """
    # To get dataset at specific coordinates not all perutations, turn to xarray first
    lat_xr = xr.DataArray(lat, dims=['location'])
    lon_xr = xr.DataArray(lon, dims=['location'])
    return ds.sel(lat=lat_xr, lon=lon_xr, method="nearest")

lat_lon_range_slice(ds, lat_min=None, lat_max=None, lon_min=None, lon_max=None)

Parameters:

Name Type Description Default
ds Union[Dataset, DataArray]
required
lat_min Optional[float]
None
lat_max Optional[float]
None
lon_min Optional[float]
None
lon_max Optional[float]
None

Returns:

Source code in isca_tools/utils/ds_slicing.py
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
def lat_lon_range_slice(ds: Union[Dataset, DataArray], lat_min: Optional[float] = None,
                        lat_max: Optional[float] = None, lon_min: Optional[float] = None,
                        lon_max: Optional[float] = None):
    """

    Args:
        ds:
        lat_min:
        lat_max:
        lon_min:
        lon_max:

    Returns:

    """
    if (lon_min is None) and lon_max is None:
        lon_range = None
    else:
        if lon_max is None:
            raise ValueError('lon_max is required')
        if lon_min is None:
            raise ValueError('lon_min is required')
        lon_range = slice(lon_min, lon_max)

    if (lat_min is None) and (lat_max is None):
        lat_range = None
    else:
        if lat_max is None:
            raise ValueError('lat_max is required')
        if lat_min is None:
            raise ValueError('lat_min is required')
        lat_range = slice(lat_min, lat_max)

    if lat_range is not None:
        ds = ds.sel(lat=lat_range)
    if lon_range is not None:
        ds = ds.sel(lon=lon_range)
    return ds

lat_lon_rolling(ds, window_lat, window_lon)

This creates a rolling averaged version of the dataset or data-array in the spatial dimension. Returned data will have first np.ceil((window_lat-1)/2) and last np.floor((window_lat-1)/2) values as nan in latitude dimension. The averaging also does not take account of area weighting in latitude dimension.

Parameters:

Name Type Description Default
ds Union[Dataset, DataArray]

Dataset or DataArray to find rolling mean of.

required
window_lat int

Size of window for rolling average in latitude dimension [number of grid points]

required
window_lon int

Size of window for rolling average in longitude dimension [number of grid points].

required

Returns:

Type Description
Union[Dataset, DataArray]

Rolling averaged dataset or DataArray.

Source code in isca_tools/utils/ds_slicing.py
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
def lat_lon_rolling(ds: Union[Dataset, DataArray], window_lat: int, window_lon: int) -> Union[Dataset, DataArray]:
    """
    This creates a rolling averaged version of the dataset or data-array in the spatial dimension.
    Returned data will have first `np.ceil((window_lat-1)/2)` and last `np.floor((window_lat-1)/2)`
    values as `nan` in latitude dimension.
    The averaging also does not take account of area weighting in latitude dimension.

    Args:
        ds: Dataset or DataArray to find rolling mean of.
        window_lat: Size of window for rolling average in latitude dimension [number of grid points]
        window_lon: Size of window for rolling average in longitude dimension [number of grid points].

    Returns:
        Rolling averaged dataset or DataArray.

    """
    ds_roll = ds.pad(lon=window_lon, mode='wrap')       # first pad in lon so wraps around when doing rolling mean
    ds_roll = ds_roll.rolling({'lon': window_lon, 'lat': window_lat}, center=True).mean()
    return ds_roll.isel(lon=slice(window_lon, -window_lon))     # remove the padded longitude values

time_rolling(ds, window_time, wrap=True)

This creates a rolling averaged version of the dataset or data-array in the time dimension. Useful for when have annual average dataset.

Parameters:

Name Type Description Default
ds Union[Dataset, DataArray]

Dataset or DataArray to find rolling mean of.

required
window_time int

Size of window for rolling average in time dimension [number of time units e.g. days]

required
wrap bool

If first time comes immediately after last time i.e. for annual mean data

True

Returns:

Type Description
Union[Dataset, DataArray]

Rolling averaged dataset or DataArray.

Source code in isca_tools/utils/ds_slicing.py
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
def time_rolling(ds: Union[Dataset, DataArray], window_time: int, wrap: bool = True) -> Union[Dataset, DataArray]:
    """
    This creates a rolling averaged version of the dataset or data-array in the time dimension. Useful for when
    have annual average dataset.

    Args:
        ds: Dataset or DataArray to find rolling mean of.
        window_time: Size of window for rolling average in time dimension [number of time units e.g. days]
        wrap: If first time comes immediately after last time i.e. for annual mean data

    Returns:
        Rolling averaged dataset or DataArray.
    """
    if wrap:
        ds_roll = ds.pad(time=window_time, mode='wrap')  # first pad in time so wraps around when doing rolling mean
        ds_roll = ds_roll.rolling(time=window_time, center=True).mean()
        return ds_roll.isel(time=slice(window_time, -window_time))  # remove the padded time values
    else:
        return ds.rolling(time=window_time, center=True).mean()