Setting up the Config File

A config (.ini) file needs to be created for each experiment to run the pipeline. All parameters not specified in this file will inherit the default values. The parameters with Default = MUST BE SPECIFIED are the bare minimum parameters which need to be set in the experiment config file. If any section, or parameter within a section, is added to the config file which is not included in the default file, an error will be raised when it is loaded in. Some example config files for typical experiments are listed below.

Example Config Files

3D2D.npy Raw DataNo AnchorQuadCamSeparate Round

[file_names]
input_dir = /Users/.../experiment1/raw
output_dir = /Users/.../experiment1/output
tile_dir = /Users/.../experiment1/tiles
round = Exp1_r0, Exp1_r1, Exp1_r2, Exp1_r3, Exp1_r4, Exp1_r5, Exp1_r6
anchor = Exp1_anchor
code_book = /Users/.../experiment1/codebook.txt

[basic_info]
is_3d = True
anchor_channel = 4
dapi_channel = 0

[file_names]
input_dir = /Users/.../experiment1/raw
output_dir = /Users/.../experiment1/output
tile_dir = /Users/.../experiment1/tiles
round = Exp1_r0, Exp1_r1, Exp1_r2, Exp1_r3, Exp1_r4, Exp1_r5, Exp1_r6
anchor = Exp1_anchor
code_book = /Users/.../experiment1/codebook.txt

[basic_info]
is_3d = False
anchor_channel = 4
dapi_channel = 0

[file_names]
input_dir = /Users/.../experiment1/raw
output_dir = /Users/.../experiment1/output
tile_dir = /Users/.../experiment1/tiles
round = Exp1_r0, Exp1_r1, Exp1_r2, Exp1_r3, Exp1_r4, Exp1_r5, Exp1_r6
anchor = Exp1_anchor
code_book = /Users/.../experiment1/codebook.txt
raw_extension = .npy
raw_metadata = metadata

[basic_info]
is_3d = True
anchor_channel = 4
dapi_channel = 0

[file_names]
input_dir = /Users/.../experiment1/raw
output_dir = /Users/.../experiment1/output
tile_dir = /Users/.../experiment1/tiles
round = Exp1_r0, Exp1_r1, Exp1_r2, Exp1_r3, Exp1_r4, Exp1_r5, Exp1_r6
code_book = /Users/.../experiment1/codebook.txt

[basic_info]
is_3d = True
ref_round = 2
ref_channel = 4

[file_names]
input_dir = /Users/.../experiment1/raw
output_dir = /Users/.../experiment1/output
tile_dir = /Users/.../experiment1/tiles
round = Exp1_r0, Exp1_r1, Exp1_r2, Exp1_r3, Exp1_r4, Exp1_r5, Exp1_r6, Exp1_r7, Exp1_r8
anchor = Exp1_anchor
code_book = /Users/.../experiment1/codebook.txt

[basic_info]
is_3d = True
anchor_channel = 4
dapi_channel = 0
dye_names = DY405, CF405L, AF488, DY520XL, AF532, AF594, ATTO425, AF647, AF750
channel_camera = 405, 555, 470, 470, 555, 640, 555, 640, 640
channel_laser = 405, 405, 445, 470, 520, 520, 555, 640, 730

[file_names]
notebook_name = sep_round_notebook
input_dir = /Users/.../experiment1/raw
output_dir = /Users/.../experiment1/output
tile_dir = /Users/.../experiment1/tiles
anchor = Exp1_sep_round
code_book = /Users/.../experiment1/codebook.txt

[basic_info]
is_3d = True
anchor_channel = 4

Note on Separate Round Config File

The Separate Round config file above is used for registering an additional round to an experiment already run to completion (using a config file like the 3D one indicated above). The pipeline for the Separate Round case cannot be run further that the stitching section.

The run_sep_reg function (script is here) runs the pipeline for the Separate Round case and then registers the anchor_round/anchor_channel to the anchor_round/anchor_channel of the full experiment.

These parameters are explained below and here.

file_names

If the names of the files change during the pipeline or if they are being accessed from another computer, the file_names section of the configuration file can also be changed as explained here.

input_dir

The input directory is the path to the folder which contains the raw data. Examples for the two possible cases of raw_extension are given below (i.e. these are respectively what the input directory looks like for the config files 3D and .npy Raw Data listed above).

.nd2.npy

Differences with raw_extension = .npy

It is assumed that when raw_extension = .npy, there were initial .nd2 files which contained excess information (e.g. extra channels). These were then converted to .npy files to get rid of this.

For the .npy case, input_dir must also contain a metadata .json file. This contains the metadata extracted from the initial .nd2 files using the function save_metadata. An example metadata file is given here for an experiment with 3 tiles and 7 channels.

Also, each name listed in the round parameter indicates a folder not a file. It is assumed these folders were produced using dask.array.to_npy_stack so the contents of each folder should contain a file named info and a .npy file for each tile, with the name being the index of the tile in the initial .nd2 file. An example showing the folder for the first round of a three tile experiment is given below:

output_dir

The output directory is the path to the folder that you would like the notebook.npz file containing the experiment results to be saved. The image below shows what the output directory typically looks like at the end of the experiment.

The names of the files produced can be changed by changing the parameters big_anchor_image, big_dapi_image, notebook_name, omp_spot_coef, omp_spot_info and omp_spot_shape in the config file.

tile_dir

The tile directory is the path to the folder that you would like the filtered images for each tile, round and colour channel to be saved to.

If is_3d == True, a .npy file will be produced for each round, tile and channel with the name for round r, tile T, channel C being config['file_names']['round'][r]_tTcC.npy with axis in the order z-y-x (the name for the anchor round, tile T, channel C will be config['file_names']['anchor']_tTcC.npy). If is_3d == False, a .npy file will be produced for each round and tile called config['file_names']['round'][r]_tT.npy with axis in the order c-y-x.

An example of what the tile directory looks like at the end of the experiment is shown below for a 3D and 2D experiment with 3 tiles and 7 channels:

3D2D

code_book

This is the path to the file containing the code for each gene. The file should be a text file containing two columns, the first being the gene name. The second is the code specifying which dye that gene should appear in each round. Thus it is of length n_rounds, containing numbers in the range from 0 to n_dyes-1 inclusive.

Example

An example is given here so that if

config['basic_info'][dye_names] = DY405, CF405L, AF488, DY520XL, AF532, AF594, ATTO425

the gene Sst with the code 6200265 will be expected to appear with the following dyes in each round:

Round 0: ATTO425
Round 1: AF488
Round 2: DY405
Round 3: DY405
Round 4: AF488
Round 5: ATTO425
Round 6: AF594

basic_info

anchor_channel

The anchor_channel is the channel in the anchor_round which contains spots corresponding to all genes. These spots are used for registration to the imaging rounds and to determine the expected bled_code for each gene.

ref_round

If there is no anchor_round, ref_round must be specified instead and it should be a round which contains a lot of spots in each channel. Spots in ref_round / ref_channel will then be used as reference spots for registration and to determine the expected bled_code for each gene.

If the anchor_round is used and ref_round is specified, ref_round will be set to anchor_round (last round) in the notebook.

Problem with not using anchor

With no anchor, the registration is likely to be worse because an imaging round is used as a reference. Thus, not all genes will appear in ref_round / ref_channel, but only those which appear with a dye in the ref_round which have high intensity in the ref_channel. Also, we would expect the final spots saved in nb.ref_spots to only correspond to genes appearing in ref_round / ref_channel and thus lots of genes will be missing.

With an anchor though, we expect all genes to show up in anchor_round / anchor_channel.

ref_channel

If there is no anchor_round, ref_channel must be specified instead and it should be the channel in ref_round which contains the most spots. If the anchor_round is used and both anchor_channel and ref_channel are specified, ref_channel will be set to anchor_channel in the notebook.

dapi_channel

This is the channel in the anchor_round that contains the DAPI images. The tiles of this channel will be stitched together and saved in the config['file_names']['output_dir'] with a name config['file_names']['big_dapi_image'].

dapi_channel does not have to be included in config['basic_info']['use_channels'] as the anchor round is dealt with separately.

To tophat filter the raw DAPI images first, either config['extract']['r_dapi'] or config['extract']['r_dapi_auto_microns'] must be specified.

Specifying Dyes

It is expected that each gene will appear with a single dye in a given round as indicated by config['file_names']['code_book']. If dye_names is not specified, it is assumed as a starting point for the bleed_matrix calculation that the number of dyes is equal to the number of channels and dye 0 will only appear in channel 0, dye 1 will only appear in channel 1 etc.

If dye_names is specified, both channel_camera and channel_laser must also be specified. This is so that a starting point for the bleed_matrix calculation can be obtained by reading off the expected intensity of each dye in each channel using the file config['file_names']['dye_camera_laser'].

The default config['file_names']['dye_camera_laser'] is given here but if a dye, camera or laser not indicated in this file are used in an experiment, a new version must be made.

Common Additional Parameters

There are a few other parameters that may often need to be different to those given in the default config file.

`extract[r_smooth]`

The parameter r_smooth in the extract section specifies whether to smooth with an averaging kernel after the raw images have been convolved with a difference of hanning kernel. This will make the extract section of the pipeline slower but will reduce the influence of anomalously high or low intensity pixels. It may be particularly appropriate to 3D data because the difference of hanning convolution is done independently on each z-plane but the smoothing can incorporate information between z-planes.

Time for smoothing

The size of r_smooth has big influence on time taken for smoothing. For a 2048 x 2048 x 50 image:

r_smooth = 1, 1, 2: 2.8 seconds
r_smooth = 2, 2, 2: 8.5 seconds

The convolution with the difference of hanning kernel takes 4.1 seconds on the same image so smoothing will make the extract section of the pipeline significantly longer.

By default, this is not specified meaning no smoothing is done. If smoothing is needed, typical values are:

2D: r_smooth = 2, 2
3D: r_smooth = 1, 1, 2

The kernel which the image is correlated with is then

np.ones(2 * r_smooth - 1) / np.sum(np.ones(2 * r_smooth - 1))

so for r_smooth = 2, 2 it will be:

array([[0.11111111, 0.11111111, 0.11111111],
       [0.11111111, 0.11111111, 0.11111111],
       [0.11111111, 0.11111111, 0.11111111]])

The effect of smoothing can be seen using view_filter.

`extract[r_dapi]`

By default, no filtering will be applied to the dapi_channel image of the anchor_round and thus no .npy file will be saved to the tile_dir. This can be changed by specifying r_dapi which should be approximately the radius of a feature in the DAPI image (typical r_dapi is 48). In this case, a 2D tophat filtering will be performed using a kernel of radius r_dapi.

Alternatively, r_dapi_auto_microns can be specified to be the radius of the kernel in units of microns and r_dapi will be computed automatically by converting this into units of yx-pixels (typical r_dapi_auto_microns is 8).

Time for DAPI filtering

The size of r_dapi has big influence on time taken for tophat filtering. For a 2048 x 2048 x 50 image:

r_dapi = 48: 142.4 seconds
r_smooth = 12: 3.9 seconds

The tophat filtering is only done on one channel for each tile but it is quite slow so it may be best to avoid it, especially for experiments with lots of tiles.

The effect of DAPI filtering can be seen using view_filter.

`stitch[expected_overlap]`

This is the expected fractional overlap between neighbouring tiles. By default, it is 0.1 meaning a 10% overlap is expected.

thresholds

The parameters in the thresholds section of the config file contains the thresholds used to determine which spots pass a quality thresholding process such that we consider their gene assignments legitimate.

The default values are based on an experiment run with ground truth data, but they will likely need adjusting after investigating the effect of the thresholds using the Viewer.

Using a subset of the raw data

To run the pipeline with a subset of tiles, imaging rounds, channels or z-planes the following parameters can be set in the basic_info section of the configuration file:

use_tiles
ignore_tiles
use_rounds
use_channels
use_z

If midway through the pipeline, it is decided that a particular tile, round or channel is not worth using, it can be removed without re-running all the steps of the pipeline completed so far.