Usage¶
Prerequisites¶
Drug Sniffer is implemented as a Nextflow workflow. Users will need to install Nextflow before they can use Drug Sniffer, it is generally quite easy to install, see the website for details.
Note that when Nextflow is installed using the default method, a file called
nextflow
is created. This can be moved to a location on the user’s PATH
or it can be invoked like ./nextflow
, per standard Unix practices.
Docker also must be installed in the execution environment and configured so that the user launching the Nextflow workflow has permission to run Docker containers. In the future, we intend to support Singularity containers as well since this container runtime is more commonly available in HPC environments.
Containers¶
The first step is to build the Docker images. This can be done by running
the build-images.sh
script in the tool/
directory.
./tool/build-images.sh
This will build the necessary Docker images. If the workflow is to be run on a
cluster or cloud environment then it may be necessary to push the images to a
registry. In this case, set the IMAGE_NAMESPACE
environment variable to
a valid registry and namespace when running the script above.
IMAGE_NAMESPACE=fancyregistry.io/mylab ./tool/build-images.sh
If the IMAGE_PUSH
environment variable is set to anything other than 0
(the default), the images will also be pushed to the specified registry,
which defaults to docker.io
is otherwise unspecified.
IMAGE_NAMESPACE=mylab IMAGE_PUSH=1 ./tool/build-images.sh
Running¶
The simplest way to learn how to use Drug Sniffer is to experiment with the
examples (see below for more information). These may be found in the
examples/
directory within the project repository. First, clone the
repository (https://github.com/TravisWheelerLab/drug-sniffer). Then,
from the project root directory, you can run one of the examples with the
command below.
nextflow run -profile local -params-file examples/3vri_params.yaml .
There are three things going on here. First, we select the environment the
workflow will run in with -profile local
. The available environments are
described in nextflow.config
in the same directory. An example is shown
below:
profiles {
local {
process.executor = 'local'
docker.enabled = true
}
aws_batch {
process.executor = 'awsbatch'
process.queue = 'drug-sniffer-queue'
aws.region = 'us-east-1'
}
}
The config above describes two environments. The first, local
, which runs
the workflow on the local machine, with Docker enabled (Drug Sniffer requires
Docker or another container runtime). The second is aws_batch
, which will
run the workflow in the AWS cloud using the Batch batch processing service,
which would need to have been configured with a queue called
drug-sniffer-queue
.
See the Nextflow documentation for information about other environments, including SLURM. The configuration file format is also described.
Next, we specify a set of parameters for the workflow run with
-params-file examples/3vri_params.yaml
. This tells Nextflow to load
workflow parameters from the specified YAML file. An example file is shown
below:
See Parameters for details on the available workflow parameters.
molecule_db: '${projectDir}/../examples/small-db'
tanimoto_cutoff: 0.5
receptor_pdb: '${projectDir}/../examples/3vri_aligned.pdb'
receptor_center_x: 14.641000
receptor_center_y: -11.026000
receptor_center_z: 43.231998
receptor_size_x: 10.0
receptor_size_y: 10.0
receptor_size_z: 10.0
admet_checks: '1 2 3'
output_dir: '${launchDir}/drug-sniffer-output'
The parameters described in this file are explained on the Parameters
page. Of interest, however, is the ${launchDir}
variable,
which is set to the directory from which the nextflow
command is run
(running a Nextflow workflow is often called “launching” it). There is also a
variable called projectDir
available which is set to the location of the
workflow itself (the .nf
file).
Finally, we tell Nextflow to run the workflow configured for the current
directory (using .
). It is also possible to run the workflow without
cloning the Git repository by referencing the repo on the command line:
nextflow run -profile local -params-file my-params.yaml \
-r main TravisWheelerLab/drug-sniffer
The -r
option tells Nextflow which branch to use. In this case, our primary
branch is called “main”, so that’s usually the one you want to execute.
We also suggest using the -with-report
option to the Nextflow “run” command
as it produces a useful report after the workflow has finished. See the example report for details.
Output¶
There are two output files. The first, all_errors.txt
, contains errors
produced during the workflow run. The second, all_results.txt
contains the
actual output. The output file is tab-separated and includes the fields listed
below:
Pose - the ID of the Autodock Vina pose
Chemical name - the name of the chemical from the molecule database
Chemical database - the name of the database that contains the chemical
Chemical SMILES string - the raw SMILES string
dock2bind score - the score assigned by the dock2bind model
Three columns per ADMET check - predicted, confidence, and credibility, see the FPADMET documentation for more details
The calculated
logp
value
Examples¶
There are two examples, both found in the examples/
directory within the
repository: 3vri
and 5l2s
. The first, when run, will test a pre-computed
set of ligands, effectively skipping Stage 3 of the pipeline and going right to
Stage 4. This has two benefits. First, Autogrow4 takes a long time to run, so if
the goal is to simply see the pipeline in action, or verify some change, the
3vri
example is the way to go. Second, some users may want to create ligands
to test using some other method, and the 3vri
example serves to demonstrate
how to do this. The 5l2s
example runs the entire pipeline.
Molecule Database¶
Drug Sniffer requires a database of potential molecules in order to function. We provide a large, curated database for use by the public. The database is an aggregation of a number of existing databases intended for drug research, and each molecule includes a reference back to its original source for convenience.
The database is about 165GB compressed, so it requires a large filesystem. Further, when running Drug Sniffer on a cluster, we recommend that you make the database accessible through NFS or some similar means to avoid downloading it on to each node.
The full database is available for download with the following steps in a shell environment :
# download a collection of files representing the complete database. This will create
# a new directory in your current working directory called molecule-files/
$ mkdir molecule_db
$ cd molecule_db
$ wget --accept-regex "ds_" -nH -np -r https://data.drugsniffer.org/molecule-files/
# merge all those files into a single zip file, then unzip it
$ zip -F molecule-files/ds_molecules.zip --out molecules.zip
$ unzip molecules.zip
# clean up
$ rm -rf molecules.zip molecule-files/
Once extracted, you can point Drug Sniffer at the location using the molecule_db parameter. That will be the path to (and including) the molecule_db directory described above.