We have developed a software toolkit to easily process the environmental DNA (eDNA) sequence data generated by CALeDNA with the highest precision possible. We named the software Anacapa, after an iconic island off the Southern Californian coast that has significant cultural and biodiversity importance. Anacapa Island's name is actually derived from the Chumash Ennepah or Anyapakh which means "mirage island" (Gudde and Bright, 2004). Using environmental DNA (eDNA) to track biodiversity may have once seemed like an illusion on the horizon, but like the island, it's real. The Anacapa toolkit processes eDNA reads and assigns taxonomy. It also includes an easy-to-use R package ranacapa, to explore differences in biodiversity across samples. Below are links to download and try Anacapa. Once we have published CALeDNA sequences available, we'll add tutorials so you too can help us explore what lives in California.
eDNA reads need to be matched to reference sequences with taxonomic assignments. Anacapa includes a tool called CRUX to create custom reference databases for any barcoding marker of your choosing. Once you've chosen a reference database, Anacapa processes reads from either a HiSeq or MiSeq platform. It keeps all of the quality reads, removes problematic sequences such as chimeras, and then assigns each unique read, or "Assigned Sequencing Variant" (ASV), to a best match in the reference sequence database. For this last step, we employ the Bayesian Least Common Ancestor algorithm (Gao et al., 2017) to generate both the taxonomy assignment a confidence estimation for each level of taxonomy, all the way to species, based on analysis of the 100 top matching sequences.
To run either CRUX or the other Anacapa tools, a specific set of software must be installed on the analysis machine. In order to simplify the process of installing all of the necessary software dependencies, a containerized version of the pipeline is available. The container is an Ubuntu Linux disk image that has all of the dependencies pre-installed at known working versions. It is also compatible with compute clusters through use of a tool
called Singularity that allows for containerized execution without requiring root privileges. More instructions for use are available on the project README in the container github (see links to the right).
Anacapa comes in a few installation flavors. Visit our github pages to download and install software on a high performance computer cluster. If you are a UC student, staff, or faculty, get in touch and we can show you where it is already maintained on campus clusters.