Researchers can now query datasets generated by single-cell sequencing using a new software tool. Users can determine which cell types any given gene combination is active in. The open-access’scfind’ software, which was published in Nature Methods on March 1st, 2021, allows a wide range of users to quickly analyze multiple datasets containing millions of cells on a standard computer.
Such datasets can be processed in a matter of seconds, saving both time and money. The tool, created by researchers at the Wellcome Sanger Institute, works similarly to a search engine in that users can enter free text as well as gene names.
Over the last decade, techniques for sequencing the genetic material of a single cell have advanced rapidly. Single-cell RNA sequencing (scRNAseq), which is used to determine which genes are active in individual cells, can be performed on millions of cells simultaneously and generates massive amounts of data (2.2 GB for the Human Kidney Atlas). Such techniques are being used by projects such as the Human Cell Atlas and the Malaria Cell Atlas to discover and characterize all of the cell types present in an organism or population. To get the most value out of data, it must be easy to access and query by a wide range of researchers.
A new software tool allows researchers to quickly query datasets generated from single-cell sequencing. Users can identify which cell types any combination of genes are active in. The open-access ‘scfind’ software enables swift analysis of multiple datasets containing millions of cells by a wide range of users, on a standard computer.
A new software tool called scfind uses a two-step strategy to compress data 100-fold to allow for quick and efficient access. Data can be queried quickly thanks to efficient decompression. Developed by Wellcome Sanger Institute researchers, scfind can perform large-scale analysis of datasets containing millions of cells on a standard computer without the use of specialized hardware. Queries that used to take days to complete now take seconds.
The new tool can also be used for multi-omics data analyses, such as combining single-cell ATAC-seq data, which measures epigenetic activity, with scRNAseq data.
Dr. Jimmy Lee, Postdoctoral Fellow at the Wellcome Sanger Institute and the study’s lead author, stated: “The advancement of multi-omics methods has created a once-in-a-lifetime opportunity to understand the landscape and dynamics of gene regulatory networks. Scfind will assist us in identifying genomic regions that regulate gene activity, even if they are far from their targets.”
Scfind can also be used to discover new genetic markers that are linked to or define a cell type. When compared to manually curated databases or other computational methods available, the researchers demonstrate that scfind is a more accurate and precise method.
To make scfind more user-friendly, it incorporates natural language processing techniques to allow for arbitrary queries.
Dr. Martin Hemberg, former Group Leader at the Wellcome Sanger Institute and current professor at Harvard Medical School and Brigham and Women’s Hospital, stated: “Single-cell dataset analysis typically necessitates basic programming skills as well as knowledge of genetics and genomics. To ensure that large single-cell datasets are accessible to a wide range of users, we created a tool that works like a search engine, allowing users to enter any query and find relevant cell types.”
According to Dr. Jonah Cool, Science Program Officer at the Chan Zuckerberg Initiative: “New, faster analysis methods are critical for uncovering promising insights in single-cell data, such as that contained in the Human Cell Atlas. User-friendly tools like scfind are hastening the pace of science and researchers’ ability to build on each other’s work, and the Chan Zuckerberg Initiative is proud to support the team that created this technology.”