Learn R Programming

scfetch - Access and Format Single-cell RNA-seq Datasets from Public Resources

Introduction

scfetch is designed to accelerate users download and prepare single-cell datasets from public resources. It can be used to:

  • Download fastq files from GEO/SRA, foramt fastq files to standard style that can be identified by 10x softwares (e.g. CellRanger).
  • Download bam files from GEO/SRA, support downloading original 10x generated bam files (with custom tags) and normal bam files, and convert bam files to fastq files.
  • Download scRNA-seq matrix and annotation (e.g. cell type) information from GEO, PanglanDB and UCSC Cell Browser, load the downnloaded matrix to Seurat.
  • Download processed objects from Zeenodo, CELLxGENE and Human Cell Atlas.
  • Formats conversion between widely used single cell objects (SeuratObject, AnnData, SingleCellExperiment, CellDataSet/cell_data_set and loom).

Framework

Installation

You can install the development version of scfetch from GitHub with:

# install.packages("devtools")
devtools::install_github("showteeth/scfetch")

For issues about installation, please refer INSTALL.md.

For data structures conversion, scfetch requires several python pcakages, you can install with:

# install python packages
conda install -c bioconda loompy anndata
# or
pip install anndata loompy

Usage

Vignette

Detailed usage is available in website.

Function list

Contact

For any question, feature request or bug report please write an email to songyb0519@gmail.com.

Code of Conduct

Please note that the scfetch project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('scfetch')

Monthly Downloads

10

Version

0.5.0

License

GPL (>= 3)

Issues

Pull Requests

Stars

Forks

Maintainer

Yabing Song

Last Published

November 21st, 2023

Functions in scfetch (0.5.0)

ExtractGEOExpSuppAll

Extract Raw Count Matrix from Supplementary Files or Fortmat Supplementary Files to 10x.
ImportSeurat

Convert Other Formats to SeuratObject.
ParsePanglaoDB

Parse PanglaoDB Data.
ParseZenodo

Download Data with Zenodo DOI.
ExtractGEOSubMeta

Extract Sample Metadata.
ExtractGEOInfo

Extract GEO Study Information.
ParseCELLxGENE

Download CELLxGENE Datasets.
ExtractPanglaoDBComposition

Extract Cell Type Composition of PanglaoDB Datasets.
ParseCBDatasets

Download UCSC Cell Browser Datasets.
ExtractRun

Extract Runs with GEO Accession Number or GSM Number.
ParseGEO

Download Matrix from GEO and Load to Seurat.
SCEAnnData

Data Format Conversion between SingleCellExperiment and AnnData.
SCELoom

Data Format Conversion between SingleCellExperiment and loom.
StatDBAttribute

Stat Database Attributes.
PanglaoDBMeta

All Sample Metadata of PanglaoDB Datasets
PanglaoDBComposition

All Sample Composition of PanglaoDB Datasets
ParseHCA

Download Human Cell Atlas Datasets.
ShowCBDatasets

Show All Available Datasets in UCSC Cell Browser.
ShowCELLxGENEDatasets

Show All Available Datasets in CELLxGENE.
ShowHCAProjects

Show All Available Projects in Human Cell Atlas.
SplitSRA

Split SRA to fastq Files and Format to 10x Standard Style.
DownloadSRA

Download SRA.
ExtractCBComposition

Extract Cell Type Composition of UCSC Cell Browser Datasets.
ExportSeurat

Export SeuratObject to Other Formats.
ExtractGEOExpSupp

Extract Raw Count Matrix from Supplementary Files.
ExtractCBDatasets

Extract UCSC Cell Browser Datasets with Attributes.
ExtractPanglaoDBMeta

Extract Metadata of scRNA-seq Datasets in PanglaoDB.
ExtractGEOMeta

Extract Sample Metadata from GEO.
ExtractGEOExpSupp10x

Fortmat Supplementary Files to 10x.
ExtractGEOExp

Extract Raw Count Matrix or Fortmat Supplementary Files to 10x.
ExtractHCAMeta

Extract Metadata of Human Cell Atlas Projects with Attributes.
ExtractCELLxGENEMeta

Extract Metadata of CELLxGENE Datasets with Attributes.
DownloadBam

Download bam.
Bam2Fastq

Convert bam files to fastq files.
ExtractZenodoMeta

Prepare Dataframe with Zenodo DOIs.