The CESCG Stem Cell Hub is a data warehouse for stem cell genomics files produced through the CIRM Genomics Initiative. It houses primary data files such as DNA sequencing reads in fastq format, as well as many other file types derived from read mapping and analysis of the primary data, and PDF and other document files describing protocols. A small but flexible system, tag storm, for associating metadata information with a file.
Any CIRM Genomics Initiative associated lab can submit data to the SC Hub. Once submitted, data is treated as prepublication data, with access only allowed to authorized users (data privacy is described in more detail in the Privacy section below). Contact us if you are a Genomics Initiative lab and would like to submit data.
Note there are some key differences between the video and the current proceess:
An account is not needed to access much of the data available through the SC Hub public site.
An account is needed to access data stored on our development server, which is intended for CESCG contributing labs to access their prepublication data. If you are associated with a contributing lab, please contact us for an account to access your data.
Once data is submitted to the SC Hub, access is only allowed to members of that lab. If you are part of a CIRM Genomics Initiative lab and would like access to your data, please contact us.
Once notified by the lab, the data will be released to the public meaning that anyone can download and access the data, even without an account. While this will be true for nearly all data, there will still be some data files that will need an approved account to access them. If there are files that you are interested in, but don't have access to you can request access to them.
The primary method for finding your data is through the File Search page, which can be found through "Browse > Files" in the menu at the top of the page.
The "Files" page by default displays a list of all available files in the SC Hub.
This list of files can be filtered using the boxes at the top of each column. Note, a list of available filters for that column can be seen and selected from by clicking into the filter cell and pressing down on the keyboard.
This filtering capability also takes advantage of UNIX wildcard syntax ("*"), which means that it will look for anything that matches the text before or after the *.
For example, if you wanted to find all datasets with "Cardio" in the name, you might filter the "data_set_id" column by "*Cardio*".
Once you've filtered your files to find those that you're interested in, you can download them.
Once you've filtered down the list of files to those that you are interested in, you can download them in one of two ways:
If you have many files to download, or a few large files that may take hours to download, you can use a variety of methods to download the files.
First, click the "Download All" link at the top of the page, from there you will be taken a page that lists the total number of files and their combined size as well as a few different download options:
Follow the instructions on that page to download the files. There are options for downloading your files using the command-line or web browser extensions. The URLs provided to you are valid for a week.
Questions? Comments? Feel free to contact our support team.