Title: | Common Computational Operations Working with RefSeq Entries (GenBank) |
---|---|
Description: | Fetches NCBI data (RefSeq <https://www.ncbi.nlm.nih.gov/refseq/> database) and provides an environment to extract information at the level of gene, mRNA or protein accessions. |
Authors: | Jose V. Die [aut, cre] |
Maintainer: | Jose V. Die <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.6 |
Built: | 2025-03-25 05:36:30 UTC |
Source: | https://github.com/jdieramon/refseqr |
refseq_AAlen()
Returns the amino acid length from a single protein accession.
Depending on the function, available accessions in refseqR
include RefSeq models with the prefixes XM_ (mRNA), XR_ (non-coding RNA), and XP_ (protein), as well as subsequently curated RefSeq records with NM_, NR_, or NP_ accession prefixes.
refseq_AAlen(protein)
refseq_AAlen(protein)
protein |
A character string of the protein id. |
A numeric value representing the aa length of the protein
.
Jose V. Die
refseq_RNA2protein
to obtain the protein ids encoded by a set of transcript ids.
# Get the amino acid lengths from a set of protein accessions protein = c("XP_004487758", "XP_004488550") sapply(protein, function(x) refseq_AAlen(x), USE.NAMES = FALSE)
# Get the amino acid lengths from a set of protein accessions protein = c("XP_004487758", "XP_004488550") sapply(protein, function(x) refseq_AAlen(x), USE.NAMES = FALSE)
refseq_AA_len_action()
Returns the amino acid length from a single protein accession.
Depending on the function, available accessions in refseqR
include RefSeq models with the prefixes XM_ (mRNA), XR_ (non-coding RNA), and XP_ (protein), as well as subsequently curated RefSeq records with NM_, NR_, or NP_ accession prefixes.
refseq_AAlen_action(protein, retries)
refseq_AAlen_action(protein, retries)
protein |
A character string of the protein id. |
retries |
A numeric value to control the number of retry attempts to handle internet errors. |
A numeric value representing the aa length of the protein
.
Jose V. Die
refseq_RNA2protein
to obtain the protein ids encoded by a set of transcript ids.
refseq_AAmol_wt()
Parses a protein accession output (RefSeq format) and extract the molecular weight
(in Daltons).
Depending on the function, available accessions in refseqR
include RefSeq models with the prefixes XM_ (mRNA), XR_ (non-coding RNA), and XP_ (protein), as well as subsequently curated RefSeq records with NM_, NR_, or NP_ accession prefixes.
refseq_AAmol_wt(protein)
refseq_AAmol_wt(protein)
protein |
A character string of the protein id. |
First, get the character vector containing the fetched record. Then, this function parses the fetched record and returns the molecular weight.
A numeric vector representing the molecular weight of the protein
.
Jose V. Die
refseq_RNA2protein
to obtain the protein ids encoded by a set of transcript ids.
# Get the molecular weight from a single protein accession protein <- "XP_020244413" refseq_AAmol_wt(protein) # Get the molecular weight from from a set of protein accessions protein = c("XP_004487758", "XP_004488550") sapply(protein, function(x) refseq_AAmol_wt(x), USE.NAMES = TRUE)
# Get the molecular weight from a single protein accession protein <- "XP_020244413" refseq_AAmol_wt(protein) # Get the molecular weight from from a set of protein accessions protein = c("XP_004487758", "XP_004488550") sapply(protein, function(x) refseq_AAmol_wt(x), USE.NAMES = TRUE)
refseq_AAseq()
Parses a single/multiple protein accessions (RefSeq format) and extract
the amino acid sequence(s) into a AAStringSet
object.
Depending on the function, available accessions in refseqR
include RefSeq models with the prefixes XM_ (mRNA), XR_ (non-coding RNA), and XP_ (protein), as well as subsequently curated RefSeq records with NM_, NR_, or NP_ accession prefixes.
refseq_AAseq(accession)
refseq_AAseq(accession)
accession |
A character string containing a single/multiple accession ids. |
An object of AAStringSet
class.
Jose V. Die
accession = c("XP_004487758", "XP_004488550", "XP_004501961") my_aa <- refseq_AAseq(accession) # Now, the `AAStringSet`can be easily used to make a fasta file : # writeXStringSet(x= my_aa, filepath = "aa_result")
accession = c("XP_004487758", "XP_004488550", "XP_004501961") my_aa <- refseq_AAseq(accession) # Now, the `AAStringSet`can be easily used to make a fasta file : # writeXStringSet(x= my_aa, filepath = "aa_result")
refseq_CDScoords()
Parses a transcript accession (RefSeq format) and extract the CDS coordinates.
The CDS coordinates refer to the mRNA molecule.
Depending on the function, available accessions in refseqR
include RefSeq models with the prefixes XM_ (mRNA), XR_ (non-coding RNA), and XP_ (protein), as well as subsequently curated RefSeq records with NM_, NR_, or NP_ accession prefixes.
refseq_CDScoords(transcript)
refseq_CDScoords(transcript)
transcript |
A character string of the single/multiple transcript id. |
An IRanges
object with the start and end position of the CDS of the
putative mRNAs.
Jose V. Die
transcript = c("XM_004487701") refseq_CDScoords(transcript) transcript = c("XM_004487701", "XM_004488493") refseq_CDScoords(transcript)
transcript = c("XM_004487701") refseq_CDScoords(transcript) transcript = c("XM_004487701", "XM_004488493") refseq_CDScoords(transcript)
refseq_CDSseq()
Parses a single/multiple transcript accessions (RefSeq format) and extract
the CDS nucleotide sequences into a DNAStringSet
object.
Depending on the function, available accessions in refseqR
include RefSeq models with the prefixes XM_ (mRNA), XR_ (non-coding RNA), and XP_ (protein), as well as subsequently curated RefSeq records with NM_, NR_, or NP_ accession prefixes.
refseq_CDSseq(transcript)
refseq_CDSseq(transcript)
transcript |
A character string of the single/multiple transcript id. |
An object of DNAStringSet
class.
Jose V. Die
transcript <- c("XM_004487701", "XM_004488493", "XM_004501904") my_cds <- refseq_CDSseq(transcript) # Now, the `DNAStringSet` can easily used to make a fasta file : # writeXStringSet(x= my_cds, filepath = "cds_result")
transcript <- c("XM_004487701", "XM_004488493", "XM_004501904") my_cds <- refseq_CDSseq(transcript) # Now, the `DNAStringSet` can easily used to make a fasta file : # writeXStringSet(x= my_cds, filepath = "cds_result")
refseq_description()
Returns the sequence description from a single transcript, protein, or GeneID accession.
Depending on the function, available accessions in refseqR
include RefSeq models with the prefixes transcript_ (mRNA), XR_ (non-coding RNA), and XP_ (protein), as well as subsequently curated RefSeq records with NM_, NR_, or NP_ accession prefixes.
refseq_description(id)
refseq_description(id)
id |
A character string of the transcript, protein, or GeneID accession. |
A character vector containing the sequence description corresponding to the specified sequence as id
.
Jose V. Die
refseq_protein2RNA
to obtain the transcript ids that encode a set of protein ids.
refseq_RNA2protein
to obtain the protein ids encoded by a set of transcript ids.
## Not run: # Get the sequence descriptions from a set of transcript accessions transcript = c("XM_004487701") sapply(transcript, function(x) refseq_description(x), USE.NAMES = FALSE) # Get the sequence descriptions from a set of protein accessions protein = c("XP_004487758") sapply(protein, function(x) refseq_description(x), USE.NAMES = FALSE) #' # Get the sequence descriptions from a set of Gene accessions locs <- c("LOC101512347", "LOC101506901") sapply(locs, function(x) refseq_description(x), USE.NAMES = FALSE) ## End(Not run)
## Not run: # Get the sequence descriptions from a set of transcript accessions transcript = c("XM_004487701") sapply(transcript, function(x) refseq_description(x), USE.NAMES = FALSE) # Get the sequence descriptions from a set of protein accessions protein = c("XP_004487758") sapply(protein, function(x) refseq_description(x), USE.NAMES = FALSE) #' # Get the sequence descriptions from a set of Gene accessions locs <- c("LOC101512347", "LOC101506901") sapply(locs, function(x) refseq_description(x), USE.NAMES = FALSE) ## End(Not run)
refseq_fromGene()
Returns the mRNA or protein accession from a single GeneID.
refseq_fromGene(GeneID, sequence)
refseq_fromGene(GeneID, sequence)
GeneID |
A character string of the GeneID. |
sequence |
A character string of the mRNA or protein accession to fetch data from mRNA or protein databases, respectively. |
A character vector containing the mRNA or protein accession corresponding to the especified GeneID
.
Jose V. Die
refseq_protein2RNA
to obtain the transcript accessions that encode a set of protein accessions.
refseq_RNA2protein
to obtain the protein accessions encoded by a set of transcript accessions.
# Get the XM accessions from a set of gene ids locs <- c("LOC101512347") sapply(locs, function(x) refseq_fromGene (x, sequence = "transcript"), USE.NAMES = FALSE) # Get the XP accessions from a set of gene ids locs <- c("LOC101512347") sapply(locs, function(x) refseq_fromGene (x, sequence = "protein"), USE.NAMES = FALSE)
# Get the XM accessions from a set of gene ids locs <- c("LOC101512347") sapply(locs, function(x) refseq_fromGene (x, sequence = "transcript"), USE.NAMES = FALSE) # Get the XP accessions from a set of gene ids locs <- c("LOC101512347") sapply(locs, function(x) refseq_fromGene (x, sequence = "protein"), USE.NAMES = FALSE)
refseq_fromGene_action()
Returns the mRNA or protein accession from a single GeneID.
refseq_fromGene_action(GeneID, sequence, retries)
refseq_fromGene_action(GeneID, sequence, retries)
GeneID |
A character string of the GeneID. |
sequence |
A character string of the mRNA or protein accession to fetch data from mRNA or protein databases, respectively. |
retries |
A numeric value to control the number of retry attempts to handle 502 errors. |
A character vector containing the mRNA or protein accession corresponding to the especified GeneID
.
Jose V. Die
refseq_GeneID()
Returns the GeneID from a single transcript or protein accession.
Depending on the function, available accessions in refseqR
include RefSeq models with the prefixes XM_ (mRNA), XR_ (non-coding RNA), and XP_ (protein), as well as subsequently curated RefSeq records with NM_, NR_, or NP_ accession prefixes.
refseq_GeneID (accession, db, retries)
refseq_GeneID (accession, db, retries)
accession |
A character string of the transcript or protein accession. |
db |
A character string of the "nuccore" or "protein" database. |
retries |
A numeric value to control the number of retry attempts to handle internet errors. |
A character vector containing the GeneID corresponding to the specified accession as accession
.
Jose V. Die
refseq_protein2RNA
to obtain the transcript accessions that encode a set of protein accessions.
refseq_RNA2protein
to obtain the protein accessions encoded by a set of transcript accessions.
## Not run: # Get the gene symbol from a set of transcript accessions transcript = c("XM_004487701") sapply(transcript, function(x) refseq_GeneID (x, db = "nuccore", retries = 4), USE.NAMES = FALSE) # Get the gene symbol from a set of protein accessions protein = c("XP_004487758") sapply(protein, function(x) refseq_GeneID (x, db = "protein", retries = 4), USE.NAMES = FALSE) ## End(Not run)
## Not run: # Get the gene symbol from a set of transcript accessions transcript = c("XM_004487701") sapply(transcript, function(x) refseq_GeneID (x, db = "nuccore", retries = 4), USE.NAMES = FALSE) # Get the gene symbol from a set of protein accessions protein = c("XP_004487758") sapply(protein, function(x) refseq_GeneID (x, db = "protein", retries = 4), USE.NAMES = FALSE) ## End(Not run)
refseq_geneSymbol()
Returns the gene symbol from a single Gene id. accession.
refseq_geneSymbol (id, db)
refseq_geneSymbol (id, db)
id |
A character string of the transcript or protein id. |
db |
A character string of the "nuccore" or "protein" database. |
A character vector containing the gene symbol corresponding to the especified accession as id
.
Jose V. Die
refseq_protein2RNA
to obtain the transcript ids that encode a set of protein ids.
refseq_RNA2protein
to obtain the protein ids encoded by a set of transcript ids.
# Get the gene symbol from a set of transcript accessions id = c("XM_004487701", "XM_004488493") sapply(id, function(x) refseq_geneSymbol (x, db = "nuccore"), USE.NAMES = FALSE) # Get the gene symbol from a set of XP accessions id = c("XP_004487758") sapply(id, function(x) refseq_geneSymbol (x, db = "protein"), USE.NAMES = FALSE)
# Get the gene symbol from a set of transcript accessions id = c("XM_004487701", "XM_004488493") sapply(id, function(x) refseq_geneSymbol (x, db = "nuccore"), USE.NAMES = FALSE) # Get the gene symbol from a set of XP accessions id = c("XP_004487758") sapply(id, function(x) refseq_geneSymbol (x, db = "protein"), USE.NAMES = FALSE)
refseq_geneSymbol_action()
Returns the gene symbol from a single Gene id. accession.
refseq_geneSymbol_action (id, db, retries)
refseq_geneSymbol_action (id, db, retries)
id |
A character string of the transcript or protein id. |
db |
A character string of the "nuccore" or "protein" database. |
retries |
A numeric value to control the number of retry attempts to handle internet errors. |
A character vector containing the gene symbol corresponding to the especified accession as id
.
Jose V. Die
refseq_protein2RNA
to obtain the XM ids that encode a set of XP ids.
refseq_RNA2protein
to obtain the XP ids encoded by a set of XM ids.
refseq_mRNAfeat()
Returns a number of features from a single/multiple mRNA accession(s).
Depending on the function, available accessions in refseqR
include RefSeq models with the prefixes XM_ (mRNA), XR_ (non-coding RNA), and XP_ (protein), as well as subsequently curated RefSeq records with NM_, NR_, or NP_ accession prefixes.
refseq_mRNAfeat(transcript , feat)
refseq_mRNAfeat(transcript , feat)
transcript |
A character string of the transcript id. |
feat |
A character string of the selected features. Allowed features: 'caption', 'moltype', 'sourcedb', 'updatedate', 'slen', 'organism', 'title'. |
A tibble
of summarized results including columns:
caption, mRNA accession
moltype, type of molecule
sourcedb, database (GenBank)
updatedate, date of updated record
slen, molecule length (in bp)
organism
title, sequence description
Jose V. Die
refseq_fromGene
to obtain the transcript or protein accession from a single GeneID accession.
refseq_RNA2protein
to obtain the protein accessions encoded by a set of transcript ids.
# Get several molecular features from a set of mRNA accessions transcript = c("XM_004487701", "XM_004488493", "XM_004501904") feat = c("caption", "moltype", "sourcedb", "slen") refseq_mRNAfeat(transcript ,feat)
# Get several molecular features from a set of mRNA accessions transcript = c("XM_004487701", "XM_004488493", "XM_004501904") feat = c("caption", "moltype", "sourcedb", "slen") refseq_mRNAfeat(transcript ,feat)
refseq_protein2RNA()
Returns the transcript accession from a single protein accession.
Depending on the function, available accessions in refseqR
include RefSeq models with the prefixes XM_ (mRNA), XR_ (non-coding RNA), and XP_ (protein), as well as subsequently curated RefSeq records with NM_, NR_, or NP_ accession prefixes.
refseq_protein2RNA(protein)
refseq_protein2RNA(protein)
protein |
A character string of the protein id. |
A character vector containing the transcript ids that encode the protein
.
Jose V. Die
refseq_RNA2protein
to obtain the protein ids encoded by a set of transcript ids.
## Not run: # Get the transcript id from a single protein accession protein <- "XP_020244413" refseq_protein2RNA(protein) # Get the transcript ids from a set of protein accessions protein = c("XP_004487758", "XP_004488550") sapply(protein, function(x) refseq_protein2RNA(x), USE.NAMES = FALSE) ## End(Not run)
## Not run: # Get the transcript id from a single protein accession protein <- "XP_020244413" refseq_protein2RNA(protein) # Get the transcript ids from a set of protein accessions protein = c("XP_004487758", "XP_004488550") sapply(protein, function(x) refseq_protein2RNA(x), USE.NAMES = FALSE) ## End(Not run)
refseq_RNA2protein()
Returns the protein accession from a single transcript accession.
Depending on the function, available accessions in refseqR
include RefSeq models with the prefixes XM_ (mRNA), XR_ (non-coding RNA), and XP_ (protein), as well as subsequently curated RefSeq records with NM_, NR_, or NP_ accession prefixes.
refseq_RNA2protein(transcript)
refseq_RNA2protein(transcript)
transcript |
A character string of the protein accession. |
A character vector containing the protein id encoded by the mRNA especified as transcript
.
Jose V. Die
refseq_protein2RNA
to obtain the transcript ids that encode a set of proteins ids.
## Not run: # Get the protein id from a single transcript accession transcript <- "XM_004487701" refseq_RNA2protein(transcript) # Get the protein ids from a set of transcript accessions transcript = c("XM_004487701", "XM_004488493") sapply(transcript, function(x) refseq_RNA2protein(x), USE.NAMES = FALSE) ## End(Not run)
## Not run: # Get the protein id from a single transcript accession transcript <- "XM_004487701" refseq_RNA2protein(transcript) # Get the protein ids from a set of transcript accessions transcript = c("XM_004487701", "XM_004488493") sapply(transcript, function(x) refseq_RNA2protein(x), USE.NAMES = FALSE) ## End(Not run)
refseqR is a framework of common computational operations working with RefSeq entries (GenBank)
Jose V. Die [email protected]
Useful links: