Fred2.IO Module¶
IO.ADBAdapter¶
IO.EnsemblAdapter¶
IO.FileReader¶
-
Fred2.IO.FileReader.
read_annovar_exonic
(annovar_file, gene_filter=None, experimentalDesig=None)¶ Reads an gene-based ANNOVAR output file and generates
Variant
objects containing all annotatedTranscript
ids an outputs a listVariant
.Parameters: - annovar_file (str) – The path ot the ANNOVAR file
- gene_filter (list(str)) – A list of gene names of interest (only variants associated with these genes are generated)
Returns: List of :class:`~Fred2.Core.Variant.Variants fully annotated
Return type: list(
Variant
)
-
Fred2.IO.FileReader.
read_fasta
(files, in_type=<class 'Fred2.Core.Peptide.Peptide'>, id_position=1)¶ Generator function:
Read a (couple of) peptide, protein or rna sequence from a FASTA file. User needs to specify the correct type of the underlying sequences. It can either be: Peptide, Protein or Transcript (for RNA).
Parameters: - files – A (list) of file names to read in
- in_type (
Peptide
orTranscript
orProtein
) – The type to read in - id_position (int) – the position of the id specified counted by |
In_type files: list(str) or str
Returns: a list of the specified sequence type derived from the FASTA file sequences.
Return type: (list(
in_type
))Raises: ValueError – if a file is not readable
-
Fred2.IO.FileReader.
read_lines
(files, in_type=<class 'Fred2.Core.Peptide.Peptide'>)¶ Generator function:
Read a sequence directly from a line. User needs to manually specify the correct type of the underlying data. It can either be: Peptide, Protein or Transcript, Allele.
Parameters: - files – a list of strings of absolute file names that are to be read.
- in_type (
Peptide
orProtein
orTranscript
orAllele
) – Possible in_type arePeptide
,Protein
,Transcript
, andAllele
.
In_type files: list(str) or str
Returns: A list of the specified objects
Return type: (list(
in_type
))Raises: IOError – if a file is not readable
-
Fred2.IO.FileReader.
read_vcf
(vcf_file, gene_filter=None, experimentalDesig=None)¶ Reads an vcf v4.0 or 4.1 file and generates
Variant
objects containing all annotatedTranscript
ids an outputs a listVariant
. Only the following variants are considered by the reader where synonymous labeled variants will not be integrated into any variant: filter_variants = [‘missense_variant’, ‘frameshift_variant’, ‘stop_gained’, ‘missense_variant&splice_region_variant’, “synonymous_variant”, “inframe_deletion”, “inframe_insertion”]Parameters: - vcf_file (str) – The path ot the vcf file
- gene_filter (list(str)) – A list of gene names of interest (only variants associated with these genes are generated)
Returns: List of :class:`~Fred2.Core.Variant.Variants fully annotated
Return type: Tuple of (list(
Variant
), list(transcript_ids)
IO.MartsAdapter¶
-
class
Fred2.IO.MartsAdapter.
MartsAdapter
(usr=None, host=None, pwd=None, db=None, biomart=None)¶ Bases:
Fred2.IO.ADBAdapter.ADBAdapter
-
get_all_variant_gene
(locations, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')¶ Fetches the important db ids and names for given chromosomal location
Parameters: - chrom (int) – Integer value of the chromosome in question
- start (int) – Integer value of the variation start position on given chromosome
- stop (int) – Integer value of the variation stop position on given chromosome
Returns: The respective gene name, i.e. the first one reported
-
get_all_variant_ids
(**kwargs)¶ Fetches the important db ids and names for given gene _or_ chromosomal location. The former is recommended. AResult is a list of dicts with either of the tree combinations:
- ‘Ensembl Gene ID’, ‘Ensembl Transcript ID’, ‘Ensembl Protein ID’
- ‘RefSeq Protein ID [e.g. NP_001005353]’, ‘RefSeq mRNA [e.g. NM_001195597]’, first triplet
- ‘RefSeq Predicted Protein ID [e.g. XP_001720922]’, ‘RefSeq mRNA predicted [e.g. XM_001125684]’, first triplet
Parameters: - 'locations' – list of locations as triplets of integer values representing (chrom, start, stop)
- 'genes' – list of genes as string value of the genes of variation
Returns: The list of dicts of entries with transcript and protein ids (either NM+NP or XM+XP)
-
get_ensembl_ids_from_id
(gene_id, **kwargs)¶ Returns a list of gene-transcript-protein ids from some sort of id
Parameters: - gene_id (str) – The id to be queried
- type (
EIdentifierTypes()
) – Assumes given ID from type found in list ofEIdentifierTypes()
, default is gene name - _db (str) – can override MartsAdapter default db (“hsapiens_gene_ensembl”)
- _dataset (str) – specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)
Returns: Containing information about the corresponding (linked) entries.
Return type: list(dict)
-
get_gene_by_position
(chrom, start, stop, **kwargs)¶ Fetches the gene name for given chromosomal location
Parameters: - chrom (int) – Integer value of the chromosome in question
- start (int) – Integer value of the variation start position on given chromosome
- stop (int) – Integer value of the variation stop position on given chromosome
- _db (str) – Can override MartsAdapter default db (“hsapiens_gene_ensembl”)
- _dataset (str) – Specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)
Returns: The respective gene name, i.e. the first one reported
Return type: str
-
get_product_sequence
(product_id, **kwargs)¶ Fetches product (i.e. protein) sequence for the given id
Parameters: - product_id (str) – The id to be queried
- type (
EIdentifierTypes()
) – Assumes given ID from type found inEIdentifierTypes()
, default is ensembl_peptide_id - _db (str) – Can override MartsAdapter default db (“hsapiens_gene_ensembl”)
- _dataset (str) – Specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)
Returns: The requested sequence
Return type: str
-
get_transcript_information
(transcript_id, **kwargs)¶ Fetches transcript sequence, gene name and strand information for the given id
Parameters: - transcript_id (str) – The id to be queried
- type (
EIdentifierTypes()
) – Assumes given ID from type found inEIdentifierTypes()
, default is ensembl_transcript_id - _db (str) – Can override MartsAdapter default db (“hsapiens_gene_ensembl”)
- _dataset (str) – Specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)
Returns: Dictionary of the requested keys as in EAdapterFields.ENUM
Return type: dict
-
get_transcript_information_from_protein_id
(product_id, **kwargs)¶ Fetches transcript sequence for the given id
Parameters: - product_id (str) – The id to be queried
- type (
EIdentifierTypes()
) – Assumes given ID from type found inEIdentifierTypes()
, default is ensembl_peptide_id - _db (str) – Can override MartsAdapter default db (“hsapiens_gene_ensembl”)
- _dataset (str) – Specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)
Returns: List of dictionary of the requested sequence, the respective strand and the associated gene name
Return type: list(dict)
-
get_transcript_position
(transcript_id, start, stop, **kwargs)¶ If no transcript position is available for a variant, it can be retrieved if the mart has the transcripts connected to the CDS and the exons positions
Parameters: - transcript_id (str) – The id to be queried
- start (int) – First genomic position to be mapped
- stop (int) – Last genomic position to be mapped
- type (
EIdentifierTypes()
) – Assumes given ID from type found inEIdentifierTypes()
, default is ensembl_transcript_id - _db (str) – Can override MartsAdapter default db (“hsapiens_gene_ensembl”)
- _dataset (str) – Specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)
Returns: A tuple of the mapped positions start, stop
Return type: int
-
get_transcript_sequence
(transcript_id, **kwargs)¶ Fetches transcript sequence for the given id
Parameters: - transcript_id (str) – The id to be queried
- type (
EIdentifierTypes()
) – Assumes given ID from type found inEIdentifierTypes()
, default is ensembl_transcript_id - _db (str) – Can override MartsAdapter default db (“hsapiens_gene_ensembl”)
- _dataset (str) – Specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)
Returns: The requested sequence
Return type: str
-
get_variant_id_from_protein_id
(transcript_id, **kwargs)¶ Returns all information needed to instantiate a variation
Parameters: - transcript_id (str) – The id to be queried
- type (
EIdentifierTypes()
) – assumes given ID from type found inEIdentifierTypes()
, default is ensembl_transcript_id - _db (str) – can override MartsAdapter default db (“hsapiens_gene_ensembl”)
- _dataset (str) – specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)
Returns: Containing all information needed for a variant initialization
Return type: list(dict)
-
get_variant_ids
(**kwargs)¶ Fetches the important db ids and names for given gene _or_ chromosomal location. The former is recommended. AResult is a list of dicts with either of the tree combinations:
- ‘Ensembl Gene ID’, ‘Ensembl Transcript ID’, ‘Ensembl Protein ID’
- ‘RefSeq Protein ID [e.g. NP_001005353]’, ‘RefSeq mRNA [e.g. NM_001195597]’, first triplet
- ‘RefSeq Predicted Protein ID [e.g. XP_001720922]’, ‘RefSeq mRNA predicted [e.g. XM_001125684]’, first triplet
Parameters: - 'chrom' – integer value of the chromosome in question
- 'start' – integer value of the variation start position on given chromosome
- 'stop' – integer value of the variation stop position on given chromosome
- 'gene' – string value of the gene of variation
- 'transcript_id' – string value of the gene of variation
Returns: The list of dicts of entries with transcript and protein ids (either NM+NP or XM+XP)
-
IO.RefSeqAdapter¶
Deprecated since version 1.0.
-
class
Fred2.IO.RefSeqAdapter.
RefSeqAdapter
(**kwargs)¶ Bases:
Fred2.IO.ADBAdapter.ADBAdapter
-
get_product_sequence
(product_refseq, **kwargs)¶ Fetches product sequence for the given id
Parameters: product_refseq (str) – Given refseq id Returns: List of dictionaries of the requested sequence, the respective strand and the associated gene name Return type: list(dict)
-
get_transcript_information
(transcript_refseq, **kwargs)¶ Fetches transcript sequence for the given id
Parameters: - transcript_id (str) – The transcript ID as string
- type – Given id, is in the form of this type,found in
EIdentifierTypes()
. It is to be documented if an ADBAdapter implementation overrides these types.
Returns: list of dictionary of the requested sequence, the respective strand and the associated gene name
Return type: list(dict)
-
get_transcript_sequence
(transcript_refseq, **kwargs)¶ Fetches transcript sequence for the given id :param transcript_refseq: :return: list of dictionary of the requested sequence, the respective strand and the associated gene name
-
load
(filename)¶
-
IO.UniProtAdapter¶
Deprecated since version 1.0.
-
class
Fred2.IO.UniProtAdapter.
UniProtDB
(**kwargs)¶ -
exists
(seq)¶ fast check if given sequence exists (as subsequence) in one of the UniProtDB objects collection of sequences.
Parameters: seq – the subsequence to be searched for Returns: True, if it is found somewhere, False otherwise
-
read_seqs
(sequence_file)¶ read sequences from uniprot files (.dat or .fasta) or from lists or dicts of BioPython SeqRecords and make them available for fast search. Appending also with this function.
Parameters: sequence_file – uniprot files (.dat or .fasta) Returns:
-
search
(seq)¶ search for first occurrence of given sequence(s) in the UniProtDB objects collection returning (each) the fasta header front part of the first occurrence.
Parameters: seq – a string interpreted as a single sequence or a list (of str) interpreted as a coll. of sequences Returns: a dictionary of sequences to lists (of ids, ‘null’ if n/a)
-
search_all
(seq)¶ search for all occurrences of given sequence(s) in the UniProtDB objects collection returning (each) the fasta header front part of all occurrences.
Parameters: seq – a string interpreted as a single sequence or a list (of str) interpreted as a coll. of sequences Returns: a dictionary of the given sequences to lists (of ids, ‘null’ if n/a)
-
write_seqs
(name)¶ writes all fasta entries in the current object into one fasta file
Parameters: name – the complete path with file name where the fasta is going to be written
-