Fred2.IO Module¶

IO.ADBAdapter¶

IO.EnsemblAdapter¶

IO.FileReader¶

Fred2.IO.FileReader.read_annovar_exonic(annovar_file, gene_filter=None, experimentalDesig=None)¶

Reads an gene-based ANNOVAR output file and generates Variant objects containing all annotated Transcript ids an outputs a list Variant.

Parameters:	annovar_file (str) – The path ot the ANNOVAR file gene_filter (list(str)) – A list of gene names of interest (only variants associated with these genes are generated)
Returns:	List of :class:`~Fred2.Core.Variant.Variants fully annotated
Return type:	list(`Variant`)

Fred2.IO.FileReader.read_fasta(files, in_type=<class 'Fred2.Core.Peptide.Peptide'>, id_position=1)¶

Generator function:

Read a (couple of) peptide, protein or rna sequence from a FASTA file. User needs to specify the correct type of the underlying sequences. It can either be: Peptide, Protein or Transcript (for RNA).

Parameters:	files – A (list) of file names to read in in_type (`Peptide` or `Transcript` or `Protein`) – The type to read in id_position (int) – the position of the id specified counted by \|
In_type files:	list(str) or str
Returns:	a list of the specified sequence type derived from the FASTA file sequences.
Return type:	(list(`in_type`))
Raises:	ValueError – if a file is not readable

Fred2.IO.FileReader.read_lines(files, in_type=<class 'Fred2.Core.Peptide.Peptide'>)¶

Generator function:

Read a sequence directly from a line. User needs to manually specify the correct type of the underlying data. It can either be: Peptide, Protein or Transcript, Allele.

Parameters:	files – a list of strings of absolute file names that are to be read. in_type (`Peptide` or `Protein` or `Transcript` or `Allele`) – Possible in_type are `Peptide`, `Protein`, `Transcript`, and `Allele`.
In_type files:	list(str) or str
Returns:	A list of the specified objects
Return type:	(list(`in_type`))
Raises:	IOError – if a file is not readable

Fred2.IO.FileReader.read_vcf(vcf_file, gene_filter=None, experimentalDesig=None)¶

Reads an vcf v4.0 or 4.1 file and generates Variant objects containing all annotated Transcript ids an outputs a list Variant. Only the following variants are considered by the reader where synonymous labeled variants will not be integrated into any variant: filter_variants = [‘missense_variant’, ‘frameshift_variant’, ‘stop_gained’, ‘missense_variant&splice_region_variant’, “synonymous_variant”, “inframe_deletion”, “inframe_insertion”]

Parameters:	vcf_file (str) – The path ot the vcf file gene_filter (list(str)) – A list of gene names of interest (only variants associated with these genes are generated)
Returns:	List of :class:`~Fred2.Core.Variant.Variants fully annotated
Return type:	Tuple of (list(`Variant`), list(transcript_ids)

IO.MartsAdapter¶

class Fred2.IO.MartsAdapter.MartsAdapter(usr=None, host=None, pwd=None, db=None, biomart=None)¶

Bases: Fred2.IO.ADBAdapter.ADBAdapter

get_all_variant_gene(locations, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')¶

Fetches the important db ids and names for given chromosomal location

Parameters:	chrom (int) – Integer value of the chromosome in question start (int) – Integer value of the variation start position on given chromosome stop (int) – Integer value of the variation stop position on given chromosome
Returns:	The respective gene name, i.e. the first one reported

get_all_variant_ids(**kwargs)¶

Fetches the important db ids and names for given gene _or_ chromosomal location. The former is recommended. AResult is a list of dicts with either of the tree combinations:

‘Ensembl Gene ID’, ‘Ensembl Transcript ID’, ‘Ensembl Protein ID’

‘RefSeq Protein ID [e.g. NP_001005353]’, ‘RefSeq mRNA [e.g. NM_001195597]’, first triplet

‘RefSeq Predicted Protein ID [e.g. XP_001720922]’, ‘RefSeq mRNA predicted [e.g. XM_001125684]’, first triplet

Parameters:	'locations' – list of locations as triplets of integer values representing (chrom, start, stop) 'genes' – list of genes as string value of the genes of variation
Returns:	The list of dicts of entries with transcript and protein ids (either NM+NP or XM+XP)

get_ensembl_ids_from_id(gene_id, **kwargs)¶

Returns a list of gene-transcript-protein ids from some sort of id

Parameters:	gene_id (str) – The id to be queried type (`EIdentifierTypes()`) – Assumes given ID from type found in list of `EIdentifierTypes()` , default is gene name _db (str) – can override MartsAdapter default db (“hsapiens_gene_ensembl”) _dataset (str) – specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)
Returns:	Containing information about the corresponding (linked) entries.
Return type:	list(dict)

get_gene_by_position(chrom, start, stop, **kwargs)¶

Fetches the gene name for given chromosomal location

Parameters:	chrom (int) – Integer value of the chromosome in question start (int) – Integer value of the variation start position on given chromosome stop (int) – Integer value of the variation stop position on given chromosome _db (str) – Can override MartsAdapter default db (“hsapiens_gene_ensembl”) _dataset (str) – Specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)
Returns:	The respective gene name, i.e. the first one reported
Return type:	str

get_product_sequence(product_id, **kwargs)¶

Fetches product (i.e. protein) sequence for the given id

Parameters:	product_id (str) – The id to be queried type (`EIdentifierTypes()`) – Assumes given ID from type found in `EIdentifierTypes()`, default is ensembl_peptide_id _db (str) – Can override MartsAdapter default db (“hsapiens_gene_ensembl”) _dataset (str) – Specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)
Returns:	The requested sequence
Return type:	str

get_transcript_information(transcript_id, **kwargs)¶

Fetches transcript sequence, gene name and strand information for the given id

Parameters:	transcript_id (str) – The id to be queried type (`EIdentifierTypes()`) – Assumes given ID from type found in `EIdentifierTypes()`, default is ensembl_transcript_id _db (str) – Can override MartsAdapter default db (“hsapiens_gene_ensembl”) _dataset (str) – Specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)
Returns:	Dictionary of the requested keys as in EAdapterFields.ENUM
Return type:	dict

get_transcript_information_from_protein_id(product_id, **kwargs)¶

Fetches transcript sequence for the given id

Parameters:	product_id (str) – The id to be queried type (`EIdentifierTypes()`) – Assumes given ID from type found in `EIdentifierTypes()`, default is ensembl_peptide_id _db (str) – Can override MartsAdapter default db (“hsapiens_gene_ensembl”) _dataset (str) – Specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)
Returns:	List of dictionary of the requested sequence, the respective strand and the associated gene name
Return type:	list(dict)

get_transcript_position(transcript_id, start, stop, **kwargs)¶

If no transcript position is available for a variant, it can be retrieved if the mart has the transcripts connected to the CDS and the exons positions

Parameters:	transcript_id (str) – The id to be queried start (int) – First genomic position to be mapped stop (int) – Last genomic position to be mapped type (`EIdentifierTypes()`) – Assumes given ID from type found in `EIdentifierTypes()`, default is ensembl_transcript_id _db (str) – Can override MartsAdapter default db (“hsapiens_gene_ensembl”) _dataset (str) – Specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)
Returns:	A tuple of the mapped positions start, stop
Return type:	int

get_transcript_sequence(transcript_id, **kwargs)¶

Fetches transcript sequence for the given id

Parameters:	transcript_id (str) – The id to be queried type (`EIdentifierTypes()`) – Assumes given ID from type found in `EIdentifierTypes()`, default is ensembl_transcript_id _db (str) – Can override MartsAdapter default db (“hsapiens_gene_ensembl”) _dataset (str) – Specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)
Returns:	The requested sequence
Return type:	str

get_variant_id_from_protein_id(transcript_id, **kwargs)¶

Returns all information needed to instantiate a variation

Parameters:	transcript_id (str) – The id to be queried type (`EIdentifierTypes()`) – assumes given ID from type found in `EIdentifierTypes()`, default is ensembl_transcript_id _db (str) – can override MartsAdapter default db (“hsapiens_gene_ensembl”) _dataset (str) – specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)
Returns:	Containing all information needed for a variant initialization
Return type:	list(dict)

get_variant_ids(**kwargs)¶

Fetches the important db ids and names for given gene _or_ chromosomal location. The former is recommended. AResult is a list of dicts with either of the tree combinations:

‘Ensembl Gene ID’, ‘Ensembl Transcript ID’, ‘Ensembl Protein ID’

‘RefSeq Protein ID [e.g. NP_001005353]’, ‘RefSeq mRNA [e.g. NM_001195597]’, first triplet

‘RefSeq Predicted Protein ID [e.g. XP_001720922]’, ‘RefSeq mRNA predicted [e.g. XM_001125684]’, first triplet

Parameters:	'chrom' – integer value of the chromosome in question 'start' – integer value of the variation start position on given chromosome 'stop' – integer value of the variation stop position on given chromosome 'gene' – string value of the gene of variation 'transcript_id' – string value of the gene of variation
Returns:	The list of dicts of entries with transcript and protein ids (either NM+NP or XM+XP)

IO.RefSeqAdapter¶

Deprecated since version 1.0.

class Fred2.IO.RefSeqAdapter.RefSeqAdapter(**kwargs)¶

Bases: Fred2.IO.ADBAdapter.ADBAdapter

get_product_sequence(product_refseq, **kwargs)¶

Fetches product sequence for the given id

Parameters:	product_refseq (str) – Given refseq id
Returns:	List of dictionaries of the requested sequence, the respective strand and the associated gene name
Return type:	list(dict)

get_transcript_information(transcript_refseq, **kwargs)¶

Fetches transcript sequence for the given id

Parameters:	transcript_id (str) – The transcript ID as string type – Given id, is in the form of this type,found in `EIdentifierTypes()`. It is to be documented if an ADBAdapter implementation overrides these types.
Returns:	list of dictionary of the requested sequence, the respective strand and the associated gene name
Return type:	list(dict)

get_transcript_sequence(transcript_refseq, **kwargs)¶: Fetches transcript sequence for the given id :param transcript_refseq: :return: list of dictionary of the requested sequence, the respective strand and the associated gene name

load(filename)¶

IO.UniProtAdapter¶

Deprecated since version 1.0.

class Fred2.IO.UniProtAdapter.UniProtDB(**kwargs)¶

exists(seq)¶

fast check if given sequence exists (as subsequence) in one of the UniProtDB objects collection of sequences.

Parameters:	seq – the subsequence to be searched for
Returns:	True, if it is found somewhere, False otherwise

read_seqs(sequence_file)¶

read sequences from uniprot files (.dat or .fasta) or from lists or dicts of BioPython SeqRecords and make them available for fast search. Appending also with this function.

Parameters:	sequence_file – uniprot files (.dat or .fasta)
Returns:

search(seq)¶

search for first occurrence of given sequence(s) in the UniProtDB objects collection returning (each) the fasta header front part of the first occurrence.

Parameters:	seq – a string interpreted as a single sequence or a list (of str) interpreted as a coll. of sequences
Returns:	a dictionary of sequences to lists (of ids, ‘null’ if n/a)

search_all(seq)¶

search for all occurrences of given sequence(s) in the UniProtDB objects collection returning (each) the fasta header front part of all occurrences.

Parameters:	seq – a string interpreted as a single sequence or a list (of str) interpreted as a coll. of sequences
Returns:	a dictionary of the given sequences to lists (of ids, ‘null’ if n/a)

write_seqs(name)¶

writes all fasta entries in the current object into one fasta file

Parameters:	name – the complete path with file name where the fasta is going to be written