Fred2.IO Module




Fred2.IO.FileReader.read_annovar_exonic(annovar_file, gene_filter=None, experimentalDesig=None)

Reads an gene-based ANNOVAR output file and generates Variant objects containing all annotated Transcript ids an outputs a list Variant.

  • annovar_file (str) – The path ot the ANNOVAR file
  • gene_filter (list(str)) – A list of gene names of interest (only variants associated with these genes are generated)

List of :class:`~Fred2.Core.Variant.Variants fully annotated

Return type:


Fred2.IO.FileReader.read_fasta(files, in_type=<class 'Fred2.Core.Peptide.Peptide'>, id_position=1)

Generator function:

Read a (couple of) peptide, protein or rna sequence from a FASTA file. User needs to specify the correct type of the underlying sequences. It can either be: Peptide, Protein or Transcript (for RNA).

  • files – A (list) of file names to read in
  • in_type (Peptide or Transcript or Protein) – The type to read in
  • id_position (int) – the position of the id specified counted by |
In_type files:

list(str) or str


a list of the specified sequence type derived from the FASTA file sequences.

Return type:



ValueError – if a file is not readable

Fred2.IO.FileReader.read_lines(files, in_type=<class 'Fred2.Core.Peptide.Peptide'>)

Generator function:

Read a sequence directly from a line. User needs to manually specify the correct type of the underlying data. It can either be: Peptide, Protein or Transcript, Allele.

In_type files:

list(str) or str


A list of the specified objects

Return type:



IOError – if a file is not readable

Fred2.IO.FileReader.read_vcf(vcf_file, gene_filter=None, experimentalDesig=None)

Reads an vcf v4.0 or 4.1 file and generates Variant objects containing all annotated Transcript ids an outputs a list Variant. Only the following variants are considered by the reader where synonymous labeled variants will not be integrated into any variant: filter_variants = [‘missense_variant’, ‘frameshift_variant’, ‘stop_gained’, ‘missense_variant&splice_region_variant’, “synonymous_variant”, “inframe_deletion”, “inframe_insertion”]

  • vcf_file (str) – The path ot the vcf file
  • gene_filter (list(str)) – A list of gene names of interest (only variants associated with these genes are generated)

List of :class:`~Fred2.Core.Variant.Variants fully annotated

Return type:

Tuple of (list(Variant), list(transcript_ids)


class Fred2.IO.MartsAdapter.MartsAdapter(usr=None, host=None, pwd=None, db=None, biomart=None)

Bases: Fred2.IO.ADBAdapter.ADBAdapter

get_all_variant_gene(locations, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')

Fetches the important db ids and names for given chromosomal location

  • chrom (int) – Integer value of the chromosome in question
  • start (int) – Integer value of the variation start position on given chromosome
  • stop (int) – Integer value of the variation stop position on given chromosome

The respective gene name, i.e. the first one reported


Fetches the important db ids and names for given gene _or_ chromosomal location. The former is recommended. AResult is a list of dicts with either of the tree combinations:

  • ‘Ensembl Gene ID’, ‘Ensembl Transcript ID’, ‘Ensembl Protein ID’
  • ‘RefSeq Protein ID [e.g. NP_001005353]’, ‘RefSeq mRNA [e.g. NM_001195597]’, first triplet
  • ‘RefSeq Predicted Protein ID [e.g. XP_001720922]’, ‘RefSeq mRNA predicted [e.g. XM_001125684]’, first triplet
  • 'locations' – list of locations as triplets of integer values representing (chrom, start, stop)
  • 'genes' – list of genes as string value of the genes of variation

The list of dicts of entries with transcript and protein ids (either NM+NP or XM+XP)

get_ensembl_ids_from_id(gene_id, **kwargs)

Returns a list of gene-transcript-protein ids from some sort of id

  • gene_id (str) – The id to be queried
  • type (EIdentifierTypes()) – Assumes given ID from type found in list of EIdentifierTypes() , default is gene name
  • _db (str) – can override MartsAdapter default db (“hsapiens_gene_ensembl”)
  • _dataset (str) – specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)

Containing information about the corresponding (linked) entries.

Return type:


get_gene_by_position(chrom, start, stop, **kwargs)

Fetches the gene name for given chromosomal location

  • chrom (int) – Integer value of the chromosome in question
  • start (int) – Integer value of the variation start position on given chromosome
  • stop (int) – Integer value of the variation stop position on given chromosome
  • _db (str) – Can override MartsAdapter default db (“hsapiens_gene_ensembl”)
  • _dataset (str) – Specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)

The respective gene name, i.e. the first one reported

Return type:


get_product_sequence(product_id, **kwargs)

Fetches product (i.e. protein) sequence for the given id

  • product_id (str) – The id to be queried
  • type (EIdentifierTypes()) – Assumes given ID from type found in EIdentifierTypes(), default is ensembl_peptide_id
  • _db (str) – Can override MartsAdapter default db (“hsapiens_gene_ensembl”)
  • _dataset (str) – Specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)

The requested sequence

Return type:


get_transcript_information(transcript_id, **kwargs)

Fetches transcript sequence, gene name and strand information for the given id

  • transcript_id (str) – The id to be queried
  • type (EIdentifierTypes()) – Assumes given ID from type found in EIdentifierTypes(), default is ensembl_transcript_id
  • _db (str) – Can override MartsAdapter default db (“hsapiens_gene_ensembl”)
  • _dataset (str) – Specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)

Dictionary of the requested keys as in EAdapterFields.ENUM

Return type:


get_transcript_information_from_protein_id(product_id, **kwargs)

Fetches transcript sequence for the given id

  • product_id (str) – The id to be queried
  • type (EIdentifierTypes()) – Assumes given ID from type found in EIdentifierTypes(), default is ensembl_peptide_id
  • _db (str) – Can override MartsAdapter default db (“hsapiens_gene_ensembl”)
  • _dataset (str) – Specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)

List of dictionary of the requested sequence, the respective strand and the associated gene name

Return type:


get_transcript_position(transcript_id, start, stop, **kwargs)

If no transcript position is available for a variant, it can be retrieved if the mart has the transcripts connected to the CDS and the exons positions

  • transcript_id (str) – The id to be queried
  • start (int) – First genomic position to be mapped
  • stop (int) – Last genomic position to be mapped
  • type (EIdentifierTypes()) – Assumes given ID from type found in EIdentifierTypes(), default is ensembl_transcript_id
  • _db (str) – Can override MartsAdapter default db (“hsapiens_gene_ensembl”)
  • _dataset (str) – Specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)

A tuple of the mapped positions start, stop

Return type:


get_transcript_sequence(transcript_id, **kwargs)

Fetches transcript sequence for the given id

  • transcript_id (str) – The id to be queried
  • type (EIdentifierTypes()) – Assumes given ID from type found in EIdentifierTypes(), default is ensembl_transcript_id
  • _db (str) – Can override MartsAdapter default db (“hsapiens_gene_ensembl”)
  • _dataset (str) – Specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)

The requested sequence

Return type:


get_variant_id_from_protein_id(transcript_id, **kwargs)

Returns all information needed to instantiate a variation

  • transcript_id (str) – The id to be queried
  • type (EIdentifierTypes()) – assumes given ID from type found in EIdentifierTypes(), default is ensembl_transcript_id
  • _db (str) – can override MartsAdapter default db (“hsapiens_gene_ensembl”)
  • _dataset (str) – specifies the query dbs dataset if default is not wanted (“gene_ensembl_config”)

Containing all information needed for a variant initialization

Return type:



Fetches the important db ids and names for given gene _or_ chromosomal location. The former is recommended. AResult is a list of dicts with either of the tree combinations:

  • ‘Ensembl Gene ID’, ‘Ensembl Transcript ID’, ‘Ensembl Protein ID’
  • ‘RefSeq Protein ID [e.g. NP_001005353]’, ‘RefSeq mRNA [e.g. NM_001195597]’, first triplet
  • ‘RefSeq Predicted Protein ID [e.g. XP_001720922]’, ‘RefSeq mRNA predicted [e.g. XM_001125684]’, first triplet
  • 'chrom' – integer value of the chromosome in question
  • 'start' – integer value of the variation start position on given chromosome
  • 'stop' – integer value of the variation stop position on given chromosome
  • 'gene' – string value of the gene of variation
  • 'transcript_id' – string value of the gene of variation

The list of dicts of entries with transcript and protein ids (either NM+NP or XM+XP)


Deprecated since version 1.0.

class Fred2.IO.RefSeqAdapter.RefSeqAdapter(**kwargs)

Bases: Fred2.IO.ADBAdapter.ADBAdapter

get_product_sequence(product_refseq, **kwargs)

Fetches product sequence for the given id

Parameters:product_refseq (str) – Given refseq id
Returns:List of dictionaries of the requested sequence, the respective strand and the associated gene name
Return type:list(dict)
get_transcript_information(transcript_refseq, **kwargs)

Fetches transcript sequence for the given id

  • transcript_id (str) – The transcript ID as string
  • type – Given id, is in the form of this type,found in EIdentifierTypes(). It is to be documented if an ADBAdapter implementation overrides these types.

list of dictionary of the requested sequence, the respective strand and the associated gene name

Return type:


get_transcript_sequence(transcript_refseq, **kwargs)

Fetches transcript sequence for the given id :param transcript_refseq: :return: list of dictionary of the requested sequence, the respective strand and the associated gene name



Deprecated since version 1.0.

class Fred2.IO.UniProtAdapter.UniProtDB(**kwargs)

fast check if given sequence exists (as subsequence) in one of the UniProtDB objects collection of sequences.

Parameters:seq – the subsequence to be searched for
Returns:True, if it is found somewhere, False otherwise

read sequences from uniprot files (.dat or .fasta) or from lists or dicts of BioPython SeqRecords and make them available for fast search. Appending also with this function.

Parameters:sequence_file – uniprot files (.dat or .fasta)

search for first occurrence of given sequence(s) in the UniProtDB objects collection returning (each) the fasta header front part of the first occurrence.

Parameters:seq – a string interpreted as a single sequence or a list (of str) interpreted as a coll. of sequences
Returns:a dictionary of sequences to lists (of ids, ‘null’ if n/a)

search for all occurrences of given sequence(s) in the UniProtDB objects collection returning (each) the fasta header front part of all occurrences.

Parameters:seq – a string interpreted as a single sequence or a list (of str) interpreted as a coll. of sequences
Returns:a dictionary of the given sequences to lists (of ids, ‘null’ if n/a)

writes all fasta entries in the current object into one fasta file

Parameters:name – the complete path with file name where the fasta is going to be written