Design module

Cloning module

Module used for cloning of microbial strains.

teemi.design.cloning.CAS9_cutting(gRNA_record, background_record)[source]

Simulates double-stranded-break by CAS9 given a gRNA.

Parameters
  • gRNA_record (pydna.dseqrecord.) – A 20 bp DNA sequence

  • background_record (pydna.dseqrecord.) – The sequence of interest for CRISPR mediated DSB

Returns

  • 1. up (pydna.dseqrecord.) – Sequence upstream of the DSB: pydna.dseqrecord.

  • 2. dw (pydna.dseqrecord.) – Sequence downstream of the DSB: pydna.dseqrecord.

teemi.design.cloning.USER_enzyme(amplicon)[source]

Simulates digestion with USER enzyme.

Parameters

amplicon (pydna.amplicon.Amplicon) – An pydna.amplicon.Amplicon to with Uracil integrated

Returns

USER digested Dseqrecord with USER tails

Return type

Dseqrecord

teemi.design.cloning.add_feature_annotation_to_seqrecord(sequence: <module 'Bio.SeqRecord' from '/home/docs/checkouts/readthedocs.org/user_builds/teemi/envs/stable/lib/python3.8/site-packages/Bio/SeqRecord.py'>, label='', type_name='misc_feature', strand=0)None[source]

Adds feature, label and name to a Bio.Seqrecord sequence. :param sequence: :type sequence: Bio.SeqRecord :param label: :type label: str (optional) :param type_name: :type type_name: str (default: “misc_feature”) :param strand: :type strand: int (default 0)

Returns

Return type

None

teemi.design.cloning.casembler(bg_strain, site_names=None, gRNAs=None, parts=None, assembly_limits=None, assembly_names=None, verbose=False, to_benchling=False)[source]

Simulate in vivo assembly and integration with the possibility of printing to gb files or send it directly to benchling.

Parameters
  • bg_strain (GenBank) – strain of choice eg. genbank file

  • site_names (list) – list of names e.g. [X-3, XI-3]

  • gRNAs (Seqrecords) – list of 20 bp seqrecords e.g. [ATF1_gRNA, CroCPR_gRNA]

  • parts (list) – list of list of parts e.g. [[ATF1_repair_template],[CPR_repair_template]]

  • assembly_limits (list) – list of numbers of bp assembly limits e.g. [200,400]

  • assembly_names (list) – list of names of DNA post assembly e.g.[“X_3_tADH1_P2_pPGK1”, “XI_3_UP_DW”]

  • verbose (bool) – write DNA e.g. False

  • to_benchling (bool) – upload DNA to benchling e.g. False

Returns

of assembled contig

Return type

One dseqrecord

teemi.design.cloning.crispr_db_break_location(start_location, end_location, strand)[source]

Determine the CRISPR cut location in the genome.

Parameters
  • start_location (int) – Start position of the sgRNA sequence in the chromosome.

  • end_location (int) – End position of the sgRNA sequence in the chromosome.

  • strand (int) – Strand of the sgRNA sequence in the chromosome, +1 for positive strand, -1 for negative strand.

Returns

crispr_db_break – CRISPR cut location in the genome.

Return type

int

teemi.design.cloning.extract_gRNAs(template, name)[source]

Extracts gRNAs from a template.

Parameters
  • template (pydna.dseqrecord or pydna.amplicon.Amplicon) – a plasmid or piece of DNA

  • name (str) – a string that would include the feature name for example: gRNA

Returns

list of with the found features and their sequences

Return type

list of pydna.dseqrecord or pydna.amplicon.Amplicon

teemi.design.cloning.extract_sites(annotations, templates, names)[source]

This function extracts the sequences from annotated sequences based on their names

Parameters
  • annotations (list) – list of annotations for sequences that will be extracted

  • templates (list of Bio.SeqRecord.SeqRecord) – A list of Bio.SeqRecord.SeqRecord with SeqFeatures

  • names (str) – name of the sequence that will be extracted

Returns

record – list of extracted sites

Return type

list of Bio.SeqRecord.SeqRecord

teemi.design.cloning.extract_template_amplification_sites(templates, names, terminator)[source]

Extracts amplifications sites from a templates features

Parameters
  • templates (list of Bio.SeqRecord.SeqRecord) – list of Bio.SeqRecord.SeqRecord objects with SeqFeatures

  • names (list of strings) – list of strings to be extracted

  • terminator (str) – a string with the name of upstream terminator

Returns

record – list of extracted elements

Return type

list of Bio.SeqRecord.SeqRecord

teemi.design.cloning.find_all_occurrences_of_a_sequence(sequence: <module 'Bio.SeqRecord' from '/home/docs/checkouts/readthedocs.org/user_builds/teemi/envs/stable/lib/python3.8/site-packages/Bio/SeqRecord.py'>, sequence_to_search_in: <module 'Bio.SeqRecord' from '/home/docs/checkouts/readthedocs.org/user_builds/teemi/envs/stable/lib/python3.8/site-packages/Bio/SeqRecord.py'>)tuple[source]

Searches for all occurrences of a given sequence in a given string.

Parameters
  • sequence (Bio.SeqRecord) – Sequence to search for.

  • sequence_to_search_in (Bio.SeqRecord) – Sequence to search in.

Returns

Number of occurrences of sequence in sequence_to_search_in.

Return type

tuple

teemi.design.cloning.find_sequence_location(sequence: <module 'Bio.SeqRecord' from '/home/docs/checkouts/readthedocs.org/user_builds/teemi/envs/stable/lib/python3.8/site-packages/Bio/SeqRecord.py'>, sequence_to_search_in: <module 'Bio.SeqRecord' from '/home/docs/checkouts/readthedocs.org/user_builds/teemi/envs/stable/lib/python3.8/site-packages/Bio/SeqRecord.py'>)tuple[source]

Finds start and end location of a mathced sequence.

Parameters
  • sequence (str) –

  • sequence_to_search_in (Bio.SeqRecord) –

Returns

(start_index,end_index)

Return type

tuple

teemi.design.cloning.nicking_enzyme(vector)[source]

Nt.Bbc.CI (nicking enzyme, Nicks) a vector with the sequence ‘CGCGTG’ on watson and ‘CGCACG’ on crick strand. :param vector: digested Dseqrecord - usually with AsiSI or similar overhang :type vector: Dseq

Returns

Return type

Dseq with nick - ready for USER cloning

teemi.design.cloning.plate_plot(df, value)[source]

Plots a 96 well plate as a pandas df.

Parameters
  • df (pd.Dataframe) – A pandas dataframe.

  • value (str) – The name of the pandas dataframe column that you want to display.

Returns

in a 96 well plate format of the chosen column.

Return type

pd.Dataframe

Example

# Initialize:
Amplicon_df = {
    'name': ['PCR_G8H_01', 'PCR_G8H_05', ...],
    'location': ['l5_A03', 'l5_A07', ...],
    ...
}

# Call the function:
plate_plot(amplicon_df, 'name')
name
pcol    1       2       3       4       5       6       7       8       9       10      11      12
prow
A       PCR_G8H_01      PCR_G8H_05      PCR_G8H_09      PCR_G8H_13      PCR_G8H_17      ...
...
teemi.design.cloning.remove_features_with_negative_loc(record)[source]

Removes a SeqFeatures if negative.

Parameters

record (pydna.amplicon.Amplicon.) – A amplicon with SeqFeature and locations

Returns

record – With the negative features deleted

Return type

pydna.amplicon.Amplicon.

teemi.design.cloning.seq_to_annotation(seq_record_from: <module 'Bio.SeqRecord' from '/home/docs/checkouts/readthedocs.org/user_builds/teemi/envs/stable/lib/python3.8/site-packages/Bio/SeqRecord.py'>, seq_record_onto: <module 'Bio.SeqRecord' from '/home/docs/checkouts/readthedocs.org/user_builds/teemi/envs/stable/lib/python3.8/site-packages/Bio/SeqRecord.py'>, type_name: str)[source]

Anotate an Bio.SeqRecord object from another bio.seqrecord object.

Parameters
  • seqrec_from (Bio.SeqRecord) – annotation sequence that will be extracted

  • seqrec_onto (Bio.SeqRecord) –

  • type_name (str) – name of the sequence that will be extracted

Returns

Return type

None

Combinatorial library module

This part of the design module is used for making combinatorial libraries from DNA fragments.

class teemi.design.combinatorial_design.DesignAssembly(list_of_seqs: List[List[pydna.dseqrecord.Dseqrecord]], list_of_pads: List[List[pydna.dseqrecord.Dseqrecord]], positions_of_pads: List[int], target_tm=55.0, limit=13, overlap=35, tm_func: Callable = <function tm_default>)[source]

Bases: object

Make a combinatorial library from DNA fragments.

Parameters
  • list_of_seqs (List[List[Dseqrecord]]) – A list of a constructs of choice.

  • list_of_pads (List[Dseqrecord]) – A nucleotide sequence to be incorporated into the primers (Max is 40 bp)

  • positions_of_pads (List[int]) – the position in the list of seqs where the pad is incorporated (zero indexed)

Returns

A powerful class and a lot of information can be retrieved. Such as: showing all the amplicons needed to construct a combinatorial library with the simple method –> pcr_list_to_dataframe or primer_list_to_dataframe.

Return type

DesignAssembly object

pcr_list_to_dataframe()[source]

Prints PCR_list into a pandas dataframe

primer_list()[source]

Return the list of primers

primer_list_to_dataframe()[source]

Return a pandas dataframe with list of primers.

show_contigs()[source]

Returns a string of the contigs generated by the assembly

show_variants_lib_df()[source]

Returns a dataframe of all the variants

teemi.design.combinatorial_design.assembly_maker(combinatorial_list_of_amplicons: list, overlap=35)[source]

Assembles Amplicons with pad and makes new overlapping primers.

Parameters
  • combinatorial_list_of_amplicons (list[[pydna.amplicon.Amplicon]]) – the list of pydna.amplicon.Amplicon that you want generate overlapping primers for.

  • overlap (int (default set to 35)) – How many basepair overlaps

Returns

List_of_assemblies – amplicons that overlaps eachother with the specified overlap value.

Return type

list[[pydna.amplicon.Amplicon]]

teemi.design.combinatorial_design.count_unique_parts(predictions_df, max_combinations)[source]

Iterates through the DataFrame of predictions and saves newly encountered parts.

Parameters
  • predictions_df (pd.DataFrame) – DataFrame containing predictions.

  • max_combinations (int) – The maximum number of combinations to consider.

Returns

encountered_parts – A dictionary containing the unique parts encountered in ‘G8H’,’pG8H’, ‘pCPR’, ‘CPR’ columns, total number of unique combinations encountered in ‘Sum of parts’ and total predictions encountered in ‘Predictions’.

Return type

dict

teemi.design.combinatorial_design.get_assembly_figure(assembly_list, limit=15)[source]

Generates a figure for the specified assembly in the assembly list.

Parameters
  • assembly_list (list) – The list of assemblies.

  • limit (int, optional) – The limit for the assembly, by default 15.

Returns

The figure for the specified assembly.

Return type

contig

teemi.design.combinatorial_design.get_combinatorial_list(input_list)[source]

Generates all possible combinations from a list of lists.

Parameters

input_list (list of lists) – The input list of lists for which all possible combinations are to be generated.

Returns

combinations – A list of tuples representing all possible combinations of the elements in the input list of lists.

Return type

list of tuples

Example

>>> input_list = [[1, 2], ['a', 'b']]
>>> combinations = get_combinatorial_list(input_list)
>>> print(combinations)
[(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]
teemi.design.combinatorial_design.get_primers(assemblies: list, names: list, primer_temps: list)[source]

Returns a list of ALL primers from the combinatorial library, updates names and what they anneal to.

Parameters
  • assemblies (list[list[pydna.amplicon.Amplicon]]) –

  • names (list[(str)]) –

  • primer_temps (list[(float, float),.)...]) –

Returns

primers – All primers that have been made for all assemblies

Return type

list[list[[pydna.primer.Primer, pydna.primer.Primer]]

teemi.design.combinatorial_design.get_systematic_names(parts_list: list)list[source]

Returns a list of list with systematic names i.e [1,1,1], [1,2,1]… etc

Parameters

parts_list (list of list) – can have any type within the list[list[any_type]]

Returns

list of tuples with the systematic names eg. [(1,1,1),(1,2,1)]

Return type

systematic_names

teemi.design.combinatorial_design.simple_amplicon_maker(list_of_seqs: list, list_of_names: list, target_tm=56.0, limit=13, primer_tm_func=<function tm_default>)[source]

Creates amplicons, updates their names

Parameters
  • list_of_seqs (list[list[pydna.dseqrecord.Dseqrecord]]) – List of the pydna.dseqrecord import Dseqrecord elements u want to made into amplicons

  • list_of_names (list[list[str]]) – provide names for the sequences since pydna changes their names to amplicon

Returns

  • list_of_amplicons (list[pydna.amplicon.Amplicon]) – list with the pydna.amplicon.Amplicon objects that have been made

  • list_of_amplicon_primers (list[list[(pydna.seq.Seq, pydna.seq.Seq)]]) – a list of all the generated primers in tuples where index0 = forward primer and index1=reverse primer. Both are pydna.seq.Seq objects

  • list_of_amplicon_primer_temps (list[list[(float, float)]]) – a list of melting temperatures in tuples where index0 = forward primer melting temp and index1=reverse primer melting temp.

teemi.design.combinatorial_design.unique_amplicons(list_of_assemblies: list)[source]

Finds Unique amplicons from a list of assemblies :param list_of_assemblies: list of the combinatorial libarary with overlapping ends :type list_of_assemblies: list[[pydna.amplicon.Amplicon]]

Returns

unique_amplicons – returns a list of unique amplicons where relavant metrics are added to the objects.

Return type

list[pydna.amplicon.Amplicon]

teemi.design.combinatorial_design.unique_primers(primers: list, list_of_assemblies)[source]

Finds unique primers from a list of assemblies :param primers: a list of all the primers made for the combinatorial library :type primers: list[list[list[pydna.primer.Primer]]] :param list_of_assemblies: used here to update the names of the primers :type list_of_assemblies: list[[pydna.amplicon.Amplicon]]

Returns

unique_primers – Relevant metrics for the unique primers of the combinatorial library.

Return type

list[list(ID,Anneals_to,Sequence,Annealing_temp,Length,Price(DKK))]

Fetch sequences module

This part of the design module is used fetching sequences

teemi.design.fetch_sequences.fetch_multiple_promoters(List_of_promoter_names: list)[source]

Retrieves a yeast promoter sequence from intermine. :param List_of_promoter_names: list of strings of promoter names fx : [‘YAR035C-A’, ‘YGR067C’, ‘JEN1’, ‘YNR034W-A’, ‘ACH1’] :type List_of_promoter_names: list

Returns

Return type

list of Bio.SeqRecord.SeqRecord

teemi.design.fetch_sequences.fetch_promoter(promoter_name: str)[source]
teemi.design.fetch_sequences.read_fasta_files(path)[source]

Reads FASTA files. :param path: path to the fasta file you want to read. :type path: str

Returns

Return type

list of Bio.SeqRecord.SeqRecord

teemi.design.fetch_sequences.read_genbank_files(path)[source]

Reads single Genbank files. :param path: path to the genbank file you want to read. :type path: str

Returns

Return type

list of Bio.SeqRecord.SeqRecord

teemi.design.fetch_sequences.retrieve_sequences_from_PDB(query: list)[source]

Retrieves sequences from PDB. :param query: list of accession numbers in the form of strings :type query: list

Returns

Return type

list of Bio.SeqRecord.SeqRecord

teemi.design.fetch_sequences.retrieve_sequences_from_ncbi(list_of_acc_numbers: list, out_file: str, db='protein')[source]

Retrieves sequences from ncbi. :param list_of_acc_numbers: list_of_acc_numbers such as: [‘Q05001’, ‘Q1PQK4’,’Q9SB48’ ,’AFX82679’] :type list_of_acc_numbers: list

Returns

Return type

A fasta file with your sequences

Retrieve gene homologs module

This part of the design module is used fetching gene homologs

teemi.design.retrieve_gene_homologs.alignment_identity(query: list, reference: str)list[source]

Calculates percent identity between a reference and query(s). :param query: list of Biopython Seqrecord objects :type query: list :param reference: :type reference: str

Returns

Return type

list of percent identeties as floats

teemi.design.retrieve_gene_homologs.all_orfs(seq)[source]

Return all ORFs of a sequence. This function was made by Justin Bois : http://justinbois.github.io/.

teemi.design.retrieve_gene_homologs.codon_optimize_with_dnachisel(sequences: List[Bio.SeqRecord.SeqRecord], lower_GC: float = 0.3, upper_GC: float = 0.7, species: Optional[str] = None, codon_usage_table=None, window: int = 50)List[Bio.SeqRecord.SeqRecord][source]

Codon-optimize sequences with_dnachisel.

Parameters
  • sequences (list) – list of Bio.SeqRecord objects

  • lower_GC (float) – the lowest GC content in the region of 50 bp

  • upper_GC (float) – the highest GC content in the region of 50 bp

  • species (str) – name of the species for which to optimize the sequence. examples: ‘e_coli, s_cerevisiae, h_sapiens, c_elegans, b_subtilis, d_melanogaster check python_codon_tables for more info.

  • codon_usage_table – a codon table following the structure of: {‘*’: {‘TAA’: 0.0, ‘TAG’: 0.0, ‘TGA’: 1.0},…

Returns

Return type

list of codon optimized sequences for yeast

teemi.design.retrieve_gene_homologs.filter_blast_results(blast_record, E_VALUE_THRESH=0.4, LOWER_PROTEIN_IDENTITY_THRESH=0.1, UPPER__PROTEIN_IDENTITY_THRESH=1, show_alignment=False)[source]
teemi.design.retrieve_gene_homologs.find_all_starts(seq)[source]

Find the starting index of all start codons in a lowercase seq This function was made by Justin Bois : http://justinbois.github.io/.

teemi.design.retrieve_gene_homologs.find_first_in_register_stop(seq)[source]

Find first stop codon on lowercase seq that starts at an index that is divisible by three. This function was made by Justin Bois : http://justinbois.github.io/.

teemi.design.retrieve_gene_homologs.longest_orf(seq, n=1)[source]

Longest ORF of a sequence. This function was made by Justin Bois : http://justinbois.github.io/.