Design module¶

Cloning module¶

Module used for cloning of microbial strains.

teemi.design.cloning.CAS9_cutting(gRNA_record, background_record)[source]

Simulates double-stranded-break by CAS9 given a gRNA.

Parameters

gRNA_record (pydna.dseqrecord.) – A 20 bp DNA sequence
background_record (pydna.dseqrecord.) – The sequence of interest for CRISPR mediated DSB

Returns

1. up (pydna.dseqrecord.) – Sequence upstream of the DSB: pydna.dseqrecord.
2. dw (pydna.dseqrecord.) – Sequence downstream of the DSB: pydna.dseqrecord.

teemi.design.cloning.USER_enzyme(amplicon)[source]

Simulates digestion with USER enzyme.

Parameters: amplicon (pydna.amplicon.Amplicon) – An pydna.amplicon.Amplicon to with Uracil integrated
Returns: USER digested Dseqrecord with USER tails
Return type: Dseqrecord

teemi.design.cloning.add_feature_annotation_to_seqrecord(sequence: <module 'Bio.SeqRecord' from '/home/docs/checkouts/readthedocs.org/user_builds/teemi/envs/stable/lib/python3.8/site-packages/Bio/SeqRecord.py'>, label='', type_name='misc_feature', strand=0) → None[source]

Adds feature, label and name to a Bio.Seqrecord sequence. :param sequence: :type sequence: Bio.SeqRecord :param label: :type label: str (optional) :param type_name: :type type_name: str (default: “misc_feature”) :param strand: :type strand: int (default 0)

Returns
Return type: None

teemi.design.cloning.casembler(bg_strain, site_names=None, gRNAs=None, parts=None, assembly_limits=None, assembly_names=None, verbose=False, to_benchling=False)[source]

Simulate in vivo assembly and integration with the possibility of printing to gb files or send it directly to benchling.

Parameters

bg_strain (GenBank) – strain of choice eg. genbank file
site_names (list) – list of names e.g. [X-3, XI-3]
gRNAs (Seqrecords) – list of 20 bp seqrecords e.g. [ATF1_gRNA, CroCPR_gRNA]
parts (list) – list of list of parts e.g. [[ATF1_repair_template],[CPR_repair_template]]
assembly_limits (list) – list of numbers of bp assembly limits e.g. [200,400]
assembly_names (list) – list of names of DNA post assembly e.g.[“X_3_tADH1_P2_pPGK1”, “XI_3_UP_DW”]
verbose (bool) – write DNA e.g. False
to_benchling (bool) – upload DNA to benchling e.g. False

Returns

of assembled contig

Return type

One dseqrecord

teemi.design.cloning.crispr_db_break_location(start_location, end_location, strand)[source]

Determine the CRISPR cut location in the genome.

Parameters

start_location (int) – Start position of the sgRNA sequence in the chromosome.
end_location (int) – End position of the sgRNA sequence in the chromosome.
strand (int) – Strand of the sgRNA sequence in the chromosome, +1 for positive strand, -1 for negative strand.

Returns

crispr_db_break – CRISPR cut location in the genome.

Return type

int

teemi.design.cloning.extract_gRNAs(template, name)[source]

Extracts gRNAs from a template.

Parameters

template (pydna.dseqrecord or pydna.amplicon.Amplicon) – a plasmid or piece of DNA
name (str) – a string that would include the feature name for example: gRNA

Returns

list of with the found features and their sequences

Return type

list of pydna.dseqrecord or pydna.amplicon.Amplicon

teemi.design.cloning.extract_sites(annotations, templates, names)[source]

This function extracts the sequences from annotated sequences based on their names

Parameters

annotations (list) – list of annotations for sequences that will be extracted
templates (list of Bio.SeqRecord.SeqRecord) – A list of Bio.SeqRecord.SeqRecord with SeqFeatures
names (str) – name of the sequence that will be extracted

Returns

record – list of extracted sites

Return type

list of Bio.SeqRecord.SeqRecord

teemi.design.cloning.extract_template_amplification_sites(templates, names, terminator)[source]

Extracts amplifications sites from a templates features

Parameters

templates (list of Bio.SeqRecord.SeqRecord) – list of Bio.SeqRecord.SeqRecord objects with SeqFeatures
names (list of strings) – list of strings to be extracted
terminator (str) – a string with the name of upstream terminator

Returns

record – list of extracted elements

Return type

list of Bio.SeqRecord.SeqRecord

teemi.design.cloning.find_all_occurrences_of_a_sequence(sequence: <module 'Bio.SeqRecord' from '/home/docs/checkouts/readthedocs.org/user_builds/teemi/envs/stable/lib/python3.8/site-packages/Bio/SeqRecord.py'>, sequence_to_search_in: <module 'Bio.SeqRecord' from '/home/docs/checkouts/readthedocs.org/user_builds/teemi/envs/stable/lib/python3.8/site-packages/Bio/SeqRecord.py'>) → tuple[source]

Searches for all occurrences of a given sequence in a given string.

Parameters

sequence (Bio.SeqRecord) – Sequence to search for.
sequence_to_search_in (Bio.SeqRecord) – Sequence to search in.

Returns

Number of occurrences of sequence in sequence_to_search_in.

Return type

tuple

teemi.design.cloning.find_sequence_location(sequence: <module 'Bio.SeqRecord' from '/home/docs/checkouts/readthedocs.org/user_builds/teemi/envs/stable/lib/python3.8/site-packages/Bio/SeqRecord.py'>, sequence_to_search_in: <module 'Bio.SeqRecord' from '/home/docs/checkouts/readthedocs.org/user_builds/teemi/envs/stable/lib/python3.8/site-packages/Bio/SeqRecord.py'>) → tuple[source]

Finds start and end location of a mathced sequence.

Parameters

sequence (str) –
sequence_to_search_in (Bio.SeqRecord) –

Returns

(start_index,end_index)

Return type

tuple

teemi.design.cloning.nicking_enzyme(vector)[source]

Nt.Bbc.CI (nicking enzyme, Nicks) a vector with the sequence ‘CGCGTG’ on watson and ‘CGCACG’ on crick strand. :param vector: digested Dseqrecord - usually with AsiSI or similar overhang :type vector: Dseq

Returns
Return type: Dseq with nick - ready for USER cloning

teemi.design.cloning.plate_plot(df, value)[source]

Plots a 96 well plate as a pandas df.

Parameters

df (pd.Dataframe) – A pandas dataframe.
value (str) – The name of the pandas dataframe column that you want to display.

Returns

in a 96 well plate format of the chosen column.

Return type

pd.Dataframe

Example

# Initialize:
Amplicon_df = {
    'name': ['PCR_G8H_01', 'PCR_G8H_05', ...],
    'location': ['l5_A03', 'l5_A07', ...],
    ...
}

# Call the function:
plate_plot(amplicon_df, 'name')

name
pcol    1       2       3       4       5       6       7       8       9       10      11      12
prow
A       PCR_G8H_01      PCR_G8H_05      PCR_G8H_09      PCR_G8H_13      PCR_G8H_17      ...
...

teemi.design.cloning.remove_features_with_negative_loc(record)[source]

Removes a SeqFeatures if negative.

Parameters: record (pydna.amplicon.Amplicon.) – A amplicon with SeqFeature and locations
Returns: record – With the negative features deleted
Return type: pydna.amplicon.Amplicon.

teemi.design.cloning.seq_to_annotation(seq_record_from: <module 'Bio.SeqRecord' from '/home/docs/checkouts/readthedocs.org/user_builds/teemi/envs/stable/lib/python3.8/site-packages/Bio/SeqRecord.py'>, seq_record_onto: <module 'Bio.SeqRecord' from '/home/docs/checkouts/readthedocs.org/user_builds/teemi/envs/stable/lib/python3.8/site-packages/Bio/SeqRecord.py'>, type_name: str)[source]

Anotate an Bio.SeqRecord object from another bio.seqrecord object.

Parameters

seqrec_from (Bio.SeqRecord) – annotation sequence that will be extracted
seqrec_onto (Bio.SeqRecord) –
type_name (str) – name of the sequence that will be extracted

Returns

Return type

None

Combinatorial library module¶

This part of the design module is used for making combinatorial libraries from DNA fragments.

class teemi.design.combinatorial_design.DesignAssembly(list_of_seqs: List[List[pydna.dseqrecord.Dseqrecord]], list_of_pads: List[List[pydna.dseqrecord.Dseqrecord]], positions_of_pads: List[int], target_tm=55.0, limit=13, overlap=35, tm_func: Callable = <function tm_default>)[source]

Bases: object

Make a combinatorial library from DNA fragments.

Parameters

list_of_seqs (List[List[Dseqrecord]]) – A list of a constructs of choice.
list_of_pads (List[Dseqrecord]) – A nucleotide sequence to be incorporated into the primers (Max is 40 bp)
positions_of_pads (List[int]) – the position in the list of seqs where the pad is incorporated (zero indexed)

Returns

A powerful class and a lot of information can be retrieved. Such as: showing all the amplicons needed to construct a combinatorial library with the simple method –> pcr_list_to_dataframe or primer_list_to_dataframe.

Return type

DesignAssembly object

pcr_list_to_dataframe()[source]: Prints PCR_list into a pandas dataframe

primer_list()[source]: Return the list of primers

primer_list_to_dataframe()[source]: Return a pandas dataframe with list of primers.

show_contigs()[source]: Returns a string of the contigs generated by the assembly

show_variants_lib_df()[source]: Returns a dataframe of all the variants

teemi.design.combinatorial_design.assembly_maker(combinatorial_list_of_amplicons: list, overlap=35)[source]

Assembles Amplicons with pad and makes new overlapping primers.

Parameters

combinatorial_list_of_amplicons (list[[pydna.amplicon.Amplicon]]) – the list of pydna.amplicon.Amplicon that you want generate overlapping primers for.
overlap (int (default set to 35)) – How many basepair overlaps

Returns

List_of_assemblies – amplicons that overlaps eachother with the specified overlap value.

Return type

list[[pydna.amplicon.Amplicon]]

teemi.design.combinatorial_design.count_unique_parts(predictions_df, max_combinations)[source]

Iterates through the DataFrame of predictions and saves newly encountered parts.

Parameters

predictions_df (pd.DataFrame) – DataFrame containing predictions.
max_combinations (int) – The maximum number of combinations to consider.

Returns

encountered_parts – A dictionary containing the unique parts encountered in ‘G8H’,’pG8H’, ‘pCPR’, ‘CPR’ columns, total number of unique combinations encountered in ‘Sum of parts’ and total predictions encountered in ‘Predictions’.

Return type

dict

teemi.design.combinatorial_design.get_assembly_figure(assembly_list, limit=15)[source]

Generates a figure for the specified assembly in the assembly list.

Parameters

assembly_list (list) – The list of assemblies.
limit (int, optional) – The limit for the assembly, by default 15.

Returns

The figure for the specified assembly.

Return type

contig

teemi.design.combinatorial_design.get_combinatorial_list(input_list)[source]

Generates all possible combinations from a list of lists.

Parameters: input_list (list of lists) – The input list of lists for which all possible combinations are to be generated.
Returns: combinations – A list of tuples representing all possible combinations of the elements in the input list of lists.
Return type: list of tuples

Example

>>> input_list = [[1, 2], ['a', 'b']]
>>> combinations = get_combinatorial_list(input_list)
>>> print(combinations)
[(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]

teemi.design.combinatorial_design.get_primers(assemblies: list, names: list, primer_temps: list)[source]

Returns a list of ALL primers from the combinatorial library, updates names and what they anneal to.

Parameters

assemblies (list[list[pydna.amplicon.Amplicon]]) –
names (list[(str)]) –
primer_temps (list[(float, float),.)...]) –

Returns

primers – All primers that have been made for all assemblies

Return type

list[list[[pydna.primer.Primer, pydna.primer.Primer]]

teemi.design.combinatorial_design.get_systematic_names(parts_list: list) → list[source]

Returns a list of list with systematic names i.e [1,1,1], [1,2,1]… etc

Parameters: parts_list (list of list) – can have any type within the list[list[any_type]]
Returns: list of tuples with the systematic names eg. [(1,1,1),(1,2,1)]
Return type: systematic_names

teemi.design.combinatorial_design.simple_amplicon_maker(list_of_seqs: list, list_of_names: list, target_tm=56.0, limit=13, primer_tm_func=<function tm_default>)[source]

Creates amplicons, updates their names

Parameters

list_of_seqs (list[list[pydna.dseqrecord.Dseqrecord]]) – List of the pydna.dseqrecord import Dseqrecord elements u want to made into amplicons
list_of_names (list[list[str]]) – provide names for the sequences since pydna changes their names to amplicon

Returns

list_of_amplicons (list[pydna.amplicon.Amplicon]) – list with the pydna.amplicon.Amplicon objects that have been made
list_of_amplicon_primers (list[list[(pydna.seq.Seq, pydna.seq.Seq)]]) – a list of all the generated primers in tuples where index0 = forward primer and index1=reverse primer. Both are pydna.seq.Seq objects
list_of_amplicon_primer_temps (list[list[(float, float)]]) – a list of melting temperatures in tuples where index0 = forward primer melting temp and index1=reverse primer melting temp.

teemi.design.combinatorial_design.unique_amplicons(list_of_assemblies: list)[source]

Finds Unique amplicons from a list of assemblies :param list_of_assemblies: list of the combinatorial libarary with overlapping ends :type list_of_assemblies: list[[pydna.amplicon.Amplicon]]

Returns: unique_amplicons – returns a list of unique amplicons where relavant metrics are added to the objects.
Return type: list[pydna.amplicon.Amplicon]

teemi.design.combinatorial_design.unique_primers(primers: list, list_of_assemblies)[source]

Finds unique primers from a list of assemblies :param primers: a list of all the primers made for the combinatorial library :type primers: list[list[list[pydna.primer.Primer]]] :param list_of_assemblies: used here to update the names of the primers :type list_of_assemblies: list[[pydna.amplicon.Amplicon]]

Returns: unique_primers – Relevant metrics for the unique primers of the combinatorial library.
Return type: list[list(ID,Anneals_to,Sequence,Annealing_temp,Length,Price(DKK))]

Fetch sequences module¶

This part of the design module is used fetching sequences

teemi.design.fetch_sequences.fetch_multiple_promoters(List_of_promoter_names: list)[source]

Retrieves a yeast promoter sequence from intermine. :param List_of_promoter_names: list of strings of promoter names fx : [‘YAR035C-A’, ‘YGR067C’, ‘JEN1’, ‘YNR034W-A’, ‘ACH1’] :type List_of_promoter_names: list

Returns
Return type: list of Bio.SeqRecord.SeqRecord

teemi.design.fetch_sequences.fetch_promoter(promoter_name: str)[source]

teemi.design.fetch_sequences.read_fasta_files(path)[source]

Reads FASTA files. :param path: path to the fasta file you want to read. :type path: str

Returns
Return type: list of Bio.SeqRecord.SeqRecord

teemi.design.fetch_sequences.read_genbank_files(path)[source]

Reads single Genbank files. :param path: path to the genbank file you want to read. :type path: str

Returns
Return type: list of Bio.SeqRecord.SeqRecord

teemi.design.fetch_sequences.retrieve_sequences_from_PDB(query: list)[source]

Retrieves sequences from PDB. :param query: list of accession numbers in the form of strings :type query: list

Returns
Return type: list of Bio.SeqRecord.SeqRecord

teemi.design.fetch_sequences.retrieve_sequences_from_ncbi(list_of_acc_numbers: list, out_file: str, db='protein')[source]

Retrieves sequences from ncbi. :param list_of_acc_numbers: list_of_acc_numbers such as: [‘Q05001’, ‘Q1PQK4’,’Q9SB48’ ,’AFX82679’] :type list_of_acc_numbers: list

Returns
Return type: A fasta file with your sequences

Retrieve gene homologs module¶

This part of the design module is used fetching gene homologs

teemi.design.retrieve_gene_homologs.alignment_identity(query: list, reference: str) → list[source]

Calculates percent identity between a reference and query(s). :param query: list of Biopython Seqrecord objects :type query: list :param reference: :type reference: str

Returns
Return type: list of percent identeties as floats

teemi.design.retrieve_gene_homologs.all_orfs(seq)[source]: Return all ORFs of a sequence. This function was made by Justin Bois : http://justinbois.github.io/.

teemi.design.retrieve_gene_homologs.codon_optimize_with_dnachisel(sequences: List[Bio.SeqRecord.SeqRecord], lower_GC: float = 0.3, upper_GC: float = 0.7, species: Optional[str] = None, codon_usage_table=None, window: int = 50) → List[Bio.SeqRecord.SeqRecord][source]

Codon-optimize sequences with_dnachisel.

Parameters

sequences (list) – list of Bio.SeqRecord objects
lower_GC (float) – the lowest GC content in the region of 50 bp
upper_GC (float) – the highest GC content in the region of 50 bp
species (str) – name of the species for which to optimize the sequence. examples: ‘e_coli, s_cerevisiae, h_sapiens, c_elegans, b_subtilis, d_melanogaster check python_codon_tables for more info.
codon_usage_table – a codon table following the structure of: {‘*’: {‘TAA’: 0.0, ‘TAG’: 0.0, ‘TGA’: 1.0},…

Returns

Return type

list of codon optimized sequences for yeast

teemi.design.retrieve_gene_homologs.filter_blast_results(blast_record, E_VALUE_THRESH=0.4, LOWER_PROTEIN_IDENTITY_THRESH=0.1, UPPER__PROTEIN_IDENTITY_THRESH=1, show_alignment=False)[source]

teemi.design.retrieve_gene_homologs.find_all_starts(seq)[source]: Find the starting index of all start codons in a lowercase seq This function was made by Justin Bois : http://justinbois.github.io/.

teemi.design.retrieve_gene_homologs.find_first_in_register_stop(seq)[source]: Find first stop codon on lowercase seq that starts at an index that is divisible by three. This function was made by Justin Bois : http://justinbois.github.io/.

teemi.design.retrieve_gene_homologs.longest_orf(seq, n=1)[source]: Longest ORF of a sequence. This function was made by Justin Bois : http://justinbois.github.io/.