Design module¶
Cloning module¶
Module used for cloning of microbial strains.
- teemi.design.cloning.CAS9_cutting(gRNA_record, background_record)[source]
Simulates double-stranded-break by CAS9 given a gRNA.
- Parameters
gRNA_record (pydna.dseqrecord.) – A 20 bp DNA sequence
background_record (pydna.dseqrecord.) – The sequence of interest for CRISPR mediated DSB
- Returns
1. up (pydna.dseqrecord.) – Sequence upstream of the DSB: pydna.dseqrecord.
2. dw (pydna.dseqrecord.) – Sequence downstream of the DSB: pydna.dseqrecord.
- teemi.design.cloning.USER_enzyme(amplicon)[source]
Simulates digestion with USER enzyme.
- Parameters
amplicon (pydna.amplicon.Amplicon) – An pydna.amplicon.Amplicon to with Uracil integrated
- Returns
USER digested Dseqrecord with USER tails
- Return type
Dseqrecord
- teemi.design.cloning.add_feature_annotation_to_seqrecord(sequence: <module 'Bio.SeqRecord' from '/home/docs/checkouts/readthedocs.org/user_builds/teemi/envs/stable/lib/python3.8/site-packages/Bio/SeqRecord.py'>, label='', type_name='misc_feature', strand=0) → None[source]
Adds feature, label and name to a Bio.Seqrecord sequence. :param sequence: :type sequence: Bio.SeqRecord :param label: :type label: str (optional) :param type_name: :type type_name: str (default: “misc_feature”) :param strand: :type strand: int (default 0)
- Returns
- Return type
None
- teemi.design.cloning.casembler(bg_strain, site_names=None, gRNAs=None, parts=None, assembly_limits=None, assembly_names=None, verbose=False, to_benchling=False)[source]
Simulate in vivo assembly and integration with the possibility of printing to gb files or send it directly to benchling.
- Parameters
bg_strain (GenBank) – strain of choice eg. genbank file
site_names (list) – list of names e.g. [X-3, XI-3]
gRNAs (Seqrecords) – list of 20 bp seqrecords e.g. [ATF1_gRNA, CroCPR_gRNA]
parts (list) – list of list of parts e.g. [[ATF1_repair_template],[CPR_repair_template]]
assembly_limits (list) – list of numbers of bp assembly limits e.g. [200,400]
assembly_names (list) – list of names of DNA post assembly e.g.[“X_3_tADH1_P2_pPGK1”, “XI_3_UP_DW”]
verbose (bool) – write DNA e.g. False
to_benchling (bool) – upload DNA to benchling e.g. False
- Returns
of assembled contig
- Return type
One dseqrecord
- teemi.design.cloning.crispr_db_break_location(start_location, end_location, strand)[source]
Determine the CRISPR cut location in the genome.
- Parameters
start_location (int) – Start position of the sgRNA sequence in the chromosome.
end_location (int) – End position of the sgRNA sequence in the chromosome.
strand (int) – Strand of the sgRNA sequence in the chromosome, +1 for positive strand, -1 for negative strand.
- Returns
crispr_db_break – CRISPR cut location in the genome.
- Return type
int
- teemi.design.cloning.extract_gRNAs(template, name)[source]
Extracts gRNAs from a template.
- Parameters
template (pydna.dseqrecord or pydna.amplicon.Amplicon) – a plasmid or piece of DNA
name (str) – a string that would include the feature name for example: gRNA
- Returns
list of with the found features and their sequences
- Return type
list of pydna.dseqrecord or pydna.amplicon.Amplicon
- teemi.design.cloning.extract_sites(annotations, templates, names)[source]
This function extracts the sequences from annotated sequences based on their names
- Parameters
annotations (list) – list of annotations for sequences that will be extracted
templates (list of Bio.SeqRecord.SeqRecord) – A list of Bio.SeqRecord.SeqRecord with SeqFeatures
names (str) – name of the sequence that will be extracted
- Returns
record – list of extracted sites
- Return type
list of Bio.SeqRecord.SeqRecord
- teemi.design.cloning.extract_template_amplification_sites(templates, names, terminator)[source]
Extracts amplifications sites from a templates features
- Parameters
templates (list of Bio.SeqRecord.SeqRecord) – list of Bio.SeqRecord.SeqRecord objects with SeqFeatures
names (list of strings) – list of strings to be extracted
terminator (str) – a string with the name of upstream terminator
- Returns
record – list of extracted elements
- Return type
list of Bio.SeqRecord.SeqRecord
- teemi.design.cloning.find_all_occurrences_of_a_sequence(sequence: <module 'Bio.SeqRecord' from '/home/docs/checkouts/readthedocs.org/user_builds/teemi/envs/stable/lib/python3.8/site-packages/Bio/SeqRecord.py'>, sequence_to_search_in: <module 'Bio.SeqRecord' from '/home/docs/checkouts/readthedocs.org/user_builds/teemi/envs/stable/lib/python3.8/site-packages/Bio/SeqRecord.py'>) → tuple[source]
Searches for all occurrences of a given sequence in a given string.
- Parameters
sequence (Bio.SeqRecord) – Sequence to search for.
sequence_to_search_in (Bio.SeqRecord) – Sequence to search in.
- Returns
Number of occurrences of sequence in sequence_to_search_in.
- Return type
tuple
- teemi.design.cloning.find_sequence_location(sequence: <module 'Bio.SeqRecord' from '/home/docs/checkouts/readthedocs.org/user_builds/teemi/envs/stable/lib/python3.8/site-packages/Bio/SeqRecord.py'>, sequence_to_search_in: <module 'Bio.SeqRecord' from '/home/docs/checkouts/readthedocs.org/user_builds/teemi/envs/stable/lib/python3.8/site-packages/Bio/SeqRecord.py'>) → tuple[source]
Finds start and end location of a mathced sequence.
- Parameters
sequence (str) –
sequence_to_search_in (Bio.SeqRecord) –
- Returns
(start_index,end_index)
- Return type
tuple
- teemi.design.cloning.nicking_enzyme(vector)[source]
Nt.Bbc.CI (nicking enzyme, Nicks) a vector with the sequence ‘CGCGTG’ on watson and ‘CGCACG’ on crick strand. :param vector: digested Dseqrecord - usually with AsiSI or similar overhang :type vector: Dseq
- Returns
- Return type
Dseq with nick - ready for USER cloning
- teemi.design.cloning.plate_plot(df, value)[source]
Plots a 96 well plate as a pandas df.
- Parameters
df (pd.Dataframe) – A pandas dataframe.
value (str) – The name of the pandas dataframe column that you want to display.
- Returns
in a 96 well plate format of the chosen column.
- Return type
pd.Dataframe
Example
# Initialize: Amplicon_df = { 'name': ['PCR_G8H_01', 'PCR_G8H_05', ...], 'location': ['l5_A03', 'l5_A07', ...], ... } # Call the function: plate_plot(amplicon_df, 'name')
name pcol 1 2 3 4 5 6 7 8 9 10 11 12 prow A PCR_G8H_01 PCR_G8H_05 PCR_G8H_09 PCR_G8H_13 PCR_G8H_17 ... ...
- teemi.design.cloning.remove_features_with_negative_loc(record)[source]
Removes a SeqFeatures if negative.
- Parameters
record (pydna.amplicon.Amplicon.) – A amplicon with SeqFeature and locations
- Returns
record – With the negative features deleted
- Return type
pydna.amplicon.Amplicon.
- teemi.design.cloning.seq_to_annotation(seq_record_from: <module 'Bio.SeqRecord' from '/home/docs/checkouts/readthedocs.org/user_builds/teemi/envs/stable/lib/python3.8/site-packages/Bio/SeqRecord.py'>, seq_record_onto: <module 'Bio.SeqRecord' from '/home/docs/checkouts/readthedocs.org/user_builds/teemi/envs/stable/lib/python3.8/site-packages/Bio/SeqRecord.py'>, type_name: str)[source]
Anotate an Bio.SeqRecord object from another bio.seqrecord object.
- Parameters
seqrec_from (Bio.SeqRecord) – annotation sequence that will be extracted
seqrec_onto (Bio.SeqRecord) –
type_name (str) – name of the sequence that will be extracted
- Returns
- Return type
None
Combinatorial library module¶
This part of the design module is used for making combinatorial libraries from DNA fragments.
- class teemi.design.combinatorial_design.DesignAssembly(list_of_seqs: List[List[pydna.dseqrecord.Dseqrecord]], list_of_pads: List[List[pydna.dseqrecord.Dseqrecord]], positions_of_pads: List[int], target_tm=55.0, limit=13, overlap=35, tm_func: Callable = <function tm_default>)[source]
Bases:
objectMake a combinatorial library from DNA fragments.
- Parameters
list_of_seqs (List[List[Dseqrecord]]) – A list of a constructs of choice.
list_of_pads (List[Dseqrecord]) – A nucleotide sequence to be incorporated into the primers (Max is 40 bp)
positions_of_pads (List[int]) – the position in the list of seqs where the pad is incorporated (zero indexed)
- Returns
A powerful class and a lot of information can be retrieved. Such as: showing all the amplicons needed to construct a combinatorial library with the simple method –> pcr_list_to_dataframe or primer_list_to_dataframe.
- Return type
DesignAssembly object
- pcr_list_to_dataframe()[source]
Prints PCR_list into a pandas dataframe
- primer_list()[source]
Return the list of primers
- primer_list_to_dataframe()[source]
Return a pandas dataframe with list of primers.
- show_contigs()[source]
Returns a string of the contigs generated by the assembly
- show_variants_lib_df()[source]
Returns a dataframe of all the variants
- teemi.design.combinatorial_design.assembly_maker(combinatorial_list_of_amplicons: list, overlap=35)[source]
Assembles Amplicons with pad and makes new overlapping primers.
- Parameters
combinatorial_list_of_amplicons (list[[pydna.amplicon.Amplicon]]) – the list of pydna.amplicon.Amplicon that you want generate overlapping primers for.
overlap (int (default set to 35)) – How many basepair overlaps
- Returns
List_of_assemblies – amplicons that overlaps eachother with the specified overlap value.
- Return type
list[[pydna.amplicon.Amplicon]]
- teemi.design.combinatorial_design.count_unique_parts(predictions_df, max_combinations)[source]
Iterates through the DataFrame of predictions and saves newly encountered parts.
- Parameters
predictions_df (pd.DataFrame) – DataFrame containing predictions.
max_combinations (int) – The maximum number of combinations to consider.
- Returns
encountered_parts – A dictionary containing the unique parts encountered in ‘G8H’,’pG8H’, ‘pCPR’, ‘CPR’ columns, total number of unique combinations encountered in ‘Sum of parts’ and total predictions encountered in ‘Predictions’.
- Return type
dict
- teemi.design.combinatorial_design.get_assembly_figure(assembly_list, limit=15)[source]
Generates a figure for the specified assembly in the assembly list.
- Parameters
assembly_list (list) – The list of assemblies.
limit (int, optional) – The limit for the assembly, by default 15.
- Returns
The figure for the specified assembly.
- Return type
contig
- teemi.design.combinatorial_design.get_combinatorial_list(input_list)[source]
Generates all possible combinations from a list of lists.
- Parameters
input_list (list of lists) – The input list of lists for which all possible combinations are to be generated.
- Returns
combinations – A list of tuples representing all possible combinations of the elements in the input list of lists.
- Return type
list of tuples
Example
>>> input_list = [[1, 2], ['a', 'b']] >>> combinations = get_combinatorial_list(input_list) >>> print(combinations) [(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]
- teemi.design.combinatorial_design.get_primers(assemblies: list, names: list, primer_temps: list)[source]
Returns a list of ALL primers from the combinatorial library, updates names and what they anneal to.
- Parameters
assemblies (list[list[pydna.amplicon.Amplicon]]) –
names (list[(str)]) –
primer_temps (list[(float, float),.)...]) –
- Returns
primers – All primers that have been made for all assemblies
- Return type
list[list[[pydna.primer.Primer, pydna.primer.Primer]]
- teemi.design.combinatorial_design.get_systematic_names(parts_list: list) → list[source]
Returns a list of list with systematic names i.e [1,1,1], [1,2,1]… etc
- Parameters
parts_list (list of list) – can have any type within the list[list[any_type]]
- Returns
list of tuples with the systematic names eg. [(1,1,1),(1,2,1)]
- Return type
systematic_names
- teemi.design.combinatorial_design.simple_amplicon_maker(list_of_seqs: list, list_of_names: list, target_tm=56.0, limit=13, primer_tm_func=<function tm_default>)[source]
Creates amplicons, updates their names
- Parameters
list_of_seqs (list[list[pydna.dseqrecord.Dseqrecord]]) – List of the pydna.dseqrecord import Dseqrecord elements u want to made into amplicons
list_of_names (list[list[str]]) – provide names for the sequences since pydna changes their names to amplicon
- Returns
list_of_amplicons (list[pydna.amplicon.Amplicon]) – list with the pydna.amplicon.Amplicon objects that have been made
list_of_amplicon_primers (list[list[(pydna.seq.Seq, pydna.seq.Seq)]]) – a list of all the generated primers in tuples where index0 = forward primer and index1=reverse primer. Both are pydna.seq.Seq objects
list_of_amplicon_primer_temps (list[list[(float, float)]]) – a list of melting temperatures in tuples where index0 = forward primer melting temp and index1=reverse primer melting temp.
- teemi.design.combinatorial_design.unique_amplicons(list_of_assemblies: list)[source]
Finds Unique amplicons from a list of assemblies :param list_of_assemblies: list of the combinatorial libarary with overlapping ends :type list_of_assemblies: list[[pydna.amplicon.Amplicon]]
- Returns
unique_amplicons – returns a list of unique amplicons where relavant metrics are added to the objects.
- Return type
list[pydna.amplicon.Amplicon]
- teemi.design.combinatorial_design.unique_primers(primers: list, list_of_assemblies)[source]
Finds unique primers from a list of assemblies :param primers: a list of all the primers made for the combinatorial library :type primers: list[list[list[pydna.primer.Primer]]] :param list_of_assemblies: used here to update the names of the primers :type list_of_assemblies: list[[pydna.amplicon.Amplicon]]
- Returns
unique_primers – Relevant metrics for the unique primers of the combinatorial library.
- Return type
list[list(ID,Anneals_to,Sequence,Annealing_temp,Length,Price(DKK))]
Fetch sequences module¶
This part of the design module is used fetching sequences
- teemi.design.fetch_sequences.fetch_multiple_promoters(List_of_promoter_names: list)[source]
Retrieves a yeast promoter sequence from intermine. :param List_of_promoter_names: list of strings of promoter names fx : [‘YAR035C-A’, ‘YGR067C’, ‘JEN1’, ‘YNR034W-A’, ‘ACH1’] :type List_of_promoter_names: list
- Returns
- Return type
list of Bio.SeqRecord.SeqRecord
- teemi.design.fetch_sequences.fetch_promoter(promoter_name: str)[source]
- teemi.design.fetch_sequences.read_fasta_files(path)[source]
Reads FASTA files. :param path: path to the fasta file you want to read. :type path: str
- Returns
- Return type
list of Bio.SeqRecord.SeqRecord
- teemi.design.fetch_sequences.read_genbank_files(path)[source]
Reads single Genbank files. :param path: path to the genbank file you want to read. :type path: str
- Returns
- Return type
list of Bio.SeqRecord.SeqRecord
- teemi.design.fetch_sequences.retrieve_sequences_from_PDB(query: list)[source]
Retrieves sequences from PDB. :param query: list of accession numbers in the form of strings :type query: list
- Returns
- Return type
list of Bio.SeqRecord.SeqRecord
- teemi.design.fetch_sequences.retrieve_sequences_from_ncbi(list_of_acc_numbers: list, out_file: str, db='protein')[source]
Retrieves sequences from ncbi. :param list_of_acc_numbers: list_of_acc_numbers such as: [‘Q05001’, ‘Q1PQK4’,’Q9SB48’ ,’AFX82679’] :type list_of_acc_numbers: list
- Returns
- Return type
A fasta file with your sequences
Retrieve gene homologs module¶
This part of the design module is used fetching gene homologs
- teemi.design.retrieve_gene_homologs.alignment_identity(query: list, reference: str) → list[source]
Calculates percent identity between a reference and query(s). :param query: list of Biopython Seqrecord objects :type query: list :param reference: :type reference: str
- Returns
- Return type
list of percent identeties as floats
- teemi.design.retrieve_gene_homologs.all_orfs(seq)[source]
Return all ORFs of a sequence. This function was made by Justin Bois : http://justinbois.github.io/.
- teemi.design.retrieve_gene_homologs.codon_optimize_with_dnachisel(sequences: List[Bio.SeqRecord.SeqRecord], lower_GC: float = 0.3, upper_GC: float = 0.7, species: Optional[str] = None, codon_usage_table=None, window: int = 50) → List[Bio.SeqRecord.SeqRecord][source]
Codon-optimize sequences with_dnachisel.
- Parameters
sequences (list) – list of Bio.SeqRecord objects
lower_GC (float) – the lowest GC content in the region of 50 bp
upper_GC (float) – the highest GC content in the region of 50 bp
species (str) – name of the species for which to optimize the sequence. examples: ‘e_coli, s_cerevisiae, h_sapiens, c_elegans, b_subtilis, d_melanogaster check python_codon_tables for more info.
codon_usage_table – a codon table following the structure of: {‘*’: {‘TAA’: 0.0, ‘TAG’: 0.0, ‘TGA’: 1.0},…
- Returns
- Return type
list of codon optimized sequences for yeast
- teemi.design.retrieve_gene_homologs.filter_blast_results(blast_record, E_VALUE_THRESH=0.4, LOWER_PROTEIN_IDENTITY_THRESH=0.1, UPPER__PROTEIN_IDENTITY_THRESH=1, show_alignment=False)[source]
- teemi.design.retrieve_gene_homologs.find_all_starts(seq)[source]
Find the starting index of all start codons in a lowercase seq This function was made by Justin Bois : http://justinbois.github.io/.
- teemi.design.retrieve_gene_homologs.find_first_in_register_stop(seq)[source]
Find first stop codon on lowercase seq that starts at an index that is divisible by three. This function was made by Justin Bois : http://justinbois.github.io/.
- teemi.design.retrieve_gene_homologs.longest_orf(seq, n=1)[source]
Longest ORF of a sequence. This function was made by Justin Bois : http://justinbois.github.io/.