WO2016193846A2 - Degenerate primer sets - Google Patents

Degenerate primer sets Download PDF

Info

Publication number
WO2016193846A2
WO2016193846A2 PCT/IB2016/052852 IB2016052852W WO2016193846A2 WO 2016193846 A2 WO2016193846 A2 WO 2016193846A2 IB 2016052852 W IB2016052852 W IB 2016052852W WO 2016193846 A2 WO2016193846 A2 WO 2016193846A2
Authority
WO
WIPO (PCT)
Prior art keywords
primer
pathogens
pathogen
sequence
amplicon
Prior art date
Application number
PCT/IB2016/052852
Other languages
French (fr)
Other versions
WO2016193846A3 (en
Inventor
Sonia CHOTHANI
Sitharthan Kamalakaran
Original Assignee
Koninklijke Philips N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips N.V. filed Critical Koninklijke Philips N.V.
Publication of WO2016193846A2 publication Critical patent/WO2016193846A2/en
Publication of WO2016193846A3 publication Critical patent/WO2016193846A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • C12Q1/689Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for bacteria
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/16Primer sets for multiplex assays
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Definitions

  • the present invention generally relates to primer sets for amplifying pathogenic bacteria, and more specifically to a degenerate primer set and the design of the same.
  • Embodiments of the present invention provide a set of broad ranging primers to allow rapid amplification of any pathogenic bacteria as well as a methodology for the design of these primers.
  • One approach to find such a set in accord with the present invention is to design primers for regions/genes that are conserved enough for universal primer binding and at the same time have enough variability to give distinguishing capacity.
  • the set may be selected based on optimizing for degeneracy and coverage.
  • embodiments of the present invention relate to a method of amplifying nucleic acid in a sample.
  • the method includes selecting at least one primer from Table 1 for the amplification of nucleic acid in a sample and producing an amplicon using the selected at least one primer.
  • the method further includes sequencing the amplicon and obtaining an amplicon sequence; comparing the amplicon sequence to a list of nucleic acids; and identifying the presence of at least one pathogen based on the comparison, wherein the list of nucleic acids corresponds to nucleic acid sequences found in at least one pathogen.
  • the identification of the at least one pathogen is performed in fewer than 10 hours.
  • the pathogen is bacterial.
  • the method further includes determining treatment based on the presence of the at least one identified pathogen; the sample may be obtained from a human and treatment may comprise administering an antibiotic to decrease the presence of the at least one identified pathogen.
  • the at least one primer from Table 1 comprises a primer with the sequence ACTCCTACGGGAGGCAGC. In one embodiment, the at least one primer from Table 1 comprises a primer set comprising a primer with the sequence ACTCCTACGGGAGGCAGC; a primer with the sequence CCAGCAGYYGCGGTAATA; and a primer with the sequence TCCTAAGGTAGCGAAATTCCT.
  • the at least one primer is optimized for the highest coverage of pathogens with the minimum number of primers. In one embodiment, the at least one primer is optimized for melting temperature and guanine-cytosine content. In one embodiment, the at least one primer is designed using phylogenetic conservation analysis. In one embodiment, the at least one primer is optimized based on at least one of degeneracy to cover a plurality of species of pathogens; and variability to distinguish among said plurality of species of pathogens.
  • embodiments of the present invention relate to a system for amplifying nucleic acid in a sample, the system comprising a primer set comprising at least one primer from Table 1.
  • embodiments of the present invention relate to a system for diagnosing the presence of at least one pathogen.
  • the system includes a primer set and a database of nucleic acids.
  • the primer set includes at least one primer from Table 1, the primer set optimized to produce an amplicon of nucleic acid in a sample.
  • the nucleic acids in the database correspond to the at least one pathogen, wherein the presence of the at least one pathogen in a sample is identifiable based on a comparison of the sequence of the amplicon to the database of nucleic acids corresponding to the at least one pathogen.
  • the primer set comprises a primer with the sequence ACTCCTACGGGAGGCAGC, a primer with the sequence CCAGCAGYYGCGGTAATA, and a primer with the sequence TCCTAAGGTAGCGAAATTCCT.
  • the at least one primer is further optimized based on at least one of: degeneracy to cover a plurality of species of pathogens; and variability to distinguish among said plurality of species of pathogens.
  • the system further includes a bioinformatics engine configured to sequence and compare the amplicon with the database of nucleic acids corresponding to the at least one pathogen; the bioinformatics engine may be further configured to produce a report including at least one of species ID, substrain ID, antibiotic for treatment, resistance value, and actionable treatment.
  • embodiments of the present invention relate to a method of obtaining a primer set.
  • the method includes identifying a plurality of pathogens of interest; identifying genes of said plurality of pathogens of interest; generating at least one primer for at least one of said identified genes; and repeating the foregoing steps to generate at least one additional primer for at least one other of said identified genes of said plurality of pathogens of interest until a primer is generated that corresponds to at least a majority of said plurality of pathogens of interest.
  • the at least one primer is generated based on an increased level of degeneracy among the identified genes of said plurality of pathogens of interest and variability among said identified genes of said plurality of pathogens of interest wherein said variability is sufficient to enable differentiation among said plurality of pathogens of interest.
  • FIG. 1 depicts an example of one embodiment of a method for primer design in accord with the present invention.
  • FIG. 2 illustrates a block diagram of an exemplary system for primer design according to the present invention.
  • like reference characters generally refer to corresponding parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed on the principles and concepts of operation.
  • Certain aspects of the present invention include process steps and instructions that could be embodied in software, firmware or hardware, and when embodied in software, could be downloaded to reside on and be operated from different platforms used by a variety of operating systems.
  • the present invention also relates to an apparatus for performing the operations herein.
  • This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer.
  • a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD- ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, solid state memory, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus or enterprise service bus.
  • the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability in a distributed manner.
  • Embodiments of the present invention concern a "universal" primer set to further reduce the turn-around time required for infectious disease diagnostics and thereby provide timely and accurate treatment to an infected patient.
  • the primer set encompasses most (i.e., > 90%) of these pathogens. Once the pathogen of interest is amplified using the primer set, it can be sequenced and mapped to a reference sequence to quickly identify the species in the infection.
  • primers are appended with adaptors to comply with Illumina index for sequencing.
  • NextGen sequencing the amplicon sequence is obtained.
  • a local BLAST may then be run on the amplicon sequence to identify the pathogen associated with the infection.
  • a primer is a short nucleic acid sequence used as a starting point for DNA synthesis. In PGR, primers are used to determine the DNA fragment to be amplified by the PCR process.
  • a primer may be designed to be specific a particular organism's gene sequence or it may be degenerate, i.e., designed for a gene sequence that is independent of a particular organism.
  • embodiments of the present invention focus on regions of genes (Step 100) that are conserved enough across pathogens to permit the selection of a common set of primers to enable the sequencing of the maximum number of pathogens, while having enough variability to differentiate species based on amplicons.
  • ribosomal genes are the target gene regions for the primer design process. Particular ribosomal genes of interest are the 16S and 23 S genes, but there are also some other genes like rpsB, rpsM, rpIP, etc., that have such properties. In other embodiments, some protein coding marker genes are considered conserved enough to determine phylogeny and are used as the target gene regions; such coding marker genes may be identified using a tool for phylogenetic analysis of genomes and metagenomes such as PHYLOSIFT, available from https://phylosift. WordPress.com/.
  • HYDEN available from http://acgt. es.tau.ae. il/hyden/
  • PRIMUX available from http://sourceforge.net/projects/primux/
  • the suggested primer set can then be evaluated to ensure that it is the minimum primer set (Step 108).
  • a table consisting of the primers and their percentage degeneracy with respect to each gene is generated and sorted by degeneracy percentage.
  • the minimum primer set (Step 108) may be adjusted to cover additional species (Step 112). In one embodiment, this may consist of taking the additional species and repeating Steps 100-108 for the additional species to obtain one or more additional primers to add to the primer set.
  • the resulting primer set may be reviewed to identify the minimum number of primers that cover the maximum number of species, and additional primers may be added to the minimal set until the point of diminishing returns is reached.
  • the result is a primer set combination that amplifies at least some region of all known pathogenic species, allowing amplification of a region given any infectious sample which can be further sequenced to identify the species in the infection.
  • the maximum number of primers allowed in the PCR cycle may serve as a natural check on the number of primers that can be added to the set.
  • Table 1 presents a list of several primers that have been identified for use in various embodiments of the Universal Primer Set (UPS).
  • UPS Universal Primer Set
  • One example of a UPS in accord with the present invention is developed by selecting 16S, 23 S, rpIB, and rpIP as the genes of interest (Step 100).
  • the primers for the genes of interest are, with reference to Table 1, Primer 1, Primer 4, Primer 6 and Primer 7 (Step 104). This combination covers 172 out of 181 pathogenic species, leaving only nine species uncovered.
  • These four primers can be used as a UPS, amplifying 95% of pathogenic species.
  • Overhang adapter sequences may be added to the primers (e.g., CAGTA on the 5' end and making sure there is no primer dimer formation) to promote compatibility with commercially available sequencing technologies (e.g., Illumina sequencing).
  • the genes of interest may be selected using a clustering method such as multiple sequence alignment (MSA) to identify regions of conservation in one or more subsets of the species of interest (Step 100').
  • MSA multiple sequence alignment
  • the genes of interest have regions that have a short conserved section, followed by a high information content section.
  • the least degenerate sequence may be located with a sliding window approach. Primers may then be chosen for each subset separately (Step 104). The minimal primer set may be confirmed by, e.g., generating a table listing the primers and their percentage degeneracy with respect to each gene sorting by degeneracy percentage.
  • UPS universal primer set
  • FIG. 2 is a block diagram of an exemplary system for universal primer design in accord with the present invention.
  • a computing unit 200 is in communication with a source of pathogen genomic data 208 and a source of sequencing data 204.
  • the computing unit 200 may take a variety of forms in various embodiments. Exemplary computing units suitable for use with the present invention include desktop computers, laptop computers, virtual computers, server computers, smartphones, tablets, phablets, etc.
  • Data sources 204, 208 may also take a variety of forms, including but not limited to structured databases (e.g., SQL databases), unstructured databases (e.g., Hadoop clusters, NoSQL databases), or other data sources running on a variety of computing units (e.g., desktop computers, laptop computers, virtual computers, server computers, smartphones, tablets, phablets, etc.).
  • the computing units may be heterogeneous or homogeneous in various embodiments of the present invention.
  • the data source 204 may be a piece of sequencing equipment that sequences the genome of at least one microorganism in a sample.
  • the data source 208 may be a publicly or privately accessible database of pathogenic genomic data.
  • the components of the systems may be interconnected using a variety of network technologies being heterogeneous or homogenous in various embodiments.
  • Suitable network technologies include but are not limited to wired network connections (e.g., Ethernet, gigabit Ethernet, token ring, etc.) and wireless network connections (e.g., Bluetooth, 802. l lx, 3G/4G wireless technologies, etc.).
  • the computing unit 200 queries the sequencing data source 204 for sequencing data for one or more microorganisms from a pathogen sample.
  • the sequencing data source 204 may have such information because it has performed such a test on the sample, or it may have received such information directly or indirectly (i.e., through data entry or transmission) from a piece of equipment that performed such testing.
  • the computing unit 200 queries the pathogen genomic data source 208 for information concerning the genomes for one or more pathogens identified by the sequencing data source 204.
  • the pathogen genomic data source 208 may have such information stored locally, or it may contact other computing units to obtain the relevant genomic information as necessary.
  • the computing unit 200 may receive sequencing data from the sequencing data source 204 for one or more amplicons generated using the UPS and query the pathogen genomic data source 208 to identify the pathogen associated with the sequenced amplicon.
  • the computing unit 200 may access either data source 204, 208 first or access both data sources contemporaneously.
  • computing unit 200 is local to an operator, i.e., being located on a local area network accessed by the operator.
  • computing unit 200 is accessed by an operator over yet another network connection (not shown), such as a wide area network or the Internet, and the graphical presentation is delivered to the operator over such network connection.
  • the computing unit 200 includes security and web server functionality customary to such remotely-accessed devices.
  • Embodiments of the present disclosure are described above with reference to block diagrams and/or operational illustrations of methods, systems, and computer program products according to embodiments of the present disclosure.
  • the functions/acts noted in the blocks may occur out of the order as shown in any flowchart.
  • two blocks shown in succession may in fact be executed substantially concurrent or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
  • not all of the blocks shown in any flowchart need to be performed and/or executed. For example, if a given flowchart has five blocks containing functions/acts, it may be the case that only three of the five blocks are performed and/or executed. In this example, any of the three of the five blocks may be performed and/or executed.

Abstract

A set of broad ranging primers to allow rapid amplification of any pathogenic bacteria as well as a methodology for the design of these primers. One approach to providing such a set in accord with the present invention is to design primers for regions/genes that are conserved enough for universal primer binding and at the same time have enough variability to give distinguishing capacity. The set may be selected based on optimizing for degeneracy and coverage.

Description

DEGENERATE PRIMER SETS
FIELD
[0001] The present invention generally relates to primer sets for amplifying pathogenic bacteria, and more specifically to a degenerate primer set and the design of the same. BACKGROUND
[0002] There is an urgent clinical need to rapidly and accurately identify infection-causing pathogens to avoid unwanted antibiotic usage and hence the spread of antibiotic resistance. Currently, a symptoms based elimination strategy is used to identify the infection, which can take a considerable amount of time. [0003] Next generation sequencing technologies permit infectious disease diagnostics to be performed in a matter of hours rather than days or weeks, provided that the proper primers are utilized in the sequencing process. There are, however, a large number of pathogenic species, and some primers are better suited than others for the amplification of particular species.
[0004] Accordingly, there is a need for an improved primer set that is able to amplify some part of any pathogenic bacterial genome and hence identify the bacteria causing infection in an accurate, timely and cost-effective manner.
SUMMARY
[0005] This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description section. This summary is not intended to identify or exclude key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
[0006] Embodiments of the present invention provide a set of broad ranging primers to allow rapid amplification of any pathogenic bacteria as well as a methodology for the design of these primers. One approach to find such a set in accord with the present invention is to design primers for regions/genes that are conserved enough for universal primer binding and at the same time have enough variability to give distinguishing capacity. The set may be selected based on optimizing for degeneracy and coverage.
[0007] In one aspect, embodiments of the present invention relate to a method of amplifying nucleic acid in a sample. The method includes selecting at least one primer from Table 1 for the amplification of nucleic acid in a sample and producing an amplicon using the selected at least one primer.
[0008] In one embodiment, the method further includes sequencing the amplicon and obtaining an amplicon sequence; comparing the amplicon sequence to a list of nucleic acids; and identifying the presence of at least one pathogen based on the comparison, wherein the list of nucleic acids corresponds to nucleic acid sequences found in at least one pathogen.
[0009] In one embodiment, the identification of the at least one pathogen is performed in fewer than 10 hours. In one embodiment, the pathogen is bacterial. In one embodiment, the method further includes determining treatment based on the presence of the at least one identified pathogen; the sample may be obtained from a human and treatment may comprise administering an antibiotic to decrease the presence of the at least one identified pathogen.
[0010] In one embodiment, the at least one primer from Table 1 comprises a primer with the sequence ACTCCTACGGGAGGCAGC. In one embodiment, the at least one primer from Table 1 comprises a primer set comprising a primer with the sequence ACTCCTACGGGAGGCAGC; a primer with the sequence CCAGCAGYYGCGGTAATA; and a primer with the sequence TCCTAAGGTAGCGAAATTCCT.
[0011] In one embodiment, the at least one primer is optimized for the highest coverage of pathogens with the minimum number of primers. In one embodiment, the at least one primer is optimized for melting temperature and guanine-cytosine content. In one embodiment, the at least one primer is designed using phylogenetic conservation analysis. In one embodiment, the at least one primer is optimized based on at least one of degeneracy to cover a plurality of species of pathogens; and variability to distinguish among said plurality of species of pathogens.
[0012] In another aspect, embodiments of the present invention relate to a system for amplifying nucleic acid in a sample, the system comprising a primer set comprising at least one primer from Table 1. [0013] In yet another aspect, embodiments of the present invention relate to a system for diagnosing the presence of at least one pathogen. The system includes a primer set and a database of nucleic acids. The primer set includes at least one primer from Table 1, the primer set optimized to produce an amplicon of nucleic acid in a sample. The nucleic acids in the database correspond to the at least one pathogen, wherein the presence of the at least one pathogen in a sample is identifiable based on a comparison of the sequence of the amplicon to the database of nucleic acids corresponding to the at least one pathogen.
[0014] In one embodiment, the primer set comprises a primer with the sequence ACTCCTACGGGAGGCAGC, a primer with the sequence CCAGCAGYYGCGGTAATA, and a primer with the sequence TCCTAAGGTAGCGAAATTCCT. In one embodiment, the at least one primer is further optimized based on at least one of: degeneracy to cover a plurality of species of pathogens; and variability to distinguish among said plurality of species of pathogens. In one embodiment, the system further includes a bioinformatics engine configured to sequence and compare the amplicon with the database of nucleic acids corresponding to the at least one pathogen; the bioinformatics engine may be further configured to produce a report including at least one of species ID, substrain ID, antibiotic for treatment, resistance value, and actionable treatment.
[0015] In still yet another aspect, embodiments of the present invention relate to a method of obtaining a primer set. The method includes identifying a plurality of pathogens of interest; identifying genes of said plurality of pathogens of interest; generating at least one primer for at least one of said identified genes; and repeating the foregoing steps to generate at least one additional primer for at least one other of said identified genes of said plurality of pathogens of interest until a primer is generated that corresponds to at least a majority of said plurality of pathogens of interest. The at least one primer is generated based on an increased level of degeneracy among the identified genes of said plurality of pathogens of interest and variability among said identified genes of said plurality of pathogens of interest wherein said variability is sufficient to enable differentiation among said plurality of pathogens of interest.
[0016] These and other features and advantages, which characterize the present non-limiting embodiments, will be apparent from a reading of the following detailed description and a review of the associated drawings. It is to be understood that both the foregoing general description and the following detailed description are explanatory only and are not restrictive of the non-limiting embodiments as claimed.
BRIEF DESCRIPTION OF DRAWINGS
[0017] Non-limiting and non-exhaustive embodiments are described with reference to the following figures in which:
[0018] FIG. 1 depicts an example of one embodiment of a method for primer design in accord with the present invention; and
[0019] FIG. 2 illustrates a block diagram of an exemplary system for primer design according to the present invention. [0020] In the drawings, like reference characters generally refer to corresponding parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed on the principles and concepts of operation.
DETAILED DESCRIPTION
[0021] Various embodiments are described more fully below with reference to the accompanying drawings, which form a part hereof, and which show specific exemplary embodiments. However, embodiments may be implemented in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the embodiments to those skilled in the art. Embodiments may be practiced as methods, systems or devices. Accordingly, embodiments may take the form of a hardware implementation, an entirely software implementation or an implementation combining software and hardware aspects. The following detailed description is, therefore, not to be taken in a limiting sense.
[0022] Reference in the specification to "one embodiment" or to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least one embodiment of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment.
[0023] Some portions of the description that follow are presented in terms of symbolic representations of operations on non-transient signals stored within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. Such operations typically require physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical, magnetic or optical signals capable of being stored, transferred, combined, compared and otherwise manipulated. It is convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. Furthermore, it is also convenient at times, to refer to certain arrangements of steps requiring physical manipulations of physical quantities as modules or code devices, without loss of generality.
[0024] However, all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as "processing" or "computing" or "calculating" or "determining" or "displaying" or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system memories or registers or other such information storage, transmission or display devices.
[0025] Certain aspects of the present invention include process steps and instructions that could be embodied in software, firmware or hardware, and when embodied in software, could be downloaded to reside on and be operated from different platforms used by a variety of operating systems.
[0026] The present invention also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD- ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, solid state memory, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus or enterprise service bus. Furthermore, the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability in a distributed manner.
[0027] The processes and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may also be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from the description below. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any references below to specific languages are provided for disclosure of enablement and best mode of the present invention.
[0028] In addition, the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, the disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the claims.
Overview
[0029] Embodiments of the present invention concern a "universal" primer set to further reduce the turn-around time required for infectious disease diagnostics and thereby provide timely and accurate treatment to an infected patient. The primer set encompasses most (i.e., > 90%) of these pathogens. Once the pathogen of interest is amplified using the primer set, it can be sequenced and mapped to a reference sequence to quickly identify the species in the infection.
[0030] In ordinary application these primers are appended with adaptors to comply with Illumina index for sequencing. Using multiplex PCR and NextGen sequencing the amplicon sequence is obtained. A local BLAST may then be run on the amplicon sequence to identify the pathogen associated with the infection.
[0031] A primer is a short nucleic acid sequence used as a starting point for DNA synthesis. In PGR, primers are used to determine the DNA fragment to be amplified by the PCR process. A primer may be designed to be specific a particular organism's gene sequence or it may be degenerate, i.e., designed for a gene sequence that is independent of a particular organism.
[0032] With reference to Figure 1, embodiments of the present invention focus on regions of genes (Step 100) that are conserved enough across pathogens to permit the selection of a common set of primers to enable the sequencing of the maximum number of pathogens, while having enough variability to differentiate species based on amplicons.
[0033] In some embodiments, ribosomal genes are the target gene regions for the primer design process. Particular ribosomal genes of interest are the 16S and 23 S genes, but there are also some other genes like rpsB, rpsM, rpIP, etc., that have such properties. In other embodiments, some protein coding marker genes are considered conserved enough to determine phylogeny and are used as the target gene regions; such coding marker genes may be identified using a tool for phylogenetic analysis of genomes and metagenomes such as PHYLOSIFT, available from https://phylosift.wordpress.com/.
[0034] While conserved behavior is important when selecting a target gene region, it is important to balance the variability in the gene region against the conserved behavior in choosing the final primer set. For example, it may be theoretically possible to have one primer that covers all pathogenic species, but the variability among amplicons may not be enough to differentiate among species.
[0035] One a particular target gene region has been selected, commercially available software tools such as HYDEN (available from http://acgt. es.tau.ae. il/hyden/) and PRIMUX (available from http://sourceforge.net/projects/primux/) can be applied to the bacterial genome multiple sequence alignment data to output a set of primers (Step 104) that cover the maximum number of bacteria while satisfying specified criteria such as degeneracy, expected amplicon size, melting temperature, etc. [0036] The suggested primer set (Step 104) can then be evaluated to ensure that it is the minimum primer set (Step 108). In one embodiment, a table consisting of the primers and their percentage degeneracy with respect to each gene is generated and sorted by degeneracy percentage. [0037] The minimum primer set (Step 108) may be adjusted to cover additional species (Step 112). In one embodiment, this may consist of taking the additional species and repeating Steps 100-108 for the additional species to obtain one or more additional primers to add to the primer set.
[0038] The resulting primer set may be reviewed to identify the minimum number of primers that cover the maximum number of species, and additional primers may be added to the minimal set until the point of diminishing returns is reached. The result is a primer set combination that amplifies at least some region of all known pathogenic species, allowing amplification of a region given any infectious sample which can be further sequenced to identify the species in the infection. The maximum number of primers allowed in the PCR cycle may serve as a natural check on the number of primers that can be added to the set. Table 1 presents a list of several primers that have been identified for use in various embodiments of the Universal Primer Set (UPS).
[0039] One example of a UPS in accord with the present invention is developed by selecting 16S, 23 S, rpIB, and rpIP as the genes of interest (Step 100). The primers for the genes of interest are, with reference to Table 1, Primer 1, Primer 4, Primer 6 and Primer 7 (Step 104). This combination covers 172 out of 181 pathogenic species, leaving only nine species uncovered.
[0040] These four primers can be used as a UPS, amplifying 95% of pathogenic species. Overhang adapter sequences may be added to the primers (e.g., CAGTA on the 5' end and making sure there is no primer dimer formation) to promote compatibility with commercially available sequencing technologies (e.g., Illumina sequencing).
[0041] In another embodiment, the genes of interest may be selected using a clustering method such as multiple sequence alignment (MSA) to identify regions of conservation in one or more subsets of the species of interest (Step 100'). In some embodiments the genes of interest have regions that have a short conserved section, followed by a high information content section. [0042] The least degenerate sequence may be located with a sliding window approach. Primers may then be chosen for each subset separately (Step 104). The minimal primer set may be confirmed by, e.g., generating a table listing the primers and their percentage degeneracy with respect to each gene sorting by degeneracy percentage. [0043] One the universal primer set (UPS) is designed and a multiplex PCR experiment is carried out, the result is an amplicon to be sequenced. Once the amplicon is sequenced, it is mapped back to its source genome using, e.g., a local BLAST search on the pathogen genome database.
Figure imgf000011_0001
Table 1 - Primers for Universal Primer Set (UPS) [0044] Figure 2 is a block diagram of an exemplary system for universal primer design in accord with the present invention. In this embodiment, a computing unit 200 is in communication with a source of pathogen genomic data 208 and a source of sequencing data 204.
[0045] The computing unit 200 may take a variety of forms in various embodiments. Exemplary computing units suitable for use with the present invention include desktop computers, laptop computers, virtual computers, server computers, smartphones, tablets, phablets, etc. Data sources 204, 208 may also take a variety of forms, including but not limited to structured databases (e.g., SQL databases), unstructured databases (e.g., Hadoop clusters, NoSQL databases), or other data sources running on a variety of computing units (e.g., desktop computers, laptop computers, virtual computers, server computers, smartphones, tablets, phablets, etc.). The computing units may be heterogeneous or homogeneous in various embodiments of the present invention. In some embodiments, the data source 204 may be a piece of sequencing equipment that sequences the genome of at least one microorganism in a sample. In some embodiments, the data source 208 may be a publicly or privately accessible database of pathogenic genomic data.
[0046] The components of the systems may be interconnected using a variety of network technologies being heterogeneous or homogenous in various embodiments. Suitable network technologies include but are not limited to wired network connections (e.g., Ethernet, gigabit Ethernet, token ring, etc.) and wireless network connections (e.g., Bluetooth, 802. l lx, 3G/4G wireless technologies, etc.).
[0047] In operation, the computing unit 200 queries the sequencing data source 204 for sequencing data for one or more microorganisms from a pathogen sample. The sequencing data source 204 may have such information because it has performed such a test on the sample, or it may have received such information directly or indirectly (i.e., through data entry or transmission) from a piece of equipment that performed such testing.
[0048] In operation, the computing unit 200 queries the pathogen genomic data source 208 for information concerning the genomes for one or more pathogens identified by the sequencing data source 204. The pathogen genomic data source 208 may have such information stored locally, or it may contact other computing units to obtain the relevant genomic information as necessary.
[0049] As discussed above, the computing unit 200 may receive sequencing data from the sequencing data source 204 for one or more amplicons generated using the UPS and query the pathogen genomic data source 208 to identify the pathogen associated with the sequenced amplicon.
[0050] The computing unit 200 may access either data source 204, 208 first or access both data sources contemporaneously. In some embodiments, computing unit 200 is local to an operator, i.e., being located on a local area network accessed by the operator. In other embodiments, computing unit 200 is accessed by an operator over yet another network connection (not shown), such as a wide area network or the Internet, and the graphical presentation is delivered to the operator over such network connection. In these embodiments, the computing unit 200 includes security and web server functionality customary to such remotely-accessed devices.
[0051] Embodiments of the present disclosure, for example, are described above with reference to block diagrams and/or operational illustrations of methods, systems, and computer program products according to embodiments of the present disclosure. The functions/acts noted in the blocks may occur out of the order as shown in any flowchart. For example, two blocks shown in succession may in fact be executed substantially concurrent or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Additionally, not all of the blocks shown in any flowchart need to be performed and/or executed. For example, if a given flowchart has five blocks containing functions/acts, it may be the case that only three of the five blocks are performed and/or executed. In this example, any of the three of the five blocks may be performed and/or executed.
[0052] The description and illustration of one or more embodiments provided in this application are not intended to limit or restrict the scope of the present disclosure as claimed in any way. The embodiments, examples, and details provided in this application are considered sufficient to convey possession and enable others to make and use the best mode of the claimed embodiments. The claimed embodiments should not be construed as being limited to any embodiment, example, or detail provided in this application. Regardless of whether shown and described in combination or separately, the various features (both structural and methodological) are intended to be selectively included or omitted to produce an embodiment with a particular set of features. Having been provided with the description and illustration of the present application, one skilled in the art may envision variations, modifications, and alternate embodiments falling within the spirit of the broader aspects of the general inventive concept embodied in this application that do not depart from the broader scope of the claimed embodiments.

Claims

CLAIMS What is claimed is:
1. A method of amplifying nucleic acid in a sample, the method comprising:
selecting at least one primer from Table 1 for the amplification of nucleic acid in a sample; and
producing an amplicon using the selected at least one primer.
2. The method of claim 1 further comprising:
sequencing said amplicon and obtaining an amplicon sequence;
comparing said amplicon sequence to a list of nucleic acids, wherein said list of nucleic acids corresponds to nucleic acid sequences found in at least one pathogen; and
identifying the presence of at least one pathogen based on said comparison.
3. The method of claim 2 wherein identifying the at least one pathogen is performed in fewer than 10 hours.
4. The method of claim 2 wherein said pathogen is bacterial.
5. The method of claim 2 further comprising determining treatment based on said presence of said at least one identified pathogen.
6. The method of claim 5 wherein the sample is obtained from a human and treatment comprises administration of an antibiotic to decrease the presence of the at least one identified pathogen.
7. The method of claim 1 wherein said at least one primer from Table 1 comprises a primer with the sequence ACTCCTACGGGAGGCAGC.
8. The method of claim 1 wherein said at least one primer from Table 1 comprises a primer set comprising:
a primer with the sequence ACTCCTACGGGAGGCAGC;
a primer with the sequence CCAGCAGYYGCGGTAATA; and
a primer with the sequence TCCTAAGGTAGCGAAATTCCT.
9. The method of claim 1 wherein said at least one primer is optimized for the highest coverage of pathogens with the minimum number of primers.
10. The method of claim 1 where said at least one primer is optimized for melting temperature and guanine-cytosine content.
11. The method of claim 1 wherein the at least one primer is designed using phylogenetic conservation analysis.
12. The method of claim 1 wherein the at least one primer is optimized based on at least one of:
degeneracy to cover a plurality of species of pathogens; and
variability to distinguish among said plurality of species of pathogens.
13. A system for amplifying nucleic acid in a sample, the system comprising:
a primer set comprising at least one primer from Table 1.
14. A system for diagnosing the presence of at least one pathogen, the system comprising: a primer set comprising at least one primer from Table 1, said primer set optimized to produce an amplicon of nucleic acid in a sample; and
a database of nucleic acids corresponding to the at least one pathogen, wherein the presence of said at least one pathogen in a sample is identifiable based on a comparison of the sequence of the amplicon to the database of nucleic acids corresponding to the at least one pathogen.
15. The system of claim 14 wherein said primer set comprises:
a primer with the sequence ACTCCTACGGGAGGCAGC;
a primer with the sequence CCAGCAGYYGCGGTAATA; and
a primer with the sequence TCCTAAGGTAGCGAAATTCCT...
16. The system of claim 14 wherein the at least one primer is further optimized based on at least one of:
degeneracy to cover a plurality of species of pathogens; and
variability to distinguish among said plurality of species of pathogens.
17. The system of claim 14 further comprising:
a bioinformatics engine configured to sequence and compare said amplicon with said database of nucleic acids corresponding to the at least one pathogen.
18. The system of claim 17 wherein the bioinformatics engine is further configured to produce a report comprising at least one of species ID, substrain ID, antibiotic for treatment; resistance value; and actionable treatment.
19. A method of obtaining a primer set comprising:
(a) identifying a plurality of pathogens of interest;
(b) identifying genes of said plurality of pathogens of interest;
(c) generating at least one primer for at least one of said identified genes wherein said at least one primer is generated based on:
an increased level of degeneracy among said identified genes of said plurality of pathogens of interest; and
variability among said identified genes of said plurality of pathogens of interest wherein said variability is sufficient to enable differentiation among said plurality of pathogens of interest; and
(d) repeating steps (a)-(c) to generate at least one additional primer for at least one other of said identified genes of said plurality of pathogens of interest until a primer is generated that corresponds to at least a majority of said plurality of pathogens of interest.
PCT/IB2016/052852 2015-05-29 2016-05-17 Degenerate primer sets WO2016193846A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201562167996P 2015-05-29 2015-05-29
US62/167,996 2015-05-29

Publications (2)

Publication Number Publication Date
WO2016193846A2 true WO2016193846A2 (en) 2016-12-08
WO2016193846A3 WO2016193846A3 (en) 2017-02-02

Family

ID=56119712

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2016/052852 WO2016193846A2 (en) 2015-05-29 2016-05-17 Degenerate primer sets

Country Status (1)

Country Link
WO (1) WO2016193846A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180002109A (en) * 2016-06-28 2018-01-08 재단법인대구경북과학기술원 Method for rapid design of valid high-quality primers and probes for multiple target genes in qPCR experiments
KR20180015690A (en) * 2018-01-30 2018-02-13 재단법인대구경북과학기술원 Method for rapid design of valid high-quality primers and probes for multiple target genes in qPCR experiments

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6054278A (en) * 1997-05-05 2000-04-25 The Perkin-Elmer Corporation Ribosomal RNA gene polymorphism based microorganism identification
KR100612877B1 (en) * 2004-12-01 2006-08-14 삼성전자주식회사 Novel heating material method and device for lysing cell using the material
KR100763906B1 (en) * 2004-12-23 2007-10-05 삼성전자주식회사 9 9 A primer set capable of specifically amplifying a target sequence found in 9 bacterial species and probe oligonucleotide specifically hybridizable with each target sequence of the 9 bacterial species
US20060211002A1 (en) * 2005-03-18 2006-09-21 Jan Weile Antibiotic susceptibility and virulence factor detection in Pseudomonas aeruginosa
CN1824805A (en) * 2005-12-09 2006-08-30 上海大学 Process for testing bacteria gene chip in sewage
KR100738083B1 (en) * 2005-12-20 2007-07-12 삼성전자주식회사 A substrate used in a microarray and a method for preparing the same
KR100846494B1 (en) * 2006-09-26 2008-07-17 삼성전자주식회사 Primer set for amplifying target sequences of Helicobacter pylori method for detecting Helicobacter pylori using the primer set and kit for detecting Helicobacter pylori comprising the primer set
WO2009049007A2 (en) * 2007-10-10 2009-04-16 Magellan Biosciences, Inc. Compositions, methods and systems for rapid identification of pathogenic nucleic acids
CN101210262A (en) * 2007-12-24 2008-07-02 哈尔滨工业大学 Method for analyzing structure composition of microorganism community by employing single-chain conformation polymorphism technique
US20090263809A1 (en) * 2008-03-20 2009-10-22 Zygem Corporation Limited Methods for Identification of Bioagents
US8354233B2 (en) * 2008-04-01 2013-01-15 University Of Washington Sequence data by reduction of noise due to carry-over primer
US8425920B2 (en) * 2009-04-27 2013-04-23 Institut Pasteur Method for inducing lymphoid tissue and modulating intestinal homeostasis
WO2012116289A2 (en) * 2011-02-25 2012-08-30 Tricorder Diagnostics, Llc Microbial signatures as indicators of radiation exposure
US20120231459A1 (en) * 2011-03-09 2012-09-13 Gen-Probe Incorporated Chemiluminescent probes for multiplex molecular quantification and uses thereof
CA2846804A1 (en) * 2011-08-26 2013-03-07 Microbiota Diagnostics, Llc Methods for diagnosing and treating cardiac defects

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
None

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180002109A (en) * 2016-06-28 2018-01-08 재단법인대구경북과학기술원 Method for rapid design of valid high-quality primers and probes for multiple target genes in qPCR experiments
KR101889146B1 (en) 2016-06-28 2018-08-17 재단법인대구경북과학기술원 Method for rapid design of valid high-quality primers and probes for multiple target genes in qPCR experiments
KR20180015690A (en) * 2018-01-30 2018-02-13 재단법인대구경북과학기술원 Method for rapid design of valid high-quality primers and probes for multiple target genes in qPCR experiments
KR101912555B1 (en) 2018-01-30 2018-10-26 재단법인대구경북과학기술원 Method for rapid design of valid high-quality primers and probes for multiple target genes in qPCR experiments

Also Published As

Publication number Publication date
WO2016193846A3 (en) 2017-02-02

Similar Documents

Publication Publication Date Title
US10127351B2 (en) Accurate and fast mapping of reads to genome
Franzén et al. Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering
Brown et al. Critical evaluation of short, long, and hybrid assembly for contextual analysis of antibiotic resistance genes in complex environmental metagenomes
Alneberg et al. Binning metagenomic contigs by coverage and composition
Laing et al. Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions
Al-Hebshi et al. Robust species taxonomy assignment algorithm for 16S rRNA NGS reads: application to oral carcinoma samples
Loman et al. Performance comparison of benchtop high-throughput sequencing platforms
Larsen et al. Multilocus sequence typing of total-genome-sequenced bacteria
Brealey et al. Dental calculus as a tool to study the evolution of the mammalian oral microbiome
Barony et al. Large-scale genomic analyses reveal the population structure and evolutionary trends of Streptococcus agalactiae strains in Brazilian fish farms
AU2021212155B2 (en) Method and apparatus for estimating the quantity of microorganisms within a taxonomic unit in a sample
Links et al. mPUMA: a computational approach to microbiota analysis by de novo assembly of operational taxonomic units based on protein-coding barcode sequences
Hoffman et al. Species-level resolution of female bladder microbiota from 16S rRNA amplicon sequencing
Saeb Current Bioinformatics resources in combating infectious diseases
Ohta et al. Using nanopore sequencing to identify fungi from clinical samples with high phylogenetic resolution
WO2016193846A2 (en) Degenerate primer sets
Langsiri et al. Targeted sequencing analysis pipeline for species identification of human pathogenic fungi using long-read nanopore sequencing
Chen et al. Identification of conserved and polymorphic STRs for personal genomes
US20190147979A1 (en) Electronic Methods And Systems For Microorganism Characterization
Hu et al. Inferring species compositions of complex fungal communities from long-and short-read sequence data
Osikowicz et al. A bioinformatics pipeline for a tick pathogen surveillance multiplex amplicon sequencing assay
Yadav et al. OTUX: V-region specific OTU database for improved 16S rRNA OTU picking and efficient cross-study taxonomic comparison of microbiomes
Galata et al. Comparing genome versus proteome-based identification of clinical bacterial isolates
Torres-Morales et al. Site-specialization of human oral Gemella species
Page et al. Rapid Mycobacterium tuberculosis spoligotyping from uncorrected long reads using Galru

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16728734

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16728734

Country of ref document: EP

Kind code of ref document: A2