Download RAGTOP References

Tree Graph Definition ▼

Our RNA-As-Graphs (RAG) approach represents RNA secondary (2D) structures as undirected tree and dual graphs [5]. These coarse-grained representations reduce the complexity of the RNA 2D structure and offer an efficient, alternative method to study RNA structure.

Undirected tree graphs are used to represent the 2D structure of an RNA molecule [6, 7]. All unpaired single stranded regions, such as hairpin loops, internal loops and bulges (with at least two nucleotides in either strand), junctions, and dangling ends are represented by a vertex (see images below). Dangling ends refer to exterior loop nucleotides adjacent to stems at the 5' and/or 3' end of an RNA sequence. An RNA stem with at least two canonical base pairs (AU, GC, and GU wobble) is represented by an edge connecting the vertices/loops. Single, isolated base pairs are ignored.


By defining additional vertices and edges, we convert the 2D tree graph into a 3D tree graph [1]. Besides adding two vertices to illustrate the 5' and 3' ends of each helix, vertices are also added to represent internal loops and bulges that contain fewer than two nucleotides in their strands. In contrast to a 2D tree graph, the edges of a 3D tree graph link two vertices representing each helix, or the loop vertices to the proximal end helical vertices. The length of an edge is scaled by the number of nucleotides in the corresponding helices and loops. A 3D graph can also be constructed from an RNA 3D structure. This allows us to score RNA 3D structures using our knowledge-based potential. The figure below shows the 2D and 3D tree graphs for a 57-nucleotide fragment of rRNA (PDB ID: 1DK1).

Representing RNA structure as 2D graphs allows us to use graph theory methods, such as graph-isomorphism, partitioning, and enumeration to analyze RNA structure [8]. We have used graph enumeration methods to generate tree graph topologies up to 13 vertices, and classified them into existing, RNA-like, and non RNA-like topologies by clustering techniques. [6, 9, 10].

Learn more about RNA tree graphs on the RAG Website.

Citation Information ▼

Please include the following citations when using the software provided by this server.

For RAGTOP's Monte Carlo Refinement:

For 3D Model Building:

For Sequence Design:

Usage and Sample Results ▼

RAG Sampler

RAG Sampler starts with 3D tree graphs from the given 2D structure and uses hierarchical graph-sampling by Monte Carlo and/or Simulated Annealing to generate candidate 3D graph topologies [1, 2].

Usage:

The user is asked to provide a RNA 2D structure file in .bpseq format (see File Formats). The user also has the option to download a sample .bpseq file or run our graph sampling protocol with the sample input. Our sampling protocol can be run with default parameters or the user can choose different options (clearly explained on the file upload page).

After the input file is successfully uploaded, the user is redirected to the results page, which is automatically refreshed until processing is finished and results are available. The link to the results page will also be emailed to the address provided by the user (optional). The link and the result files will be active and available to download for 15 days. The user also has the option to cancel or delete the job and associated data from the results page.

See the full description of our RAGTOP protocol run by RAG Sampler: Graph-based sampling for approximating global helical topologies of RNA, Using sequence signatures and kink-turn motifs in knowledge-based statistical potentials for RNA structure prediction.

Sample results:

Results produced by RAG Sampler for the sample input provided on the RAG Sampler main page (29-nucleotide RNA Aptamer, 2D structure derived from PDB ID: 1OOA) are shown below.

example monte carlo results

After RAGTOP has finished running for 50,000 MC/SA steps, a visual rendering of the RNA secondary structure is displayed and two plots similar to the ones shown above are produced. The plot titled "Graph Scores" displays the scores of every candidate graph generated, with graph numbers on the x-axis and scores on the y-axis. The lowest scoring candidate graph (e.g., graph 1075 with score –1.58144 shown above) is marked on the plot. This plot also clearly indicates the convergence of the RAGTOP sampling run. The plot titled "Score Distribution" highlights the number of graphs generated for different score ranges. From this plot, the mode range of scores can easily be determined. For example, the mode range for the above run is –2 to 0, with scores for more than 800 graphs in this range. Other relevant information about the run, including the number of nucleotides in the input 2D structure, total number of candidate graphs generated, and run time are also shown. The plots (PNG file), all candidate graphs (PDB file), and the score file (TXT file) are available for download.

For convenience of the user, the lowest scoring graph can be directly uploaded to RAG Builder to build 3D atomic models using the provided link.

Full sample results page and files available here

RAG Builder

RAG Builder produces atomic models for a target RNA 2D structure and a 3D tree graph topology [3].

Usage:

The user is asked to upload an RNA 2D structure file in .bpseq format and the target 3D tree graph file in .pdb format (see File Formats). The user also has the option to download sample input or run RAG Builder with the sample input. The lowest scoring graph from the RAG Sampler run can also be directly uploaded as input from the RAG Sampler results page. After the input files are successfully uploaded, the target graph is partitioned into subgraphs. The user can either run RAG Builder with the default selection of subgraphs or choose their own subgraphs from the options provided in the drop down menu.

After RAG Builder successfully starts running, the user is redirected to the results page, which is automatically refreshed until processing is finished and results are available. The link to the results page will also be emailed to the address provided by the user (optional). The link and the result files will be active and available to download for 15 days. The user also has the option to cancel or delete the job and associated data from the results page.

See the full description of our fragment assembly F-RAG protocol run by RAG Builder: F-RAG: Generating Atomic Coordinates from RNA Graphs by Fragment Assembly.

Sample results:

Results produced by RAG Builder for the sample input provided on the RAG Builder main page (29-nucleotide RNA Aptamer, 2D structure derived from PDB ID: 1OOA) are shown below.

example 3d results

Once F-RAG has finished running, a visual rendering of the RNA secondary structure is displayed and two plots similar to the ones shown above are produced. The plot titled "Model Nucleotide Distribution" shows the distribution of number of nucleotides for all 3D models generated. The plot titled "Ranked Model Scores" shows the scores of all 3D models genearted, plotted in increasing order. The top 20 (or fewer) lowest scoring, unique models that contain all nucleotides are highlghted in blue. If no models have the required number of nucleotides, there are no blue points on the second plot. Other relevant information about the run, including the number of nucleotides in the input 2D structure, total number of models generated, and run time are also shown. The plots (PNG file), generated atomic models and 3D graphs (PDB file), model score files, and list of lowest scoring unique models (TXT file) are available for download. The model score file contains the score and number of nucleotides in each model. The lowest scoring model file lists the model number and the score for the lowest scoring 20 (or less) unique models.

Note that the atomic models produced by F-RAG may contain chain breaks or missing nucleotides, as this webserver does not geometry optimize or energy minimize the generated atomic models. For best results, we recommend running F-RAG for multiple subgraph decompositions of the target 3D tree graph, selecting atomic models with the highest number of nucleotides and the lowest scores (lower scores are better), followed by geometry optimization and/or energy minimization before further use [3].

Full sample results page and files available here

RAG Designer

RAG Designer designs sequences and corresponding atomic models that fold onto a target RNA-like tree graph topology [4].

Usage:

The user is asked to upload a target tree graph topplogy in the form of an adjacency matrix (see File Formats). The user also has the option to download a sample adjacency matrix or run RAG Designer with the sample input. After the input file is successfully uploaded, the target graph is partitioned into subgraphs. The user can either run RAG Designer with the default selection of subgraphs or choose their own subgraphs from the options provided in the drop down menu.

After RAG Designer successfully starts running, the user is redirected to the results page, which is automatically refreshed until processing is finished and results are available. The link to the results page will also be emailed to the address provided by the user (optional). The link and the result files will be active and available to download for 15 days. The user also has the option to cancel or delete the job and associated data from the results page.

See the full description of our design pipeline: A pipeline for computational design of novel RNA-like topologies.

Sample results:

Results produced by RAG Designer for the sample input provided on the RAG Designer main page (RNA-like topology 7_4) are shown below.

example design results

Once F-RAG has finished running, the 2D structures of the 200 (or fewer) lowest scoring unique sequences are predicted by two programs, RNAfold[15] and NUPACK[16, 17, 18], and sequences that are predicted onto the target topology with both programs are selected. The two plots shows the number of nucleotides and score for all top 200 sequences, with the sequences that fold onto the target topology marked as blue diamonds. If none of the top 200 sequences are predicted to fold onto the target topology by RNAfold and NUPACK, no sequences will be highlighted. Other relevant information about the run, including the target topology, total number of sequences generated, and run time are also shown. The user is provided with a drop-down menu to visualize designed 2D structures of sequences that fold onto the correct topology. The plots (PNG file), generated atomic models and 3D graphs (PDB file), model score files, list of lowest scoring unique sequences, and list of sequences that fold onto the target topology using RNAfold and NUPACK (TXT file) are available for download. The model score file contains the score and number of nucleotides in each model/sequence. The lowest scoring model file lists the sequence, model number, score, and the topology as predicted by RNAfold and NUPACK for the lowest scoring 200 (or fewer) unique sequences. The list of sequences that do fold onto the target topology with RNAfold and NUPACK also contains their model number, number of nucleotides, and score.

Note that the atomic models generated by F-RAG are not optimized for energy or geometry and may contain chain breaks. Sequences corresponding to the 2D strucures and 3D atomic models generated by F-RAG are not guaranteed to fold onto the target RNA-like topology using any other RNA 2D structure prediction programs. For best results, we recommend running F-RAG for multiple orientations and subgraph decompositions of the target RNA-like topology, selecting atomic models without chain breaks and lowest scores (lower scores are better). We highly recommend screening the lowest scoring unique sequences by at least two different 2D structure prediction programs (like with RNAfold and NUPACK done here)[4].

Full sample results page and files available here

File Formats ▼

Input File Formats

BPSEQ

A bpseq file contains information regarding the secondary structure of RNA molecules. There are three columns within a bpseq file. The first column lists the nucleotide number for each base present in the 5' to 3' direction. The second column lists the nucleotide type for each base (A, U, G, C) and the third column lists the nucleotide number of the base that is paired to that base (0 if the base is unpaired).

Below is an example RNA secondary structure (left) and its corresponding bpseq file (right). For RAG Sampler and RAG Builder, the file needs to have the extension .bpseq.

1 G 19
2 G 18
3 C 17
4 C 16
5 G 15
6 U 14
7 A 0
8 A 0
9 C 0
10 U 0
11 A 0
12 U 0
13 A 0
14 A 6
15 C 5
16 G 4
17 G 3
18 U 2
19 C 1

Adjacency Matrices

The first step in design is to upload the adjacency matrix of the target RNA tree graph. The adjacency matrix describes the connectivity of the vertices in the tree graph. In other words, an adjacency matrix explicitly states which vertices are connected to one another. Below is an example of an RNA tree graph (left) and its corresponding adjacency matrix (middle). The first row in the table represents the first vertex and the second row represents the second vertex and so on.

For RAG Designer, the vertices should be numbered in an increasing order from 5' to 3' direction, with the first vertex as the 5'/3' end vertex with one connection. This specifies the order of loops in the designed RNA sequence. An example designed sequence and its 2D structure that corresponds to the tree graph topology is also shown (right). Note that more than one loop order, each with multiple 2D structures can correspond to the same tree graph topology.

0 1 0 0 0
1 0 1 1 1
0 1 0 0 0
0 1 0 0 0
0 1 0 0 0

Output File Formats

RAG Sampler score file

The score file is provdided as a TXT file. The file contains one column which corresponds to the score for generated candidate 3D graphs.

RAG Builder score files

The score files are provided as TXT files. The 'Lowest Scores' file contains atomic model number for the selected best models, along with their nucleotide number and score. The 'Model Scores' file contains the model number, graph RMSD of their 3D graph from the target graph, score, steric-clash information, and nucleotide number for all generated models.

RAG Designer sequence and score files

The sequence and score files are provided as TXT files. The 'Correct Sequences' file contains the sequences that fold onto the target topology, along with their nucleotide number, and score and model number for the corresponding atomic model. The '200 Lowest Scores' file contains the same information for the top 200 (or fewer) unique sequences, and their graph topology as predicted by RNAfold and NUPACK (which may or may not be the target topology). The 'Model Scores' file contains the model number, score, steric-clash information, and nucleotide number for all generated models/sequences.

2D structures

Each 2D structure file is generated in BPSEQ format (see above). All 2D structure files are compressed and provided as a .zip file for ease of download.

3D Graphs and Atomic Models

Each 3D graph and 3D atomic model is generated in PDB format (see PDB file format). All graph and models files are compressed and provided as a .zip file for ease of download.

Browser Compatibility ▼