Contents of this Article

Sequence file formats
Design files
Bioinformatic file types

Sequence File Formats

To upload a new sequence into TeselaGen’s library, you can create it from scratch or upload several sequences in bulk. In this article, we’ll focus on uploading many sequences. To do this, go to the DNA Sequences library, click on “Upload” and then select “DNA Sequences”.

The formats supported are:

.zip: includes the formats listed below, compressed
.json: TelesaGen’s JSON format
.fasta, .fas, .fa, .fna, .ffn, .txt: Fasta format
.csv, .txt, .xlsx: CSV Files
.geneious: Geneious format
.ab1: Sequence Trace format
.dna: SnapGene DNA format
.gb, .gbk, .txt: Genbank format
.xml, .rdf: SBOL XML format

Notice that, by clicking on any of the formats shown on the pop-up window displayed, you can download example files. Besides the input data, you can also obtain output files by exporting your sequences. You can do it by clicking on the “Export” icon on the bottom part of the library.

Before exporting your file, a pop-up window will appear, where you can select the columns to export. Then, click on “submit” to obtain a CSV file.

Another way to export sequences is by selecting an individual one, right-clicking and selecting the option “Export Sequences”.

This will display a new window, where you can fill in the required information before obtaining the file on the available formats.

Another way to export files is on the Open Vector Editor, by clicking on the “File” option of the upper menu, selecting “Export”, and filling in the required information.

Design Files

Similar to the process of uploading a sequence, you can upload a design on the library.

The formats supported are:

.csv
.xlsx
.xml: DIVA Design file
.json
.zip

For design file formats, you can also download example files. In the case of TeselaGen JSON schemas, you can view a simple format:

To export a design, right-click on it and select the option “Export”, there you can choose between a JSON or CSV + Genbank format.

You can also go to the "File" menu and select "Export" on the Design Editor.

Bioinformatic File Types

Becoming familiar with bioinformatic file types might be useful for scientific work. Even when a sequence in plain format contains only IUPAC characters and this is a very valuable biological knowledge, this kind of information could be considered raw material. Usually, scientists need to store more information about a determined sequence (RNA, DNA or protein); information that allows them to relate the sequence to its function.

That is particularly relevant to the synthetic biology field, for that reason, there have been different types of files developed that are capable of containing more information about the sequence. For example, from storing more than one sequence in the same file to storing lines of annotations, indicators such as quality value, ID or LOCUS, length, etcetera. Here we’ll talk about the most common file types supported by our platform.

GenBank (.gb, .gbk)

The GenBank file format is commonly used because it allows for the storage of extra information in addition to the DNA/protein sequence. If you want to take a deeper look at its structure (data element or field) you can see an example on this article from NCBI.

Any GenBank file contains this information:

Locus (Locus name, Sequence length, Molecule type, GenBank division, Modification date)
Definition
Accession
Version (GI)
Keywords
Source (Organism)
Reference (Authors, Title, Journal, Pubmed)
Features (Source, Taxon, CDS, GI or Translation, Gene)
Origin
Sequence.

FASTA (.fasta)

FASTA is a text file format for representing raw biological sequences. A FASTA file contains one line (defline) for a name and an optional description which is distinguished from the sequence by a greater-than (">") symbol at the beginning. This line is followed by several lines that contain the letters from the sequence (IUPAC/IUB). NCBI for BLAST (Basic Local Alignment Search Tool) recommends all lines of text to be shorter than 80 characters in length.

ZIP (.zip)

As most are aware, the ZIP format is a type of archive file that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed. So, in this context, allowing you to download and upload bulks of your data.

Tabular information (.csv and .xlsx)

Both of them are formats that typically store tabular data. But some differences might be interesting for you. CSV stands for Comma-Separated-Values, it’s a plain text that stores your data but not the operations on it. CSV is a common data exchange format -to a certain extent makes the data “raw” again- widely used even when it is not fully standardized. On the other hand, XLSX is a binary file (created by Microsoft Excel) that holds the same data and also the operations on it. Exporting your data in this format will create a spreadsheet that is viewable and editable in Excel. This makes the data easy to re-group, combine, and re-format.

Anyway, CSV files can be opened or edited by Microsoft Excel and even by text editors (that is not possible for XLSX files). So, in our particular context, it’s more a user preference considering what format they are used to work with.

JSON (.json)

A JSON file is a text file written with JavaScript Object Notation, which means it is a type of syntax for storing data. This format allows the software to store computational Objects as text, which makes it very useful to store complex biological designs in a format that is compatible with almost any operating system.

DNA Sequence Association Tool

Creating a Strain Archive

Getting Started: TeselaGen Community Edition

Getting Started: Molecular Biology Toolkit

DNA Sequences

Files in TeselaGen