ANCAC: Amino acid, nucleotide, and codon analysis of COGs

Purpose of the Tool

ANCAC calculates frequencies of amino-acids, nucleotides or codons in sequences of orthologs.
By the use of complete sets of orthologs from multiple organisms it connects overall sequence composition to biological function.
You can address general questions for instance about the abundance of positively-charged amino-acids or GC-content by selecting specific sets of amino-acids, nucleotides or codons.
Normalization against the database allows for the identification of orthologs with a frequency-bias that is sequences whose composition differs from the average in the genome/proteome.

By including particular organisms or taxa you can add taxonomic resolution to your queries.
Using a text-input mode you can calculate frequencies for user-defined sets of COGs and sets of amino acids, nucleotides or codons.
By combining COGs according to your criteria you can analyse composition or sequence bias in nearly any biological context like biological pathways, cellular localisation or catalytic activity.
Currently the sever holds amino-acid and nucleotide-sequences of all orthologs in the COG and archaeal COG databases.

For more information or if you use our tool in your research please have a look at the related publication:

ANCAC: amino acid, nucleotide, and codon analysis of COGs - a tool for sequence bias analysis in microbial orthologs
BMC Bioinformatics 2012, 13:223


In a basic query the server will calculate frequencies in orthologs from a set of organisms.
For each cluster of orthologs it will calculate the frequency in the corresponding sequences of the selected organisms.
Please go through the following steps:

  • In the tab "Database/Taxonomy" choose a database

  • Depending on the database you now see a selection of organisms:
    On the left side you can choose organisms by taxonomic level. The tree contains all taxonomic terms for which there is at least one organism in the database.
    Below you can choose organisms by a couple of traits from the dropdown-menu or by the occurrence of a specific ortholog in an organism. Finally you can select single organisms on the right hand side where also your current selection is displayed. Selections made in the tree override previous ones, so start here if you want to use this funtionality. Selections in the other parts of the tab will be added.

  • In the tab "Sequence features" choose amino acid, nucleotide or codon depending on what you are interested in. Below you can select one or several features of that kind

  • Finally in the "Options"-tab you can choose between frequencies or normalized frequencies. Here frequency means the percentage of the selected features compared to the whole sequence length. If you choose normalization theses frequencies will be divided by the feature frequency in all sequences of the database, either over all organisms or your organism-selection only

  • In order to submit your query and see the results go the "Results"-tab.


In the "Results"-tab you see a plot showing the distribution of frequency-scores
Below you find a table showing the frequency scores for each COG in your organism-selection. You can modify the number of entries shown on each page and navigate through the pages. Entering text in the search-field in the upper right corner you can filter the entries displayed by COG-number, score range (e.g. "0.5") or any keyword in the description (e.g. "ribosomal")
If you click on an entry a second table displays the sequences from which the frequency-score of this ortholog was calculated and their respective organism.

Taxonomic subgrouping

In order to investigate which taxonomic branches contribute to a certain bias you might observe you can use taxonomic subgrouping from the "Options"-tab.
Choosing a rank here will subdivide your organism-selection according to that rank and calculate one score for each of those groups separately.

Batch processing

If you need to calculate scores for sets of COGs, e.g. that might be involved in a particular pathway or biological context, you can use batch-processing.
Here you can calculate scores for every combination of sequence-features and COGs. Normalization and taxonomic subgrouping are also available in this mode. Please have a look to the examples in the "Options"-tab for the correct syntax.


ANCAC is open source. Below you can download the server-side scripts for local setup or code-reuse.
Also you can download mysql-dumps of the underlying database giving you full flexibility to query or extend a locally installed ANCAC mysql-database.
Separate downloads are available for databases containing the datasets from COG, arCOG or a non-redundant version of both COG and arCOG respectively.

Select database
Select organisms by taxonomy
Select organisms by predefined properties
Select organisms with orthologs in

Current species selection

Select sequence features

Select normalization option

Subgroup the organism-selection by taxonomic rank

Batch processing by groups of COGs and sequence-features
Perform batch processing:

Here you can calculate scores for groups of COGs and sequence-features.
Multiple query rows can be sent at once.
Format your input according to the example below:


The general syntax is:
list1 > list2 ;
where list1 is a comma-separated list of amino acids, nucleotides or codons
and list 2 is a comma-separated list of valid COG numbers.

Choose the sequence feature type of your input, a database and the
organisms of interest in the other tabs.
However, a selection of particular amino acids, nucleotides or codons
in the third tab is not necessary and would be ignored.
The server will calculate one score for each dataset, or one for each
dataset and rank if taxonomic subgrouping is selected.