DNA COMPUTER CODE BASED ON EXPANDED GENETIC ALPHABET (Published)
Due to the capacity of DNA to store large amounts of information in a very small physical volume, it has become an attractive molecule for storing information in future molecular computers. This information resides in the order of nucleotides (nt), and several proposals to correlate the 256 ASCII computer symbols with the same number of different groupings of nt have been described. Although a DNA molecule of any size can be synthesised, its use, however, has several limitations, the most important of which are related to stability and biosecurity. To circumvent these limitations, to increase the capacity to store information, and then diminish the probability of errors, I have considered the use of a DNA molecule made with the two standard nt-pairs and two non-standard (ns) synthetic nt-pairs. The use of this ns-DNA would generate 512 permutations for a triplet code and would, therefore, permit the encoding of the 256 computer symbols, together with a start and an end signal, coded by three nt. Printable symbols can be distributed into four groups (upper case, lower case letters, numbers and mathematical symbols). A common first nt was assigned to all triplets from the same group. The excess of 256 triplets was used to add a high redundancy in the third nt according with the frequency of use in English writings. The main advantages of this DNA encoding relative to the previously published are a lower size, a lower error probability, inability to contaminate any living cell, and an explicit non-biological origin.
Keywords: DNA computing; expanded genetic alphabet; molecular computing