Arbib, C.Pınar, Mustafa Ç.Rossi, F.Tessitore, A.2021-02-202021-02-202020-020305-0548http://hdl.handle.net/11693/75522The problem of choosing an optimal codon sequence arises when synthetic protein-coding genes are added to cloning vectors for expression within a non-native host organism: to maximize yield, the chosen codons should have a high frequency in the host genome, but particular nucleotide bases sequences (called “motifs”) should be avoided or, instead, included. Dynamic programming (DP) has successfully been used in previous approaches to this problem. However, DP has a computational limit, especially when long motifs are forbidden, and does not allow control of motif positioning and combination. We reformulate the problem as an integer linear program (IP) and show that, with the same computational resources, one can easily solve problems with much more nucleotide bases and much longer forbidden/desired motifs than with DP. Moreover, IP (i) offers more flexibility than DP to treat constraints/objectives of different nature, and (ii) can efficiently deal with newly discovered critical motifs by dynamically re-optimizing additional variables and mathematical constraints.EnglishProtein designCodon optimizationMotif engineeringInteger linear programmingCodon optimization by 0-1 linear programmingArticle10.1016/j.cor.2020.104932