"Fossies" - the Fresh Open Source Software Archive

Member "augustus-3.3.3/config/species/maize5/maize5_parameters.cfg" (22 May 2019, 7190 Bytes) of package /linux/misc/augustus-3.3.3.tar.gz:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file. See also the latest Fossies "Diffs" side-by-side code changes report for "maize5_parameters.cfg": 3.3.2_vs_3.3.3.

    1 #
    2 # maize5 parameters. 
    3 # 
    4 # date : 19.12.2006
    5 #
    6 
    7 #
    8 # Properties for augustus
    9 #------------------------------------
   10 /augustus/verbosity 3     # 0-3, 0: only print the necessary
   11 maxDNAPieceSize    200000 # maximum segment that is predicted in one piece
   12 stopCodonExcludedFromCDS false # make this 'true' if the CDS includes the stop codon (training and prediction)
   13 
   14 # gff output options:
   15 protein             on    # output predicted protein sequence
   16 codingseq           off   # output the coding sequence
   17 cds                 on    # output 'cds' as feature for exons
   18 start               on    # output start codons (translation start)
   19 stop                on    # output stop codons  (translation stop)
   20 introns             on    # output introns
   21 tss                 on   # output transcription start site
   22 tts                 on   # output transcription termination site
   23 print_utr           off   # output 5'UTR and 3'UTR lines in addition to exon lines
   24 
   25 checkExAcc          off   # internal parameter for extrinsic accuracy
   26 
   27 # alternative transcripts and posterior probabilities
   28 sample                      100   # the number of sampling iterations
   29 alternatives-from-sampling  false # output alternative suboptimal transcripts 
   30 alternatives-from-evidence  false # output alternative transcripts based on explicit evidence from hints
   31 minexonintronprob           0.08  # minimal posterior probability of all (coding) exons
   32 minmeanexonintronprob       0.4   # minimal geometric mean of the posterior probs of introns and exons
   33 maxtracks                   -1    # maximum number of reported transcripts per gene (-1: no limit)
   34 keep_viterbi                true  # set to true if all Viterbi transcripts should be reported
   35 uniqueCDS                   true  # don't report transcripts that differ only in the UTR
   36 UTR                         off   # predict untranslated regions
   37 
   38 #
   39 # 
   40 # The rest of the file contains mainly meta parameters used for training.
   41 #
   42 
   43 # global constants
   44 # ----------------------------
   45 
   46 /Constant/trans_init_window           25
   47 /Constant/ass_upwindow_size           50
   48 /Constant/ass_start                   3
   49 /Constant/ass_end                     2
   50 /Constant/dss_start                   2
   51 /Constant/dss_end                     3
   52 /Constant/init_coding_len	      17
   53 /Constant/intterm_coding_len	      2
   54 /Constant/tss_upwindow_size           45
   55 /Constant/decomp_num_at               1
   56 /Constant/decomp_num_gc               1
   57 /Constant/gc_range_min		      0.32   # This range has an effect only when decomp_num_steps>1. 
   58 /Constant/gc_range_max                0.73   # States the minimal and maximal percentage of c or g
   59 /Constant/decomp_num_steps            3      # I recommend keeping this to 1 for most species.
   60 /Constant/min_coding_len              201    # no gene with a coding sequence shorter than this is predicted
   61 /Constant/probNinCoding               0.23   # divide this by .25 to get a malus for making one masked letter part of the coding sequence
   62 /Constant/amberprob                   0.304   # Prob(stop codon = tag), if 0 tag is assumed to code for amino acid
   63 /Constant/ochreprob                   0.202   # Prob(stop codon = taa), if 0 taa is assumed to code for amino acid
   64 /Constant/opalprob                    0.494   # Prob(stop codon = tga), if 0 tga is assumed to code for amino acid
   65 /Constant/subopt_transcript_threshold 0.7
   66 /Constant/almost_identical_maxdiff    10
   67 
   68 # type of weighing, one of  1 = equalWeights, 2 = gcContentClasses, 3 = multiNormalKernel
   69 /BaseCount/weighingType    3
   70 # file with the weight matrix (only for multiNormalKernel type weighing)
   71 /BaseCount/weightMatrixFile   maize5_weightmatrix.txt # change this to your species if at all necessary
   72 
   73 # Properties for IGenicModel
   74 # ----------------------------
   75 /IGenicModel/verbosity      0
   76 /IGenicModel/infile         maize5_igenic_probs.pbl   # change this and the other five filenames *_probs.pbl below to your species
   77 /IGenicModel/outfile        maize5_igenic_probs.pbl
   78 /IGenicModel/patpseudocount 5.0
   79 /IGenicModel/k              4        # order of the Markov chain for content model, keep equal to /ExonModel/k
   80 
   81 # Properties for ExonModel
   82 # ----------------------------
   83 /ExonModel/verbosity          3
   84 /ExonModel/infile             maize5_exon_probs.pbl
   85 /ExonModel/outfile            maize5_exon_probs.pbl
   86 /ExonModel/patpseudocount     3.3125
   87 /ExonModel/minPatSum          475
   88 /ExonModel/k                  4       # order of the Markov chain for content model
   89 /ExonModel/etorder	      2
   90 /ExonModel/etpseudocount      3
   91 /ExonModel/exonlengthD        2000    # beyond this the distribution is geometric
   92 /ExonModel/maxexonlength      15000
   93 /ExonModel/slope_of_bandwidth 0.1875
   94 /ExonModel/minwindowcount     3
   95 /ExonModel/tis_motif_memory   2
   96 /ExonModel/tis_motif_radius   3
   97  
   98 # Properties for IntronModel
   99 # ----------------------------
  100 /IntronModel/verbosity          0
  101 /IntronModel/infile             maize5_intron_probs.pbl
  102 /IntronModel/outfile            maize5_intron_probs.pbl
  103 /IntronModel/patpseudocount     5.0
  104 /IntronModel/k                  4     # order of the Markov chain for content model, keep equal to /ExonModel/k
  105 /IntronModel/slope_of_bandwidth 0.1875
  106 /IntronModel/minwindowcount     1
  107 /IntronModel/asspseudocount     0.0005
  108 /IntronModel/dsspseudocount     0.0002
  109 /IntronModel/dssneighborfactor  0.00505
  110 #/IntronModel/splicefile         maize5_splicefile.txt # this optional file contains additional windows around splice sites for training, uncomment if you have one
  111 /IntronModel/sf_with_motif	false           # if true the splice file is also used to train the branch point region
  112 /IntronModel/d                  843  # constraint: this must be larger than 4 + /Constant/dss_end + /Constant/ass_upwindow_size + /Constant/ass_start
  113 /IntronModel/ass_motif_memory   3
  114 /IntronModel/ass_motif_radius   3
  115 
  116 # Properties for UtrModel
  117 # ----------------------------
  118 /UtrModel/verbosity             3
  119 /UtrModel/infile                maize5_utr_probs.pbl
  120 /UtrModel/outfile               maize5_utr_probs.pbl
  121 /UtrModel/k                     4
  122 /UtrModel/utr5patternweight     0.1
  123 /UtrModel/utr3patternweight     0.5
  124 /UtrModel/patpseudocount        1
  125 /UtrModel/tssup_k               2
  126 /UtrModel/tssup_patpseudocount  1
  127 /UtrModel/slope_of_bandwidth    0.25
  128 /UtrModel/minwindowcount        1
  129 /UtrModel/exonlengthD           800
  130 /UtrModel/maxexonlength         1800
  131 /UtrModel/max3singlelength      2000
  132 /UtrModel/max3termlength        1500
  133 /UtrModel/tss_start             9
  134 /UtrModel/tss_end               4
  135 /UtrModel/tata_start            2
  136 /UtrModel/tata_end              10
  137 /UtrModel/tata_pseudocount      2
  138 /UtrModel/d_tss_tata_min        26      # minimal distance between start of tata box (if existent) and tss 
  139 /UtrModel/d_tss_tata_max        37      # maximal distance between start of tata box (if existent) and tss
  140 /UtrModel/polyasig_consensus    aataaa  # polyadenylation signal training not fully automated yet
  141 /UtrModel/d_polyasig_cleavage   14      # the transcription end is predicted this many bases after the polyadenylation signal
  142 /UtrModel/d_polya_cleavage_min  9
  143 /UtrModel/d_polya_cleavage_max  19
  144 /UtrModel/prob_polya            0
  145 /UtrModel/tts_motif_memory      1