"Fossies" - the Fresh Open Source Software Archive

Member "augustus-3.3.3/config/species/s_pneumoniae/s_pneumoniae_parameters.cfg.orig2" (22 May 2019, 7564 Bytes) of package /linux/misc/augustus-3.3.3.tar.gz:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file. See also the latest Fossies "Diffs" side-by-side code changes report for "s_pneumoniae_parameters.cfg.orig2": 3.3.2_vs_3.3.3.

    1 #
    2 # s_pneumoniae parameters. 
    3 # 
    4 # date : 19.12.2006
    5 #
    6 
    7 #
    8 # Properties for augustus
    9 #------------------------------------
   10 genemodel     bacterium   # no introns, overlapping genes, etc
   11 translation_table 11 # prokaryotic genetic code
   12 /augustus/verbosity 3     # 0-3, 0: only print the necessary
   13 maxDNAPieceSize    2000000 # maximum segment that is predicted in one piece
   14 stopCodonExcludedFromCDS false # make this 'true' if the CDS includes the stop codon (training and prediction)
   15 
   16 # gff output options:
   17 protein             on    # output predicted protein sequence
   18 codingseq           off   # output the coding sequence
   19 cds                 on    # output 'cds' as feature for exons
   20 start               on    # output start codons (translation start)
   21 stop                on    # output stop codons  (translation stop)
   22 introns             on    # output introns
   23 tss                 on   # output transcription start site
   24 tts                 on   # output transcription termination site
   25 print_utr           off   # output 5'UTR and 3'UTR lines in addition to exon lines
   26 
   27 checkExAcc          off   # internal parameter for extrinsic accuracy
   28 
   29 # alternative transcripts and posterior probabilities
   30 sample                      100   # the number of sampling iterations
   31 alternatives-from-sampling  false # output alternative suboptimal transcripts 
   32 alternatives-from-evidence  false # output alternative transcripts based on explicit evidence from hints
   33 minexonintronprob           0.08  # minimal posterior probability of all (coding) exons
   34 minmeanexonintronprob       0.4   # minimal geometric mean of the posterior probs of introns and exons
   35 maxtracks                   -1    # maximum number of reported transcripts per gene (-1: no limit)
   36 keep_viterbi                true  # set to true if all Viterbi transcripts should be reported
   37 uniqueCDS                   true  # don't report transcripts that differ only in the UTR
   38 UTR                         off   # predict untranslated regions
   39 
   40 #
   41 # 
   42 # The rest of the file contains mainly meta parameters used for training.
   43 #
   44 
   45 # global constants
   46 # ----------------------------
   47 
   48 /Constant/trans_init_window           24
   49 /Constant/ass_upwindow_size           30
   50 /Constant/ass_start                   3
   51 /Constant/ass_end                     2
   52 /Constant/dss_start                   3
   53 /Constant/dss_end                     3
   54 /Constant/init_coding_len	      18
   55 /Constant/intterm_coding_len	      5
   56 /Constant/tss_upwindow_size           45
   57 /Constant/decomp_num_at               1
   58 /Constant/decomp_num_gc               1
   59 /Constant/gc_range_min		      0.25   # This range has an effect only when decomp_num_steps>1. 
   60 /Constant/gc_range_max                0.75   # States the minimal and maximal percentage of c or g
   61 /Constant/decomp_num_steps            3      # I recommend keeping this to 1 for most species.
   62 /Constant/min_coding_len              61    # no gene with a coding sequence shorter than this is predicted
   63 /Constant/probNinCoding               0.23   # divide this by .25 to get a malus for making one masked letter part of the coding sequence
   64 /Constant/amberprob                   0.067   # Prob(stop codon = tag), if 0 tag is assumed to code for amino acid
   65 /Constant/ochreprob                   0.678   # Prob(stop codon = taa), if 0 taa is assumed to code for amino acid
   66 /Constant/opalprob                    0.255   # Prob(stop codon = tga), if 0 tga is assumed to code for amino acid
   67 /Constant/subopt_transcript_threshold 0.7
   68 /Constant/almost_identical_maxdiff    10
   69 
   70 # type of weighing, one of  1 = equalWeights, 2 = gcContentClasses, 3 = multiNormalKernel
   71 /BaseCount/weighingType    3
   72 # file with the weight matrix (only for multiNormalKernel type weighing)
   73 /BaseCount/weightMatrixFile   s_pneumoniae_weightmatrix.txt # change this to your species if at all necessary
   74 
   75 # Properties for IGenicModel
   76 # ----------------------------
   77 /IGenicModel/verbosity      0
   78 /IGenicModel/infile         s_pneumoniae_igenic_probs.pbl   # change this and the other five filenames *_probs.pbl below to your species
   79 /IGenicModel/outfile        s_pneumoniae_igenic_probs.pbl
   80 /IGenicModel/patpseudocount 5.0
   81 /IGenicModel/k              4        # order of the Markov chain for content model, keep equal to /ExonModel/k
   82 
   83 # Properties for ExonModel
   84 # ----------------------------
   85 /ExonModel/verbosity          3
   86 /ExonModel/infile             s_pneumoniae_exon_probs.pbl
   87 /ExonModel/outfile            s_pneumoniae_exon_probs.pbl
   88 /ExonModel/patpseudocount     0.5
   89 /ExonModel/minPatSum          475
   90 /ExonModel/k                  4       # order of the Markov chain for content model
   91 /ExonModel/etorder	      2
   92 /ExonModel/etpseudocount      3
   93 /ExonModel/exonlengthD        2000    # beyond this the distribution is geometric
   94 /ExonModel/maxexonlength      35000
   95 /ExonModel/slope_of_bandwidth 0.1875
   96 /ExonModel/minwindowcount     1
   97 /ExonModel/tis_motif_memory   3
   98 /ExonModel/tis_motif_radius   0
   99 /ExonModel/lenboostL          150     # (single and initial) exons above this length are rewarded
  100 /ExonModel/lenboostE          0.05   # by a factor of 1+lenboostE for each base that they are longer
  101 
  102 # Properties for IntronModel
  103 # ----------------------------
  104 /IntronModel/verbosity          0
  105 /IntronModel/infile             s_pneumoniae_intron_probs.pbl
  106 /IntronModel/outfile            s_pneumoniae_intron_probs.pbl
  107 /IntronModel/patpseudocount     5.0
  108 /IntronModel/k                  4     # order of the Markov chain for content model, keep equal to /ExonModel/k
  109 /IntronModel/slope_of_bandwidth 0.4
  110 /IntronModel/minwindowcount     4
  111 /IntronModel/asspseudocount     0.00266
  112 /IntronModel/dsspseudocount     0.0005
  113 /IntronModel/dssneighborfactor  0.00173
  114 #/IntronModel/splicefile         s_pneumoniae_splicefile.txt # this optional file contains additional windows around splice sites for training, uncomment if you have one
  115 /IntronModel/sf_with_motif	false           # if true the splice file is also used to train the branch point region
  116 /IntronModel/d                  100  # constraint: this must be larger than 4 + /Constant/dss_end + /Constant/ass_upwindow_size + /Constant/ass_start
  117 /IntronModel/ass_motif_memory   3
  118 /IntronModel/ass_motif_radius   3
  119 
  120 # Properties for UtrModel
  121 # ----------------------------
  122 /UtrModel/verbosity             3
  123 /UtrModel/infile                s_pneumoniae_utr_probs.pbl
  124 /UtrModel/outfile               s_pneumoniae_utr_probs.pbl
  125 /UtrModel/k                     4
  126 /UtrModel/utr5patternweight     0.5
  127 /UtrModel/utr3patternweight     0.5
  128 /UtrModel/patpseudocount        1
  129 /UtrModel/tssup_k               0
  130 /UtrModel/tssup_patpseudocount  1
  131 /UtrModel/slope_of_bandwidth    0.2375
  132 /UtrModel/minwindowcount        3
  133 /UtrModel/exonlengthD           800
  134 /UtrModel/maxexonlength         1800
  135 /UtrModel/max3singlelength      2000
  136 /UtrModel/max3termlength        1500
  137 /UtrModel/tss_start             8
  138 /UtrModel/tss_end               5
  139 /UtrModel/tata_start            2
  140 /UtrModel/tata_end              10
  141 /UtrModel/tata_pseudocount      2
  142 /UtrModel/d_tss_tata_min        26      # minimal distance between start of tata box (if existent) and tss 
  143 /UtrModel/d_tss_tata_max        37      # maximal distance between start of tata box (if existent) and tss
  144 /UtrModel/polyasig_consensus    aataaa  # polyadenylation signal training not fully automated yet
  145 /UtrModel/d_polyasig_cleavage   14      # the transcription end is predicted this many bases after the polyadenylation signal
  146 /UtrModel/d_polya_cleavage_min  7
  147 /UtrModel/d_polya_cleavage_max  19
  148 /UtrModel/prob_polya            0.4
  149 /UtrModel/tts_motif_memory      1