! ! This is a sample prior file for use with the module sampler. It is essentially ! the same as the one used in Decoding Human Regulatory Circuits ! ! The command line that we used with the module sampler is: ! /home/thompson/proj-bern/mpcluster/Gibbs.mpi.x86 10,10,8,10,10 ! -E 9 -n -m -D -C 0.01 -f 0 -S 50 -p 100 ! -P -B -o -X 2,5,1,75000 -K ! ! The -X option is used for simulated tempering. It only works with the MPI version ! of Gibbs. -K implements the sampling method for k, the number of sites/seq, described ! in the supplemental text. This parameter seems to improve the MAP solution in ! some cases. Leaving it off, causes Gibbs to do an different inference on k before sampling. ! ! Information on obtaining the modular sampler may be found at ! http://bayesweb.wadsworth.org/gibbs/gibbs.html ! If you desire an MPI version, please include that in your request and please specify ! the platform you will be using. ! ! If you have questions about this file or general Gibbs questions, please contact ! me at thompson@wadsworth.org ! ! Bill Thompson ! Motif model priors - these are uniform for each of the 5 models ! Since the default prior weight for Gibbs is 0.1, this prior model works ! out to be 1 pseudocount for each motif position ! The order of the columns is A T C G. >PSEUDO 1 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 > >PSEUDO 2 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 > >PSEUDO 3 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 > >PSEUDO 4 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 > >PSEUDO 5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 > ! The spacing model - 100 represents the inter-site distance from the end of ! a site to the start of the following site >FLAT 100 > ! The number of sequences followed by the expected number of sites for each motif type >Seq 48 36 24 26 14 12 > ! The probability of the number of sites. I.e. the probability of 0 sites, 1 site ...kmax sites ! This is normalized by the program so you can specify it as counts and the program will do the normalization. ! The number following >BLOCK is a weight, the default is 0.1 ! Note - changing the weight will affect the MAP. >BLOCK 1 0.1 8 4 12 16 6 0.1 0.1 2 0.1 > ! Prior transition, starting and ending probailities. These do not have to be normalized. ! The transition matrix should be (no. of motifs) x (no. of motifs). ! The number after >TRANS are update and weight. Update should be 1 if you want Gibbs ! to calculate the transition probabilites, weight defaults to 0.8. ! Trans[t][t1] is the prior probability of a site of type t following ! a site of type t1. Thus, Trans[2][1] is the probability that a site of motif type 2 ! will follow a site of motif type 1. ! In analyzing the human-mouse sequences, we didn't make any prior assumptions about ! order, hence the uniform priors. >TRANS 0.1 1 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 > ! Starting and ending probabilities. Begin[3] is the prior probability that the ! first site in the sequence is type 3. >BEGIN 0.2 0.2 0.2 0.2 0.2 > >END 0.2 0.2 0.2 0.2 0.2 > >COMMENT Sample human-mouse prior file uniform priors 5 models weight on prior prob of sites/seq = 1 max sites/seq = 9 > ! The following options affect sampling. They may be useful to speed convergence ! They are included for completeness. They are heuristics. ! remove the ! to use them. ! This parameter weights the alignment count for inference on the number of ! sites/sequence. It's not very useful because it's hard to estimate what the ! correct weights should be. However, it can be useful in certain circumstances when Gibbs ! seems to be overestimating the number of sites. !>ALIGN !1 1E+3 1E+3 1E+4 1E+4 1E+5 1E+5 1E+4 !> ! Gibbs has 2 modes of inference on the number of sites in a sequnces. The method ! described in the supplement is turned on when the MAP > 0. These parameters ! control that. Setting them low like in this example, may speed convergence at ! the risk of Gibbs getting stuck at a negative MAP !>KSAMPLEMAP !-1000000 !> !>MINSITEMAP !-100 0 !> ! A sample Poisson distribution on the number of sites/seq. ! A poisson distribution is a good estimate if you don't have any other ! information on the distribution of sites in the data. ! Poisson distribution - lambda = 3.5 !>BLOCK 1 !0.30197383 1.05690842 1.84958973 2.15785469 1.88812285 1.32168600 0.77098350 0.38549175 0.16865264 0.06558714 !>