---------------------------------------------------------------------------- SUMMARY TABLES These tables 'summarize' all of the expression and genomic data for all of the expressed signatures found in our libraries. For signatures that match uniquely to the genome, this table includes the gene id and genomic information. For signatures duplicated in the genome, we have not stored all of the possible genes, locations, classes, etc., in this table. To determine the normalized abundance value for each signature in the library, we merged 2-step runs and separately merged 4-step runs to create a raw abundance count for each stepper. The raw abundance count was calculated as the average from all 2-step or from all 4-step runs for a given signature. The average total number of signatures sequenced in all 2-step or for all 4-step runs was also calculated within the library. The final stage to merge the data within the steppers was to calculate the normalized abundance for each stepper. The normalized abundance is the raw abundance count divided by the average total number of signatures for both of the steppers in the library, multiplied by 10^6 to obtain a ?transcripts per million? value. ---------------------------------------------------------------------------- COLUMN HEADINGS The summary table includes the following columns: Signature sequence - 17 or 20 bases, depending on the file. Stepper chosen - indicates for each signature whether the 2-step or 4-step abundances were higher across all libraries. The higher stepper is chosen and that normalize data is retained. Reliability (true/false) - False=present in only one sequencing run. We believe it is best to remove the unreliable signatures because these likely result from sequencing errors. Significance (S/null) - null means that the maximum expression level never exceeds 3 TPM in any library, S indicates the normalized abundance of the signatures was >=4 TPM in at least one library. Non-significant signatures may be either weakly expressed transcripts, or noise. Library 1 normalized data - see key to libraries on our web page. This value is normalized abundance, in transcripts per million (TPM). The value was derived by a series of steps to merge the sequencing runs, described in more detail in Meyers et al., (2004, Genome Research). If "hits"=1 then the positional information is present. Hits is the number of occurrences of the signatures in the Arabidopsis genome (excluding matches to plastid sequences). Chromosome_id is the chromosome number. Position: coordinates if there is a single match in the genome for the signature (hits=1) Strand is indicated as "W" for the top strand, or "C" for the bottom strand. Gene is the identifier for the Arabidopsis gene with which the signatures is associated, if it has hits=1, and lies between the start coordinates and 500 bp 3' of the stop codon. Class is described in more detail on our web page, or our papers. The class indicates the location of the match relative to annotated genes, introns, exons and 3' UTRs (again, this field is only filled in when hits=1). If no genomic match was found for a signature that occurred in the expression data, this is identified as a "class 0" signature. ---------------------------------------------------------------------------- Questions regarding these data should be directed to Blake Meyers (meyers@dbi.udel.edu).