ID-Align

goto IdAlign on the Web or install IdAlign on my computer See also here for links to installable executables for your windows or mac.

Expected file format

The format of the uploaded file is expected to be a tab separated file with the first line in the file containing column names and subsequent lines being either blank or containing exactly the same number of columns as the first line. The column names line must contain a header "Name" -which indicates the column of metabolites - and "FileName" (case-sensitive) - which indicates the name of the "file" from which the row's data was drawn. All other column names are arbitrary although two or more columns with the same name will lead to undefined results.

Except for the Name and FileName columns all other columns are scanned to see if they can be converted into numbers (either floating point or integers). Currently no attempt is to infer the meaning of column values from header names. Spaces are stripped and a number can be prefixed by '<' or '>' and can have a "units" value postfix as one of (m/z | scans | %). If this is not the case then entire value is taken to be a text string.

Computation

The file is read and a list of metabolites created. Each metabolite references a map (dictionary) of files (keyed on the filename supplied in the FileName column) which – in turn -- contains a hash map of named values. Again the names are supplied by the column header under which the value appeared. A normalizing metabolite is found (initially the first that matches ‘rutenol’ in the list).

The user can then supply a data name (‘selected Data’) to display in the table. Values that fall below a user defined minimum are highlighted. Missing values for each column are calculated as half the smallest value found in that file/column or 0.0).

XLS Output

The output is a table containing filenames as columns and metabolite values as rows. The output uses Excel's Formula support to scale entire columns by the input of a single value. Multiple worksheets are created. The first sheet presents the table of values selected by the user. Those values that are missing are replaced by a formula referencing a missing value cell that appears above each column. This value is initially equal to half the smallest value found for that column or zero if no values exist.

The next worksheet is a table of formulas viz: normalized!A3 = rawdata!A3/normalized!A1 where A1 is a cell containing the normalization value specified by the user at the web interface. The final worksheet permits whole table scaling.

Technology

The software is entirely written in python. It is hosted as a webapp in a tomcat (http://tomcat.apache.org) Server 6.0 using jython (www.jython.org) and the Apache upload (http://commons.apache.org/fileupload/)

Known Issues

If two different values exist for the same (metabolite- Name,FileName) pair then they are averaged.

The webapp stores the uploaded and parsed files server-side in the servlet’s session object. Currently up to ten files will be stored per session with the oldest files being “lost” and the entire session expiring after an idle time of 4 hours. The files are keyed on the filename sent by the browser during upload - a point of difference here since for example Firefox (http://www.mozilla.org/firefox) only sends the filename whereas IE7 (http://www.microsoft.com/windows/products/winfamily/ie/default.mspx) sends the entire path name. Computational parameters (Data Value to display) specified by the users are stored in the session and applied to each computation and each file.

There are still unresolved issues about file character encoding. These seem to mainly affect the presentation of metabolite names under Firefox and not the computations.

When the software fails - such as when an incorrect file type is uploaded - it fails ungracefully and possibly confusingly. This is a UI issue that the author will improve if time permits.

Output of the XLS file uses a python library written by the authors and based on the perl library SpreadSheet::WriteExcel (http://homepage.tinet.ie/~jmcnamara/perl/WriteExcel.html). It has the benefit of being usable with either CPython, Jython and also IronPython but it currently only outputs Excel97 formats. For maintainability reasons future implementations may move to the Apache POI library http://poi.apache.org.

Update: the webstart, Windows and Mac versions now use apache POI to generate the Excel spreadsheets

Undefined results will occur for if two or more columns have the same name. Currently the last in the row will "overwrite" the earlier columns.

It is assumed that each column has a homogenous format - that all values parse to a number or all parse to text.

News
According to Time magazine:

Centre contributes to the second most important scientific discovery of 2009!

Centre contributes to the second most important scientific discovery of 2009

From the original Time article by EBEN HARRELL

The Top 10 of Everything for 2009

2. The Human Epigenome, Decoded

DNA

The decoding of the human genome nearly a decade ago fueled expectations that an understanding of all human hereditary influences was within sight. But the connections between genes and, say, disease turned out to be far more complicated than imagined. What has since emerged is a new frontier in the study of genetic signaling known as epigenetics, which holds that the behavior of genes can be modified by environmental influences and that those changes can be passed down through generations. So people who smoke cigarettes in their youth, for example, sustain certain epigenetic changes, which may then increase the risk that their children's children will reach puberty early. In October, a team led by Joseph Ecker at the Salk Institute in La Jolla, Calif., studied human skin and stem cells to produce the first detailed map of the human epigenome. By comparing this with the epigenomes of diseased cells, scientists will be able to work out how glitches in the epigenome may lead to cancers and other diseases. The study, which was published in the journal Nature, is a giant leap in geneticists' quest to better understand the strange witches' brew of nature and nurture that makes us who we are.


Nature Paper

Breaking News!

October 2009: After working on the Arabidopsis methylome Ryan Lister, Julian Tonti-Filippini and Harvey Millar have co-authored a Nature Paper about the the human methylome!
Read More...

An overview for the general reader can be found at the Economist

Software