| Mission Software Resources CompMed EASDcourse Contact | ||
|
Course topics
|
Exercise 3: Textbook caseIt is highly recommended that you go through the first two exercises before starting this one.The self-organizing map can be used for clustered datasets, where the data points can be divided into clear non-overlapping categories. This is the typical textbook case of classification but, unfortunately, such situations are rare in the study of diabetic complications. Nevertheless, it is important to recognize if the dataset has a clearly defined intrinsic structure. The goal of this exercise is to demonstrate how the Melikerion software can be used in the description of the data space and how to identify possible errors and peculiarities in the dataset. Task 1: View the materialSimulated data is useful for technical demonstrations, since the author knows the "true" phenomenon beforehand and can manipulate the dataset to create specific effects for data analysis. Here, a dataset with clearly defined clusters of samples is created. A few erroneous samples have also been added to make the exercise more instructive.
Download data Task 2: Create self-organizing mapSubmit the configuration and data files to the online system and follow the links until the job is finished. When the results are ready, you should see a collection of map colorings and other images.Find the image entitled 'qerrors'. You should see a histogram and a few highlighted values on the right. The histogram depicts the distribution of the difference between a profile "predicted" by the SOM, and the actual observed profile for a sample. Put differently, for every sample there is a numeric value that tells how accurately can the SOM describe it. Evidently, a poor description means that there is something peculiar about a sample, which may indicate some type of mistake in data collection, for instance. After you have finished looking at the colorings and the histograms, please download the entire result archive onto your desktop (link on the right) for a more detailed inspection. Task 3: Find errorsDownload the ZIP-archive onto your desktop, rename it to 'clusters_results1' and view its contents. There are, in fact, several files named 'qerrors'. Choose the one compatible with Excel (screenshot).You should now see the so called quantification errors for each sample as a spreadsheet. Order the data according to the QERROR column to highlight those samples with the largest error (focus on the worst five). You can now go back to the original data file 'clusters.xls' to see if these particular samples have strange measurement values, for instance. Once you have located the suspicious data rows, you can remove them. If the values seem to be valid, it is possible that the profile is atypical. A real-world equivalent could be a patient with a rare hereditary form of diabetes (MODY) who has an overall uncharacteristic metabolic profile, but looking at blood glucose alone may be classified as type 1 or type 2 diabetes. Task 4: Re-submit dataSave the updated spreadsheet and submit it to the Melikerion tool. Compare the results with the first analysis.Questions
|
GWAS exercisesDownload material
Statistics exercises1) Networking without Facebook |
| Updated 2009-11-27 by vpmakine. | ||