Klinik und Poliklinik für Endokrinologie und Nephrologie - Sektion Nephrologie
 Universitätsmedizin Leipzig

easyLINKAGE - A graphical user interface for automated linkage analyses

easyLINKAGE is a joint project of the Institute of Human Genetics, Charité Virchow Campus Berlin and the Division of Nephrology, Department of Medicine, Neurology, and Dermatology, University Clinic Medical Leipzig. We extended the original easyLINKAGE program by enabling linkage analyses for large-scale SNP data in addition to those of microsatellites. We implemented new modules for Allegro, Merlin, SimWalk, GeneHunter Imprinting, GeneHunter TwoLocus, SuperLink and extended FastSLink by automatic loop breaking and new outputs. We added conditional linkage analyses as well as multipoint simulation studies, and extended error test routines by checking for Mendelian/non-Mendelian genotyping errors and for deviations from Hardy–Weinberg equilibrium. Data can be analyzed in sets of markers, in defined centimorgan intervals and by using different allele frequency algorithms. The outputs consist of genome-wide as well as chromosomal postscript plots of LOD scores, NPL scores, P-values and other parameters.

Download software and marker maps

Contact by email: Katrin Hoffmann / Tom H. Lindner (in case you just want to know the email adress: click with the right mouse key on the link and copy the email address)

General notes
Supported programs
Installation guide
Suggestions for large-scale SNP data
Acknowledgements
Maps containing marker positions
License agreement

Please cite the program for STRP analyses with

Lindner TH, Hoffmann K: easyLINKAGE: A PERL script for easy and automated two-/multi-point linkage analyses. Bioinformatics 21: 405-407, 2005 [PDF]

and/or for SNP analyses with

Hoffmann K, Lindner TH: easyLINKAGE Plus – Automated linkage analyses using large-scale SNP data. Bioinformatics 21: 3565-3567, 2005 [PDF]

General notes

Back to top

easyLINKAGE was designed to make the use of linkage programs user-friendly and to enable those analyses on Microsoft Windows based operating systems. The idea came up when the programmers had to run several linkage programs such as those that were implemented in easyLINKAGE and tried to get their own linkage projects done. Very quickly it became obvious that all programs were not easy to handle at all. Most programs used different input formats. Appropriate files had to be generated in a time-consuming process in addition. No program provided a marker database from which the genetic positions could be drawn and used for automatic linkage analyses. Only a few programs were recompiled for running on Windows systems which is not really understandable since most marker genotypes were generated on such systems. Further, graphical outputs were almost completely missing.

easyLINKAGE overcomes all those pitfalls. Since the introduction of the program as version 2.01 major changes have been applied. In general, the user can analyze individual chromosomes or even entire genomes. The user can furthermore select between at least 5 different allele frequency algorithms that have to be used with care of course. For SNP projects reference allele frequencies are provided for Asian, African American and Caucasian populations (kindly provided by Affymetrix Inc.). Besides SNP projects, the user can run his analyses under the use of sex-averaged, female, or male marker map positions. Allele frequency algorithms were programmed universally for all subsequent applications. However, every algorithm has its limitations. The user should have an idea of which algorithm will deliver the best results. Details can be found in the user manual.

Some programs have trouble with certain family or subject identifiers. Therefore, easyLINKAGE offers the opportunity to recode all identifiers to integers. Of course, after running the program the recoded identifiers will be decoded back for all generated outputs.

Some programs do also have problems with non-continuous alleles (2, 10, 11 etc.). Recoding in continuous alleles can also be activated (1, 2, 3 etc.) and is strongly recommended. Programs tended to behave strangely if this option was not activated.

In extension of the original version the program runs PedCheck for the identification of Mendelian errors prior running subsequent linkage programs. This option can be deactivated. However, the process is usually not very time-consuming.

For microsatellite projects the user has to provide 2 types of input files: A pedigree information file containing the structure of one or more pedigrees in general linkage format (column 1: Family ID, 2: Subject ID, 3: Father ID (0 denotes a founder), 4: Mother ID (0 denotes a founder), 5: Sex (1 =  male, 2 = female), 6: Affection status (= unknown, 1 = unaffected, 2 = affected)), and marker files containing the genotypes of the individual. Subject IDs have to be unique throughout the pedigree information file in order to match them correctly to the IDs in the marker files. The pedigree information file must contain an additional column 7 for DNA availability if the user wants to run single-point simulation studies (FastSLink). Marker genotypes have to be provided in individual marker files containing the marker name, the subject ID, and the genotypes in integer format.

For SNP projects only two files are necessary: one file that contains all genotypes, one file with the pedigree information in linkage format.

A datafile that contains the inheritance model is not needed anymore. easyLINKAGE solves this issue internally in interaction with the user. All options can be “clicked”, edited, and set by using the mouse in specific option menus.

After starting the program all set options from the most recent analysis are reactivated automatically. The program “remembers” up to 20 directories in which analyses were performed earlier. Entry number 21 deletes the oldest entry. Each directory that is selected within the MAINSCREEN will be screened for pedigree information files immediately. Only files that start with “p” and end with “pro” will be recognized as pedigree information files. Whitespaces (spaces, commas, dots, tabs) are not allowed within the pedigree information file name. “_” can be used.

A major benefit of easyLINKAGE is the generation of structured text outputs and graphical plots of LOD scores, P values, and many other parameters. Plots will be provided as chromosomal or genome-wide plots. All plots display details of the used inheritance model, marker map, sexspecific or sex-averaged marker positions, the number of known and unknown markers, in SNP projects even the number of uninformative SNPs, a table with the top five markers, the used pedigree file, date and time and elapsed time, directory, allele frequency algorithm, and other parameters. Plots can be generated as “TOTAL” plots averaging all families or plus individual family plots.

The implementation of SNP projects derived from the Affymetrix 10k/50k/100k/250k/500k and Illumina 5k/25k/100k/240k/300k/550k/650k chips is another major step ahead. Those projects can be analyzed with Allegro (single-/multipoint analyses), GeneHunter/-Plus, Merlin, FastLink, SimWalk and SuperLink. Some tricks have to be applied to enable the correct run of Allegro. Since all multipoint linkage programs assume linkage equilibrium between neighboring markers this preference can be easily broken by SNP data. Many SNPs are very close to each other in terms of their genetic positions. If such a case occurs the program does automatically set the distance to a recombination fraction of 0.001. The user has also the choice to calculated LODs without using fully uninformative markers, i.e. markers with homozygous and identical genotypes in all tested individuals. However, removing the uninformative markers can lead to a substantial information loss and is therefore not recommended. Allegro should not be limited in the number of markers; however the pedigrees have to be of a moderate size.

With regards to large scale SNP data and multipoint analyses an important issue has to be mentioned here: Although the authors of Allegro claim that Allegro is not limited in the number of markers Allegro can run into severe problems in assigning correct haplotypes. The user would not notice. It is possible to analyze 500 markers on a chromosome in a piece. We experienced a number of situations where the LOD score dropped significantly when using many markers, sometimes from 3 to 0. The reasons for that are not yet clear to us. Most likely it is not due to the marker number but to the used allele frequencies. Haplotype assignment problems can occur when a rare allele is made to a common or a common to a rare allele. Even population reference allele frequencies must be handled with care. However, it cannot be excluded beyond reasonable doubt that the marker number itself or limitations of the Lander-Green algorithm might play a role. To circumvent the problem we implemented the possibility to analyze sets of markers and/or predefined chromosomal centimorgan intervals.

Supported programs

Back to top

No. Program Version Supported analyses
1 FastLink 4.1 Parametric, single-point
2 SuperLink 1.6 Parametric, single-/multi-point
3 SPLink 1.09 Nonparametric single-point
4 GeneHunter 2.1r5 Nonpara-/parametric, single-/multipoint
5 Genehunter Plus 1.2 Nonpara-/parametric, single-/multipoint
6 Genehunter MOD 2.0.1 Nonpara-/parametric, single-/multipoint
7 GeneHunter Imprinting 2.1r3/1.3 Nonpara-/parametric, single-/multipoint
8 GeneHunter TwoLocus 1.3 Parametric, two-locus, single-/multipoint
9 Merlin 1.0.1 Nonpara-/parametric, single-/multipoint
10 SimWalk 2.9.1 Nonparametric, single-/multipoint
11 Allegro 1.2.c Nonpara-/parametric, single-/multipoint
12 PedCheck 1.0 Mendelian error check
13 FastSLink 2.51 Simulation, single-/multi-point

GeneHunter, GeneHunter Plus, GeneHunter Imprinting/TwoLocus/MOD, SPLink were recompiled (MingW) for the use in Microsoft Windows. FastLink, SuperLink, Slink, Merlin are available as DOS runtimes over the internet but will also be provided in our software package. PedCheck is available after free registration. Allegro can be obtained free of charge upon email request from allegro@decode.is for academic users (Allegro 2.0 is supported but has some severe bugs; use v1.2c instead)! The same procedure applies for GeneHunter Imprinting/TwoLocus/MOD. However, the author of the GeneHunter Imprinting/TwoLocus/MOD extensions did not agree with providing windows runtimes for those programs in our setup package. The source code/executables) can be obtained from http://www.staff.uni-marburg.de/~strauchk/software.html and must be manually implemented.

If easyLINKAGE cannot find Allegro, PedCheck, GeneHunter Imprinting/TwoLocus/MOD as stated in the INI file all options regarding those programs will be deactivated. This start-up check will be performed each time you call easyLINKAGE so that the later addition of those programs will be noticed right away by easyLINKAGE. No further user interaction will be required in this regard.

easyLINKAGE was tested under Microsoft Windows 2000/XP (any service pack). It might run on Vista, or older NT versions, or Windows 95 but has not been tested yet. It could be that some source code changes have to be made, in particular to the routines with DOS commands.

Installation

Back to top

Starting with version 3.0 we decided to provide a precompiled version in a setup routine making it rather convenient to start with. Just follow the instructions.

Setup 1 Plus Symbol
© Tom H. Lindner
Setup 2 Plus Symbol
© Tom H. Lindner

Setup 3 Plus Symbol
© Tom H. Lindner
Setup 4 Plus Symbol
© Tom H. Lindner

IMPORTANT! Several subdirectories will be created. Runtimes for Allegro, PedCheck, GeneHunter Imprinting, GeneHunter TwoLocus, SimWalk, PedCheck cannot be provided. They must be obtained upon registration from the appropriate websites.

Once the installation is complete you can use the program right away. easyLINKAGE requires an INI file “easyLINKAGE_setup.ini” without that it would not run at all. The INI file can be edited manually. The options are self explaining. In brief, you can predefine penetrances, models, the path of the linkage runtimes and other parameters as the default setup for easyLINKAGE.

easyLINKAGE does provide many error checking routines and many other options that make the program really user-friendly. The user gets software in hand which was extensively tested by the developers. The programmers themselves are tied up in linkage projects as well, therefore always programming very close to the everyday needs.
 
Many users noted the limited pedigree drawing abilities of all programs. Only GeneHunter provides pedigree plots, however they are very limited. Therefore, easyLINKAGE extends the GeneHunter plots by used markers with their genetic position and in addition, it provides input files for the software HaploPainter. This program draws very nice pedigrees including a colored presentation of markers, position, and haplotypes plus recombination events.
 
We have realized that some scientists are not used to the rather complicated setup of the PERL interpreter and all the additional modules easyLINKAGE needs to run correctly. Therefore, we compiled the program into a Windows binary and packed all the necessary files into a user friendly setup system. The use of the program is free of charge. We appreciate any comment or bug report. We would also appreciate to register your email with us in order to keep you updated with bug fixes, newer versions and stuff like that. Good luck and have fun!

Suggestions for analyses of large-scale SNP data

Back to top

  • Try to avoid SNPs that are very close to each other. Allegro, GeneHunter, Merlin and other programs assume linkage equilibrium between markers. That rule can be easily broken when using many markers in close proximity. However, if such a situation occurs, easyLINKAGE will set the distance between such markers to 0.001 cM.
  • When analyzing an entire chromosome or even the whole genome perform more than just one analysis:
    • Analysis of whole chromosomes
    • Analysis of blocks of maximal 100 markers
    • Analysis of blocks with a smaller marker number in order to overlap regions of b.
    • Analysis of the region where you observe peaks without using blocks of markers
    • Analysis of the region where you observe peaks using blocks of markers
    • Use different allele frequency algorithms (preferentially “codominant” (equal allele distribution) when dealing with inbred or larger families)
  • Analyze your data with and without fully-uninformative markers, i.e. markers with homozygous, 1-allelic genotypes for every genotyped subject. Homozygous markers genotypes can add significant information to the results of multi-point analyses. They can be left aside in two-point linkage analyses.

Maps containing marker positions

Back to top

The program uses marker positions that were kindly provided by Affymetrix Inc. Please be aware, that those positions are a subject to change. Affymetrix updates its maps on a regular basis and can therefore not be held responsible for incorrect positions when using elder map data. The Affymetrix data we provide date back to November 2004.

Acknowledgements

Back to top

We thank Drs. Alejandro Schaffer (FastLink), Jurg Ott (SLink), Michael L. Frigge (GeneHunter Plus), Leonid Kruglyak (GeneHunter), and David Clayton (SPLink) for the permission to recompile the source code of their program for the use in Microsoft Windows and for publishing the binaries on our website.

License agreement

Back to top

IMPORTANT!!! There is a risk in running third-party binaries. Users are advised to compile their own binaries! THIS SOFTWARE IS PROVIDED “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Tom H. Lindner, Katrin Hoffmann, August 16, 2004: Division of Nephrology, Department of Medicine, Neurology, and Dermatology, University Clinic Leipzig, Liebigstr. 20, 04103 Leipzig, Germany and Institute of Medical Genetics, Charité Berlin, Humboldt University, Augustenburger Platz 1, 13353 Berlin, Germany

 
Letzte Änderung: 20.11.2015, 12:52 Uhr
Zurück zum Seitenanfang springen
Zurück zum Seitenanfang springen
Klinik und Poliklinik für Endokrinologie und Nephrologie - Sektion Nephrologie