Users' documentation for PCPhaser

PCPhaser is an application which allows you to use the EIGENSOFT package to obtain PCA graphs from the individual's haplotypes, rather than from their genotypes.

System requirements

PCPhaser can be run only in Linux architectures, and you will need to install the following packages:

You will not need to install EIGENSOFT, since a copy of the package -version 4.2- is included in the application folder. However, you must not move the script "PCPhaser.sh", or it will not work properly, since it uses relative paths to other scripts in the application folder.

Download PCPhaser

PCPhaser 1.2 can be downloaded here.

Install PCPhaser

Once file PCPhaser.zip has been downladed, your should execute this commands in console to install it:

  1. cd [download folder] (to open the folder that contains the ".zip" file)
  2. unzip PCPhaser.zip (to uncompress the ".zip" file)
  3. mv -R PCPhaser [target folder] (to move PCPhaser folder to another location)

Execute PCPhaser

PCPhaser requires five arguments, and may be provided with a flag too, as follows:

./PCPhaser.sh [INPUT_FOLDER] [OUTPUT_FOLDER] [OUTPUT_PREFIX] [WIDTH] [HEIGHT] [--silent]

File formats

PCPhaser uses the PLINK format for the input and output ".ped" and ".map" files, which is described here:

The 6th column of ".ped" files -the phenotype- should contain the name of the population where the individual belongs to. As an example, the following ".ped" file has 5 individuals, 3 belonging to the HapMap population CEPH (CEU) and 2 to the HapMap population from Southest USA with African ancestry (ASW):

2427 2427_19909 0 0 2 ASW 2 2 2 2 2 4 1 1 3 1 3 1 2 4 4 2 3 3 1 1 1 3 4 4 2 2 2 2 2 2 1 2 1 1 1 1 1 3 3 4
2446 2446_20127 0 0 2 ASW 2 2 2 2 2 2 1 1 3 3 3 3 2 2 4 4 3 3 1 3 1 1 4 4 2 2 1 2 4 2 2 1 3 1 1 1 3 3 4 4
1463 1463_12890 0 0 2 CEU 2 2 2 2 2 2 1 1 1 3 1 3 4 2 2 4 3 3 1 1 3 1 4 4 2 2 2 1 2 4 2 2 1 3 1 1 3 3 4 4
1463 1463_12889 0 0 1 CEU 2 2 2 2 2 2 1 1 3 3 3 3 2 2 4 4 3 1 3 1 1 1 4 4 2 2 2 2 2 2 1 1 1 1 1 1 3 3 4 4
1424 1424_11931 0 0 2 CEU 2 2 2 2 2 2 1 1 3 3 3 3 2 2 4 4 3 3 1 1 1 1 4 4 2 2 2 2 2 2 1 1 1 1 1 1 3 3 4 4

Input files

For PCPhaser to work, input ".ped" files must contain phased genotypes. Therefore, alleles at every marker are always considered ordered so that alleles in the first position of a genotype belong to the same homologous chromosome. As an example, the second individual in the ".ped" file above (ID# 2446_20127) has haplotypes "22213324311421423134 / 22213324331422211134".

Output files

For each input ".ped" file with m lines corresponding to m different individuals, PCPhaser will output another ".ped" file with 2m lines, i.e., 2 for each individual, with their haplotypes, but duplicating each allele. As an example, if an individual haplotype is "24322", the haplotype with duplicated alleles will be "2244332222". The only reason to duplicate alleles is technical, as PCPhaser has been implemented as a wrapper of EIGENSOFT. Therefore, lines in the output ".ped" file corresponding to individual 2446_20127 will have the following content:

2446 2446_20127_1 0 0 2 ASW 2 2 2 2 2 2 1 1 3 3 3 3 2 2 4 4 3 3 1 1 1 1 4 4 2 2 1 1 4 4 2 2 3 3 1 1 3 3 4 4
2446 2446_20127_2 0 0 2 ASW 2 2 2 2 2 2 1 1 3 3 3 3 2 2 4 4 3 3 3 3 1 1 4 4 2 2 2 2 2 2 1 1 1 1 1 1 3 3 4 4

Use example

You can test the application using the example data included in the application, by running the following command:

./PCPhaser.sh exampleIN/ exampleOUT/ ASW+CEU 1000 800

When the proccess is completed, a window showing the PCA graph will pop out by default. If you are going to work with huge data files, it is strongly recommended to run the program in the background and in silent mode:

nohup ./PCPhaser.sh exampleIN/ exampleOUT/ ASW+CEU 1000 800 --silent &

The plot drawn by PCPhaser from this example can be seen at http://bios.ugr.es/PCPhaser/ASW+CEU.png.