Skip to content
Roy Straver edited this page Nov 21, 2013 · 37 revisions

Welcome to the WISECONDOR Wiki

Hi, if you found this page you are probably looking for some additional information on WISECONDOR.

The algorithm used is described in the paper which can be found here:
WISECONDOR: detection of fetal aberrations from shallow sequencing maternal plasma based on a within-sample comparison scheme

For now, This page will just show some questions I've received on the script, which I'll answer to the best I can.

Getting WISECONDOR to work

I'm getting this Tk-something error:

Plotting Z-Scores
Traceback (most recent call last):
File "test.py", line 562, in
plotResults(sample,markedBins,kept,kept2,outputBase,zScoresDict,zSmoothDict,blindsDict)
File "test.py", line 344, in plotResults
ax = plt.figure(2)
File "/usr/lib/pymodules/python2.6/matplotlib/pyplot.py", line 254, in figure
**kwargs)

File "/usr/lib/pymodules/python2.6/matplotlib/backends/backend_tkagg.py", line 90, in new_figure_manager
window = Tk.Tk()
File "/usr/lib/python2.6/lib-tk/Tkinter.py", line 1646, in init
self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: no display name and no $DISPLAY environment variable

This appears to show up on computers that do not have the Tcl/Tk toolkit installed, which is often the case on compute nodes (headless servers). An easy fix for this is to tell matplotlib to use another backend for its graphical work:

  • Open test.py
  • Find the list of imports, somewhere about line 25 to 35
  • Add these lines to the list:
    import matplotlib
    matplotlib.use('Agg')

This fix was suggested by W.Y.Leung, thank you!

I specified an output, the script runs without errors but there is nothing in the output directory

When specifiying the output for test.py, it assumes you provide it a path including a filename without an extension. The plots will take this path/file combination, add a dot, and add the plot type and pdf extension to it. As a result, providing a path such as ./output/ will result in output files such as ./output/.zscores.pdf, which, in unix systems, end up being hidden because they start with a dot. Simply add a basic file name to the output path such as ./output/filename and your files will show up. If you really need to see what hidden files you created, try pressing ctrl-h in a file browser to show hidden files.

I get a nice text output in my terminal but I want to save this to a file instead

The script was written rather rapidly and this never really seemed a problem to me. If you want to save this text output, the usual Unix approach (directing the output to a file) should suffice: python test.py ./input.gcc ./output ./reference > ./output.txt

Understanding the input

There is no read depth normalization, how can you compare samples this way?

There is read depth normalization, but it's done implicitly by the LOWESS GC-Correction. WISECONDOR previously had a seperate step to normalize the data (which was only useful after applying the RETRO-filter due to the read-towers or spikes in the data), but I decided to remove it as I applied LOWESS using a division: correctedValue = sample[chrom][bin]/lowessCurve.pop(0). As the actual read count of any bin gets scaled to about 1 using this step, the separate normalization step became obsolete, results with and without normalizing the data prior to GC-Correction showed no differences.

What happens if there is not enough fetal DNA present?

Not much, WISECONDOR will probably not report any aberrated bins (not ones that are fetal anyway) but it won't check for fetal percentages, it simply assumes there is enough.

Understanding the output

There is a list of calls that differ in number and results, what should I look at?

Indeed, there is a bit of a list:

  • Single bin, bin test
  • Single bin, aneuploidy test
  • Windowed, bin test
  • Windowed, aneuploidy test
  • Chromosome wide, aneuploidy test

And well, it depends on how securely you are looking at the data. In general, the list right after Windowed, aneuploidy test should give the most reliable results for aneuploidy calling, while the Windowed, bin test attempts to provide a list of the areas on the chromosomes that are actually aberrated. The single bin methods are not very reliable as they are not sensitive enough for most samples, instead of combining the power of several bins, just a single bin is taken into account. Calls made are usually strongly deviating values, while most fetal aberrations are not deviating that much.
The last one, Chromosome wide, aneuploidy test appears to be way to sensitive for calling aneuploidy cases but has shown its use in a set of samples that had far too little fetal DNA: It somehow was able to point us to the right samples as their aberrated chromosomes showed up as more deviating values than usual using this approach.

The plots are all the same size, it's hard to recognize chromosomes

True, during development I just wanted a quick overview of my results without wasting any space so all chromosomes have their lengths scaled. An alternative was proposed by S.Ghesquiere, which shows every chromosome by it's real length and additional, detailed plots for chromosomes 13 18 and 21 next to their cytobands. We are looking into this approach and my incorporate this in the future. In the meanwhile, you are free to fork this project and make a pull-request, all your input is welcome.

What do the lines in Z-Score plots actually show?

The deviation for any bin compared to its reference. The blue one shows the per bin tested deviations. To get a rough idea of the meaning of this, consider it this way: A high spike shows that that bin is strongly increased when compared to what it should be and what deviation is expected. A bin for which its reference set of bins has a lot of variation will therefore show a smaller spike than a bin that increased just as much but has a more stable set of reference bins. It is not a direct comparison, and it does not show the actual increase for any area, it shows how WISECONDOR looks at that area. Plots that show actual read frequencies for bins do not provide a lot of information as the small change in read depth caused by a fetal aberration often gets completely overruled by natural fluctuations in the read depth data.
The red line shows the windowed approach, which basically just combines a set of results shown in the blue line and determines how much this set deviates. A group of deviating bins shown in blue will therefore result in a strongly deviating red line, hence the visual correlation between the two.

I found a 10-20 Mb area using the sliding window, but the plots only show a huge narrow spike

That is considered an artifact. If it shows up in just one sample for that area, the pregnant woman is likely the cause of this:
If a maternal CNV is large enough to cover more than one bin and make up for a relatively large part of the bins total covered area, it will appear as an aberrated area in WISECONDOR. The windowed method removes the highest bin from it's window but two subsequent spiking bins will leave the window with a strongly deviating value, which will then influence the total Z-Score for all bins close to it.
If the spike does show up in several samples, the reference set used may seem more stable in this area than the tested samples. This structural artifact can be removed by adding more samples to the reference set, allowing WISECONDOR to learn about the spikyness in this area.