-
Notifications
You must be signed in to change notification settings - Fork 65
Home
Hi, if you found this page you are probably looking for some additional information on WISECONDOR.
The algorithm used is described in the paper which can be found here:
WISECONDOR: detection of fetal aberrations from shallow sequencing maternal plasma based on a within-sample comparison scheme
For now, This page will just show some questions I've received on the script, which I'll answer to the best I can.
Plotting Z-Scores
Traceback (most recent call last):
File "test.py", line 562, in
plotResults(sample,markedBins,kept,kept2,outputBase,zScoresDict,zSmoothDict,blindsDict)
File "test.py", line 344, in plotResults
ax = plt.figure(2)
File "/usr/lib/pymodules/python2.6/matplotlib/pyplot.py", line 254, in figure
**kwargs)
File "/usr/lib/pymodules/python2.6/matplotlib/backends/backend_tkagg.py", line 90, in new_figure_manager
window = Tk.Tk()
File "/usr/lib/python2.6/lib-tk/Tkinter.py", line 1646, in init
self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: no display name and no $DISPLAY environment variable
This appears to show up on computers that do not have the Tcl/Tk toolkit installed, which is often the case on compute nodes (headless servers). An easy fix for this is to tell matplotlib to use another backend for its graphical work:
- Open test.py
- Find the list of imports, somewhere about line 25 to 35
- Add these lines to the list:
import matplotlib
matplotlib.use('Agg')
This fix was suggested by W.Y.Leung, thank you!
When specifiying the output for test.py, it assumes you provide it a path including a filename without an extension. The plots will take this path/file combination, add a dot, and add the plot type and pdf extension to it. As a result, providing a path such as ./output/
will result in output files such as ./output/.zscores.pdf
, which, in unix systems, end up being hidden because they start with a dot. Simply add a basic file name to the output path such as ./output/filename
and your files will show up. If you really need to see what hidden files you created, try pressing ctrl-h in a file browser to show hidden files.
The script was written rather rapidly and this never really seemed a problem to me. If you want to save this text output, the usual Unix approach (directing the output to a file) should suffice:
python test.py ./input.gcc ./output ./reference > ./output.txt
There is read depth normalization, but it's done implicitly by the LOWESS GC-Correction. WISECONDOR previously had a seperate step to normalize the data (which was only useful after applying the RETRO-filter due to the read-towers or spikes in the data), but I decided to remove it as I applied LOWESS using a division:
correctedValue = sample[chrom][bin]/lowessCurve.pop(0)
.
As the actual read count of any bin gets scaled to about 1 using this step, the separate normalization step became obsolete, results with and without normalizing the data prior to GC-Correction showed no noticable differences.
The data we obtained from our lab contains enough reads to allow us to use this setting. We prefer using only the highest reliable data we can get over more, less reliably mapped data. Of course, you are free to test WISECONDOR with mismatches allowed, just make sure your reference set is build using the same settings as your test samples. WISECONDOR does not care for these mismatches, it only counts reads based on their position on the genome.
Not much, WISECONDOR will probably not report any aberrated bins (not ones that are fetal anyway) but it won't check for fetal percentages, it simply assumes there is enough.
Indeed, there is a bit of a list:
- Single bin, bin test
- Single bin, aneuploidy test
- Windowed, bin test
- Windowed, aneuploidy test
- Chromosome wide, aneuploidy test
And well, it depends on how securely you are looking at the data. In general, the list right after Windowed, aneuploidy test
should give the most reliable results for aneuploidy calling, while the Windowed, bin test
attempts to provide a list of the areas on the chromosomes that are actually aberrated. The single bin methods are not very reliable as they are not sensitive enough for most samples, instead of combining the power of several bins, just a single bin is taken into account. Calls made are usually strongly deviating values, while most fetal aberrations are not deviating that much.
The last one, Chromosome wide, aneuploidy test
appears to be way to sensitive for calling aneuploidy cases but has shown its use in a set of samples that had far too little fetal DNA: It somehow was able to point us to the right samples as their aberrated chromosomes showed up as more deviating values than usual using this approach.
True, during development I just wanted a quick overview of my results without wasting any space so all chromosomes have their lengths scaled. An alternative was proposed by S.Ghesquiere, which shows every chromosome by it's real length and additional, detailed plots for chromosomes 13 18 and 21 next to their cytobands. We are looking into this approach and my incorporate this in the future. In the meanwhile, you are free to fork this project and make a pull-request, all your input is welcome.
The deviation for any bin compared to its reference. The blue one shows the per bin tested deviations. To get a rough idea of the meaning of this, consider it this way: A high spike shows that that bin is strongly increased when compared to what it should be and what deviation is expected. A bin for which its reference set of bins has a lot of variation will therefore show a smaller spike than a bin that increased just as much but has a more stable set of reference bins. It is not a direct comparison, and it does not show the actual increase for any area, it shows how WISECONDOR looks at that area. Plots that show actual read frequencies for bins do not provide a lot of information as the small change in read depth caused by a fetal aberration often gets completely overruled by natural fluctuations in the read depth data.
The red line shows the windowed approach, which basically just combines a set of results shown in the blue line and determines how much this set deviates. A group of deviating bins shown in blue will therefore result in a strongly deviating red line, hence the visual correlation between the two.
That is considered an artifact. If it shows up in just one sample for that area, the pregnant woman is likely the cause of this:
If a maternal CNV is large enough to cover more than one bin and makes up for a relatively large part of the bins total covered area, it will appear as an aberrated area in WISECONDOR. The windowed method removes the highest bin from it's window but two subsequent spiking bins will leave the window with a strongly deviating value, which will then influence the total Z-Score for all bins close to it.
If the spike does show up in several samples, the reference set used may seem more stable in this area than the tested samples. This structural artifact can be removed by adding more samples to the reference set, allowing WISECONDOR to learn about the spikyness in this area.
If you run into issues, please create a ticket so I can take care of it.
If you have other troubles running WISECONDOR or any related questions, feel free to contact me through the e-mail adress on my GitHub page.