Skip to content

Processing Polyhedra data (5GQM & 5GQN)

keitaroyam edited this page Aug 28, 2017 · 8 revisions

The following describes how Polyhedra datasets can be processed using KAMO (documentation in Japanese / English).

References

  • Original paper
    • Abe et al. (2017) "Crystal Engineering of Self-Assembled Porous Protein Materials in Living Cells." ACS Nano doi: 10.1021/acsnano.6b06099
  • Structures described here

Raw data

  • Available in Zenodo. DOI
  • EIGER X 9M detector, 5×5 μm2 beam, 1 Å wavelength, 160.0 mm camera length
  • 5°/dataset, 1°/frame (shutterless)
  • 5GQM: 20 datasets collected manually from one cryoloop
    • I23, a= 103.42 Å
  • 5GQN: 184 datasets collected automatically (ZOO) from two cryoloops
    • I23, a= 103.32 Å

How data were processed in the original paper

Processing of individual datasets

GUI command 'kamo' was used by default parameters, that is, XDS (ver. Oct 15, 2015) was used for integration and no prior crystal information was employed.

All 20 datasets of 5GQM were processed, while 155 out of 184 datasets of 5GQN were processed.

Merging

As there were a number of samples, the automatic merging command kamo.auto_multi_merge was used.

First, a CSV file like below was prepared and saved as targets.csv.

topdir,name
/foo/_kamoproc/CPS1963/01,WT in vivo
/foo/_kamoproc/Auto/abe-CPS1963-05,Del3 PhC in vivo
/foo/_kamoproc/Auto/abe-CPS1963-06,Del3 PhC in vivo
# omitted other sample information not described here

Then, the following script was executed (NOTE that command-line arguments here may not be valid in the current version).

kamo.auto_multi_merge \
  csv=targets.csv \
  workdir=$PWD \
  prefix=merge_ \
  postrefine=False \
  reference=2oh6.mtz \
  merge.d_min_start=1.4 \
  merge.clustering=cc \
  merge.cc_clustering.min_acmpl=90 \
  merge.cc_clustering.min_aredun=2 \
  merge.anomalous=false \
  batch.engine=sge \
  merge.batch.par_run=merging \

This script performed resolution of indexing ambiguity using 2oh6 data, hierarchical clustering based on intensity correlation, and merging with outlier rejections for each cluster with expected completeness>90% and redundancy>2, and repeat the procedure by cutting resolution (starting from 1.4 Å).

And there were:

  • merge_WT_in_vivo/cc_1.68A_final
  • merge_Del3_PhC_in_vivo/cc_1.55A_final

from which the result having largest CC1/2 was chosen:

  • merge_WT_in_vivo/cc_1.68A_final/cluster_0013/run_01/ (14 datasets merged)
 SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
 RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
   LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

     5.04        6707     837       841       99.5%      11.5%     13.1%     6697   13.42     12.3%    98.7*    -3    0.720     472
     3.56       10623    1457      1465       99.5%      13.0%     12.9%    10599   12.94     14.0%    98.4*     2    0.816     870
     2.91       14938    1874      1877       99.8%      15.8%     15.7%    14910   10.68     16.9%    97.8*    -3    0.814    1206
     2.52       17920    2174      2189       99.3%      21.5%     21.2%    17888    8.36     22.9%    96.0*    -1    0.808    1497
     2.25       18900    2484      2518       98.6%      27.1%     26.7%    18838    6.59     29.1%    94.3*    -7    0.775    1680
     2.06       20375    2653      2671       99.3%      33.0%     32.9%    20324    5.57     35.4%    92.6*    -7    0.757    1944
     1.91       22829    2858      2870       99.6%      44.5%     46.9%    22774    4.16     47.5%    89.5*    -3    0.730    2145
     1.78       27235    3369      3376       99.8%      63.1%     73.2%    27174    2.90     67.3%    83.2*    -2    0.685    2501
     1.68       24782    3287      3305       99.5%      92.6%    115.0%    24711    1.86     99.3%    70.3*    -3    0.644    2337
    total      164309   20993     21112       99.4%      24.2%     25.4%   163915    6.06     25.9%    98.2*    -3    0.737   14652
  • merge_Del3_PhC_in_vivo/cc_1.55A_final/cluster_0100/run_02 (41 datasets merged)
 SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
 RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
   LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

     4.66       23650    1053      1055       99.8%      24.7%     26.2%    23650   15.43     25.2%    98.3*     3    0.892     861
     3.30       36688    1828      1828      100.0%      28.8%     26.3%    36688   14.01     29.6%    98.2*     4    0.968    1633
     2.69       51964    2360      2360      100.0%      29.5%     28.5%    51964   12.52     30.2%    97.3*     0    0.960    2169
     2.33       62690    2778      2778      100.0%      35.6%     31.9%    62690   10.83     36.4%    96.3*     2    0.993    2583
     2.09       64967    3007      3007      100.0%      43.8%     37.0%    64967    9.15     44.9%    94.2*    -1    0.989    2822
     1.90       73769    3602      3603      100.0%      59.3%     45.0%    73769    7.24     60.8%    92.9*    -1    0.989    3398
     1.76       81043    3731      3733       99.9%      75.5%     73.2%    81043    5.62     77.3%    88.9*     0    0.918    3539
     1.65       84906    3849      3850      100.0%     111.1%    122.1%    84906    4.05    113.8%    82.9*     1    0.860    3664
     1.55       94115    4525      4525      100.0%     174.0%    208.9%    94115    2.71    178.4%    70.6*     2    0.805    4316
    total      573792   26733     26739      100.0%      40.3%     37.8%   573792    7.63     41.3%    98.3*     1    0.921   24985

NOTE that choosing non-final result is not recommended now.

The details may be added later.