Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compatibility (save/restore) between CudnnGRU and CudnnCompatibleGRUCell #7

Open
danielwatson6 opened this issue Jun 25, 2018 · 2 comments
Labels
feature Add new feature help wanted Contributions are welcome

Comments

@danielwatson6
Copy link
Owner

Documentation has been kindly requested in the TensorFlow repo. See this issue.

@danielwatson6 danielwatson6 added feature Add new feature help wanted Contributions are welcome labels Jun 25, 2018
@SSUHan
Copy link

SSUHan commented Nov 12, 2018

Is this problem solved?
I got same problem :(
Here is my error log :

Traceback (most recent call last):
  File "evaluate.py", line 50, in <module>
    model.restore(FLAGS.model_path)
  File "/notebooks/skip-thoughts/skip_thoughts.py", line 239, in restore
    saver.restore(sess, ckpt.model_checkpoint_path)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1768, in restore
    six.reraise(exception_type, exception_value, exception_traceback)
  File "/usr/local/lib/python3.5/dist-packages/six.py", line 693, in reraise
    raise value
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1752, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key cudnn_gru/opaque_kernel not found in checkpoint
	 [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
  File "evaluate.py", line 50, in <module>
    model.restore(FLAGS.model_path)
  File "/notebooks/skip-thoughts/skip_thoughts.py", line 230, in restore
    saver = tf.train.Saver(max_to_keep=1)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1284, in __init__
    self.build()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1296, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1333, in _build
    build_save=build_save, build_restore=build_restore)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 781, in _build_internal
    restore_sequentially, reshape)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 400, in _AddRestoreOps
    restore_sequentially)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 832, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1463, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3414, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1740, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

NotFoundError (see above for traceback): Key cudnn_gru/opaque_kernel not found in checkpoint
	 [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

@SSUHan
Copy link

SSUHan commented Nov 12, 2018

I find how to solve this problem
change evaluation.py code like this
before :

with graph.as_default():
        model = SkipThoughts(w2v_model,
        vocabulary_size=100000, batch_size=2, output_size=512, cuda=True)

after :

with graph.as_default():
        model = SkipThoughts(w2v_model,
        vocabulary_size=20000, batch_size=2, output_size=512, cuda=True)

train.py default option for vocabulary_size is 20000 in master branch..
It's not match on evaluation.py graph

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Add new feature help wanted Contributions are welcome
Projects
None yet
Development

No branches or pull requests

2 participants