You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm sorry to disturb you again, but I encountered another error when using the linkage method while running the following script:
# -*- coding: utf-8 -*-
from sparsehc_dm import sparsehc_dm
import numpy as np
from con_sql import get_vectors_db
import resource
import time
# 1w points test
SIZE = 10000
SORT_RAM = int(0.5 * 1024 * 1024 * 1024)
INDEX_TO_PRINT = 20
HOST = 'XXX'
USER = 'XXX'
PASSWD = 'XXX'
DB = 'XXX'
GET_SENTENCES_SQL = 'XXX'
GET_VECTORS_URL = 'XXX'
def print_use(start_time):
print('total time taken: %s seconds' % (time.time() - start_time))
print('memory usage: %f GB' % (resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024. / 1024.))
def get_distance(i, j, vectors):
return np.linalg.norm(vectors[i] - vectors[j])
def cluster(size, sort_ram, index_to_print, vectors):
distances = sparsehc_dm.InMatrix(sort_ram)
for i in range(size - 1):
for j in range(i + 1, size):
distances.push(i, j, get_distance(i, j, vectors))
l_history = sparsehc_dm.linkage(distances, 'average')
print (l_history[: index_to_print])
def main():
start_time = time.time()
vectors = get_vectors_db(HOST, USER, PASSWD, DB, SIZE, GET_SENTENCES_SQL, GET_VECTORS_URL)
print('time take to read from database: %s seconds' % (time.time() - start_time))
cluster(SIZE, SORT_RAM, INDEX_TO_PRINT, vectors)
print_use(start_time)
if __name__ == '__main__':
main()
Basically this script is reading data from some database and will process the data to obtain a 2d numpy array (via get_vectors_db), which will be identified as points to be clustered, while the time and memory usage is monitored. However, during the execution of the script, the following was printed on screen:
Using 0.5GB RAM for on-disc sorting.
[STXXL-MSG] STXXL v1.4.99 (prerelease/Release) (git 263df0c54dc168212d1c7620e3c10c93791c9c29) + gnu parallel(20180415)
[STXXL-MSG] Disk '/tmp/stxxl' is allocated, space: 953 MiB, I/O implementation: syscall queue=0 devid=0 unlink_on_open
Traceback (most recent call last):
File "cluster.py", line 95, in <module>
main()
File "cluster.py", line 91, in main
cluster(SIZE, SORT_RAM, INDEX_TO_PRINT, vectors)
File "cluster.py", line 85, in cluster
l_history = sparsehc_dm.linkage(distances, 'average')
IndexError: vector::_M_range_check: __n (which is 8145) >= this->size() (which is 8145)
A similar error also occurred when I set SIZE = 5000:
Using 0.5GB RAM for on-disc sorting.
[STXXL-MSG] STXXL v1.4.99 (prerelease/Release) (git 263df0c54dc168212d1c7620e3c10c93791c9c29) + gnu parallel(20180415)
[STXXL-MSG] Disk '/tmp/stxxl' is allocated, space: 953 MiB, I/O implementation: syscall queue=0 devid=0 unlink_on_open
Traceback (most recent call last):
File "cluster.py", line 92, in <module>
main()
File "cluster.py", line 88, in main
cluster(SIZE, SORT_RAM, INDEX_TO_PRINT, vectors)
File "cluster.py", line 82, in cluster
l_history = sparsehc_dm.linkage(distances, 'average')
IndexError: vector::_M_range_check: __n (which is 3736) >= this->size() (which is 3736)
However, I got no error if I replace 'average' with 'complete' in sparsehc_dm.linkage.
The text was updated successfully, but these errors were encountered:
Actually, I tried pushing a constant value to each entry of sparsehc_dm.InMatrix(), and I received the same error while the number is 0. IndexError: vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)
I'm sorry to disturb you again, but I encountered another error when using the linkage method while running the following script:
Basically this script is reading data from some database and will process the data to obtain a 2d numpy array (via get_vectors_db), which will be identified as points to be clustered, while the time and memory usage is monitored. However, during the execution of the script, the following was printed on screen:
A similar error also occurred when I set SIZE = 5000:
However, I got no error if I replace 'average' with 'complete' in sparsehc_dm.linkage.
The text was updated successfully, but these errors were encountered: