Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prime Number Functions #400
base: develop
Are you sure you want to change the base?
Prime Number Functions #400
Changes from 6 commits
4015fbc
3ee737b
9512bb6
d762398
5375a1d
a684dbd
3e4db8a
dd8a61c
4debd0d
2a7e031
a68910e
386cc4d
d79eddb
3fdf917
6ca245b
7d3a520
a1ac504
35d2aa1
243a299
2eadeff
173ce0d
23fba36
f7b45fd
d687d5e
e51d727
1245d27
2052081
9f81e0d
55ea045
31a2105
f545928
d780db5
2639bed
94fc1ac
6ebb906
0521854
b233a80
f95c2cf
8e2e29a
1a24f16
c63f1f1
b7d4256
fbc38c8
2e46b81
6759ede
1b40403
97244be
fa04133
c4a89c8
91836f6
6d6b19f
81e4a6c
0b8f1d5
5ba0a1d
8e240f7
7c0cbbf
3d9b77c
302fb5f
9792a23
84a69f0
5dc3523
0dbe69c
eee2c86
f5d789a
f2277e3
1d2f03c
0d9d31b
c361cde
66c2642
de1f331
eaea5f9
e8196f3
cb5d978
830ccc4
4dc3eb2
90be100
a772782
2d1461f
e7cdb32
29eef88
b5a28b5
6c26b53
980bfe7
07e6f58
86b9e5a
f19149e
1ad0d51
5f6d06a
670b06d
7c7a491
e507ba5
c9cb41c
e8b71a0
d649f65
ed98892
c3b3934
e9aa05d
7c5d792
2bbfad2
ddc5c8a
f517c00
a356b47
2212e45
b12e2dc
f39a4d8
eafbefc
cb36b89
16c2354
0b1a690
8c883d1
f357e0f
a4f2d89
d16e562
91be2c3
d9ae8c2
7d22010
b541987
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would wager that this is a factor in the performance difference: Kim's implementation estimates the number of primes and reserves that number of elements upfront.
I didn't look into exactly how it is estimated but presumably it is something like this: https://en.wikipedia.org/wiki/Prime-counting_function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; I would agree-I thought we had this earlier . . .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did. The snippets in question are below:
and this was in the main
prime_sieve
(now callable fromprime_reserve
)I thought making it a little less generic to gain performance was worthwhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call me a purist, but you'll make it faster by making it more generic. Allocating memory for the output is not the responsibility of the algorithm.
Now that I've seen KW's implementation, I guess you essentially copied the interface so that it was easier to compare performance?
We don't have to copy the interface, ours can be better. And performance is not even the most important factor: allocating memory on the heap is simply a no-go in some contexts.
To make it concrete, if the system that I work on at work needed to generate primes, we could not use KW's or this implementation (as it is), we would have to write our own. I wasn't always aware of such systems and limitations, it just comes with experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially was returning via output iterator instead of using a provided container. Profiling showed that more than 10% of my runtime was the call to
std::move
for the output iterator so I changed approaches to the current one.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not convinced this interface change is worth ~5x more CPU runtime. Care to weigh in @pabristow @jzmaddock ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have a number of issues here:
Question: how far can we go with a range-based output? As in "put the next N primes here". The range could either be a pair of iterators, an iterator and the value N, or a boost::range. If that's not faster than a container based interface (no memory allocation) then there's something wrong somewhere.
But... I don't know how you would divide that up for multithreading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mborland
I can't see the code that is testing the output iterator, but yeah, a 5x difference is not acceptable. But it's also bizarre. Why is
move
being used at all?int64_t
is a built-in data type, right, I thought it would just be copied? Is there the same huge difference forint32_t
?I would test it something like this. Create the vector to the size required outside the loop.
If you're doing something different, I'd be curious to see it and know why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jzmaddock What about building a wrapper class that would support range-based output and other functions like last/next value, random value, etc? I think that is currently the best way to go about supporting a wider variety of data structures. As for removing dynamic memory allocation? I think it is in the realm of the possible, but would require massive retooling. You would become highly dependent on mutex locking which is not a trivial operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jzmaddock
back_insert_iterator
onvector
.(Btw, I'm going to use x for prime numbers and
n
for data sizes.)It's funny, but the output iterator interface is almost a range interface. As you say, either an (iterator, iterator) or an (iterator,
n
) pair define a range. The interface to this algorithm is essentially (iterator, x), which means "generate prime numbers up to x", which translates to an output range of (iterator, prime_count(x)).So yeah, an alternative interface is (iterator,
n
), i.e., generate the firstn
prime numbers. Then it's up to the client to calculaten
from x if they know what x they want to calculate up to.