So, how does it shape up? On a i5-3210M (2.5GHz) Ivybridge (2 cores, 4 threads) I get a peak of ~99.6 million 64 bit rdrands per second with 4 threads which equates to ~6.374 billion bits per second. Not bad at all.
With a 4 threaded i5-3210M CPU we hit maximum rdrand throughput with 4 threads.
..and with a 8 threaded i7-3770 (3.4GHz) Ivybridge (4 cores, 8 threads) we again hit a peak throughput of 99.6 million 64 bit rdrands a second on 3 threads. One can therefore conclude that this is the peak rate of the DNRG on both CPUs tested. A 2 threaded i3 Ivybridge CPU won't be able to hit the peak rate of the DNRG, and a 4 threaded i5 can only just max out the DNRG with some hand-optimized code.
Now how random is this random data? There are several tests available; I chose to exercise the DRNG using the dieharder test suite. The test is relatively simple; install dieharder and do 64 bit rdrand reads and output these as a raw random number stream and pipe this into dieharder:
sudo apt-get install dieharder
./rdrand-test | dieharder -g 200 -a
#=============================================================================#
# dieharder version 3.31.1 Copyright 2003 Robert G. Brown #
#=============================================================================#
rng_name |rands/second| Seed |
stdin_input_raw| 3.66e+07 | 639263374|
#=============================================================================#
test_name |ntup| tsamples |psamples| p-value |Assessment
#=============================================================================#
diehard_birthdays| 0| 100| 100|0.40629140| PASSED
diehard_operm5| 0| 1000000| 100|0.79942347| PASSED
diehard_rank_32x32| 0| 40000| 100|0.35142889| PASSED
diehard_rank_6x8| 0| 100000| 100|0.75739694| PASSED
diehard_bitstream| 0| 2097152| 100|0.65986567| PASSED
diehard_opso| 0| 2097152| 100|0.24791918| PASSED
diehard_oqso| 0| 2097152| 100|0.36850828| PASSED
diehard_dna| 0| 2097152| 100|0.52727856| PASSED
diehard_count_1s_str| 0| 256000| 100|0.08299753| PASSED
diehard_count_1s_byt| 0| 256000| 100|0.31139908| PASSED
diehard_parking_lot| 0| 12000| 100|0.47786440| PASSED
diehard_2dsphere| 2| 8000| 100|0.93639860| PASSED
diehard_3dsphere| 3| 4000| 100|0.43241488| PASSED
diehard_squeeze| 0| 100000| 100|0.99088862| PASSED
diehard_sums| 0| 100| 100|0.00422846| WEAK
diehard_runs| 0| 100000| 100|0.48432365| PASSED
..
dab_monobit2| 12| 65000000| 1|0.98439048| PASSED
..and leave to cook for about 45 minutes. The -g 200 option specifies that the random numbers come from stdin and the -a option runs all the dieharder tests. All the tests passed with the exception of the diehard_sums test which produced "weak" results, however, this test is known to be unreliable and recommended not to be used. Quite honestly, I would be surprised if the tests failed, but you never know until one runs them.
The CA cert research labs have an on-line random number generator analysis website allowing one to submit and test at least 12 MB of random numbers. I submitted 32 MB of data, and I am currently waiting to see if I get any results back. Watch this space.
You can rest assured that I ran the DRNG's output though dieharder a few times during its development. The thing with dieharder is that on perfectly random data, it will randomly throw up some 'weak's . Do it a second time and you will get a different set of weak results. If you take enough P values, some of them will land in the margins.
ReplyDelete6.374GBits/s is 796.75MBytes/s which is closer to the theoretical maximum of 800MBytes/s than I achieved (I got about 780). So well done.
Can confirm.
DeleteI'm just working on improving dieharder. The problem of random WEAKs is due to https://github.com/rurban/dieharder/issues/6
the Multiple testing problem, which dieharder does by default not mitigate against. Therefore it is currently recommended to use -Y1 in all cases when WEAK results are returned. This will retry such WEAK results with a different seed for some time and checks if it will get better. A better idea would be to subtract the expected number of bad p-values, the alpha, or to check for outliers, as we do with smhasher.
I also got crazily slow rdrand benchmarks on my AMD Ryzen 3, with 72 ints/second, not 36000 as expected on Intel. But I'm really torturing rdrand, without any buffer.
I also got extremely bad test results with dieharder. Intel uses a good AES_CBC-MAC, wonder what AMD uses. https://software.intel.com/content/www/us/en/develop/articles/intel-digital-random-number-generator-drng-software-implementation-guide.html?wapkw=rdrand64_step
I'm sure with some careful optimization I can get some more performance out of the test rig I hacked up.
ReplyDeleteOn my Core i7-3520M (on same X230) I top at ~42M rdrand/s at 4 threads.
ReplyDeleteIt seems you didn't commit rdrand-test on the same repository, I'd be interested too.
rdrand-test simply dumped the rdrand64() reads to stdout, it's quite trivial.
DeleteGood point, with this (gross) patch I manage to get ~10MB/s on that X230, but doing the write every 64 bits is quite a bad idea :)
ReplyDeletehttp://paste.debian.net/204718/
Looks from the graph like, until the RNG bottleneck is hit at 99.6 MHz, it's taking 9 cycles per read.
ReplyDeleteOn IVB, 800MiBytes/s is the limit (100MHz x 64bits), imposed by the bus local to the DRNG. From my perspective, the CPU core is way over there on the other side of the chip. I don't care how fast it is, it's not getting more that 800MiB/s :)
ReplyDeleteThere is a bug on some Ivy Bridge processors that cause rdrand to signal an illegal instruction exception. My new laptop has that problem and I am not happy. I would actually have a very particular use for that instruction.
ReplyDeleteThis is described in note BV54 in http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/3rd-gen-core-desktop-specification-update.pdf
DeleteI have a question regarding practical use of rdrand: is it better to use rdrand instead of C rand() function? will I get better performance? I need this for simple, not critical apps likes games, animations.
ReplyDelete