NOTE: Sorry for the delay in posting. School is hitting me rather hard as of late, and will continue too until the end of the month. Thus until September posts are going to be a little sporadic... unfortunately. :(
I'm fascinated with the idea of doing real world calculations on the GPU. The idea just seems to make sense to me. You have this piece of hardware in almost all machines today that is around to do one thing really quickly, and that thing is to crunch numbers so video games can look good. And unless your playing video games that card is mostly sitting in the background just sucking up energy and maybe helping to provide somenice eye candy for you desktop.
But within the last couple of years this idea has been changing. People are figuring out ways to make this super calculator to do more general purpose math manipulations. And now you have the idea of GPGPU, which means General Purpose computation on Graphics Processing Units. And of course like a good geek, I want to play with this.
Of course, before I start to play withCUDA, I needed to come up with a baseline. Which leads me to ask myself, “What am I going to program?” After some time, I decided I would program a prime number generator. My next step was to code up a basic prime number generator, using just a simple algorithm.
Again, nothing very special going on here. Taking some basic liberties with calculations, like just giving the value of 2 & 3 as prime, and starting to run calculations after 3. Also avoiding all calculations of even numbers. Pretty simple stuff. And I'm positive that if I wanted to I could tweak the current algorithm some more to squeeze an extra drop or two of performance out of it. But that wasn't the point of this. This program above is to just give me a baseline of performance . So as I now start to play with CUDA and GPGPU I can get an idea on how much performance might be gained.
And of course, what would be the point of creating a baseline if I din't include the numbers of running this program.
|Iterations||Times (in seconds)|
And as the table shows, not only is the code simple... its get to be really slow with a really large number.
Next stop; somehow putting this code into CUDA and seeing what kind of performance improvement can be derived.