Categories
Fade2D Examples

2D Delaunay Triangulation Benchmark – Example1

Fade2D is a very fast Delaunay triangulation library. With this C++ Delaunay benchmark code you can test its performance on your own hardware.

Benchmark Code

// * 1 *   Create a Fade object and set numCPU
Fade_2D dt;
int numUsedCPU=dt.setNumCPU(0); // 0 means: autodetect
cout<<"Number of CPU cores: "<<numUsedCPU<<endl;
 
// * 2 *   Set up how many points to test
std::vector<std::pair<std::string,int> > vNumPoints;
vNumPoints.push_back(make_pair("numPoints: 1k",1000));
vNumPoints.push_back(make_pair("numPoints: 50k",50000));
vNumPoints.push_back(make_pair("numPoints: 100k",100000));
vNumPoints.push_back(make_pair("numPoints: 500k",500000));
vNumPoints.push_back(make_pair("numPoints: 1 mio",1000000));
vNumPoints.push_back(make_pair("numPoints: 10 mio",10000000));
 
// * 3 *   Test
for(size_t i=0;i<vNumPoints.size();++i)
{
    std::string label(vNumPoints[i].first);
    int numPoints(vNumPoints[i].second);
 
    // Prepare a vector of random points
    vector<Point2> vInPoints;
    generateRandomPoints(numPoints,0,100,vInPoints,1);
 
    // Insert the points and measure the time
    cout<<"\n"<<label<<": start"<<endl;
    double elapsed=dt.measureTriangulationTime(vInPoints);
    cout<<"Elapsed time: "<<elapsed<<endl<<endl;
}
  1. The above source code starts with a call to Fade_2D::setNumCPU(int numCPU):
    • numCPU=0 means autodetect
    • numCPU=1 means single-threaded (default)
    • numCPU>1 activates multithreading
  2. Then it sets up the test sizes
  3. Finally the loop creates random point sets and measures their triangulation time

“By default Fade runs single-threaded to avoid nested multithreading in parallel applications. Thus you need to activate multithreading with the Fade_2D::setNumCPU(..) function.”

Benchmark Results: Single- and multithreaded

The below diagram shows the triangulation runtime on a Core i7 6800K CPU for up to 250 million points (equivalent to 500 million triangles). The practical runtime is almost linear, theoretically O(n*log(n)), to the size n of the input point cloud. The multithreading-speedup on the 6-core CPU is 4.39.

Benchmark results: Delaunay triangulation of up to 250 million input points, singlethreaded and multithreaded
Benchmark results: Delaunay triangulation of up to 250 million input points, singlethreaded and multithreaded. The speedup with 6 cores is 4.39.

Raspberry PI2 – 1 Million Points on an ARM Cortex A7

Fade2D also runs on Raspberry PI computers. The RPI2 uses a Broadcom BCM2836 SoC, which contains a 32-bit ARM Cortex-A7 CPU. This small system gets by with 3 watts of power. The more astonishing is that this small board could triangulate one million points (2 million triangles) in 7 seconds.

Fade on Raspberry PI

Get true Benchmark Results

  • Compile an x64 application in Release mode. For better performance and because 32-bit applications suffer from the (MSVC 32-bit) 2GB memory limit.
  • Run the test directly in the command prompt, not in a debug window of your IDE
  • Do you have enough RAM for the benchmark? On 64-bit systems you need 212 MB per million points (2 million triangles).
  • Prevent cpu throttling: Disable energy saving options during the test. On Linux use the Performance governor. If you use a laptop plug in the power chord.
  • Fade still supports very old environments (VS2010, CentOS6.4) but multithreading is disabled there.

The next article Example2 – Traversing teaches you the important basics you need to access specific elements in a triangulation.

Leave a Reply

Your email address will not be published. Required fields are marked *