Categories
Fade2D Examples

2D Delaunay Triangulation Benchmark – Example1

Fade2D is a numerically robust C++ Delaunay triangulation library. But it is nevertheless extremely fast. Certainly you can verify it on your own hardware with the below Delaunay triangulation benchmark.

Benchmark Code

// * 1 *   Create a Fade object and set numCPU
Fade_2D dt;
int numUsedCPU=dt.setNumCPU(0); // 0 means: autodetect
cout<<"Number of CPU cores: "<<numUsedCPU<<endl;
 
// * 2 *   Set up how many points to test
std::vector<std::pair<std::string,int> > vNumPoints;
vNumPoints.push_back(make_pair("numPoints: 1k",1000));
vNumPoints.push_back(make_pair("numPoints: 50k",50000));
vNumPoints.push_back(make_pair("numPoints: 100k",100000));
vNumPoints.push_back(make_pair("numPoints: 500k",500000));
vNumPoints.push_back(make_pair("numPoints: 1 mio",1000000));
vNumPoints.push_back(make_pair("numPoints: 10 mio",10000000));
 
// * 3 *   Test
for(size_t i=0;i<vNumPoints.size();++i)
{
    std::string label(vNumPoints[i].first);
    int numPoints(vNumPoints[i].second);
 
    // Prepare a vector of random points
    vector<Point2> vInPoints;
    generateRandomPoints(numPoints,0,100,vInPoints,1);
 
    // Insert the points and measure the time
    cout<<"\n"<<label<<": start"<<endl;
    double elapsed=dt.measureTriangulationTime(vInPoints);
    cout<<"Elapsed time: "<<elapsed<<endl<<endl;
}
  1. Step 1 in the above code calls Fade_2D::setNumCPU(int numCPU) to enable multithreading. Possible values for numCPU are:
    • numCPU>1 specifies the number of cpu’s to be used
    • numCPU=0 means autodetect and use the available cpu’s
    • numCPU=1 means single-threaded (default)
  2. Afterwards Step 2 sets up the test data sizes
  3. Finally, Step 3 creates random point sets and measures their triangulation time

“By default Fade runs single-threaded. This avoids nested multithreading in parallel applications. Thus, if you want multithreading, you need to activate it explicitly using the Fade_2D::setNumCPU() function.”

Single- and multithreaded Delaunay triangulation benchmark

The below diagram shows benchmark results for up to 250 million points. This is equivalent to 500 million output triangles. The observed runtime is almost linear to point cloud’s size. More precisely, the upper bound on the algorithm’s runtime is O(n*log(n)). The multithreading-speedup on a 6-core CPU is 4.39.

Delaunay triangulation benchmark results for up to 250 million points, singlethreaded and multithreaded
Benchmark results: Delaunay triangulation of up to 250 million input points, singlethreaded and multithreaded. The speedup on a Corei7 6800K cpu with 6 cores is 4.39.

Raspberry PI2 – 1 Million Points on an ARM Cortex A7

Fade2D also runs on Raspberry PI computers. The RPI2 uses a Broadcom BCM2836 SoC, which contains a 32-bit ARM Cortex-A7 CPU. This small system gets by with 3 watts of power. The more astonishing is that this small board could triangulate one million points (2 million triangles) in 7 seconds.

Fade on Raspberry PI

Get true Delaunay Benchmark Results

  • Compile the Delaunay triangulation benchmark as x64 application in Release mode. For better performance and because 32-bit applications suffer from the (MSVC 32-bit) 2GB memory limit.
  • Run the test directly in the command prompt, not in a debug window of your IDE.
  • Do you have enough RAM for the benchmark? On 64-bit systems you need 212 MB per million points which correspond to 2 million triangles.
  • Prevent cpu throttling: Disable energy saving options during the test. Furthermore, use the Performance Governor under Linux and if you use a laptop, connect the power cord.
  • Fade still supports very old environments (VS2010, CentOS6.4) but multithreading is disabled there.

The next article Example2 – Traversing teaches you the important basics you need to access specific elements in a triangulation.

Leave a Reply

Your email address will not be published. Required fields are marked *