Categories
Fade2D Examples

2D Delaunay Triangulation Benchmark – Example1

Fade2D is a numerically robust and extremely fast C++ Delaunay triangulation library. Certainly you can verify that on your own hardware with the below Delaunay triangulation benchmark. You can find this code in examples2D/ex1_benchmark.cpp.

Benchmark Code

// * Init *   
std::vector<std::pair<std::string,int> > vNumPoints;
vNumPoints.push_back(make_pair("numPoints: 1k",1000));
vNumPoints.push_back(make_pair("numPoints: 50k",50000));
vNumPoints.push_back(make_pair("numPoints: 100k",100000));
vNumPoints.push_back(make_pair("numPoints: 500k",500000));
vNumPoints.push_back(make_pair("numPoints: 1 mio",1000000));
vNumPoints.push_back(make_pair("numPoints: 10 mio",10000000));
//vNumPoints.push_back(make_pair("numPoints: 50 mio (10.7 GB)",50000000));
//vNumPoints.push_back(make_pair("numPoints: 100 mio (21.3 GB)",100000000));
//vNumPoints.push_back(make_pair("numPoints: 250 mio (53 GB)",250000000));
//vNumPoints.push_back(make_pair("numPoints: 500 mio (109 GB)",500000000)); 

// * Test *
for(size_t i=0;i<vNumPoints.size();++i)
{
  std::string label(vNumPoints[i].first);
  int numPoints(vNumPoints[i].second);

  // * Step 1 *   Create a Fade object, make performance settings
  Fade_2D dt;
  dt.setFastMode(false); // Use 'true' for raster points (performance!)
  int numUsedCPU=dt.setNumCPU(0); // 0 means: autodetect

  // * Step 2 *   Prepare a vector of random points
  vector<Point2> vInPoints;
  generateRandomPoints(numPoints,0,100,vInPoints,1);

  // * Step 3 *   Measure the triangulation time
  cout<<"\n"<<label<<": start"<<endl;
  double elapsed=dt.measureTriangulationTime(vInPoints);
  cout<<"Elapsed time (cores: "<<numUsedCPU<<"): "<<elapsed<<endl<<endl;
}
  • Step 1 in the above code creates an empty Fade_2D object and makes two important performance settings:
    • Fade_2D::setFastMode(bool b) enables/disables a triangulation mode that greatly speeds up the triangulation of raster points.
    • Fade_2D::setNumCPU(int numCPU) sets the number of CPU cores to be used. The special value numCPU=0 means autodetect and use the available cores.
  • Step 2 prepares a vector of random points
  • Step 3 inserts the points and measures the triangulation time.

“The fast triangulation mode avoids computationally expensive predicates, which are especially frequent for points of a regular grid. These are rare for other settings and thus only triangulation of raster data is accelerated by this mode.”

“By default Fade runs single-threaded. This avoids nested multithreading in parallel applications. Thus, if you want multithreading, you need to activate it explicitly using Fade_2D::setNumCPU(..).”

Single- and multithreaded Delaunay triangulation benchmark

The below diagram shows benchmark results for up to 250 million points. This is equivalent to 500 million output triangles. The observed runtime is almost linear to point cloud’s size. More precisely, the upper bound on the algorithm’s runtime is O(n*log(n)). The multithreading-speedup on a 6-core CPU is 4.39.

Delaunay triangulation benchmark results for up to 250 million points, singlethreaded and multithreaded
Benchmark results: Delaunay triangulation of up to 250 million input points, singlethreaded and multithreaded. The speedup on a Corei7 6800K cpu with 6 cores is 4.39.

Raspberry PI2 – 1 Million Points on an ARM Cortex A7

Fade2D also runs on Raspberry PI computers. The RPI2 uses a Broadcom BCM2836 SoC, which contains a 32-bit ARM Cortex-A7 CPU. This small system gets by with 3 watts of power. The more astonishing is that this small board could triangulate one million points (2 million triangles) in 7 seconds.

Fade on Raspberry PI

Get true Delaunay Benchmark Results

  • Compile the Delaunay triangulation benchmark as x64 application in Release mode. For better performance and because 32-bit applications suffer from the (MSVC 32-bit) 2GB memory limit.
  • Run the test directly in the command prompt, not in a debug window of your IDE.
  • Do you have enough RAM for the benchmark? On 64-bit systems you need 212 MB per million points which correspond to 2 million triangles.
  • Prevent cpu throttling: Disable energy saving options during the test. Furthermore, use the Performance Governor under Linux and if you use a laptop, connect the power cord.
  • Fade still supports very old environments (VS2010, CentOS6.4) but multithreading is disabled there.

The next article Example2 – Traversing teaches you the important basics you need to access specific elements in a triangulation.

2 replies on “2D Delaunay Triangulation Benchmark – Example1”

Making Fade2D even faster would be nice. But it would also add dependencies and thereby worsen the usability and compatiblity. Another thought is that Fade2D creates already ~15 mio. triangles per second (on recent hardware) and I think only very few projects have so much data that they require a new algorithm.

Leave a Reply

Your email address will not be published. Required fields are marked *