Fade2D is a very fast multithreaded Delaunay triangulation library. With this Delaunay triangulation benchmark example you can test its performance on your own hardware.
Benchmark Code
// * 1 * Create a Fade object and set numCPU Fade_2D dt; int numUsedCPU=dt.setNumCPU(0); // 0 means: autodetect cout<<"Number of CPU cores: "<<numUsedCPU<<endl; // * 2 * Set up how many points to test std::vector<std::pair<std::string,int> > vNumPoints; vNumPoints.push_back(make_pair("numPoints: 1k",1000)); vNumPoints.push_back(make_pair("numPoints: 50k",50000)); vNumPoints.push_back(make_pair("numPoints: 100k",100000)); vNumPoints.push_back(make_pair("numPoints: 500k",500000)); vNumPoints.push_back(make_pair("numPoints: 1 mio",1000000)); vNumPoints.push_back(make_pair("numPoints: 10 mio",10000000)); // * 3 * Test for(size_t i=0;i<vNumPoints.size();++i) { std::string label(vNumPoints[i].first); int numPoints(vNumPoints[i].second); // Prepare a vector of random points vector<Point2> vInPoints; generateRandomPoints(numPoints,0,100,vInPoints,1); // Insert the points and measure the time cout<<"\n"<<label<<": start"<<endl; double elapsed=dt.measureTriangulationTime(vInPoints); cout<<"Elapsed time: "<<elapsed<<endl<<endl; }
- The above source code starts with a call to Fade_2D::setNumCPU(int numCPU):
- numCPU=0 means autodetect
- numCPU=1 means single-threaded (default)
- numCPU>1 activates multithreading
- Then it sets up the test sizes
- The final loop creates random point sets and measures their triangulation time
“By default Fade runs single-threaded to avoid nested multithreading in parallel applications”
Delaunay Triangulation Benchmark single- and multithreaded
A Core i7 6800K CPU was used to test Fade with large point clouds up to 250 million points (equivalent to 500 million triangles). The practical runtime is almost linear ( theoretically O(n*log(n)) ) to the number of points. The speedup on this 6-core machine is up to 4.39.

Raspberry PI2 – 1 Million Points on an ARM Cortex A7
Fade2D also runs on Raspberry PI computers. The RPI2 uses a Broadcom BCM2836 SoC, which contains a 32-bit ARM Cortex-A7 CPU. This small system gets by with 3 watts of power. The more astonishing is that this small board could triangulate one million points (2 million triangles) in 7 seconds.

Get true Benchmark Results
- Compile an x64 application in Release mode. For better performance and because 32-bit applications suffer from the (MSVC 32-bit) 2GB memory limit.
- Run the test directly in the command prompt, not in a debug window of your IDE
- Do you have enough RAM for the benchmark? On 64-bit systems you need 212 MB per million points (2 million triangles).
- Prevent cpu throttling: Disable energy saving options during the test. On Linux use the Performance governor. If you use a laptop plug in the power chord.
- Fade still supports very old environments (VS2010, CentOS6.4) but multithreading is disabled there.
The next article Example2 – Traversing teaches you the important basics you need to access specific elements in a triangulation.