2D Delaunay Triangulation Performance – Example1

Fade is a very fast multithreaded Delaunay triangulation library, see the benchmark diagram below. The present test code (example1.cpp, contained in the download) allows you to evaluate the performance on your own hardware.

  1. It starts with a call to Fade_2D::setNumCPU(int numCPU):
    • numCPU=1 means single-threaded execution (default behavior)
    • larger values activate multithreading and
    • the special value numCPU=0 means: autodetect and use all available CPU cores

    The command returns the number of used CPU cores (useful in case of autodetection).

  2. Then it sets up the size of the point sets to be triangulated:
    • You can activate the 100 and 250 million points lines in the source if you have enough memory.
  3. The final loop creates random point sets and measures their triangulation time
// * 1 *   Create a Fade object and call setNumCPU(..)
Fade_2D dt;
int numUsedCPU=dt.setNumCPU(0); // 0 means: autodetect
cout<<"Number of CPU cores: "<<numUsedCPU<<endl;

// * 2 *   Set up how many points to test
std::vector<std::pair<std::string,int> > vNumPoints;
vNumPoints.push_back(make_pair("numPoints: 1k",1000));
vNumPoints.push_back(make_pair("numPoints: 50k",50000));
vNumPoints.push_back(make_pair("numPoints: 100k",100000));
vNumPoints.push_back(make_pair("numPoints: 500k",500000));
vNumPoints.push_back(make_pair("numPoints: 1 mio",1000000));
vNumPoints.push_back(make_pair("numPoints: 10 mio",10000000));
vNumPoints.push_back(make_pair("numPoints: 50 mio (10.7 GB)",50000000));
//vNumPoints.push_back(make_pair("numPoints: 100 mio (21.3 GB)",100000000));
//vNumPoints.push_back(make_pair("numPoints: 250 mio (53 GB)",250000000));


// * 3 *   Test
for(size_t i=0;i<vNumPoints.size();++i)
{
	std::string label(vNumPoints[i].first);
	int numPoints(vNumPoints[i].second);

	// Prepare a vector of random points
	vector<Point2> vInPoints;
	generateRandomPoints(numPoints,0,100,vInPoints,1);

	// Insert the points and erase afterwards. The total time for
	// triangulation (without destruction time) is measured.
	cout<<"\n"<<label<<": start"<<endl;
	double elapsed=dt.measureTrianguluationTime(vInPoints);
	cout<<"Elapsed time: "<<elapsed<<endl<<endl;
}


1 Million Points tested on a Core i7 6800K

The above Delaunay triangulation benchmark has been run on a desktop CPU (Core i7 6800K, Linux x64): One million points (2 million triangles) take 0.17 seconds multithreaded (or 0.42 seconds single threaded).

Delaunay Triangulation Benchmark, fast single- and multithreaded execution

Delaunay Triangulation Performance, single- and multithreaded for 1 million points

Comparison: Multithreaded Delaunay vs. Singlethreaded Delaunay

The same machine has been used to test Fade2D with datasets up to 250 million points (500 million triangles). The runtime is roughly linear in the number of input points. The speedup from parallel execution on this 6-core machine is up to 4.39.

Delaunay, single- and multithreaded benchmark for huge point sets up to 250 million points

Delaunay Performance, single- and multithreaded for huge point sets, 250 Million Points


Raspberry PI2 – 1 Million Points on an ARM Cortex A7

Whether it is a fun benchmark or even a very important one: Fade2D has been run on a comparably weak Raspberry PI2. It uses a Broadcom BCM2836 SoC containing a 32-bit ARM Cortex-A7 cpu.

This poor little 3-watt device has fought incredibly brave, it was only 16 times slower than a today’s top desktop computer with a 140-watt TDP. This result encourages us to provide support for further embedded devices. If you have any suggestions feel free to add your comments below.
Raspberry PI Delaunay benchmark
Raspberry PI Delaunay Benchmark: 1 Million points on a weak embedded device

Delaunay Triangulation Performance on a Raspberry PI2

  • Make sure you have enough memory for this benchmark. In Visual Studio use the x64/Release configuration to avoid the Win32 memory limit (which is 2 GB)
  • Linux users may use the Performance-Governor for better repeatability
  • Fade still supports very old compilers. But the VS2008, VS2010 and CentOS6.4 versions are single-threaded.

The next article Example2 – Traversing teaches you the important basics you need to access specific elements in a triangulation.

Spread the word. Share this post!

Leave A Reply

Your email address will not be published. Required fields are marked *

*

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close