Recently I was writing code in Python and C to find prime numbers using a simple implementation of the Sieve of Eratosthemes (implementation details will follow in a later post). Up to 1 million candidates everything was fine, but beyond that my C program segfaulted before printing even the first prime number. Since the compiler wasn’t emitting any warnings (I use GCC with
-Wall -Wextra -Werror), a reasonable guess at to the likely cause was some sort of memory problem. I recompiled with extra debugging information (
-g) and ran my program through Valgrind, which produced the following output – cut down to the relevant parts:
Warning: client switching stacks? SP change: 0x7fefffec0 --> 0x7fa3b4ae8 to suppress, use: --max-stackframe=79999960 or greater Invalid write of size 8 at 0x4004F8: main (sieve.c:14) Address 0x7fa3b4a70 is on thread 1's stack Process terminating with default action of signal 11 (SIGSEGV) Access not within mapped region at address 0x7FA3B4A70 at 0x4004F8: main (sieve.c:14)
If you believe this happened as a result of a stack overflow in your program's main thread (unlikely but possible), you can try to increase the size of the main thread stack using the --main-stacksize= flag. The main thread stack size used in this run was 8388608.
Lines 13 and 14 of
unsigned long int prime_candidates[max_candidate + 1]; unsigned long int primes_found;
The first line reserves space on the stack for an array of integers, and the array size is known at runtime. The total number of bytes required is:
sizeof(unsigned long int) * (max_candidate + 1)
On my machine, the size of an unsigned long is 8 bytes. For small values of
max_candidate, this wasn’t a problem, but when I tried to find the prime numbers up to 10 million the program segfaulted (1 million was just under the allowable stack size). What Valgrind reported as ‘unlikely, but possible’ was the cause: a stack overflow in the program’s main thread, i.e. the stack had insufficient space for all the variables which were being pushed onto it.
The solution was to use the heap instead of the stack for this large array. Fortunately, single-dimension arrays can be created using a pointer to a continuous block of memory, so the initialisation line:
unsigned long int prime_candidates[max_candidate + 1];
unsigned long int prime_candidates = (unsigned long int *) malloc(sizeof(unsigned long int) * (max_candidate + 1));
Since this allocation is on the heap, we must manually free it after use (memory allocated on the stack is automatically freed):
As single-dimension arrays in C can be accessed using the
 notation (square brackets) regardless of whether they are on the stack or heap, there was no need to make any other changes to the code.
Why didn’t the same problem happen with Python? I can’t be sure without a bit more debugging, but I presume as I’m using a dynamic data structure (list) to store the candidates, Python also dynamically allocates the memory.