Linker trouble on BlueGene/P system

Today I have learned that having a global variable in some C89 code which is not declared as static might be problematic if said code is later linked together with a Fortran program which includes a function with the exact same name es the mentioned variable. Somehow the program ended up at the location of the variable which of course ended in a crash with a signal four (illegal instruction). Took me about 12h to find and fix that bug… in my defense it was on a BlueGene system and I first thought the error had something to do with some front-/backend problems (you cross compile on the frontend and then execute the resulting binary using the LoadLeveler)… Anyway, now it works and I can finally take measurements for the thesis.

SIGPROF signal handler and pthreads

During work on my thesis the question popped up how signals generated by the SIGPROF timer were handled in multithreaded code. Signal handlers are process specific to it could have been that one random thread handled the sent signal. As I could not a find a suitable explanation in the intertubes I performed a small experiment.

My sample program installs a signal handler and starts a timer with a frequency of about 100hz. At first the number of captured signals in a ten second timespan are captured using only the main thread and then using four individual threads. The output is:
Signals caught after 10 seconds: 999
Creating 4 threads
Signals caught after 4x10 seconds by thread 0: 1000
Signals caught after 4x10 seconds by thread 1: 1000
Signals caught after 4x10 seconds by thread 2: 1000
Signals caught after 4x10 seconds by thread 3: 1000

So apparently each thread handles the SIGPROF signal, which is quite nice for my purpose.

The sourcode is here (I just assume that pthread_self is async-safe even though it’s specified by the standard. It appears, however, that assuming that is done by most people working on that kind of stuff):
[sourcecode language=”cpp”]#include <signal.h>
#include <stdio.h>
#include <pthread.h>
#include <sys/time.h>

#include "../../util/util_time_measurement.h"

const int thread_count = 4;

volatile sig_atomic_t signal_count[thread_count];

pthread_t threads[thread_count];

static void sigprof_handler(int sig_nr, siginfo_t* info, void *context)
{
   int t;
   for(t = 0; t < thread_count; ++t)
   {
      if(threads[t] == pthread_self())
      {
signal_count[t]++;

return;
      }
   }

   /* Probably no thread */
   signal_count[0]++;
}

void install_signal_handler()
{
   /* Install signal handler for SIGPROF event */
   struct sigaction sa;
   memset(&sa, 0, sizeof(sa));
   sa.sa_sigaction = sigprof_handler;
   sa.sa_flags = SA_RESTART | SA_SIGINFO;
   sigemptyset(&sa.sa_mask);

   sigaction(SIGPROF, &sa, NULL);
}

void idle_time(int seconds)
{
   timestamp_t start = util_get_timestamp();
   while(1)
   {
      if(util_get_timestamp() > start + seconds)
      {
break;
      }
   }
}

void* thread_work(void* data)
{
   idle_time(10);

   return NULL;
}

int main(int argc, char** argv)
{
   install_signal_handler();
   
   static struct itimerval timer;

   timer.it_interval.tv_sec = 0;
   timer.it_interval.tv_usec = 1000000 / 100; /* 100hz */
   timer.it_value = timer.it_interval;

   /* Reset count */
   int t;
   for(t = 0; t < thread_count; ++t)
   {
      signal_count[t] = 0;
   }

   /* Install timer */
   if (setitimer(ITIMER_PROF, &timer, NULL) != 0)
   {
      printf("Timer could not be initialized \n");
   }
   
   /* Idle for 10 seconds */
   idle_time(10);

   printf("Signals caught after 10 seconds: %d \n", signal_count[0]);

   /* Reset count */
   for(t = 0; t < thread_count; ++t)
   {
      signal_count[t] = 0;
   }

   printf("Creating %d threads… \n", thread_count);

   for(t = 0; t < thread_count; ++t)
   {
      pthread_create(&threads[t], NULL, thread_work, NULL);
   }
   
   for(t = 0; t < thread_count; ++t)
   {
      pthread_join(threads[t], NULL);
   }

   for(t = 0; t < thread_count; ++t)
   {
      printf("Signals caught after %dx10 seconds by thread %d: %d \n", thread_count, t, signal_count[t]);
   }
}
[/sourcecode]

Export Multiple Charts from Excel (Office) To PDF For LaTeX Inclusion At Once

I still receive a lot of hits for my earlier post about how to export Excel charts to PDF in order to include them in a LaTeX document. Various keywords also indicate that a lot of people want to kind of automate that process and export many charts at once. Luckily that is also possible now. I have not checked for the PC version yet (if you do it would be great if you could tell me in the comments or view some other channel but here is how-to for Microsoft Excel 2011 for the Mac:

1. Create your charts (that step should be obvious)

2. Select all your charts (hold shift and click):

3. Select “Save as Picture” from the context menu

4. Check the Box “Save each graphic as a separate file” and select format PDF:
 5. Enter a name. Note: The name entered here will result in Excel: a) creating a directory with the name b) saving each selected diagram with name “<entered-name>-<number>.pdf”. So for example “Charts”:
with the two earlier selected charts will result in:
6. All selected charts will be saved and automatically cropped, ready to be included in your document.

7. This works similarly in Microsoft Word and Microsoft Powerpoint.

 

Have fun.

 

First correct output of monitoring module

Wohoo.. my thesis project is coming together, today I got the first output that actually provided information about the benchmarked application. Great accomplishment even though it’s just a simple hybrid (OpenMP, MPI) jacobi:

[lwm2] lwm2-analysis.c:11 --------------------------------------------------
[lwm2] lwm2-analysis.c:12 Run successfully completed
[lwm2] lwm2-analysis.c:21 Wallclock time: 6.34 s
[lwm2] lwm2-analysis.c:28 --------------------------------------------------
[lwm2] lwm2-analysis-mpi.c:25 MPI analysis:
[lwm2] lwm2-analysis-mpi.c:54 General event count for Event MPI_Allreduce: 3 Time: 1308671560.8108737469 Avg Time: 436223853.6036245823
[lwm2] lwm2-analysis-mpi.c:54 General event count for Event MPI_Bcast: 2 Time: 0.0001499653 Avg Time: 0.0000749826
[lwm2] lwm2-analysis-mpi.c:54 General event count for Event MPI_Comm_rank: 2 Time: 0.0000009537 Avg Time: 0.0000004768
[lwm2] lwm2-analysis-mpi.c:54 General event count for Event MPI_Comm_size: 2 Time: 0.0000009537 Avg Time: 0.0000004768
[lwm2] lwm2-analysis-mpi.c:54 General event count for Event MPI_Irecv: 2 Time: 0.0000290871 Avg Time: 0.0000145435
[lwm2] lwm2-analysis-mpi.c:54 General event count for Event MPI_Isend: 2 Time: 0.0000259876 Avg Time: 0.0000129938
[lwm2] lwm2-analysis-mpi.c:54 General event count for Event MPI_Reduce: 2 Time: 0.0001509190 Avg Time: 0.0000754595
[lwm2] lwm2-analysis-mpi.c:54 General event count for Event MPI_Type_commit: 2 Time: 0.0000061989 Avg Time: 0.0000030994
[lwm2] lwm2-analysis-mpi.c:54 General event count for Event MPI_Type_create_struct: 2 Time: 0.0000038147 Avg Time: 0.0000019073
[lwm2] lwm2-analysis-mpi.c:54 General event count for Event MPI_Waitall: 32 Time: 39260146779.9131774902 Avg Time: 1226879586.87228679
[lwm2] lwm2-analysis-omp.c:42 Program has spent 68.43 % of the overall time in OpenMP parallel regions [Omp Events: 362 General Events: 529]

In general my thesis project is a “light-weight monitoring module” (short: lwm2) that is automatically attached to all jobs running on a high-performance computing cluster and collects various metrics. These are then shown to the user after the application has finished. Due to the light-weight nature – running applications should be influenced as little as possible – only certain metrics can be collected. Therefore the module is of course not able to provide deep insight into the application but certain (potential) problems will be detected and these results shall act as a starting point for further analysis using more sophisticated tools. Examples for these tools might be Scalasca, Vampir, ThreadSpotter, or other tools our of the HOPSA package (lwm2 is also a, albeit small, part of the HOPSA project).

Open questions which will be (hopefully) answered in the thesis are

  • which metrics can be collected without compile-time instrumentation. For example, while there does exist a profiling interface MPI, OpenMP has no support for tools apart from some vendor specific tools.
  • which problems with parallel (and, of course, also serial) programs can be detected with an overhead of not more than 1% of the application’s wall-clock time
  • how to rank and present these problems to a user in such a way that they know what to do next
  • how to make sure, that the module – despite pulling various tricks to gain insight into monitored programs without compile-time instrumentation – is as robust as possible and does not introduce bugs or instabilities into the monitores programs

Keep Outlook quiet during a pomodoro

At the moment I’m using the excellent Pomodoro application on my Mac for tracking all my pomodoros during the day. While it already supports changing your Adium/Skype status to DND with a custom message during a pomodoro I also wanted to suppress Outlook “new mail” notifications.

Fortunately Pomodoro supports executing AppleScript scripts for certain events so I used the following snippets to enable/disable these Outlook notifications:

Disable on start/resume:

tell application "Microsoft Outlook"
  set display alerts to false
  set play sound on new message to false
end tell

Enable on end/reset:

tell application "Microsoft Outlook"
  set display alerts to true
  set play sound on new message to true
end tell

It’s working great so far.