Jupyter C++

March 25, 2021

One beauty of iPython is the interactivity. You can construct test cases and try them interactively. I wanted to experiment with this functionality, but for C++.

For the Coursera Bioinformatics class, the level of difficulty has gotten to the point where a simple int main() program reading from stdin and stdout is not cutting it - I need to be able to test all the pieces modularly.

One approach I have found is to use xeus-cling. It is quite easy to install using Miniconda. Just don’t think too hard about all the language boundaries between JavaScript(i/o), Python(backend), Clang(compiling your C++/interpreting it on the fly).

Now one can develop little C++ functions interactively.

For the coursera class, where autograder needs the code to read from stdin/stdout, a simple go-inspired test harness like this works for me:

struct TestCaseInputs {
    // The input to your logic function.
    std::string in;

    // What you want to get out of your logic function.
    std::string want;

    // Description of the input/test case.
    std::string desc;
};
struct TestCases {
    std::string name;
    std::function<std::string(std::string)> logic;
    std::vector<TestCaseInputs> cases;
};

void RunTests(TestCases const test_cases) {
    std::cout << "Testing: " << test_cases.name << std::endl;
    for (auto const& test : test_cases.cases) {
        std::cout << test.desc;
        std::string got = test_cases.logic(test.in);
        if (test.want != got) {
            std::cerr << "->FAIL: " << std::endl << test.desc << ": want=" <<  test.want << " got=" << got << std::endl;
        } else {
            std::cout << "->PASS" << std::endl;
        }
    }
}

The user can specifiy some string input, some wanted output, and then some logic to make the input go to the output. For example, to test my cyclospectrum logic, I use the harness with some inputs like this:

RunTests({
  .name = "CyclospectrumTest",
  .logic = [](std::string input) {
      auto m = PeptideToMasses(input, INTEGER_MASS_TABLE);
      return StrJoin(Cyclospectrum(m));
    },
    .cases = {
      {.desc = "Empty", .in = "", .want = "0"},
      {.desc = "Single", .in = "N", .want = "0 114"},
      {.desc = "Single", .in = "L", .want = "0 113"},
      {.desc = "Single", .in = "E", .want = "0 129"},
      {.desc = "Pair", .in = "NL", .want = "0 113 114 227"},
      {.desc = "Triple(rot0)", .in = "NLE", .want = "0 113 114 129 227 242 243 356"},
      {.desc = "Triple(rot1)", .in = "ENL", .want = "0 113 114 129 227 242 243 356"},
      {.desc = "Triple(rot2)", .in = "LEN", .want = "0 113 114 129 227 242 243 356"},
    }
});

Running the above cell will provide some outputs like this:

Testing: CyclospectrumTest
Empty->PASS
Single->PASS
Single->PASS
Single->PASS
Pair->PASS
Triple(rot0)->PASS
Triple(rot1)->PASS
Triple(rot2)->PASS

There are certain C++ features that don’t seem to work for me in cling; for example, static data in functions has not worked for me yet (which makes sense, I think).

My biggest tip is: if cling repeatedly fails to compile/evaluate a cell (Function definition not allowed here, or various type errors), the kernel may have gotten corrupted / in a bad state, so just use Kernel -> Restart And Run All Cells.

Shared libraries

To build a shared library using clang++, you can run a command like:

clang++ -Wall -Werror -std=c++17 -g -fpic -shared util.cc -olibutil.so

Then, to load the library into your notebook, you can load the library in cling with the following:

#pragma cling load("path/to/the/shared/library/libutil.so")
#include "path/to/the/shared/library/util.h"

Optimization

The “interpreter mode” may will run slowly by default, in -O1. To set the optimization mode, you can try using:

#pragma cling optimize(3)

But, there will be some inherent limitations to using C++ in an interpreted context. Many advanced optimizations are only really possible with more context (code-inlining, de-virtualization, etc).

Further, the wall time to run your notebook may go up, as compilation time in -O3 may begin to dominate the runtime of the cell, if your function is very simple and not computationally intensive.